You are here

Performace Benchmarking

3 posts / 0 new
Last post
Performace Benchmarking

I have installed rosetta3 on HPC. How can I perform benchmarking tests to find out how many cpu or gpu nodes are required for running rosetta on HPC? Are these tests separate for each rossetta application?

Post Situation: 
Wed, 2019-12-04 02:50
Tushar Kush

1. GPU nodes are useless for Rosetta.  Rosetta is currently CPU-only.  (We're working on GPU support.)

2.  Since even the same protocol can take wildly different amounts of time depending on your inputs and settings, there's no standard benchmark for determining the computational cost of doing something.  Set up whatever you want to do, do it once on one CPU, and multiply the time it takes to generate one sample by the number of samples that you want in order to get the total CPU-hours that you'll need.  Then divide by available CPUs to get wall-hours.  For example, if you want to predict a structure, let's say that you find that running AbRelax with -nstruct 1 (to produce 1 sample) takes 3 minutes on a single CPU.  If you want 100,000 samples, you'll need 5,000 CPU-hours.  So you could run for 50 hours on 100 CPUs, or 5 hours on 1,000, or whatnot.

Be aware that to take advantage of multiple CPU cores, you need to do one of the following:
A.  Launch a separate instance of Rosetta for each CPU core that you have.
B.  Compile Rosetta with the extras=mpi option, and then launch 1 MPI process for each core that you have.  (This produces better load-balancing.  I'd recommend this as the best practice, provided the application that you're using supports MPI-based parallelism.  Most Rostta applications do.)
C.  Compile Rosetta with the extras=cxx11thread option, and then launch 1 Rosetta instance per node.  Rosetta will automatically launch 1 thread per core.  (Note that currently, only the packer takes advantage of this, so this will not use CPUs very effectively.  This is not yet a recommended best practice.)

Wed, 2019-12-04 13:05

To further Vikram's comment: Rosetta is generally embarassingly parallel in that it just runs N independent monte carlo trajectories on N processors.  So it will use as many processors as you give it up to your requested nstruct.  (I am assuming you are doing something "standard" like docking or packing that has nstruct, etc).

Wed, 2019-12-04 13:07