You are here

How to uninstall and install rosetta using "extras=mpi'?

11 posts / 0 new
Last post
How to uninstall and install rosetta using "extras=mpi'?
#1

Dear Friends,

 

I compiled rosetta using :

 

./scons.py -j <number_of_cores_to_use> mode=release bin

But, since I have 20 cores, and I want to distribute the runs across all cores, I want to use "MPI" options. Could you please let me know how can I uninstall and install rosetta using "MPI" option? thanks@!

Category: 
Post Situation: 
Sun, 2019-09-08 03:42
Danielsebas

Rosetta does not have an "uninstall", you just delete things you don't want anymore.

./scons.py -j <number_of_cores_to_use> mode=release extras=mpi bin

 

will do what you want.  All I've added is "extras=mpi".  It will put the build files in a sibling directory in build and put symlinks to the executables in bin/ as you expect; they will be xyz.mpi.linuxgccrelease instead of xyz.default.linuxgccrelease (for example)

Sun, 2019-09-08 08:44
smlewis

so I should not delete anything and simply run the command? Thanks!

Sun, 2019-09-08 09:28
Danielsebas

Correct, there is no need to delete anything.  When you change any build setting, scons will put all the "new" files in different places or with different names.  You can have dozens of binary modes built at once assuming you are patient and have the disk space to waste.

 

Sun, 2019-09-08 10:12
smlewis

Ok, I have done this and now I want to run rosetta "abinitiorelax.mpi.*" on my 20 cores. Can you please let me know how to do that?

I ran this:

 

nohup ../../maoin/source/bin/AbinitioRelax.mpi.linuxgccrelease @options > log &

but when I do htop I see abinitio running on only one CPU? Isthere a way to make the abinitio run on multiple cpus when I have MPI installed? Thanks

Mon, 2019-09-09 04:34
Danielsebas

I read more about rosetta and came up with this command line for denovo structure prediction of 382 residue long protein:

I ran it in silent mode.
 

mpiexec -np 16 ../../main/source/bin/AbinitioRelax.mpi.linuxgccrelease -database ../../rosetta/rosetta_src_2019.31.60840_bundle/main/database/ @options -mpi_tracer_to_file log1 &

options:

-in
        -file
                -fasta sequence.fasta  # protein sequence in fasta format
                -frag3 t001_.200.3mers  # protein 3-residue fragments file
                -frag9 t001_.200.9mers  # protein 9-residue fragments file
-abinitio
        -relax
        -increase_cycles 10     # Increase the number of cycles at each stage in AbinitioRelax by this factor
        -rg_reweight 0.5        # Reweight contribution of radius of gyration to total score by this scale factor
        -rsd_wt_helix 0.5       # Reweight env, pair, and cb scores for helix residues by this factor
        -rsd_wt_loop 0.5        # Reweight env, pair, and cb scores for loop residues by this factor
-relax
        -fast   # At the end of the de novo protein_folding, do a relax step of type "FastRelax".  This has been shown to be the best deal for speed and robustness.
-out
        -nstruct 50000  # how many structures do you want to generate?  Usually want to fold at least 1,000.
        -file
                -silent abrelax.out # full path to silent file output
                -silent_struct_type binary      # we want binary silent files
                -scorefile score.sc
-overwrite      # overwrite any existing output with the same name you may have generated
-nstruct 50000

the program is running on 16 cores and has generated log files like this:

log1_0, log1_1, ......log1_15

. Can you please let me know if this is correct? And, what I need to be careful about once the run is finished ? Thanks.

Mon, 2019-09-09 04:40
Danielsebas

I can't know what the most correct way to run MPI on your system is; it varies from implementation to implementation.  Certainly the observed behavior of 16 log files numbered 0-15 is correct.

With 16 processors and 50K structures it may take a long time to complete.

I would not use overwrite with a job of this type; overwrite is appropriate for debugging but not production runs at 50K outputs. It won't hurt anything unless you restart; if you restart the job it will waste time and possibly give redundant nstruct in your silent file.  

A single silent file with 50K outputs is going to be quite large and may be a little slow to manage when you get around to extracting the interesting models.  

Mon, 2019-09-09 11:24
smlewis

Thanks! 

I am running 50K since this is the recommended nstruct value on rosetta tutoirial; however, since it is going to take long time, how many structures in general would be a good idea to produce for further analysis and clustering to get best model? 

 

And, can I stop the program in between and delete the files (silent file and score file) and rerun for lesser number of nstruct? would it influence the rerun? Thanks 

Tue, 2019-09-10 02:35
Danielsebas

I can't tell you how many structures you need.  It's empirical.  The answer is "until it converges on the right answer", which can only be seen retrospectively.

Stopping Rosetta at any time is fine.  It writes structures to the silent and score files as it completes them.  Those files are perfectly good and analyzable even if you stop Rosetta partway through.  There is no reason to stop Rosetta, change to a lower nstruct, and restart - just let it run as long as you care to and then kill the process and look at the files.  You can see how many models have finished by peeking at the number of lines in score.sc.

If you were to change a "science setting" you would want to move or remove your existing silent file so you don't have a mix of models from different settings.  

Tue, 2019-09-10 12:05
smlewis

Thanks ! Once I have the silent files:

how to analyse the silent files since there are multiple files like this, abrelax_1, abrelax_15 etc to find best 3D structure? I am following this tutorial for abinitio structure prediction:

https://www.rosettacommons.org/demos/latest/tutorials/denovo_structure_prediction/Denovo_structure_prediction

 

Wed, 2019-09-11 07:44
Danielsebas

Generally with analysis it's good to take a look at previously published papers and see what they have done.

With abinito there's usually two main approaches in analysis. The first is to make a score-vs-rmsd plot (a "funnel plot") and look for a clear funnel to your native. If you don't have a native for comparison (usually the case), then what you can do is pick a structure as a "mock native", and compute the rmsd to it. If you get a nice funnel to that structure, it's a good indication that Rosetta thinks the structure is native like. (Generally your mock native will be the lowest energy structure. But sometimes it's good to go through a number of diverse low energy structures to see how the funnel plot changes as you pick different structures. Sometimes the structure with the best funnel plot will not be the lowest energy structure.)

The other approach is to do structural clustering of all the structures, or a large selection of lowest energy structures. Generally speaking, native-like structures will have larger clusters than non-native clusters. Typically the most native like structure will be the lowest energy structure (or one of the lowest energy structures) in the largest (or one of the largest)  clusters.

From a purely mechanistic point, if you have multiple output silent files, you can combine them into one file with the combine_silent application, or simply by concatenating them together with the Unix cat program. It doesn't really matter if there are redundant structure names, upon read-in Rosetta will give the redundant structures new structure names.

Wed, 2019-09-18 10:11
rmoretti