You are here

rosetta_mpi error

19 posts / 0 new
Last post
rosetta_mpi error
#1

Hello!

I'd compiled rosetta using using extra=mpi and it was successful (no errors). moreover, I already have rosetta running without mpi options. 

when i am trying to run any application (let say abinitio) it pop up an error " AbinitioRelax.mpi.linuxgccrelease: error while loading shared libraries: libcppdb.so: cannot open shared object file: No such file or directory"

Could you please guide me what to change in source file under (LD_LIBRARY_PATH)? is it "rosetta_2017.52.59948_mpi/main/source/bin/SymDock.mpi.linuxgccrelease"? i tried this based on some earlier posts on rosetta forum but it didn't work.

Thanks!

malkeet

Category: 
Post Situation: 
Mon, 2018-02-05 10:29
malkeet.singh

This isn't really specific to MPI (in case you're wondering).

What likely happened here is that you either moved the compiled executables after compiling, or you're running them from a directory which is mounted in a different place between compiling and running. (This happens with me - the network drive is mounted in a different location on my local machine versus on the cluster.)

Probably the best fix is to recompile under the same full path you'll be using to run.

Past that, you'll want to find the appropriate directory in the build/external/ directory.  This is going to be something like `rosetta_2017.52.59948_mpi/main/source/build/external/release/linux/2.6/64/x86/gcc/5.2/mpi/` -- but it will be different depending on your particular system and compiler. The important thing to make sure of is that A) the directory exists B) it contains the libcppdb.so C) it corresponds to the compile for the binary you're running (e.g. same compiler, both mpi, etc.).  Once you've figured out the external directory, you can add it to your LD_LIBRARY_PATH, and that should fix the issue.

 

Mon, 2018-02-05 15:12
rmoretti

Hi!
Thanks for your reply!

I compiled Rosetta source again using "mpi" flag successfully, and "**/build/external/release/linux/2.6/64/x86/gcc/5.2/mpi/" directory is present, which also contain "libcppdb.so" file.

Earlier, when I was trying to use "--help" flag following any application (let say  /path to rosetta/Relax.mpi.linuxgccrelease --help), it was showing 'shared library' error. At leatst, now, this is not the isuue in new compilation and it display various options for rosetta_appliactions. (earlier error might be due to multiple tries of compilation without prior deleting wreck of every futile compilation try).

I copied the path of "**/build/external/release/linux/2.6/64/x86/gcc/5.2/mpi/" to ".bashrc" file export LD_LIBRARY_PATH=

and tried to run realx application using following command

" /path to rosetta installation/source/bin/mpiexec -np20 /home/gnss/singhma/rosetta/software/copiled/rosetta_src_2017.52.59948_bundle/main/source/bin/relax.mpi.linuxgccrelease -s 1aki.pdb",

which pop out following error


mpiexec_genesis: cannot connect to local mpd (/tmp/mpd2.console_singhma); possible causes:
  1. no mpd is running on this host
  2. an mpd is running but was started without a "console" (-n option)

 

Could you help me to sort this out? 

 

Second question: if i use relax, let say, using mpi and flag -nstruct 10 and -np 20, how mpi is supposed to work? I mean, how will it be using 10 processors? and how will it provide output structures. Will it make the overall process fast ? 

 

thanks!

 

Wed, 2018-02-07 02:01
malkeet.singh

To successfully run MPI, you need to have an "MPI daemon" running. MPI is the "message passing interface", and how it works is that you launch several different copies of the same program. The programs themselves communicate with the MPI library (which was compiled into Rosetta with the extras=mpi option on the build commandline.) These libraries then know how to communicate with a (single) "MPI daemon" which is running on the same machine. It's this MPI daemon which actually takes care of passing the messages around from one program to the other, and from one machine to the other if you're running across multiple machines.

So you'll need to have the MPI daemon program running on your machine before you can launch an MPI job. Most clusters already have the MPI daemons running on the cluster nodes, so typically you don't have to worry about it, but if you're attempting to do an MPI run on your local machine, or are doing cluster administration yourself, it's something you'll need to be aware of. -- Note that it's important that the MPI daemon is from the same version of MPI that you used to compile Rosetta with. (There's several "flavors" of MPI out there, and each have their own slighlty different way of interacting. You need to make sure that the libraries you used and the daemons which are running are all compatible.)

 

Relax, like most protocols in Rosetta, is "trivially parallel". That is, each output structure is independent of each other. This is how we split things up under MPI . The MPI is used just in the job distribution stage, to coordinate which process gets which output structures. They then each work (indepedently) on the output structure.  This means, though, that there's little to nothing to be gained by over-supplying processors. With an -nstruct 10 and -np 20, you'll have only 10 output structures, so 10 processors will be working on structures, and 10 processors will be idle. (Well, assuming only one input structure. If you have 2 input structures, you have 20 output structures total, with enough work for all processes to do.)   -- It's not a completely 1-to-1 assignment, though, as depending on settings you'll reserve a few processors for administrative purposes.

Wed, 2018-02-07 08:35
rmoretti

Hello! 

I would appreciate if you could answer my previous query; however, my mpi version worked by changing mpiexec with mpirun. 

I compared the relax_application run with same flags using .static and .mpi versions and i was surprised to see mpi took ~3X time as compared to .static version. 

I wondered seeing many rows written in score.sc file of .mpi run as compared to .static run. Could you please explaion me the difference in both runs?

Thanks!

Malkeet

Wed, 2018-02-07 03:44
malkeet.singh

The mpiexec/mpirun distinction is part of the "flavor" difference I mentioned -- you need to make sure that the MPI launcher you're using matches the flavor of the MPI daemon and MPI libraries you're compiling with.

I'm not sure why the .mpi version is 3x slower than the .static version -- There's not that much overhead for MPI, and distributing it out among many processors should definitely beat a single processor static run.

The only thing I can think of is if you're not compiling with the correct MPI libraries. If you're compiling with the wrong MPI libraries, they might not be able to communicate effectively, resulting in delays/timeouts and potentially serial-like runs (with no coordination between processes). When you say many lines in the score.sc file, does it look like each launched process wrote their own entry for each output structure? If so, that indicates the MPI run isn't effectively communicating, and each process is acting like a non-MPI run, effectively duplicating effort (and overwriting each other).

I would double check your compilation conditions and environment variables, to make sure that the MPI libraries you're using correspond to the "flavor" of MPI that's being invoked with mpiexec (you might have to talk to your cluster administrator for details on how things are set up -- each cluster is different).

Wed, 2018-02-07 08:44
rmoretti

Hello Rosetta developers!

Your last post and some other MPI-related posts on Rosetta forum helped me to understand MPI's concept. Indeed, my cluster had 'mpiexec' (MPI installation) on multiple locations, so Rosetta had been compiled with default location but, while running Rosetta applications, I declared different path of 'mpiexec'. Now, I declared specific path of 'mpiexec' in site.settings (library_path) file for Rosetta_mpi compilation. Then passed 'mdp' command which solved following error.

mpiexec_genesis: cannot connect to local mpd (/tmp/mpd2.console_singhma); possible causes:
  1. no mpd is running on this host
  2. an mpd is running but was started without a "console" (-n option)

 

My Rosetta_mpi is working fine now! No overwriting of final scores etc. However, have some specific questions about options and theory:

1.  I’m using 'abinitio' designing of small peptides so using 'Fragment_picker' script with 'quota' protocol.

a.  First, SAM is not working (https://compbio.soe.ucsc.edu/SAM_T08/T08-query.html). So, I used spider_X and doctored the three-state SS (secondary structure) output file for probability columns (of H, C and E) to make it comparable to 'pispred' and 'jufo' output files. I guess it’s ok as long as any SS prediction software provide probability values for H,C,E. WDYT?

b.  I read Gront et al., 2011 but I didn’t understand why in 'quota.def' the contributions are distant. I mean, don’t you think all should have same contributions otherwise results would be biased towards psipred predictions? What is your opinion on it?   

c.  As the tutorial has available pdb file that is not in my case, I had some problems in setting quota 'weights' file. For instance, deleting 'ProfileScoreL1' and 'PhiPsiSquareWell' helped to run application. Aren’t they crucial for quota-based weighing of fragment (or benchmarked)? Here in my case, I used only two features for three SS prediction methods, i.e. 'secondarysimilarity' and 'ramascore'. Is there any way to involve ProfileScoreL1 and PhiPsiSquareWell terms?

d.  In fragment_picker, different lenghts of fragments could be generated (in tutorial 3 and 9 are mentioned) like 3,4,5,6,…n . I guess using various lengths of fragments could be beneficial instead of only 3 and 9. What is your opinion on this and what is rationale for 3 and 9 in tutorial ? random?

e.  If I have more than 2 types of fragments, can I use them under same flag? I mean, If I have frags.200.3mers, 4mers, 5mers files, could I use them with -in:file:frag9 repeatedly or I have to alter the flag from 9 to 3,4 and5?

           -in:file:frag3 frags.200.3mers

           -in:file:frag4 or 9?for frags.200.4mers 

           -in:file:frag5 or 9? for frags.200.5mers

2.  I used –np 15 –nstruct 50 for Abinito_application, the output pdb files are of two types: S_000000021.pdb and F_000000013.pdb. Although # of total output pdb structures is same as declared with –nstruct (50), I couldn’t understand the reason for different naming patterns i.e. F_ and S_.

I used Calibur for clustering using command “*/main/source/bin/calibur.mpi.linuxgccrrelease -input:pdb_list pdblist -res::chains A -strategy::thres_finder 0”. It is default, and filtering is ‘false’, i.e. highly dissimilar decoys are already filtered. The end lines of result looks like as following; however, I could barely understand # of clusters created, centroid and companions in same cluster due to monotonous titles like cluster =1, center, cluster = 0 etc.

Largest 2 cluster centers: /home/gnss/singhma/rosetta/peptide_design_test/F_00000003.pdb(2), /home/gnss/singhma/rosetta/peptide_design_test/S_00000049.pdb(1). Margin = 50.00000%

cluster = 0; center = /home/gnss/singhma/rosetta/peptide_design_test/F_00000003.pdb; n_decoy_members = 2; members =

cluster = 1; center = /home/gnss/singhma/rosetta/peptide_design_test/S_00000049.pdb; n_decoy_members = 1; members =

gn_test/F_00000004.pdb(3), /home/gnss/singhma/rosetta/peptide_design_test/S_00000009.pdb(2). Margin = 33.3333%

cluster = 0; center = /home/gnss/singhma/rosetta/peptide_design_test/F_00000004.pdb; n_decoy_members = 3; members =

cluster = 1; center = /home/gnss/singhma/rosetta/peptide_design_test/S_00000009.pdb; n_decoy_members = 2; members =

 

I feel It became a really long post, I would be incredibly greatful if you could answer aforementioned questions.

Thanks!

Malkeet

 

Tue, 2018-02-13 05:37
malkeet.singh

1a - I believe that normally Rosetta fragment prediction is done with a locally installed version of SAM, rather than the webserver, but that shouldn't be too big an issue. While benchmarking was done with SAM, using a different secondary structure prediction massaged into SAM format shouldn't be too much of an issue. The only consideration might be what weights to give it, given it's not SAM (see point 1b)

1b - DIfferent secondary structure prediction algorithms have different abilities. Not only different overall accuracies, but also different performance on different types of proteins. By having multiple secondary structure prediction programs involved the hope is that you can overcome deficiencies in one program with the predictions of another program. But because the predictions are different, they have different amounts of usefulness in the Rosetta protocols. For the particular benchmark set that the quota protocol was run against, those particular combinations of weights were determined to give the best result. You can certainly play around with the weights if you want, if you think a particular

1c - Generally speaking, for structure prediction any input PDB is only needed for calculating benchmarking statistics (rmsd to native) -- it should be possible to run without an actual native PDB.  I'm a little unclear on the errors you're getting which caused you to remove them, but the ProfileScoreL1 needs a sequence profile for input, not a structure.  I think this is typically provided with the BLAST *.checkpoint file. The PhiPsiSquareWell needs Phi/Psi predictions. As a "cheat", it can take the phi/psi predictions from a native structure -- which is what the demos are doing. If you don't have a native, you can get a TALOS-style file with the predictions and pass it to `-in:file:talos_phi_psi. TALOS (I believe) requires NMR chemical shift data.  FragmentCrmsd is likewise a "cheat" entry, which explicitly uses structural similarity to a reference structure to help score fragments. If you don't have phi/psi predictions or a reference structure, feel free to omit the PhiPsiSquareWell and FragmentCrmsd entries.

1d - You can certainly try different length fragments. This is something that's been tested in the Baker lab. However, when they've tested adding more/different fragment lengths, they didn't see any particular improvement over the standard 3 & 9 choice. It seems like 3 & 9 strike a good balance of structural diversity and capturing medium-range interactions. Unless you have reasons to use different lengths, I'd stick with 3 & 9. (For example, in FlexPepDock they tend to use 3 & 5 instead of 3 & 9 - the reason being that peptides are short and 9-mers are either a substantial fraction of the peptide or are too long.)

1e - It really depends on the protocol you're using. I think that in most cases -in:file:frag3 and -in:file:frag9 are interprested as "short fragments" and "long fragments", respectively -- they don't necessarily need to be actual 3-mers and 9-mers.  -- That said, it certainly could be the case that some corners of Rosetta make an assumption about the literal length of the fragments they're using.  There's a general mechanism for inputting multiple differing length fragment files (-in:file:frag_files), but whether a particular protocol actually respects that setting is another question. For example, the loop modeling protocols have their own setting (-loops:frag_files and -loops:frag_sizes) that they use in preference to the general ones. -- If you're interested in using non-standard fragment sizes you'll likely need to play around with things to see if the particular protocol you're using supports it.

2 - The "F" and "S" stand for "Failure" and "Success", respectively. Basically, if the output of the protocol doesn't pass the internal quality filters, it's labeled with the "F" instead of the "S".

 

Wed, 2018-02-21 10:00
rmoretti

Thanks for your reply!

in case of F and S models, F models could be omitted for further analysis. Isn't it?

Could you please reply for my Calibur related question as well?

Could you also let me know if it is possible to do following with Rosetta?

- design a peptide for which peptides specific interactions with other protein (possible crystal structure or unknown peptide) ar known so that both produce good complementary fit. in other words, let say I have to design  XXHXXSXXX eptide and i know that  S and H interacts with A and B of other protein (crystal structure or other unknown peptide to be designed), is it possibe to design the peptide so that their intermolecular interactions could be preserved with good overall complementary fit? or in other case, if the interacting amino acids A and B of any crystal structure are known (binding site is known), could we design a small peptide that would fit into the binding site and make interactions with A and B (i.e. denovo design of peptide inside the binding site to ensure specific interactions and good overall fitting).

Thansks!

Malkeet

Sat, 2018-03-03 08:10
malkeet.singh

It depends on what you're interested in, but generally speaking it's probably safe to just ignore the F_ models.

I'm not really that familiar with Calibur output, so I'm not entirely certain, but from what I can interpret, the:

Largest 2 cluster centers: /home/gnss/singhma/rosetta/peptide_design_test/F_00000003.pdb(2), /home/gnss/singhma/rosetta/peptide_design_test/S_00000049.pdb(1). Margin = 50.00000%

Is an overview line. It's telling you that the two representatives of the two largest clusters are F_00000003.pdb and S_00000049.pdb. The number in parenthesis are the number of structures in that cluster (so two for the first cluster, one for the second cluster).  Note that this overview line is only going to show you the top two clusters, even if you actually have more clusters.

The other lines like

cluster = 0; center = /home/gnss/singhma/rosetta/peptide_design_test/F_00000003.pdb; n_decoy_members = 2; members =

are summaries of each of the clusters. The cluster number is just an arbitrary index (larger clusters have smaller numbers, though). The "center" is the representative member of the cluster, and "n_decoy_members" are the number of structures in the cluster.  The members= bit is supposed to tell you which structures are in the cluster, but it seems like that functionality isn't working at the moment, for some reason.  Also, it seems like the number of clusters reported at the moment is hard-coded to a maximum of 2 clusters, for some reason.

 

On your last question, it sounds like you're interested in multi-state design. There's two protocols for multistate design in Rosetta. The first is mpi_msd (https://doi.org/10.1371/journal.pone.0020937), which is good if you're interested in fixed-backbone designs particularly if you're interested in negative design. The other is the RECON protocol (https://doi.org/10.1371/journal.pcbi.1004300), which can't handle negative design, but is able to handle backbone flexibility and other such sampling. (Though the more you deviate from fixed backbone the harder the problem gets.)

Mon, 2018-03-05 15:58
rmoretti

Thanks!
 

I'm little perplexed that i started with 100 total models for clustering and the results are shown for top 2 clusters which in total cover only 3-4 models. It mean all other models are gone. I can't fathom why and where are the other models. I'll be obliged if you can eplain this to me.

 

Thanks!

Malkeet

Wed, 2018-03-07 06:22
malkeet.singh

It looks like it's just a hard-coded limitation of the current implementation of the calibur clustering program in Rosetta.  I'm guessing whoever mocked it up didn't need more than the top two clusters.

That said, having only 3-4 models in the top cluster is indeed rather sparse. You may want to play around with the clustering threshold settings, to see if there's another way to cluster which gives "looser" but bigger clusters.

Another thing is that 100 models for abinitio is *way* undersampling things.  The possible space of folded proteins is large. If you only have a few structures scattered in it, chances are that none are going to be close to one another, and any "clusters" you find might just be structures which are similar by random chance.. By increasing the number of structures generated, you get a clearer picture of the shape of the underlying landscape, and your top clusters are more likely to be ones which exist for reasons of a good region of structure space, rather than by random chance. Even then, any clusters are only going to represent a small fraction of the total amount sampled - you'll just be more confident that they're "real" clusters rather than spurious associations.

Wed, 2018-03-07 08:10
rmoretti

Hello!

If I use mpirun with n number of processor in Relax application I get only one score. it mean the calculations are not happening serial wise (as earlier i guess my mpi libraries were not good so if i was using -np 10 , 10 scores were written in output file mean calculations were happening in serial). But when i use ddg_monomoer, with -np5 and 5 mutations, I expect 5 mutations will be distributed to 5 processors and they will finish parallely to give 5 ddg values in final ddg_predcition.txt file. However, i am getting many, lines there. so how mpi is handing ddg_monomer script?

is it like, if i use 10 processor, each mutation will be processed 10 times? or it should run parallely?

I hope I explained my question!

malkeet

Thu, 2018-03-22 09:14
malkeet.singh

Running with MPI correctly should result in the same number of output structures as if you ran the same command without MPI - it's just that you'll complete it quicker, as you're using more than one processor.

For relax, each input structure will result in  `-nstruct` output models. By default -nstruct is 1, so if you pass relax a single input structure, you'll get a single output model by default. This is true even if you're running under MPI. One processor will work on your one output model, and the rest of the processors will twiddle their thumbs. If you want multiple output structures per input structure, you'll need to increase the -nstruct number accordingly. (Each output structure will be parallelized across different CPUs.)

The ddg_monomer app is a little different, though. Unlike relax which uses the standard job distribution system (and thus has MPI support), ddg_monomer uses its own job distribution system. It looks like this custom job distributor is intrinsically single processor -- there's no support for MPI with ddg_monomer, so attempting to run it with MPI isn't going to work all that well.

Sat, 2018-03-31 11:40
rmoretti

Hello Rosetta developers!!

Could you please reply for my previous post ?

Thanks!

Malkeet

Sun, 2018-02-18 00:18
malkeet.singh

Hi !

 

If I am using ddg monomer script and use -ddg:minimization_scorefunction ref2015, would it use ref2015 scoring function or i need to provide path for ref2015.wts file ?

Moreover, this application has a flag -ddg::iterations 50, which corresponds to number of packer and minimizer runs. I'm not sure if 50 is default number of models generated as mentioned at ddg_monomer's webpage or number of models genrated has any correlation with number of iterations? 

as per webpage "More precisely, 50 models each of the wild-type and mutant structures should be generated, and the most accurate ddG is taken as the difference between the mean of the top-3-scoring wild type structures and the top-3-scoring point-mutant structures."

i'm confused by 'should' term. I mean how can we play with the number of models generated. where could i find the code of ddg_monomer script?

 

thanks!

Malkeet 

Thu, 2018-03-22 07:15
malkeet.singh

I don't quite know what you mean by the "ddg monomer script", but if you're looking for the code for the ddg_monomer application, it's located at Rosetta/main/source/src/apps/public/ddg/ddg_monomer.cc -- Though most of the "heavy lifting" for the application is done by Rosetta/main/source/src/protocols/ddg/ddGMover.cc

Sat, 2018-03-31 11:43
rmoretti

Hello! 

Thanks for the reply!

 

By ddg_monomer script, I mean "/bin/ddg_monomer.mpi.linuxgccrelease" (What you say it, if not script, it will help me to address in future  messages ?)

In this applictaion, there is a flag "-ddg::iterations 50", which corresponds to number of packer and minimizer runs. Could you confirm if 50 is default number of models generated as mentioned at ddg_monomer's webpage or "number of models genrated depends upon number of iterations ( if # iterations = # models generated)? 

As per webpage, "More precisely, 50 models each of the wild-type and mutant structures should be generated, and the most accurate ddG is taken as the difference between the mean of the top-3-scoring wild type structures and the top-3-scoring point-mutant structures."

- Is there any new develompment in "ddg_monomer" application or going on?

Thanks!

Sun, 2018-04-01 13:17
malkeet.singh

Hello!

 

Could you plaese reply my previous post?

 

Thansk!
msb

Tue, 2018-04-10 01:16
malkeet.singh