Unrecognised resiudues when rescoring PDB files generated by enzyme_design under MPI

5 posts / 0 new

Top

Hello,

I'm running the enzyme_design application using MPI and in the scorefile that is produced, the 'description' field isn't unique across the processors, so if I run on 10 processors, I get 10 PDB files named mypdb___DE_10.pdb. Firstly, is there a way to ensure that the PDB files have unique names?

As I can't correlate the scores in the scorefile to the individual PDB file, I've renamed all my PDB files to give them unique names and am then trying to use the score_jd2 to generate a scorefile. I'm using the following flags:

/opt/rosetta_bin_linux_2017.08.59291_bundle/main/source/bin/score_jd2.default.macosclangrelease \
-database /opt/rosetta_bin_linux_2017.08.59291_bundle/main/database \
-in:file:l pdb.list \
-out:file:o rescorefile.json \
-scorefile_format json \
-extra_res ../SAM.params \
-extra_res ../SAX.params \
-ignore_unrecognized_res \

However this generates the following error:

protocols.jd2.PDBJobInputter: pushed 1m6e_scaffold__DE_98.pdb nstruct index 1
protocols.jd2.PDBJobInputter: pushed 1m6e_scaffold__DE_99.pdb nstruct index 1
protocols.evaluation.ChiWellRmsdEvaluatorCreator: Evaluation Creator active ...
protocols.jd2.PDBJobInputter: PDBJobInputter::pose_from_job
protocols.jd2.PDBJobInputter: filling pose from PDB 1m6e_scaffold__DE_1.pdb
core.chemical.GlobalResidueTypeSet: Finished initializing fa_standard residue type set. Created 546 residue types
core.chemical.GlobalResidueTypeSet: Total time to initialize 0.820689 seconds.
core.import_pose.import_pose: File '1m6e_scaffold__DE_1.pdb' automatically determined to be of type PDB
core.chemical.ResidueTypeFinder: No ResidueTypes remain after filtering by ResidueType base name: 'SAX'

ERROR: No match found for unrecognized residue at position 1
Looking for lower-terminal residue with 3-letter code: MET
ERROR:: Exit from: src/core/io/pose_from_sfr/PoseFromSFRBuilder.cc line: 433

So it's firstly falling over on the MET residue, even though I'm passing the -ignore_unrecognized_res flag, and also looks like it's not reading the information on the ligands that I'm passing with the -extra_res flag.

Rosetta must be able to score these PDB file as it generated them with a scorefile, so can anyone tell me what I need to do to recreate the scorefile?

Thanks!

Jens

Category:

Scoring

Post Situation:

Unsolved

Tue, 2018-02-13 03:41

linucks

Top

Unfortunately, the enzyme_design application doesn't actually support MPI. It's on an older JobDistribution system which doesn't batch things out across MPI. As such, there's no coordination between the mulitple processed being run -- they overlap in names because each process thinks they're the only ones running.

But enzyme_design is a "trivial parallel" application, anyway. Each output structure is completely independent of the others. So what you can do is lauch a number of different (single threaded) processes, and then use different values passed to -out::suffix or -out::prefix for each process to distinguish the output. Alternatively, if you do need MPI distribution, you should be able to replicate the enzyme_design protocol in RosettaScripts. (Take a look at the EnzRepackMinimize mover - https://www.rosettacommons.org/docs/latest/scripting_documentation/RosettaScripts/Movers/Movers-RosettaScripts#ligand-centric-movers_enzyme-design)

Regarding the unrecognized residue, it's a bit hard to say from the information you've given. It's apparently failing on the first residue in your PDB. Is that your ligand? Is it a protein residue you're using with constraints? What type should it be? Rosetta is apparently looking for a MET or SAX residue at that position. Is this accurate? What's your SAX.params file look like? Particularly, what are the NAME and IO_STRING lines like?

If I had to guess off-the-cuff, I think this might be related to a bug that's been fixed in recent weekly releases. Basically, in the output file there's HETNAM records. In certain cases (particularly if there's covalent interations to the residue), Rosetta can get confused by the HETNAM records it outputs. A quick fix is to simply delete the HETNAM lines in the PDB - Rosetta *should* be able to read them in then, though I'm not sure if having a user-provided type which has the same three letter code as a standard protein residue might be an issue.

Wed, 2018-02-21 12:37

rmoretti

Top

Many thanks for your reply.

I'd actually just logged on to say that I've worked out that I can rescore the PDB files using the enzyme_design application using the -enz_score option, so the following works for me:

opt/rosetta_bin_linux_2017.08.59291_bundle/main/source/bin/enzyme_design.static.linuxgccrelease \
-database /opt/rosetta_bin_linux_2017.08.59291_bundle/main/database \
-out:file:o scorefile.tsv
-out::overwrite
-in:file:l pdb.list
-extra_res_fa SAM.params
-extra_res_fa SAX.params
-resfile resfile
-enzdes:cstfile constraint.cst
-enzdes:enz_score

Does that look correct to you?

Thanks for the information regarding running enzyme_design under MPI - I'll try using the multiple separate processes as you suggest.

Thanks again,

Jens

Fri, 2018-02-23 07:34

linucks

(Reply to #4)

Top

I don't see any issues off-hand with that command line.

Fri, 2018-02-23 07:58

rmoretti

(Reply to #5)

Top

Great - thank you!

Fri, 2018-02-23 08:08

linucks

Search form

You are here

Unrecognised resiudues when rescoring PDB files generated by enzyme_design under MPI