You are here

Unrecognised resiudues when rescoring PDB files generated by enzyme_design under MPI

5 posts / 0 new
Last post
Unrecognised resiudues when rescoring PDB files generated by enzyme_design under MPI


I'm running the enzyme_design application using MPI and in the scorefile that is produced, the 'description' field isn't unique across the processors, so if I run on 10 processors, I get 10 PDB files named mypdb___DE_10.pdb. Firstly, is there a way to ensure that the PDB files have unique names?

As I can't correlate the scores in the scorefile to the individual PDB file, I've renamed all my PDB files to give them unique names and am then trying to use the score_jd2 to generate a scorefile. I'm using the following flags:

/opt/rosetta_bin_linux_2017.08.59291_bundle/main/source/bin/score_jd2.default.macosclangrelease \
-database /opt/rosetta_bin_linux_2017.08.59291_bundle/main/database \
-in:file:l pdb.list \
-out:file:o rescorefile.json \
-scorefile_format json \
-extra_res ../SAM.params \
-extra_res ../SAX.params \
-ignore_unrecognized_res \

However this generates the following error:

protocols.jd2.PDBJobInputter: pushed 1m6e_scaffold__DE_98.pdb nstruct index 1
protocols.jd2.PDBJobInputter: pushed 1m6e_scaffold__DE_99.pdb nstruct index 1
protocols.evaluation.ChiWellRmsdEvaluatorCreator: Evaluation Creator active ...
protocols.jd2.PDBJobInputter: PDBJobInputter::pose_from_job
protocols.jd2.PDBJobInputter: filling pose from PDB 1m6e_scaffold__DE_1.pdb
core.chemical.GlobalResidueTypeSet: Finished initializing fa_standard residue type set.  Created 546 residue types
core.chemical.GlobalResidueTypeSet: Total time to initialize 0.820689 seconds.
core.import_pose.import_pose: File '1m6e_scaffold__DE_1.pdb' automatically determined to be of type PDB
core.chemical.ResidueTypeFinder: No ResidueTypes remain after filtering by ResidueType base name: 'SAX'

ERROR: No match found for unrecognized residue at position 1
Looking for lower-terminal residue with 3-letter code: MET
ERROR:: Exit from: src/core/io/pose_from_sfr/ line: 433

So it's firstly falling over on the MET residue, even though I'm passing the -ignore_unrecognized_res  flag, and also looks like it's not reading the information on the ligands that I'm passing with the -extra_res flag.

Rosetta must be able to score these PDB file as it generated them with a scorefile, so can anyone tell me what I need to do to recreate the scorefile?




Post Situation: 
Tue, 2018-02-13 03:41

Unfortunately, the enzyme_design application doesn't actually support MPI. It's on an older JobDistribution system which doesn't batch things out across MPI. As such, there's no coordination between the mulitple processed being run -- they overlap in names because each process thinks they're the only ones running.

But enzyme_design is a "trivial parallel" application, anyway. Each output structure is completely independent of the others. So what you can do is lauch a number of different (single threaded) processes, and then use different values passed to -out::suffix or -out::prefix for each process to distinguish the output. Alternatively, if you do need MPI distribution, you should be able to replicate the enzyme_design protocol in RosettaScripts. (Take a look at the EnzRepackMinimize mover -

Regarding the unrecognized residue, it's a bit hard to say from the information you've given. It's apparently failing on the first residue in your PDB. Is that your ligand? Is it a protein residue you're using with constraints? What type should it be? Rosetta is apparently looking for a MET or SAX residue at that position. Is this accurate? What's your SAX.params file look like? Particularly, what are the NAME and IO_STRING lines like?

If I had to guess off-the-cuff, I think this might be related to a bug that's been fixed in recent weekly releases. Basically, in the output file there's HETNAM records. In certain cases (particularly if there's covalent interations to the residue), Rosetta can get confused by the HETNAM records it outputs. A quick fix is to simply delete the HETNAM lines in the PDB - Rosetta *should* be able to read them in then, though I'm not sure if having a user-provided type which has the same three letter code as a standard protein residue might be an issue.


Wed, 2018-02-21 12:37

Many thanks for your reply.

I'd actually just logged on to say that I've worked out that I can rescore the PDB files using the enzyme_design application using the -enz_score option, so the following works for me:

opt/rosetta_bin_linux_2017.08.59291_bundle/main/source/bin/enzyme_design.static.linuxgccrelease \
-database /opt/rosetta_bin_linux_2017.08.59291_bundle/main/database \
-out:file:o scorefile.tsv
-in:file:l pdb.list
-extra_res_fa  SAM.params
-extra_res_fa  SAX.params
-resfile resfile
-enzdes:cstfile constraint.cst

Does that look correct to you?

Thanks for the information regarding running enzyme_design under MPI - I'll try using the multiple separate processes as you suggest.

Thanks again,




Fri, 2018-02-23 07:34

I don't see any issues off-hand with that command line.

Fri, 2018-02-23 07:58

Great - thank you!

Fri, 2018-02-23 08:08