I am trying to use the RosettaRemodel Application to insert three residues into a structure. I want the backbone structure to have a loop structure. However, for some reason I am getting a warning in the output file. Originally there was a residue within the structure where these residues are but I manually removed them. I removed this residue because because it wasn't an amino acid but a molecule. My overall goal is to delete the molecule from the pdb and insert three amino acids in its place. So the total pipeline goes something like this. I removed the molecule in the structure, I renumbered the structure (so it starts from 1 and is continuously numbered), and then I ran RosettaRemodel on the pdb structure. I kept the chain id since it is only a single chain.
The blueprint file I used looks like:
57 T .
58 L .
59 V .
60 T .
61 T L PIKAA L
62 L L PIKAA L
0 x L PIKAA T
0 x L PIKAA Y
0 x L PIKAA G
63 V L PIKAA V
64 Q E PIKAA Q
65 C .
66 F .
My flags look like:
-database -database path_to_database
The warning I get is:
protocols::checkpoint: (1) Deleting checkpoints of LoopMover
protocols.forge.remodel.RemodelMover: (1) design_refine: final chainbreak = 0.337903 at 66
protocols.forge.remodel.RemodelMover: (1) WARNING: DESIGN REFINE FAILED TO CLOSE STRUCTURE!!
protocols.forge.remodel.RemodelMover: (1) Remodel poses remaining from original run: 0
protocols.jd2.MPIWorkPoolJobDistributor: (1) Slave Node 1: Finished job successfully! Sending output request to master.
protocols.jd2.MPIWorkPoolJobDistributor: (0) Master Node: Received message from 1 with tag 30
I am not sure why I am getting this warning and what exactly it means. Does this mean that the design failed and that the inserted loop region could not be closed? I tried designing residues on either side of the inserted region to help but for some reason the warning persists. Am I doing something incorrectly? Any help would be appreciated.
I think this is (somewhat) expected. Depending on your structure, Remodel won't always be able to close the loop with the inserted residues. It recognizes that what it's trying to do isn't going to work, and throws out that trash model. This is why you typically don't run with just a single output structure. What you want to do is setup a run for a large number of outputs structures by using the -nstruct option. Pass the number of output structures to this flag.
Try re-running the procotol with -nstruct set to something like 10 or 20, and look at the number of successful output structures. If more than half the runs give you a successful output (and the output you get looks decent) - say 11 out of 20 structures are successes and only 9 out of 20 are failures - I wouldn't worry too much about the failed runs. Rosetta algorithms are a stochastic process, and it's not expected that all the results will be good/usable.
I'd only start to worry if the vast majority of the output structures fail, or if you're not getting any output at all. In those cases, there might be something about your system which isn't working with things.
Another option is you can try using alternative protocols for pose relax/refinement. You might want to try the flags "-remodel:use_pose_relax" or "-remodel:use_cart_relax" (I'd recommend trying the later if your verson of Rosetta supports it.) These use different protocols for the design refine step, so might work in closing the inserted loop in cases where the default protocol doesn't work. -- But be sure to visually check the quality of any output structures, as you might be getting structures which pass the internal quality filter, but are still not very physical.
So I tried passing -nstruct 10 with my original protocol. I still did get those warnings. However, when I tried -nstruct 20 and used the "remodel:use_pose_relax" flag I did not get the warnings. The scores also looked more responsible. Before I was getting extremely large positive values. I have done a quick visual inspection of a few of the output structures and it looks like the loop indeed did close. I do have question. What is the difference between the orginal refinement protocol versus the "-remodel:use_pose_relax" or the "-remodel:use_cart_relax" flag protocols? I did not see the "-remodel:use_cart_relax" flag in the documentation. Also should I be worried that the original protocol failed for all 10 of the structures?
Thanks for your help.
The difference with/without the flag is that different refinement protocols are invoked. Without any flags, Remodel will use just the standard loop closure proceedure (CCD or KIC). If you use the "-remodel:use_pose_relax" flag, it will use the FastRelax protocol instead. Whereas the standard loop closure protocols work almost exculsively on the loop region, the FastRelax protocol should allow movement of the whole protein. In this way, loops that can't be closed due to the odd location of the takeoff and landing residues might be able to be closed, as FastRelax allows the takeoff and landing residues of the loop to move to a better location, and the rest of the protein to adjust to accomidate.
The "-remodel:use_cart_relax" is similar, but instead of doing the full FastRelax protocol, instead it just does a Cartesian space minimization (normal minimization and Relax in Rosetta operates in torsion space - basically, by juust rotating bonds). Cartesian space minimization is a relatively newer technique in Rosetta, so that may be why it is not in the documentation. Like FastRelax, Cartesian minimization will allow the rest of the protein to move to accomidate the loop, but it will also allow the internals of the loop to flex more (allowing non-ideal bond lengths and angles), which can help to close a loop which doesn't want to be closed with perfectly ideal structure.
If you're getting good structures with either of theflags, I wouldn't worry too much about not getting good structures without them. It just may be that the loop you're remodeling is slightly strained in the input conformation, and you need that extra "flex" in the rest of the protein in order to get things in a conformation which allows you to have a well closed protein loop structure.
Thanks for explaining that. I have one last question. What is the difference between the regular Rosetta output structures from the job distributor and the structures that are outputted from the RosettaRemodel file. For example, if I use the two flags
in my flag file I get 25 structures as output. These structures are numbered 1.pdb, 2.pdb ....25.pdb. However, if I use the flag "-nstruct 25", I also get 25 structures but these structures are named input_structure_0001.pdb, pdb_name_0002.pdb, ..., input_structure_0025.pdb. What is the difference between these two (i.e., trajectories and nstructs)? In the RosettaRemodel documentation it says that RosettaRemodel handles it's own file I/O and to not use the XXX_0001.pdb file if num_trajectory is greater than 1. So I am confused about how trajectories relate to the output structures and how the output structures from RosettaRemodel related to the output structures from the Rosetta job distributor.
The current standard job distributor for Rosetta ("JD2") doesn't allow for any cross-talk between output structure runs. That is, each output structure is created independently, and there is not opportunity for clustering, picking the "best" models, or filtering based on aggregate information about multiple runs. These are the results you get with the -nstruct flag (The *_0001.pdb, *_0002.pdb ... ones): independent, once-through runs of the protocol.
This has certain drawbacks for Remodel usage, for example if you do want to do clustering or filtering. To accommodate this use case, RosettaRemodel has a simple output processing capability, where it will manage multiple runs and then do clustering/filtering/selecting on those structures to return only a subset. These are the 1.pdb, 2.pdb, etc. files. Note that this processing is rather simple, so it doesn't have things like MPI support the JD2 job distributor does.
As I read things, it looks like Remodel should pass back the best structure from its sub-loop to the JD2 job distributor, so I believe that doing "-num_trajectory 1 -nstruct 25" and using the input_structure_0001.pdb, ..., input_structure_0025.pdb files should be more or less equivalent to "-num_trajectory 25 -nstruct 1" and using the 1.pdb, 2.pdb ....25.pdb files - assuming you're not using any of the additional options to do clustering or filtering, etc. (That is, if you do a "-num_trajectory 1 -nstruct 1" run, the two structures should be pretty much the same.)
The reason the documentation recommends using the internal Remodel output files instead of the standard JD2 ones is because it allows you to do things like cluster filtering and top structure selection.
I get it now. Thanks for much!
I have a similar question regarding the remodel output. Your explanation above makes it clear which structure stands for what.
But in my case I would like to know which of the structures have the best scores. It is stated in the protocol that Once Accumulator/Clustering is done, due to sorting done internally, the structure with lowest energy, according to score12, is output as XXXX_0001. I presume they are referring here to the j2D nstruct pdbs. meaning BL_0001.pdb in my case will correspond to the best structure.
What if I used the clustering method, is the numbering still reflecting the score ranking, meaning does ck_001.pdb in my case stands for the best structure?
Finally in the potocol and above, I was advised to use the score that are labelled as 1.pdb, 2.pdb. My computer seems to give me ck_001.pdb, etc instead. I am assuming this is the same thing as 1.pdb. Is my assumption correct?
Thanks in advance