Hi! I am very new to Rosetta, and I was trying to find out a way to model the conformation of a 7aa-9aa long peptide inserted into a protein with a known crystal structure. By reading some documentation I thought loop modeling / remodeling would be the way to go. But there are quite a few methods available for loop modeling (CCD, KIC, NGK, generalized KIC etc.) and none of them specifically mentioned that it is suitable for the kind of case that I'm dealing with (a peptide insertion to a protein with known structure). Which method should I choose? How can I judge whether one method is better than the other?
By the way, do you think modeling of this kind of insertion can be precise (rmsd < 1.5A)? I found a web page showing the results of some loop modeling benchmark runs (https://guybrush.ucsf.edu/benchmarks/benchmarks/loop_modeling), but I'm not sure if my case is comparable to those.
Nobody has benchmarked the "insertion" case specifically.
CCD and KIC/NGK are meant for the general case protein loop modeling. NGK is generally regarded as "best" and is probably better than KIC directly, but they each will give different results so it is reasonable to try all of them.
GenKIC is rigged for more broad loop closure problems (if your loop is part of a cyclic peptide, proceeds through a noncanonical protein residue, etc). It will perform probably at KIC/NGK levels but is more than you need here for a "normal" protein chemistry.
Rosetta still lacks a reliable simple tool for insertions of this type. You should be able to cobble something together with a RosettaScript that performs the desired insertion immediately followed by loop modeling.
Thanks a lot for the explanation, smlewis! So I guess in my case NGK and CCD will be must-tries.
Currently I'm trying putting the inserted residues in the remodel:blueprint file, and use "PICKAA" to specify their identity. Is this a proper alternative to do what you said? ("a RosettaScript that performs the desired insertion immediately followed by loop modeling")
Remodel is a great tool for the purpose. I don't know how to use it but I am pretty confident it gets used for this kind of thing.
You might want to mentally frame this as a homology modeling protocol. That's sort of what you're doing. You're attempting to model a protein variant based on the structure of a homolog which is missing a 7-9 aa region.
In fact, the traditional way of homology modeling in Rosetta was threading followed by loop modeling. The newer way is RosettaCM. The big new feature with RosettaCM is multi-template modeling. That is, it can combine structures from multiple homolog structures (or homolog structure fragments). This would be particularly useful to you if that 7-9 aa insertion comes from some other protein which has a structure. It doesn't even necessarily need to be completely the same structure - as long as the region with the 7-9 aa portion and some flanking region matches, you can "steal" the structure of the loop with RosettaCM.
If you don't have a structure with the loop, then not necessarily any benefit of using RosettaCM versus loop modeling. (Though the Baker lab has found that - for general homology modeling - RosettaCM with a single template actually does a better job than the traditional loop-modeling based single template homology modeling.)
Regarding accuracy, typically we say that loop modeling works best for loops under 10 aa. Your insertion is within that range, but once you start including the already-existing portions of the protein which would be flexible under the insertion, you're starting to push into the questionable region. You may be okay, so long as there's decent internal structure of the loop, or if it makes interactions with the rest of the protein which may stabilize the conformation.
Thanks for the detailed explanation! Sorry for the delayed reply (I thought the forum would send me an email if there were any updates).
Unfortunately, the insertions don't have structures themselves. But given the Baker lab's experience, I would love to try RosettaCM a try. Thanks!
When I'm doing RosettaCM, is there a simple way to fix the majority of the residues so that they have the same structure as the single template? From a prior CryoEM structure of a similar insertion at the same site (but totally insertion peptide sequences) we knew that an insertion at this site of the protein won't affect its "global" structure.
- The "CoordinateConstraint" or "LocalCoordinateConstraint" in constraint file seems to be a relevant option for this problem. Is that correct? If so, how would you write do this for a few hundreds of residues? (Sorry this sounds too detailed, but I couldn't find an example in this documentation: https://www.rosettacommons.org/manuals/archive/rosetta3.5_user_guide/de/d50/constraint_file.html ).
> how would you write do this for a few hundreds of residues?
Python script. You'll probably need to write the script yourself, though.