I'm a postgraduate student, interested in protein structure prediction.
I noticed that the document said:
- Abinitio: max 150 amino acids are cosidered possible
So, I want to know, how to use rosetta to predict number of protein of amino acids more than 150 (e.g. 500)?
- Is it a good idea to divide it into several domain segmentations, and predict them along?
- If we do as above, how to determine domain regions, and how to assemble them to a full-length structural model?
Any help would be appreciated.
Yes, you're exactly correct. It's likely that the 500 aa protein is not a single cooperatively folding domain. Instead, it's possible that those 500 amino acids are divided up into several independently folding domains. (Or there's one independently folded domain in the midst of intrinsically disordered sections.) You can then predict the domains independently, and then attempt to combine the domain models later.
How best to divide up the sections is a bit of a black art. There are a number of different prediction methods up there. If I recall correctly, the approach used by the Robetta server is called "ginzu". I'm not sure exactly how to run it, though. As this is a strictly sequence-based approach, Rosetta proper isn't really involved. I'd do some literature searches to see what the state of the art is in domain parsing. (Really any approach can be used as a pre-processing step before feeding the split sequences into Rosetta.)
By the way, I'd highly recommend doing something like a BLAST search of your protein against the PDB databases. You might not have the entire protein with homology to an existing structure, but if you find a particular domain or subsection of the seqeunce with a decent homology to an already solved structure, you can use the more efficient homology modeling protocols in Rosetta to model those domains, leaving you a much smaller segment to have to model with ab initio methods.
If this is a eukaryotic protein, I'd also recommend running something like DISOPRED or another intrinsically-disordered structure prediction program. This will give you a better sense of where the secondary structure elements are and where potentially unstructured regions are. If there's significant segments of disorder, you can trim those out prior to predicting the structure, as the disordered regions probably will deliniate the independently folding domains. (Assuming something like a "beads on a string" model. If there's a disordered insertion into the middle of a domain it could be hard to tell.)
Hi, rmoretti, thank you for your reply. But for combine the domain models, I have no idea. Could you give me some advice?
It depends a bit on how you believe the independent domains come together, what sort of information you have relating them, and what you want to do with the final model.
Your best bet is if you know/suspect they have a relatively strong interaction with each other. In that case, you can do protein-protein docking with the domain predictions, and see if you can determine the interaction interface between the two. Once you have the two domains placed next to each other, you can use various loop modeling techniques to build the linker between them.
If it's more beads-on-a-string type setup, then there probably isn't anything as "the" combined structure, and attempting to model it is probably not a worthwhile effort, unless you have some sort of downstream application which needs an arbitrary combination of the two. (And if you do have such downstream application, the best way to build the combined model will probably depend on what that application needs.)
Hi rmoretti, your suggestion is very helpful.
Thank you very much!