i'm having some problems with 2 proteins that have 140 and 150 residues respectively.
By now i have generated only 1000 models (i intend to create 10000 in total), but i'm not having clusters at all :/
I read somewhere that the abinitio method (related to modelling) works on protein with less that 100 residues. Is that correct?
What can i do to overcome this problem?
I was thinking that (since i don't have any structure as a reference for my protein) on the total of 10000 models i'll select the structure with the lowest energy and i'll use it to perform a replica exchange molecular dynamic simulation, in order to be able to search in a more proper way the potencial energy surface of the respective protein.
What do you think about it?
Do you have any suggestion? maybe a different method/software?
Best regards to all of you :),
Carlos Navarro Retamal
You are here
problem with modelling of protein (140 and 150 residues respectively)
The normal size limitations that are quoted are typically for structures without any additional experimental information. If you have additional experimental information which can focus the search space, you can certainly extend the range over which you can be successful. If you have any sort of information about residue contacts, known secondary structure, residue orientation information from NMR, etc. incorporating that information into the simulation will greatly help your prediction. If there are any structures from homologous or closely related structures, they can be a real help, too.
The reason you fail with larger proteins is mainly because the possible search space increases tremendously. So it's not too surprising that a small amount of sampling is not yet very convergent.
I'm not sure that MD is going to help you all that much - you have the same large search space to cover, and MD isn't necessarily the most efficient way of sampling it. It would be good for increasing local sampling around the proposed models - but that presupposes you have a good indication as to which structures are close to the native. Without strong clusters in the ab initio output, that's somewhat questionable - you might even be missing a structure which is close to the native altogether.
If you wanted to increase local sampling, I might instead recommend some of the other coarse-grained sampling techniques from Rosetta, like loop remodeling modeling.
One thing you may want to try is to see if you can get decent clustering if you cluster not on the whole-structure rmsd, but instead on an rmsd of a "core" segment which might be more well folded. (If a portion of the protein is folded well, but other parts are still unstructured.) You may first want to try clustering by metrics like GDT ("gdtmm") instead of rmsd, as they're less sensitive to flexible loop regions. Consolidating the "core" and then later modeling the other, less well modeled parts on top of that core might be a way of reducing the search space enough.