You are here

Rescoring regions of structures

3 posts / 0 new
Last post
Rescoring regions of structures
#1

I am working on structure prediction of a protein that is predominately unfolded and I have used rosetta (csrosetta actually) to do prediction. While my clustering looks good, ie conserved features in the protein are clearly visible, I can't get funneling plots due to loop regions ruining the score. It seems to have 5 unstructured residues at either end, and a 5 residue loop. I want to rescore structures based on the regions that are similar, such that when I plot rmsd vs score I have the best hope to see funneling for the folded region of the protein.
I have previously tried predicting shorter regions of the protein, but the same features are not observed as there seems to be some edge effect, where 3-4 residues at the N and C terminus are always unstructured.
Is there a simple was to score and exclude some residues (similiar to cluster:exclude_res)? Or should I clip my models to only the structured regions and rescore with score or score_jd2 or will this result in a chain break penalty?

Post Situation: 
Mon, 2013-07-22 04:43
erin_cutts

I wouldn't clip the structures and rescore them. If the core of the structure is good, but there isn't place to put a loop, the clipped structure wouldn't pick that up, while the full structure would indicate that.

I think it's the distance metric, rather than the score metric, that you should turn your attention to. As you're apparently not overly concerned about the structure of the loops, you may want to look into using GDT ( http://en.wikipedia.org/wiki/Global_distance_test ) instead or rmsd for your distance metric. GDT is used for scoring CASP for exactly this reason - rmsd is overly sensitive to loop regions which aren't that critical for the "structure" of the protein. Rosetta programs that deal with structural prediction should have an option which allows you to use GDT instead of rmsd as a metric.

For example, the cluster application ( https://www.rosettacommons.org/manuals/archive/rosetta3.5_user_guide/db/... ) takes the flag "-cluster:gdtmm" to cluster by GDT instead of rmsd. (The "MM" bit is a historical artifact of Rosetta.) The score application (but not the score_jd2 application) should output GDT metrics in the scorefile if you give it an -in:file:native.

Mon, 2013-07-22 10:43
rmoretti

Thanks for your help. The GDT option is very useful to help recognise different fold types, but it doesn't solve the whole problem.

Rosetta has a tendancy for putting small bits of structure in loop regions eg 3 resdiues of helix. This is not necessarily wrong, but results in a wide variety of energy scores, and when the loop and tail regions make up a reasonably quantity of the protein structure, this prevents any observation of funneling.

I went ahead and clipped the structures to that which have a well defined beta sheet, removing loops and the c and n terminal tails. After rescoring the clipped structure and calculating pairwise RMSDs based on the cluster centres found using -cluster:exclude_res, excluding the resiudes that were clipped, I can observe funneling.

I'm not sure whether this kind of analysis is necessarily fair, but as I am seeking structures to feed into further steps where the structure will be considered flexible or at least semi-flexible, including haddock docking and MD, any unfair treatment at this stage may resolve or be detected when it is more rigourously tested. Loop modelling is always challenging, and in this case, with out my protein's binding partner, is probably just guessing.

Tue, 2013-07-23 03:46
erin_cutts