You are here

How to judge docking success for two proteins without knowledge of native structure

2 posts / 0 new
Last post
How to judge docking success for two proteins without knowledge of native structure

If you are trying to dock two proteins, using a global search, with no knowledge of correct answer (native structure) , how does one judge a docking success?. When reporting benchmark results, for example, the native (or mock-native) structure is used to rmsd against, and the definition of docking funnel is based on I_rmsd, so judgement of docking success is always based on some known (native) structure (or correct answer). Starting with just two proteins arbitrarily place apart. All rmsd's (even if against starting or input structure) are then meaningless even for just a global low-resolution search. Is one restricted to simply best score then ?

We are generating 30,000 decoys with a global search using low-resolution, then doing using each of the 30,000 decoys to do a local-perturbation using high resolution and creating 1000 decoys for each of the original low-res decoys. We are just testing something we wish to do in more detail, and are actually using a complex with known structure for testing, but still would like to know, what is used to judge success, for comparison to whatever the general/standard protocol is used when the answer (native structure) is not known.

Excuse any naiveness on my part on the subject.

Thanks in Advance

James Snyder

Post Situation: 
Fri, 2015-07-10 14:27

One approach is to pick a likely structure and treat it as a mock native. That is, examine the results with the perspective of "if this *was* the native structure, does it look like the docking results were successful?" If the results behave as if the picked structure was native-like, then that lends evidence to the theory that the picked structure is native-like. If not, you can pick another structure as a "mock native" and re-examine the results. This is how you can use things like the score-versus-rmsd docking funnel to examine docking success when you don't have a true native for comparison.

More generally, what you want to do is to use a protocol that has been tested on known systems, and which has a process that results in native-like results for those systems where you know a priori what "native-like" is. You can then apply that same selection protocol to the unknown systems, and hope that the selection procedure which works for the benchmark systems will also work for the unknown systems. Read some of the other Rosetta protein-protein docking papers and pay attention to how they do the post-analysis/filtering.

Normally, what you're looking for is a model that has a low Rosetta energy. Single metric selection is sometimes tricky - scoring metrics are not always precise, so you can run into situations where you over-optimize the metric and get non-physical models with really good scores. Often it's good to look for structures which good by selected metrics, but aren't horrible on other metrics. (For example, look for low scores, but don't accept structures which have bad SASA burial metrics, or too many buried unsatisfied polars, or poor interface shape complementarity. - These sorts of metrics are reported by the InterfaceAnalyzer )

The other thing that people look for is clustering. Generally, native structures are in a wide, deep minimum of the energy landscape. So you're looking for structures where Rosetta can get a large number of similar structures in low energy conformations. If you get a single (or just a few) low energy structure that is structurally close that indicates a narrow minimum in the energy landscape, and it's unlikely that the structure is native-like - even if it's much lower in energy than other structures. (It could be that it reflects a defect in the Rosetta energy function instead.)

Tue, 2015-07-14 15:26