You are here

Cluster gdtmm vs. rmsd

3 posts / 0 new
Last post
Cluster gdtmm vs. rmsd
#1

Hi there,

I've been trying to find the definition of the cluster application's -gdtmm flag, and why we might use it over clustering by rmsd (no flag required).
All the examples I've run across use the -gdtmm flag, but again, I'm just not how exactly this is clustering.

Then, following that, when the flag -cluster:sort_groups_by_energy is used, how does it sort them? Does it label the first cluster as the one with the lowest average energy for all the structures in the cluster? It almost seems at odds with clustering by RMSD.

Any ideas?

Cheers,
Brett

Post Situation: 
Thu, 2012-04-19 08:19
brspurri

GDTMM seems to be dark magic used during CASP but nobody seems to know what it is, how it works, or who uses it...

Sun, 2012-06-24 14:24
smlewis

After asking some people in the know, here's my understanding of GDTMM.

It's a variant of the GDT metric (global distance test) used by the CASP structure prediction contest (see Zemla A. "LGA: A method for finding 3D similarities in protein structures." (2003) NAR 31(13):3370-4. http://nar.oxfordjournals.org/content/31/13/3370.long ). The basic algorithm is to iteratively align two structures under a set of different cutoffs, and then tally the fraction of residues which fall within the cutoffs. So you end up with scores in the range of 0.0 to 1.0, with 1.0 being a perfect match.

The benefits of this metric over RMSD is that RMSD is very sensitive to outliers. For example, if you have an almost perfect match, but have an unstructured tail that's 20 Ang different between the two models, that 20^2 differences is going to dominate the RMSD, even though the differences may be limited to a small fraction of the structure.

For CASP, there are typically two variants used GDT-TS ("total score"), which is the standard metric (I think the cutoffs are 1, 2, 4 and 8 Ang), and GDT-HA ("high accuracy" - cutoffs are something like 0.5, 1, 2, 4).

GDTMM is a Baker-lab specific metric, where the MAMMOTH alignment algorithm (MM = MAMMOTH) is used for the superposition (it should match GDT-TS in all other respects, though). You can get *slight* differences in GDT metrics based on which alignment algorithm you use.

Mon, 2012-06-25 12:17
rmoretti