what's the difference between recovery and sequence identity
Hi ALL,
When I read the rosetta paper, there are terms call residues recovery rate and sequence identity. What's the differences between these terms? Thank you!

Thu, 2012-04-26 05:46
Lindsay

Sequence recovery means, "If you allow Rosetta to choose the sequence, what fraction of the time does it choose the native sequence?"

Rotamer recovery further asks, "how accurately is Rosetta predicting the rotamer chi angles?"

Thu, 2012-04-26 06:29
smlewis

But the sequence identity (if design 100 sequence for each protein in the dataset and do the average), from my understanding,the sequence recovery should == sequence identity.
----Do you mean the whole sequence? If so, rosetta can somehow has about 30% change to design a sequence identity 100%? I don't think so....
I think it should be the residues. If so, it is the sequence identity. For a design sequence, the residue recovery is about 30-40%, so the correct design residue # = 30-40% *Length=sequence identity * Length. Am I right?

Thank you

Thu, 2012-04-26 07:08
Lindsay

Sequence recovery and sequence identity are synonyms as far as I know. Your interpretation of 30% is correct.

There is a big difference lurking in some papers (Rosetta and not-Rosetta) that has to do with preprocessing of data before calculating sequence recovery. Rosetta's sequence recovery of a Rosetta-produced structure (say, a relaxed crystal structure) is commonly 10 or 20 percentage points better than recovery on a direct-from-PDB input. Another issue is that whenever you see a paper comparing multiple design methods via sequence recovery, it's fair to bet that the authors of each original method could coax better recovery out of their protocol (for example, in Rosetta's case, with the -ex# extra rotamers flags) than is seen in a comparison paper.

Thu, 2012-04-26 07:18
smlewis