When scores reported by Rosetta can be compared?
For example, it is OK to compare two scores obtained for two variants of the same protein. Is it still OK to compare scores between two completely different structures that have the same length?
Rosetta scoring is only truly effective for ranking a series of models of the same length (but not necessarily the same sequence). In other words, it is best for comparing models produced within a run by multiple nstruct.
It is never safe to compare raw scores between structures of different lengths. The score magnitude tends to increase with size. So, a pose of 100 residues and a pose of 200 residues might have scores of -236 and -452, respectively. You can normalize this by number of residues to iron out some of the problems.
It is sort-of-okay to compare scores of models, even if they are different lengths and different folds, so long as they have had the same freedoms applied. In other words, if both have been fully relaxed and minimized, then comparison of their scores (normalized by length) will carry SOME information. Be wary.
I have a related question, referring to the RosettaDock Webinar FAQ:
> Q: Is there a way of comparing affinities between ligands in Rosetta?
> The raw docking score doesn't do a very good job at predicting affinities and is not appropriate
> for comparisons of different ligands. Rosetta -interface mode does predict delta delta G of
> binding between various ligands. However, be aware that calculating small changes in binding
> affinities is a very hard problem and the predicted changes in energies will not give a 100%
> correlation (typically ~30-60% correlation) to experimental affinities.
Is your answer the same-kind of idea, that docking program outputs *are* in fact approximations and making one of those graphs comparing affinities vs. experiments should be treated carefully? ...and what did -interface mode refer to originally?
I'm not clear on what -interface mode was. It was a mode in Rosetta 2.x; I used it briefly in fall 2006 and haven't touched it since. I think it was for calculating ddGs of binding for interface mutations at protein-protein interfaces; so by "different ligands" they mean essentially the same structure with small sequence changes, not different binding partners. There is a paper or two on ddGs in Rosetta3 being prepared by the community.
It's perfectly reasonable to make a graph of predicted versus experimental ddGs, but your correlation will top out at .6-.7, depending on a variety of factors. (We are of course working towards unity!)
All rosetta outputs are approximations because our energy function isn't reality's energy function.
Therefore, presumably to do the same kind of thing in Rosetta3 -
I would, say, run a docking or refinement on a set of these mutations (or adapt the alanine scan tutorial) and modify some of the "-ddG" and "-scoring" options? If I wanted to maximise said correlation between ddG predictions, rather than the accuracy of the reported pose, which set of scoring weights should I use?
The weights for optimizing ddG correlation haven't been reported yet - they're what's in those papers. Try contacting Liz Kellogg in the Baker lab.