Dear Rosetta Users,
I have three questions regarding the RosettaLigand application.
1) As far as I can understand, the total_score is a measure of the overall quality of the models generated by the docking run. The lower the score value and the better the model.
In my limited experience I've always had negative score values, but for a particular protein target I'm always getting positive values (even with different pockets in the same structure).
The total_score must be always negative?
A positive score means that the interaction is unlikely?
2) I'd like to study a protein-ligand interaction for which I have no information about.
The only thing I know is that the protein has various binding pockets that could accomodate the ligand with the same probability.
I've docked the ligand in each of the pockets and I want to compare the results of each individual docking run.
Is it reasonable to identify the best pocket by comparing the energy (interface_delta_X) vs. total_score plots?
3) In order to obtain the Rosetta’s best prediction for the ligand-docking experiment, I've read different approaches.
For instance, in Combs et al. 2013, the top 10% of models by energy (interface_delta_X) is considered. In other cases the the top 10% of models by total_score is selected. Then, regardless the parameter considered, the top 10% of models is clustered and the best energy model(s) from the largest cluster(s) is considered as putative binding mode(s).
There is a reason to prefer the top 10% energy models over the top 10% score models or vice-versa?
Moreover, I'd like to know if someone is aware of a more recent (and perhaps 'standardized') protocol to select the best model.
Thanks in advance!
The total score is indeed the main measure that's optimized during docking, but it includes not just the protein-ligand interaction, but also the internal energy of the protein. This is often what can cause positive values - if there's some poor interaction within the protein, it can cause a positive energy that's not changed on docking (because docking mostly doesn't change the internal protein interactions). That said, you may want to pre-optimize your apo state (e.g. as in Nivon et al.) to get rid of those poor scoring regions. While the docking process mostly doesn't change the protein, there is some protein flexibility in a limited region around the ligand, and if this radius happens to include the bad regions of the protein you can get false signals during docking as the program will interpret the improvement in the protein-protein interactions as resulting from a good ligand-protein interaction.
While we optimize total score, it often is a bad indicator of near-native conformations. It's potentialy good for throwing out really bad models, but there's too much noise due to the protein-protein interactions in order to rank the best structures. Instead, you typically look specifically at the protein-ligand binding energy, which is what's reported as interface_delta_X in the scorefile. There the absolute value is more meaningful, and something that's scored as a positive is something Rosetta definitely doesn't think will bind. (That's rare, though. Typically you'll get some sort of negative binding energy, even if it's very weak.)
For 2), the interface_delta_X should be comparable for different pockets to the same ligand. So you should be able to throw the docking to the different pockets with the same ligand together and do the post-analysis like you would with a large number of docking results to a single pocket. (I wouldn't necessarily rely solely on interface_delta_X to pick out the best binding pocket, but it can help you narrow things down to the few top-ranking candidates.)
For 3), there's probably not much difference in the proceedure as it sounds. The top 10% by total score cutoff is simply to throw out those structures which score poorly by total score (as mentioned above, this is basically the limit of what you would want to use the total score for, due to potential noise.) With the recent Transform-based docking proceedure (versus the older Translate/Rotate approach), for most docks it's rare that a structure scoring well by interface_delta_X is also going to have a really bad total score. Often you can just skip the first total score cutoff and go straight to interface_delta_X ranking, but there's certainly no harm in filtering by total_score first. Filtering by top 10% on interface_delta_X is mainly just a convenience to either limit the number of structures going into clustering, or to make sure you're only clustering the top scoring structures (for clustering algorithms which don't also look at the scores and treat all input models as equally weighted).
Note that you wouldn't necessarily want to take either the cluster size or the interface_delta_X scores as the single decising factor. Generally it would be recommended to look at a selection of the best scoring models and use whatever domain-specific knowledge you have, as well as your biochemist's intuition about protein-ligand binding to see if the proposed binding modes make sense.