This may be a really dumb question, but I am struggling to make sense of some data —so any input is welcome.
I was under the dogmatic impression that ref2015 is calibrated to approximate kcal/mol (and not kJ/mol) —1 kcal/mol is 4.2 kJ/mol.
However, playing around with curated experimental ∆∆G mutation datasets —the Bundell O2567 dataset (https://pubs.acs.org/doi/abs/10.1021/acs.jcim.0c00591) and the Frenz et al. 2020 (https://www.frontiersin.org/articles/10.3389/fbioe.2020.558247/full) dataset— I found the PyRosetta calculated values to be roughly 4.2 greater (median) than the correctly signed empirical values from these reduced ProTherm datasets. If I correct with this value my absolute errors drop as follows:
- ref2015: 1.71 (median) ± 1.1 (MAD) ==> 0.81±0.55
- ref2015_cart --> 1.80 ± 1.2 ==> 0.81±0.54
- beta_2016 --> 2.3 ± 1.2 ==> 0.80±0.53
This happens regardless of radius of minimisation and number of FastRelax cycles and any other mods. I know the Mean Absolute Error is the canonically reported value, especially by the ML folk, but there are some nasty outliers and the distribution is not a normal one, so I am opting for median. Mean gives a slightly higher value than median. The fudge scale that lowers the median of the absolute errors the most is basically the same as the median/mean ratio of the calculated over experimental.
This can be the result one of three things:
- My protocol is rubbish and its just coincidence — I do have a Pearson rho of only 0.4 and Rho is scale and shift invariant but outlier sensitive. So this is possible.
- Bundell's dataset has been erroneously converted twice — I do not have the ProthermDB dataset, becuase I have not sent off the signed copyright MOU thinggy, so I have not checked this.
- ref2015 is actually calibrated to kJ/mol — I made a pose in pyrosetta with two waters, relaxed and got -1.65 REU (2.9 Å D->A), which is consistent with a kilocalorie value. Likewise, the values of the ref fudge factors for the residues are not crazy high —which one'd expect if these were somehow calibrated to kJ/mol:
| | talaris2014 | ref2015 | beta_nov16 | |:--------|--------------:|----------:|-------------:| | weight | 1 | 1 | 1 | | ref_ALA | 0.773742 | 1.32468 | 2.3386 | | ref_ARG | -0.32436 | -0.09474 | -1.281 | | ref_ASN | -1.19118 | -1.34026 | -0.873554 | | ref_ASP | -1.63002 | -2.14574 | -2.2837 | | ref_CYS | 0.443793 | 3.25479 | 3.2718 | | ref_GLN | -1.51717 | -1.45095 | -1.0644 | | ref_GLU | -1.96094 | -2.72453 | -2.5358 | | ref_GLY | 0.173326 | 0.79816 | 1.2108 | | ref_HIS | 0.388298 | -0.30065 | 0.134426 | | ref_ILE | 1.0806 | 2.30374 | 1.0317 | | ref_LEU | 0.761128 | 1.66147 | 0.729516 | | ref_LYS | -0.358574 | -0.71458 | -1.6738 | | ref_MET | 0.249477 | 1.65735 | 1.2334 | | ref_PHE | 0.61937 | 1.21829 | 1.4028 | | ref_PRO | -0.250485 | -1.64321 | -5.1227 | | ref_SER | 0.165383 | -0.28969 | -1.1772 | | ref_THR | 0.20134 | 1.15175 | -1.425 | | ref_TRP | 1.23413 | 2.26099 | 3.035 | | ref_TYR | 0.162496 | 0.58223 | 0.964136 | | ref_VAL | 0.979644 | 2.64269 | 2.085 |
So I just wanted confirmation that the units are definitely calibrate for kcal/mol and not kJ/mol —as said a rho of 0.4 is rubbish...
The ref2015 terms were scaled to be nominally kcal/mol. That doesn't mean that any results you get from a Rosetta simulation will match up 1:1 with experimental results calculated in kcal/mol.
This is (obliquely) mentioned in the supplemental material of the original ref2015 paper (https://dx.doi.org/10.1021/acs.jctc.6b00819) (section Mutational ∆∆G calculation):
Scaling factors are introduced to fit the overall scale of estimated values to actual experimental free energies measured in kcal/mole. A least-squares fit determined a scaling factor for talaris2014 of 1.0/1.84 and for opt-nov15 of 1.0/2.94.
Note that that scaling factor is for that specific protocol. Depending on how you're simulating things, it may or may not apply for your protocol. You will likely (still) need to create a calibration curve if you wish to translate the relative Rosetta energies into comparable-with-experiment kcal/mol values.
A factor of 4.2 and any association to kJ/mol is entirely coincidental.
That is fantastic and completely explains my problem —thank you so much.