First of all thanks you for the replies to other post as that has helped me to progress on using Rosetta.
I have generated structures using enzyme design but I am struggling to understand the energy scores.
First of all, I prepared my input enzyme using rpk_min and used the structure with the lowest score for redesigning the binding interface to a ligand that normally would not bind. A score file is generated containing energy terms and there is also the same energy terms in the minimised structure. the total score are the same which is expected. This is however not the case with the output of the design.
Based on the energy score in the generated pdb, there is significant change in energy score from the input, which is not the case with energy score from the score.out.
Why are these output energy values different and which one should I use to compare select structures? Also what is the unit of this energy terms.
Input score for minimised structure: total score -369.617
# All scores below are weighted scores, not raw scores.
label fa_atr fa_rep fa_sol fa_intra_rep fa_elec pro_close fa_pair hbond_sr_bb hbond_lr_bb hbond_bb_sc hbond_sc dslf_ss_dst dslf_cs_ang dslf_ss_dih dslf_ca_dih atom_pair_constraint coordinate_constraint angle_constraint dihedral_constraint rama omega fa_dun p_aa_pp ref chainbreak total
weights 0.8 0.4 0.6 0.004 0.416667 1 0.8 2 2 1.3 1.3 0.5 2 5 5 1 1 1 1 0.2 0.5 0.4 0.5 1 1 NA
pose -613.684 50.322 282.618 1.19148 0 1.79555 -28.591 -74.5601 -42.9051 -17.2338 -23.5157 0 0 0 0 0 0 0 0 -13.5164 41.381 117.599 -16.2572 -34.26 0 -369.617
Values from score.sc file: total score -369.617
SCORE: total_score angle_constraint atom_pair_constraint chainbreak coordinate_constraint dihedral_constraint dslf_ca_dih dslf_cs_ang dslf_ss_dih dslf_ss_dst fa_atr fa_dun fa_elec fa_intra_rep fa_pair fa_rep fa_sol hbond_bb_sc hbond_lr_bb hbond_sc hbond_sr_bb omega p_aa_pp pro_close rama ref description
SCORE: -369.617 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 -613.684 117.599 0.000 1.191 -28.591 50.322 282.618 -17.234 -42.905 -23.516 -74.560 41.381 -16.257 1.796 -13.516 -34.260 input_only_enz_0001
After running enzyme design, I get structures with energy terms at the end of each file and a score file that has energy terms for all generated structures.
But the energy terms and values in the score.out and generated pdbs do not match. I use the one in the score.out file to select structures, but I do not know why it does not match and i do not have a reference for comparing whether the redesign has improved energy score of enzyme or not. find below the output data
Score from generated pdb: total score = -47.1571
# All scores below are weighted scores, not raw scores.
label fa_atr fa_rep fa_sol fa_intra_rep fa_elec pro_close hbond_sr_bb hbond_lr_bb hbond_bb_sc hbond_sc dslf_fa13 atom_pair_constraint coordinate_constraint angle_constraint dihedral_constraint rama omega fa_dun p_aa_pp ref chainbreak res_type_constraint total
weights 0.8 0.44 0.75 0.004 0.7 1 1.17 1.17 1.17 1.1 1 1 1 1 1 0.2 0.5 0.56 0.32 1 1 1 NA
pose -602.881 58.1044 361.274 1.17461 -60.6706 0.265734 -42.6404 -25.431 -17.1365 -24.332 0 23.4042 0 12.7602 118.854 -12.4155 34.6583 151.93 -12.9141 -11.1606 0 0 -47.1571
followed by details of energy terms for each residue then ends with
EY1_connectP1_158 -5.10689 0.333896 6.98374 0.0652896 -0.408197 0 0 0 -0.182143 -1.66857 0 11.7021 0 6.38009 59.4271 0 0 0 0 0 0 0 77.5264
EY1 is the ligand.
Scores from score.out file: total score = -202.18
total_score fa_rep hbond_sc all_cst tot_pstat_pm tot_nlpstat_pm tot_burunsat_pm tot_hbond_pm tot_NLconts_pm tot_nlsurfaceE_pm tot_total_charge tot_total_pos_charges tot_total_neg_charges tot_seq_recovery SR_1 SR_1_total_score SR_1_fa_rep SR_1_hbond_sc SR_1_all_cst SR_1_hbond_pm SR_1_burunsat_pm SR_1_pstat_pm SR_1_nlpstat_pm SR_2 SR_2_total_score SR_2_fa_rep SR_2_hbond_sc SR_2_all_cst SR_2_hbond_pm SR_2_burunsat_pm SR_2_pstat_pm SR_2_nlpstat_pm SR_3 SR_3_total_score SR_3_fa_rep SR_3_hbond_sc SR_3_all_cst SR_3_hbond_pm SR_3_burunsat_pm SR_3_pstat_pm SR_3_nlpstat_pm SR_4 SR_4_total_score SR_4_fa_rep SR_4_hbond_sc SR_4_all_cst SR_4_hbond_pm SR_4_burunsat_pm SR_4_pstat_pm SR_4_nlpstat_pm SR_5 SR_5_total_score SR_5_fa_rep SR_5_hbond_sc SR_5_all_cst SR_5_hbond_pm SR_5_burunsat_pm SR_5_pstat_pm SR_5_nlpstat_pm SR_6 SR_6_total_score SR_6_fa_rep SR_6_hbond_sc SR_6_all_cst SR_6_hbond_pm SR_6_burunsat_pm SR_6_pstat_pm SR_6_nlpstat_pm SR_7 SR_7_total_score SR_7_fa_rep SR_7_hbond_sc SR_7_all_cst SR_7_interf_E_1_2 SR_7_dsasa_1_2 SR_7_hbond_pm SR_7_burunsat_pm description
-202.18 58.10 -24.33 155.02 0.64 0.68 37.00 147.00 32.00 3.27 -1.00 20.00 21.00 0.32 12.00 -0.91 0.56 0.00 0.02 2.00 0.00 0.78 0.86 18.00 -1.94 0.18 -0.79 7.83 4.00 1.00 0.76 0.84 18.00 -1.94 0.18 -0.79 7.83 4.00 1.00 0.76 0.84 18.00 -1.94 0.18 -0.79 7.83 4.00 1.00 0.76 0.84 19.00 -0.92 0.65 -0.38 64.79 2.00 1.00 0.85 0.79 129.00 -0.58 0.12 -0.80 4.87 3.00 0.00 0.72 0.76 158.00 -0.86 0.33 -1.67 77.51 -1.73 0.62 5.00 3.00 input_docked_ligand__DE_253
Below is my flag file
-packing:soft_rep_design #-packing:linmen_ig 10 #-enzdes::final_repack_without_ligand
Looking forward to your assistance.
It looks like there's an energy function mis-match between your repacking step and you enzdes step. (Take a look a the weights line in your PDBs, and observe that they're different)
For your minimized structure it looks like you're using the "ligand.wts" scorefunction, as internally modified by the rpk_min application. For your enzdes runs you're using the talaris2013 scorefunction for the PDB output. In the enzdes output file, the total_score is the total score *without* the constraint energies, whereas in the PDB file, the score is with the constraint energies.
As you're using a weekly release, I'd recommend just using the talaris2013 weights for your design. If you're going to use the rpk_min application, use "-score:weights talaris2013_cst". But actually, I'd recommend using the constrained relax protocol of Nivon et al. instead ( http://www.ncbi.nlm.nih.gov/pubmed/23565140 https://www.rosettacommons.org/docs/latest/preparing-structures.html ) It should work better for design applications, as it also allows backbone movement.
Regarding interpreting scores, the scores aren't in any (externally) consistent unit. They should be internally consistent within the same protocol and same weights/scorefunction, but don't correspond to an external value like kcal/mol. The unit is often referred to as an "REU" ("Rosetta Energy Unit"), but that unit is not necessarily consistent between scorefunctions, settings and protocols.
For enzyme design purposes, I'd recommend focusing on the values in the score.out file. It's not a case that only a single value is use - look at all of them. The key in design is not so much identifying the structures which are best, but instead identifying and throwing out those designs which are bad. Filter structures out which have bad ligand scores, which have bad constraint scores, which have bad total scores, etc. depending on what parameters you think are critical (see https://www.rosettacommons.org/docs/latest/enzyme-design.html for a description of terms). Normally I set filters based on relative thresholds (e.g. "better than the median" or "top 10%") and filter out structures that fail at least one of those thresholds. Then I rank based on the metric that's most critical to design. (Usually ligand binding energy, but could be constraint geometry if that's what you're after.) Then go down the list, examining each structure to build a set of diverse designs which score well and don't have any obvious problems when examined visually. Test them in the lab, realize I missed a critical feature, then go back to the computer to re-design/re-filter.
thanks very much for your swift reply. That is very encouraging.
Actually I realised that my constraint values were high from 134 to 600 for 250 structures. i set constraints 7 blocks of constraints to provide specific interactions which I know are present in the mechanism to hold the phosphate in position. This therefore makes my all_cst score to be high. I was wondering if this is a bad idea, given that the paper on de novo enzdes recommended cst to be <1.
On this basis I sorted the total score, biding energy, and cst score and took the top 20% of each category, then picked the structures that had all 3 properties. I had 3 structures. I aslo noticed very slight variation in cst and total score in the top 20%, while te variation was much wider in he binding energy. Hence i also took the top 10% of structures with best binding energy, this gave me another 4 structures.
let me know what you make of this approach of selection.
100+ constraint energies are very high. I'd recommend opening up one of the structures in your favorite structure viewing program (e.g. PyMol) and using the measurement tool to look at the actual geometries of the interactions which you are constraining, rather than just the constraint energies. By looking at the geometries of "real world" examples, you can get a better sense if you're over constraining your system or not. Having high constraint energies normally means that Rosetta cannot simultaneously satisfy all the constraints you have put in. Usually that means you're being too severe in your definition of constraints. Either you have interactions that you don't really need/want, or the window on the range of allowable geometries is too narrow. Take a look at the structures, and see how the actual geometries match up with the range permitted by your constraint files, and see if and how you should adjust the settings on your constraints.
I wouldn't take structures with 100+ constraint energies just because they're the lowest constraint energy structures. I'd instead reevaluate what I'm looking for, and either re-run the protocol with a more realistic, loosened geometry range, or I would say that I really do want that restricted of constraint geometries and look at other techniques (different starting structures perhaps) which might be able to give me the tight geometries which I need.
Your overall approach (once you fixed the constraint issue) sounds reasonable, but I'd recommend looking for more than just 3-7 structures. Instead I'd typically shoot for a few dozen structures (depending on how many I want to test in the end), which I would then examine one-by-one in something like PyMol, to check if things look good or if there is something obviously wrong. Normally even in good scoring structures there are things which I can identify visually as artifacts and aberrations. I would then either throw these structures out manually, or go back and adjust my filtering procedure to eliminate such defects.
I have prepared my structures using the fast relax instead of the rpkmin. I get this error which I think is linked to this change.
What do I need to change to enable the design process to run properly. It does run on Mac without any issues but not on the cluster which is linux based.
ERROR: Unable to open weights/patch file. None of (./)/Applications/rosetta_src_
2015.05.57576_bundle/main/database/scoring/weights/ligand.wts or (./)/Applicatio
gand.wts or /Applications/rosetta_src_2015.05.57576_bundle/main/database/scoring
ERROR:: Exit from: src/core/scoring/ScoreFunction.cc line: 3066
ng const&, int, std::string const&, int)+0x107) [0x2b5db423a087]
hts_file(std::string, std::string)+0x710) [0x2b5db21d7310]
tion::add_weights_from_file(std::string const&)+0x55) [0x2b5db21d7795]
_docking::LigandBaseProtocol::make_tweaked_scorefxn(std::string const&, bool, bo
ol, bool)+0x181) [0x2b5dac29d6f1]
I think I found the problem. I was stupid and forgot to change the database link on the flag file when transiting from Mac to cluster.
Hi thanks for your comment.
I have relaxed the constraints and have noticed that the constraint energy has reduced to around 10