Here's the source code of mutation_script.xml, part of the demo calculate_protein_protein_ddg included in Rosetta (see https://www.rosettacommons.org/demos/latest/public/calculate_protein_protein_ddg/README). I deleted the comments to make it shorter:
<dock_design> <SCOREFXNS> </SCOREFXNS> <FILTERS> <Ddg name=dg_wt threshold=1000 repeats=3 jump=3/> <Ddg name=dg_mut threshold=1000 repeats=3 jump=3/> </FILTERS> <TASKOPERATIONS> <RestrictToRepacking name=repack_only /> <ReadResfile name=resfile filename=UbcH7-A98W.resfile/> </TASKOPERATIONS> <MOVERS> FastRelax name=relax/> <PackRotamersMover name=pack task_operations=repack_only/> <PackRotamersMover name=mut_and_pack task_operations=resfile/> </MOVERS> <PROTOCOLS> Add mover_name=relax/> <Add mover_name=pack/> <Add filter_name=dg_wt/> <Add mover_name=mut_and_pack/> <Add filter_name=dg_mut/> </PROTOCOLS> </dock_design>
You can see that the FastRelax mover is commented out (it's missing the opening <). This is exactly like this on the original mutation_script.xml file included in Rosetta.
Is this a typo or is there a reason for this? Note that I don't care so much for computation time. I want to obtain results as accurate as possible. So what is the recommended protocol here? Use FastRelax or not?
Noting that uncommenting the FastRelax makes the protocol MUCH more slower.
Two more questions:
1) Why is the threshold=1000 on <Ddg .../> so high?
2) I am running the demo with -nstruct 1000, to generate 1000 pdb structures. For many of them, the ddG = 0 (exactly zero!). Is this likely? I'd expect the ddG to be small, if the mutation has a negligible effect. But a value that is exactly zero seems unlikely. Is this normal?
I'm guessing that the fact that the relax slows down the run tremendously is the reason why it's commented out.
Note that the relax is the very first thing in the protocol - this means that you're pre-optimizing the structure prior to the ddG calculation - the relax is not being applied during the ddG calculation proper. If you already have a decently optimized structure (e.g. if you already ran relax on the strucutre, or if it's coming from some other Rosetta protocol which does a relax), then the initial relax is likely superflous. Instead, you can get away with just repacking the structure.
The other issue is that you typically don't do a single ddG calculation - normally it's done against a number of different mutants. If you include the relax as part of the ddG script, you re-relax the structure for each independent mutations. It's much more CPU efficient to do the relax once, as a preparation step, rather than doing it for every mutation.
Regarding the threshold, the reason is that you don't really want the ddG filters to act as "filters" - you want to run the ddG protocol, send the results to the tracer and continue. Setting the threshold that high accomplishes that. If the threshold was lower, you might actually trigger it and stop the run prior to outputing the mutated structure. (If you set `confidence=0`, the the filters wouldn't be run at all at the point in the protocol, only being run at the very end of the protocol, both on the mutated structure.)
How are you calculating the ddG values? If you are pulling the values from the score file, those values are both from the exact same (mutated) structure (the filters being run at the end of the protocol), so it's reasonable that the difference between the two are zero - if the two converge to exactly the same packed rotamers, then there may be no difference between the structures, leading to an exactly zero difference.
Thanks for the reply.
How are you calculating the ddG values?
I am doing what the README.md (of the demo) says:
The output file will be a repacked and mutated PDB file with the dg_wt and
dg_mut lines at the end of the file. Subtration of these two numbers yields
the change in binding energy.
So I am looking at the pdb output files. The final two lines contain ddG scores, for example, here are the last two lines of one .pdb output file from a run I did using the above script:
You see that the numbers are exactly the same.
Looking forward to your reply.
Yeah ... that's not going to work. -- I'm not sure why they're doing it that way.
As I mentioned, the values in the scorefile (and by extension the end of the PDB) are calculated based on the structure at the *end* of the protocol. So both ythe dg_mut and dg_wt values are being calculated on the same structure ... hence the reason they're the same.
If you have a recent weekly release, then you can use the FilterReportAsPoseExtraScoresMover (https://www.rosettacommons.org/docs/latest/scripting_documentation/RosettaScripts/Movers/FilterReportAsPoseExtraScoresMover) . You would make two new movers, one for each filter, and then in the PROTOCOLS section use those movers, rather than the filters. This should do what you expect to do.
Thanks for the bug report - I'll fix the demo accordingly.
Thanks. When you fix the demo can you let me know here, or send me the updated version?
I don't understand why the filters are run on the same mutated structure. In the <PROTOCOLS> ... </PROTOCOLS> section, I place one Ddg filter *before* the mutation, and another *after* the mutation. Why are they both run after the mutation?
It's due to a quirk in the way RosettaScripts filters work.
The *filtering* portion of the Filter works at the time you place it in the PROTOCOLS section. That is, if you pass/fail based on the result of the filters, you do so at the point of time the filters are placed in the PROTOCOLS block.
However, the *(re-)scoring* portion of filters for the scorefile (or end of the PDB) work based on the structure at the *end* of the protocol. The philosophy is that the scores in the scorefile (or PDB file) are for the final structure, so the filter-related scores should also be for the final structure. - I admit it's confusing. It's probably not the way I would have done it.
Thanks. It's clearer now.
@rmoretti Based on your comments, I modified my protocol as follows. I would like to know your opinion. Thanks.
I relaxed the crystal pdb containing the complex with FastRelax. I generated 100 decoys and selected the one with lowest score. Then I feed this structure to the following script to compute the dg of a mutated structure:
I also apply this script to the wild-type (by using a resfile that only allows the native residue).
This way I obtain two distinct PDBs, the wild-type and the mutant, each with its dG. Substracting the dG's gives the predicted ddG of mutation.
Sounds good to me!