You are here

ΔΔG Calculation of a Mutation with PyRosetta

14 posts / 0 new
Last post
ΔΔG Calculation of a Mutation with PyRosetta
#1

Hi everyone,

We are trying to replicate the experiment by Alford et al. (2017) in our lab to calculate the ΔΔG of T193V mutation for the RT-RH-derived peptide bound to HIV-1 protease (PDB entry 1kjg). However, we have been getting values that are different from the reported ΔΔG of –4.95 kcal/mol and it’s likely due to errors in our code.

We are using PyRosetta to perform the calculations, but the code provided by the authors in the supplementary information is based on  C++, so we were wondering if we got the Python-based code correctly.

Below is the protocol of the calculation provided by Alford et al. (2017), following which is our Python code:

 

ΔΔG of Mutation. The coordinate file for 1kjg was downloaded from the Protein Data Bank and cleaned to remove any non-canonical amino acids. The PDB was refined with fast relax constrained to native coordinates using Cartesian-space refinement and the REF2015 energy function using the following command line:

relax.linuxgccrelease –s 1kgj.pdb –use_input_sc –constrain_relax_to_start_coords –ignore_unrecognized_res –nstruct 1000 –relax:coord_constrain_to_sidechains –relax:ramp_constraints false –relax:cartesian –relax:min_type lbfgs_armijo_nonmonotone

After refinement, the lowest scoring model was used to generate five structures of the native conformation and five structures of the T193V mutated conformation using a Cartesian version of Rosetta’s ddg protocol.

cartesian_ddg.linuxgccrelease –s 1kgj_refined_lowest.pdb –ddg:mut_file $MUT_FILE –ddg:iterations 5 –optimization:default -max_cycles 200 –bbnbr 1 –relax:min_type lbfgs_armijo_nonmonotone –fa_max_dis 9.0

The energies were averaged for each ensemble of five structures. The ΔΔG was then calculated as thedifference between the average energy of the mutated ensemble and the average energy of the native ensemble.

 

Our Python code:

This is the first run of PyRosetta to fast relax to start coordinates and save a .pdb file for the second run.

from pyrosetta.rosetta import *
from pyrosetta import *
from pyrosetta.toolbox import *
init(extra_options = "-beta_nov16_cart -in:file:s 1kjg.clean.pdb -use_input_sc -constrain_relax_to_start_coords -ignore_unrecognized_res -relax:coord_constrain_sidechains -relax:ramp_constraints false -relax:cartesian -relax:min_type lbfgs_armijo_nonmonotone")
pose = Pose()
pose = pose_from_pdb("1kjg.clean.pdb")
scorefxn = ScoreFunction()
scorefxn = get_fa_scorefxn()
scorefxn(pose)
mutate_residue(pose,201,'V')
relax = pyrosetta.rosetta.protocols.relax.FastRelax()
relax.constrain_relax_to_start_coords(True)
relax.coord_constrain_sidechains(True)
relax.ramp_down_constraints(False)
relax.cartesian(True)
relax.min_type("lbfgs_armijo_nonmonotone")
relax.set_scorefxn(scorefxn)
relax.apply(pose)
pose.dump_pdb("1kjg.T193V.pdb")
print(scorefxn(pose))

 

Here we loaded the produced .pdb file and obtained energy values, which were used to calculate the ΔΔG of mutation T193V.

from pyrosetta.rosetta import *
from pyrosetta import *
from pyrosetta.toolbox import *
init(extra_options = "-beta_nov16_cart -in:file:s 1kjg.T193V.pdb -ddg:mut_file -ddg:iterations 5 -max_cycles 200 -relax:min_type lbfgs_armijo_nonmonotone -fa_max_dis 9.0")
pose = Pose()
pose = pose_from_pdb("1kjg.T193V.pdb")
scorefxn = ScoreFunction()
scorefxn = get_fa_scorefxn()
relax = pyrosetta.rosetta.protocols.relax.FastRelax()
relax.constrain_relax_to_start_coords(True)
relax.coord_constrain_sidechains(True)
relax.ramp_down_constraints(False)
relax.cartesian(True)
relax.min_type("lbfgs_armijo_nonmonotone")
relax.set_scorefxn(scorefxn)
scorefxn(pose)

 

Could someone please let us know if we're doing this correctly or if we need to modify our PyRosetta code? We are aware that we're using the beta_nov16 Cartesian function, whereas the authors used the REF2015 Cartesian scoring function, but we were hoping to see improved ΔΔG prediction that matches the experiementally determined value of -1.11 kcal/mol more closely.

Category: 
Post Situation: 
Fri, 2018-03-09 01:44
akaashvenkat

.

Fri, 2018-03-09 03:28
ac.research

Additionally, is it correct to run the ddg and relax protocols separately to obtain the ΔΔG of mutation or should they be combined into one run with PyRosetta?

Sat, 2018-03-10 18:14
akaashvenkat

I contacted the Primary Author on the paper - you are using the -beta_nov16 scorefunction (which has also not been published) on a protocol on which the ddG values were calcuated with REF2015 (the current default).

Mon, 2018-03-12 12:48
jadolfbr

As for the ddG prediction being closer to the experimentally determined value with -beta_nov16, I think you answered your own question here...

Mon, 2018-03-12 12:51
jadolfbr

Using the -beta_nov16 scoring function with Cartesian refinement, we observed a slighlty closer ΔΔG prediction to experimental value (-4.29 kcal/mol) compared with the reported value of -4.95 kcal/mol with only the first run of the Python code (i.e. without the -ddg:mut_file flag), so would this function be a better predictor of ΔΔG of mutation for other proteins as well?

Also, with the optimized parameters of the -beta_nov16_cart function, wouldn't it be more efficient in sampling the global minimum as compared with the -beta_nov15 scoring function with Cartesian refinement utilized by Alford et al. (https://www.biorxiv.org/content/biorxiv/suppl/2017/02/07/106054.DC1/106054-1.pdf)?

Mon, 2018-03-12 16:03
Sazhnyev

You are asking a question here that we don't know the answer to.  Perhaps try it with a few other proteins?  Benchmark.  You could even contact Rebecca to collaborate perhaps.  

 

Again, the beta score function has not been published yet. It is unclear if it is quicker or better.  Without data how do we know?  Perhaps the creator of the scorefunction could tell you - but being a developer of Rosetta, I have yet to hear much about it. Perhaps I missed a paper?  I don't know, but personally I've just been using REF2015 as it was heavily benchmarked for many different tasks in Rosetta.     

Mon, 2018-03-12 20:26
jadolfbr

Ok, will do - thank you for all your input.

Tue, 2018-03-13 15:55
Sazhnyev

Hi!

I am also trying to use the PyRosetta to do the ddG calculation, may I ask did you figure out how to use PyRosetta to do the cartesian_ddG eventually?

XD

Fri, 2022-01-07 01:11
CKS2001

I had a somewhat similar confusion for ∆∆G, wherein the values were scaled — https://www.rosettacommons.org/node/11294
Namely, whereas the strength of a hydrogen bond in a test system does seem to be 1-2 kcal/mol as expected, the ∆∆G values for mutations need to be scaled.
I was confused as my computed values were something like 4.2 fold greater than they were meant, possiblt indicating a classic kJ/mol vs kcal/mol mix-up. I was told this is not the case and just coincidence, but the data needs to be scaled as seen in the papers as is normal. For my application my mean absolute error dropped down to around a kcal/mol (with nasty outliers removed). This was true for both ref2015 and beta2016.

Mon, 2022-01-10 06:57
matteoferla

Hi,

May I ask which platform did you use? PyRosetta or Rosetta? I have some confusion while dealing with the ddG calculation in PyRosetta

Best

Mon, 2022-01-10 08:16
CKS2001

Me? PyRosetta, but for a different ball game: speed.
My benchmarking was for assessing ∆∆G calculations on unminimised models with a strong time constraint (> 60 s) for my app, venus.cmd.ox.ac.uk: for this a neighbourhood of 8–15Å was FastRelax minimised for 1-3 cycles before and after the change. Cartesian was too slow and the improvement negligible, so I am using vanilla ref2015 scaled by some factor.
My problems with the ∆∆G calculations were destabilising outliers that caused backbone distortions, especially on a different segment of the protein so I resorted to using median instead of mean for the "mean absolute error" metric and all the data made sense —if a mutation is destabilising, I don't care if it's +10 kcal/mol or +100 kcal/mol, but it throws off the mean. Machine learning ∆∆G calculators don't have hard-to-calculate outlier problems hence why papers comparing methods seem to disagree and handwave it away on different subsets of the ProTherm database.

Tue, 2022-01-11 02:28
matteoferla

Hi matteoferla,

Thank you so much for your reply! I tried your app VENUS and it is so efficient and quick to give feedback on ∆∆G for a specific mutation, so cool!

I am new to pyrosetta and so sorry that I have no idea about your problem. BTW, may I ask you a question related to the PyRosetta ∆∆G calculation? 

Following the cartesian_∆∆G Protocol in Rosetta, which is claimed to have a Pearson correlation of 0.74, I tried to transform the calculation into the PyRosetta version. My logic is stated in the following:

  1.  Prepare a cleaned PDB file and apply cartesian fast_relax with the function of ref2015_cart, record the score in function of ref2015 and denote it as S0
  2.  Mutate the pose with the function mutate_residue, then apply MinMover and Monte Carlo trial mover to minimize the energy, denote the score at this stage as S1
  3.  Calculate the ∆∆G as S1-S0.
    from pyrosetta import *
    from pyrosetta.rosetta import *
    from pyrosetta.toolbox import *
    from pyrosetta.teaching import *
    init("-ignore_unrecognized_res 1 -ex1 -ex2 -flip_HNQ -relax:cartesian -nstruct 200 -crystal_refine -optimization:default_max_cycles 200")
    
    
    #Parameter set up#
    testPose= Pose()
    testPose = pose_from_pdb("1BNI.clean.pdb")
    scorefxnDDG=get_fa_scorefxn()
    
    #Firstly relax the structure for later modificiation#
    from pyrosetta.rosetta.protocols.relax import FastRelax
    scorefxnRelax = pyrosetta.create_score_function("ref2015_cart")
    relax = pyrosetta.rosetta.protocols.relax.FastRelax()
    relax.constrain_relax_to_start_coords(True)
    relax.coord_constrain_sidechains(True)
    relax.ramp_down_constraints(False)
    relax.cartesian(True)
    relax.min_type("dfpmin_armijo_nonmonotone")
    relax.min_type("lbfgs_armijo_nonmonotone")#for non-Cartesian scorefunctions use'dfpmin_armijo_nonmonotone'
    relax.set_scorefxn(scorefxnRelax)
    
    
    
    relax.apply(testPose)
    s0=scorefxnDDG(testPose) #Record the energy score after cartesian_relax
    
    #Energy minimization
    min_mover = MinMover() #define a Mover in type of MinMover
    mm=MoveMap()
    mm.set_bb_true_range(28,36)
    min_mover.movemap(mm)
    min_mover.score_function(scorefxnDDG)
    #min_mover.min_type("dfpmin")
    min_mover.tolerance(0.01)
    print(min_mover)
    ddG=[]
    def minimize_Energy(pose):
        #Minimization#
        min_mover.apply(pose)
    
        #Trial_mover define#
        kT=1
        mc=MonteCarlo(pose,scorefxnDDG,kT)
        mc.boltzmann(pose)
        mc.recover_low(pose)
    
        trial_mover = TrialMover(min_mover,mc)
        #Monte Carlo#
        for i in range (100):
            trial_mover.apply(pose)
        
        return
    
    #Point mutation#
    AA=['G','A','L','M','F','W','K','Q','E','S','P','V','I','C','Y','H','R','N','D','T']
    mp=Pose()
    for i in AA:
        mp.assign(testPose)
        mutate_residue(mp,52,i)
        relax.apply(mp)#relax after minimization
        minimize_Energy(mp)
        
        s1=scorefxnDDG(mp)
        dg=s1-s0
        ddG.append(dg)
    
    
    
    
    #Output#
    
    print("The energy after relax: ",s0)
    print("The ddG: ")
    print(ddG)
    
    

     

However, I found that according to this logic, the ∆∆G I got are always a positive value (I fixed the residue position and perform all 21 types of mutation). Do you know which steps in my protocol are wrong, or may you give me some suggestions to calculate the ∆∆G in PyRosetta?

Thank you so much for your help! 

Yours sincerely

Tue, 2022-01-11 03:08
CKS2001

Your code looks like it would work fine. 

Gibbs free energy is a potential, wherein negative is good. Say there's a magical being that shrunk itself down and has a yet to be folded string of a protein, as soon as they let it go and the protein will fold by itself releasing energy (no magic required), because in the energy funnel representation of the protein fold the protein is rolling down towards the minima. Most mutations are bad, so must mutations are expected to be positive.

My aforementioned problem was that the difference between mutant and folded is a value that is scaled relative to kcal/mol. Applied to your case the last line would be:

calorie_ddG = ddG/2.94
print(f"The ddG: {calorie_ddG} kcal/mol")

Wed, 2022-01-12 01:20
matteoferla