I am tring to refine the predicted structure from alphafold and SWISS-MODEL database. Since the protocol I followed said I need to eliminate the steric clash and do refinement in predicted structure first (and then docking or some other analysis).
So I just relaxed protein in torsion space with constraints. And here is the code I used:
relax.mpi.linuxgccrelease -s 5B0U.pdb -relax: constrain_relax_to_start_coords -ramp_constraints false -relax: coord_constrain_sidechains -nstruct 20 -ex1 -ex2 -use_input_sc -flip_HNQ -no_optH false
But when I used Structure Assessment from Swiss-model to evaluate relax result( I choose the lowest one) , I found all of the protein's clash score were significantly increased (17.6 to 207 or 0.46 to 174), and the overall MolProbity Score were also increased (ideal case is as low as possible)
So I am confused about these results, and do not know if these proteins become better or not after relaxing by rosetta. Or just trust rosetta results and follow the protocol?
Thanks in advance for any responses on this question,
A number of different things might be happening. Here's some ideas:
1) Clashes going up could be a difference of opinion between rosetta on what a clash is (the fa_rep term) and the other tools you are using. Generally MolProbity and Rosetta agree well so it's probably not this.
2) The Rosetta scorefunction (and MolProbity) consider MANY local geometric terms, beyond the nonlocal clash terms - for examine adherence to the Ramachandran plot, rotamericity of the sidechains, etc. A model that has very low clash score because it had terrible backbone torsions may behave like this, as Rosetta tries to fix the torsion it may allow clashes. For example if you have 1000 energy units worth of bad torsions, if you lower that to 500 energy units of bad torsions and 300 units of clashes, 1000 > 500+300 so the model is better on the whole, although worse on clashes specifically. The overall MolProbity score also considers this, but it may be interesting to break out the scores by term to figure out what is increasing and decreasing.
3) (hypothesis deleted, -use_input_sc precludes it)
4) from Frank DiMaio: "huge structures (500+ residues) with poor initial geometry sometimes minimize poorly. A cartesian relax or constrained relax would work better in these cases". Interpreting this: you've already tried a constrained relax. Try a dualspace or cartesian relax (I think the flag is -dualspace...unsure). Try removing the constraints you DO have to see what happens (the flags with constraint in the name).
5) I'd look at the model motion in general. With these constraints you should see little motion; if you see any regions with large motion it's a flag that Rosetta thinks something is wrong there.
Thank you for your reply, I will try dualspace and cartesian relax first :)
I tried dualspace and cartesian relax on some proteins, and still got worse molprobity and clash score (but cartesian's result showed slightly improvement).
I also tried other methods to refine predicted structure, like openmm with constraint or Force Field X , and they could actually imporve the molprobity and clash score for some (original molprobity >2) proteins, but rosetta method cannot in contrast...
I should trust which result? Consider about:
So I can only trust rosetta relax result right? I noticed the rosetta relax method will imporve other index except clash score and finally get a worse molprobity score in many cases, that indicate the molprobity and rosetta relax has different philosophy on protein optimization?
I would say that MolProbity and Rosetta have similar philosophies but as a case of convergent evolution rather than shared ancestry. For example they both have rotamericity calculations, but they use different rotamer libraries for the calculation. MolProbity counts clashes as atom contacts within certain distances; Rosetta just smoothly penalizes atom placement with an LJ potential. Molprobity is tuned for detecting errors in crystallographic models; Rosetta is tuned for making computational models look like existing proteins in the PDB.
When I am in a situation where different tools have different preferences I switch to asking about what my use case is going to be. If I am proceeding into Rosetta design, I am going to use the Rosetta relax model because it is in the right scorefunction for design, even if MolProbity likes it a bit less. If I am doing something like molecular replacement or insertion into a CryoEM model (outside of Rosetta), I might make the opposite choice, because MolProbity is closer to those tools. Both Rosetta and MolProbity will give you per-residue scores (Rosetta) or violation reports (MolProbity); another thing to do is to look at where the model is bad by looking for those violations. IF you know you care about a certain part of the model, use the model that is cleaner in that particular region. Very few structures of any size are correct everywhere.
If you are moving on to cartesian ddg - I'd look at per residue results to see if you have problems in the areas you want to work on later, and then use the Rosetta results unless i find something really weird.