I am trying to design a protein using backrub application. I have two questions.
1. Is nstruct 100 enough to sample such a problem? (I am only trying to design 15-20 out of 560 amino acids)
2. From the output of backrub, what is the best way to minimize the decoys to score them so that I can compare with the wild type?
I and others in the Kortemme lab having been using backrub for similar use cases, so we should have some insight that can help you.
100 structures should be more than sufficient for your use case. We typically find that past 30-50, you're not adding too much diversity to your generated sequence profile.
It is also important for sampling to consider how many trials of backrub to run. 10,000 trials would be the minimum to try per structure, but you could also consider going up to 20k or more, depending on how flexible the region you are sampling is.
As for how to minimize the decoys, doing a minimization on each decoy (with pairwise coordinate constraints) until the minimization converges should drive the decoys into local minima, decreasing the amount of noise when you compare wild type to design sequence profiles. The minimize_with_cst command line application (part of the ddg_monomer protocol) can do this, or I send you a Rosetta script for it if you are using the scripting interface.
Thank you for the reply.
As for the no. of trials, we are currently using 10,000. We would also like to experiment with a higher value and see the results. When you said 'how flexible the region you are sampling is' , I hope you meant the flexibility in designing the sequence. That means, I can get more diverse sequence profiles with a larger value of ntrials, right?
Could you send me an example script that might help me? You can send to the address firstname.lastname@example.org (or attach a txt file in the reply whichever is fine with you)
When I was referring to "how flexible the region you are sampling is", I was referring to the inherent flexibility of the region of protein being sampled, i.e. if it is a loop or an alpha helix. But yes, a more flexible backbone region could also result in a more diverse sequence profile.
The number of backrub trials required should depend on the amount of underlying flexibility, as more trials are probably needed to fully sample a more flexible region. But if ntrials is too high, the simulations tend to diverge too far from the true minima, so it really is about getting the number in the right range to get more accurate predictions. There is a fair amount of leeway on this, and the best way to see if you're in the right range is probably to look at the decoys and profiles to see if they seem reasonable.
The backrub ddG integration test does a similar thing to the script you're looking for. I modified its XML script (but didn't test it); see attached. The original can be found at:
Thanks for posting the backrub script - I'm thinking of doing something similar and this was helpful to look at. I noticed that for the mover "addcst", you've specified a cst_weight of 0.0. I'm curious, wouldn't this essentially remove the constraints by making all calculated constraints 0, or does this parameter work in a different way?