I'm trying to compare another method to Rosetta's side chain packing. My first pass is just using default options for the PackMover, and the results I'm getting out of Rosetta are pretty bad by comparison. Are there good rule of thumb settings for establishing a baseline over a set of proteins?
I am setting up a PackMover in C++ code, as opposed to running the command line fixbb application. Looking through the fixbb code I see the following options, some of which aren't mentioned in the documentation:
option.add( minimize_sidechains, "Do minimization of side chains after rotamer packing").def(false);
option.add( min_pack, "Pack and minimize sidechains simultaneously").def(false);
option.add( stochastic_pack, "Pack using a continuous sidechains rotamer library").def(false);
These all suggest a continuous optimization after rotamer packing, do they differ in any way? Any explanation would be helpful.
I turned on *all* of the -ex flags (e.g. 1, 2, 3 and aromatics) which gives a considerable drop in energy for most proteins I've tried. I also enabled minimize_sidechains, which doesn't seem to help a ton. Anything else I should try?
Alright, first some background. The packer in Rosetta is a Monte Carlo Simulated Annealing protocol which attempts to find the lowest energy sidechain conformation from a set of sidechain rotamers on a fixed backbone. There are internal settings which adapt the number of trials and the temperature scheme to the size of the problem - more positions or more rotamers = more sampling. The defaults should be set up so that they're reasonable for most applications.
The first question to ask is by which criteria are you saying the results are "pretty bad"? The Rosetta packer is going off of the Rosetta scorefunction, so the lowest energy under some other energy function might not match up with the lowest energy under the Rosetta energy function. (The ranking of conformations may invert under different scorefunctions.)
The next point is that the packer is going to be selecting from a limited set of rotamers. It's pretty good at finding the minimum energy combination from that limited set of sidechain conformations, but a procedure which allows for "off-rotamer" sampling might find a conformation which is even lower, as it can pick sidechain conformations which aren't found in that initial set of rotamers.
That's what the -ex flags are trying to emulate. What they do is expand the set of input rotamers. The Dunbrack rotamer library, which Rosetta uses, contains both a rotamer center and also a distribution width (standard deviation) for the ensemble of sidechains within that rotamer well. Without any -ex flags, the packer is just using the rotamer centers. The -ex flags will add in additional rotamers which represent (by default) +/- 1 sd from the center. -ex1 means you do the expansion for the first chi angle, -ex2 for the second chi angle, etc. -ex1aro means to do the expansion for the first chi on aromatic residues only, etc.
The +/- 1 sd is only the default, though. If you want, you can increase the sampling. You can use the "level" sub-option to increase that to a greater level of sampling. See https://www.rosettacommons.org/docs/latest/packing-options.html for details on the settings. By default, -ex expansion isn't done for surface exposed residues. What counts as "surface exposed" is based on the number of neighbors (residues with Ca's within 10 Ang) the residue has. You can make this less stringent with the -extrachi_cutoff option.
Another option is to include more rotamers. By default, Rosetta throws out rotamers with extremely low probability. You can change this with the options -dunbrack_prob_buried and -dunbrack_prob_nonburied. These take a fraction (between 0.0 and 1.0) which represent which fraction of the Dunbrack rotamer set (by cumulative probability) to keep. Crank them to 1.0, and you'll use all of the Dunbrack rotamers in your packing.
All of those settings still only do rotameric sampling. If you really want to sample off-rotamer, you may want to add in minimization to your protocol. There's several ways to do this. The first is the min-packer. What this does is for every Monte Carlo trial, it minimizes the rotamer substitution in the current context before evaluating the energy. This allows much more thorough off-rotamer sampling, at the cost of greater time expenditure. (Though tests indicate that with the min-packer you probably don't need the -ex settings to get comparable results, as the minimization covers that.)
The min-packer differs from -minimize_sidechains in that minimize_sidechains only does the minimization *after* the entire MC-SA process finishes. This is faster than the min-packer, but may miss combinatorial effects where minimization of one sidechain may allow enough room to place in a different rotamer at a different position.
-stochastic_pack (renamed to -off_rotamer_pack for weekly release version within the past year or so) somewhere between the two, as it doesn't do minimization, but instead does a random sample in the continuous range of +/- 1 sd (non-adjustable) of the rotamer center, instead of fixed samples.
Another thing you may want to be cognizant of is *which* Dunbrack rotamer library you're using. Rosetta3.5 and before (as well as the weekly releases with the -restore_pre_talaris_2013_behavior flag) use the 2002 Dunbrack library. The weekly releases use the newer (and better) 2010 Dunbrack rotamer library. A different rotamer library will have different rotamers present, and thus will change how the packer does sampling.
A final note is that there are some flags which can control how the packer behaves. Some of them (like "-linmem_ig 10") solely affect performance, and shouldn't change how the packer behaves. Others (most notably "-multi_cool_annealer 10") will change how the packer behaves. These are often most critical in design, or other situations where you have a highly-combinatorial problems. Most protein core (non-design) packing runs are rather convergent with the default parameters, so there's typically marginal utility in changing the details of the packer settings. Changing the rotamer sets used, or using a different sampling scheme (like the min packer of the off-rotamer/stoichastic packer) typically yields better results.
The standard in Rosetta is to use the regular packer with defaults + -ex1 & -ex2 options. Note, though, that this is normally in context of a larger protocol, which might include backbone minimization, rigid body sampling, and loop remodeling steps, or in the context of redesign. There's time tradeoffs in the choice. Doing more extensive sampling would get better results for the packing step, but would slow down the entire protocol, and wouldn't necessarily improve the overall end result. The typical settings are "good enough" for the multi-phase protocols that Rosetta is normally used for. Repacking just to repack a fixed sequence on a fixed backbone doesn't normally come up as a (non-benchmarking) problem.
By the way, what's the method which is giving you the apparently better results? If you could provide details on how it works, I might be able to suggest alterations to the Rosetta protocol which would make it more comparable to the other protocol. (One major issue to work out is if the alternate protocol is allowing backbone movement - packing by itself on Rosetta will not move the backbone, but doing so can greatly change the energy you get. - this includes preparation steps which pre-minimize the structure. Small structural changes (with insignificant atom movement in the backbone) can greatly affect Rosetta energy (http://www.ncbi.nlm.nih.gov/pubmed/23565140 ).