When interpreting an ensemble of decoys made with relax or backrub, which structures should be discarded and which ones retained. I get a lot of conflicting advice ranging from “keep them all” to “take the top 10%”. Is there a best practice regarding this?
Also, should the structures be clustered and only the centroids considered?
Any advice would be of help.
It really depends on what you're trying to accomplish.
Typically "keep them all" is only going to apply if you're looking at relax & backrub to generate diversity going into something like design. You keep the large variety of structures generated to broaden your diversity, and hope that the design protocol will fix up any structures which the stochastic procedure left bad. After your design protocol there's a filtering step which throws out bad structures, including those that relax/backrub messed up and the design protocol wasn't able to fix.
If you don't have excess computational resources to throw at the problem, you may wish to reduce the number of input structures, even if you're looking at diversity. The first way to do this is to throw out the really bad structures, on the assumption that time in the downstream protocol will be wasted. The level for "really bad" depends on your particular protocol and input structure, and it could encompass 90% of the structures from relax, leaving only the top 10%. It really depends on what the relaxed/backrubbed structures look like and how many structures you generated and want to take through to the next stage.
Another way to reduce the number of structures taken forward is clustering. Basically, you say that it isn't worth taking both of two similar structures forward because they'll both produce more-or-less the same results in the next stage. (Typically because there's a local diversification step in the next stage.) Number of clusters and cluster radius depends on what the downstream protocol is and how much computational power you have. You want to reduce the number of structures to a manageable number, but don't want to reduce your diversity past the point where the sampling of the downstream stage can't compensate.
That's assuming you want the diversity. Sometimes you're not looking for diversity in relax/backrub, but optimization. In that case, you should only take forward the very best energy structures. - But not necessarily just the lowest energy structure, as the Rosetta energy isn't necessarily precise. You may want a number of other structures which are also close to being the lowest energy structure, but are structurally distinct from the lowest energy one. Again, what counts as "close" and "structurally distinct" depends on the next stage and how much refinement it does. (There's no sense taking forward a structure if you're just going to throw out the results from it later on.)