Hi Rosetta users,
I was trying to recapitulate a loop conformation in a protein by using RosettaRemodel (Rosetta3.5). The loop was taken out so in the blueprint file I specified "0 x L PIKAA [native aa]". I turned on the clustering functionality in RosettaRemodel to focus on only the unique structures that score well in the first stage. From my understanding, it should generate several cluster centers based on the specified cluster radius, which have the lowest energy within each cluster. However, I only got ONE cluster center after the clustering and therefore only one final refined structure will be generated at most, no matter how many were generated in the building stage (-num_trajectories and -save_top were both 100 in my case). The cluster radius was 0.2. So I am wondering if I am giving a wrong cluster radius, or there is anything I didn't notice. And how is it different from the "standalone" clustering application? Thanks.
The clustering in RosettaRemodel uses the same clustering code that the clustering application uses, although it does all-against-all clustering for all relevant poses, not just the top 400 like the clustering application does. It also doesn't necessarily obey all the options in the same way.
This may be the issue you're seeing. Cluster radius is one of the options that varies between the two. Remodel has it's own cluster radius option: -remodel:cluster_radius rather than -cluster:radius for the standalone application. By default it's set to -1, which does auto-determination of the cluster radius, which may end up being bigger than you want/expected.
Edit: You probably also want to make sure you're not setting -remodel:cluster_on_entire_pose. It defaults to false, meaning you'll compute the rmsd only on the loop regions. If it gets set to true, you'll compute the rmsd over the whole pose, potentially washing out the differences between the conformations.
Thanks for your reply. The weird thing I observed was that there are "protocols.cluster: Redistributing groups ... 0 cluster centers" lines in the log file if I left the cluster_radius the default value (which is -1). What bothered me was the fact that I got 0 clusters, and in this case no further refinement protocols would follow the building stage, since there was no input structure in the accumulator. So I was wondering if there is a upper or lower limit in the auto-detection of cluster radius.
As I read things, when a cluster radius is auto-determined, it's set at 1.1 times the median rmsd between the initial 400 seed clustering structures. (For rmsd clustering. For GDT-based clustering, it's similarly based on the median GDT (different modifier), clamped to be within 0.5 and 0.9).
I don't think there's an intrinsic limit. The actual value used should be printed to the log, though, in a line looking like:
protocols.cluster: Clustering of NNNNN structures with radius RRRRRRR (auto)