I've got 10,000 output structures from the rosetta CM protocol and am looking to cluster the results to analyze these structures. I've been having issues running calibur (seperate post) and noticed that there was an energy_based_clustering program published recently.
In my quick browse of each I see that the way in which clusters are found is different for each. I am wondering if there is a significant difference in applicability of each, and if so, which one would be more applicable for homology modelling.
The biggest difference is in philosophy. (And which to choose would depend on what you're interested in doing with the clusters centers downstream.)
Calibur clustering is purely a structural based one. It doesn't take energy into account at all. The cluster centers are those structures which are structurally closest to the average structure of the cluster. It'll give you structural representatives.
The energy-based clustering is more a filtering mechanism. It prefer the low-energy structures, and then filters out structures which are "practically identical" to those low energy structures. The cluster centers are low energy structures, but "different enough" from each other.
So which to use depends on why you want to use clustering. Often why you're performing clustering is to reduce the redundancy in output. (There's not really a point in taking two structures which are 0.2 Ang rmsd both forward. One is representative enough of that structural region.) In that sort of case, you're probably better off with the energy-based approach, as it takes forward the "best" structure (by Rosetta energy), but reduces the structural ensemble to a point where each is "different enough" from each other.
Calibur is more suited for a standard final-analysis clustering approach, where getting the structural representative (independent of score) is the goal. (But if you're then going to analyze the energy of the cluster centers, you still might want the energy-based one.) Calibur is also a more conventional clustering approach (whereas energy-based clustering is not a common clustering technique), so reviewers are likely to be more familiar with it.