1)I have some doubts on clustering. I have modeled a protein and refined its loop. How exactly clustering will help me in further steps? and whats going to be the output result which i can use later? and do i need to use relax_a_large_protein option to the final model? (my protein contains 290 amino acid) or directly use the final model for docking studies?
2) In protein-ligand docking protocols, unlike other softwares there is no protein preparation steps where we can minimize the total protein and add Hydrogen atoms. So is it fine to skip these steps and move further or should i use other softwares for this purpose?
Thank you in advance for all the suggestions.
1) Typical clustering uses rmsd over the C alphas of the whole protein. If the loop is the only thing moving, clustering will split the multiple models that Rosetta spits out into different groups based on the position and conformation of the loop. This will help you reduce the number of structures which you bring forward to the next stages, while preserving structural diversity. (e.g. if you keep the top scoring few structures in each cluster)
Also, you can get a sense of how "wide" the landscape is in the vicinity of a given model. If a structure is the only one in a cluster, and other clusters have several dozen each, that conformation could be a Rosetta fluke, and might not be worth pursing further.
2) Rosetta automatically adds hydrogens (and other missing atoms) with standard geometries to all structures it reads in. It can also optimize placement in the input context with the commandline flag "-no_optH false". (And crystallographically ambiguous heavy atom placements with the flag "-flipHNQ")
For ligand docking there should be a preparation step where you repack the entire protein, to avoid artifacts during the local repacking during docking. The stand alone application uses the ligand_rpkmin program to do this, but you can do similar with RosettaScripts.
I'm a little biased, but if you're looking for a more general protein preparation protocol optimization, the all atom relax protocol of Nivon et al. (http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0059004 https://www.rosettacommons.org/manuals/archive/rosetta3.5_user_guide/dd/...) is probably what you want. It removes most of the Rosetta scoring incompatibilities in the input structure, while keeping the structure close to the input structure coordinates.