I am trying to run a global docking search on two proteins on the university computing cluster sampling 100K structures.
docking_protocol.mpi.linuxgccrelease -database /home/aaj30/rosetta_src_2015.39.58186_bundle/main/database -s xyz.pdb -use_input_sc -ex1 -ex2aro -nstruct 100000 -randomize1 -randomize2 -dock_pert 5 10 -spin
I was wondering if it is possible to introduce a filter such that the docking algorithm samples all 100K structures but stores only the top 0.1% sorted by I_sc or total_score into the output. Alternatively, is it possible to give a cutoff value where poses below this cutoff are rejected.
Could this also be achieved using rosetta_scripts?
Currently, it's only a very few protocols which will allow you to filter your results based on aggregate statistics. Rosetta tends to treat each output structure in isolation. That is, it's hard to say "only keep the top 0.1% of the structures." It's much easier if you can specify an absolute cutoff. e.g. "only keep structures with total scores better than -456.8 REU". So if you're interested in only keep the top X%, you might want to do a short test run, figure out what the cutoff level would be to give you just the top X%, and then use that absolute cutoff in the filtering.
For the docking_protocol application, you can use the -output_score_filter option to set the total score cutoff value. Filtering by interface score is more difficult, as the cutoff is fixed at 0 REU.
If you're using RosettaScripts, things are more flexible. You can apply any of the Filters after your DockingProtocol call to pass/fail the structure. (But again, all of those filters are set up for absolute thresholds, rather than output-based thresholds.) I'd recommend the Ddg filter if you're looking for interface score.
Thanks for your response. I was able to do it successfully by setting up absolute thresholds in RosettaScripts using a compoundfilter that passes structures using Ddg and SASA.