I refer to the constraints file instructions provided here:
(1) Does any one know how to input the constraints to Rosetta 3.0 assuming constraints files have been prepared according to the format stated in the above URL?
In particular I'd like to use the "docking_protocol" application provided in Rosetta. However, calling for command line help (eg ./docking_protocol.linuxgccdebug -help) doesn't seem to provide any options that indicate how a constraints file may be specified.
(2)Additionally, I have also tried various flags at command line such as -constraints filename -cst_file filename and so on which have no effect. That is, assuming the "docking_protocol" application accepts constraints -- if it doesn't, which are the other Rosetta 3.0 applications that do accept constraints? Abinitio comes to mind. For reference I am trying to specify atom pair distances as constraints.
(3) The link also advises to update several functions, for eg "add_constraints_from_cmdline()". However, which source code files should these be added into? I'm sorry if these are trivial questions but I'm really quite at a loss.
Any help is appreciated and thank you for your time.
Did you ever figure out how to do constraints with Protein Docking?
I have two PDB's where I have A and B so not sure how to indicate the residue/Atom from A should be a minimum distance from B
Respond to email@example.com
So far as I can tell the primary docking protocol does not read constraints internally. (All I did was search the word "constraint" in the docking code.) The documentation's description of constraints (add from cmdline, etc) is basically all that needs to be done to cause the docking protocols to respect constraints. Look for where the ScoreFunction is initialized in the docking application (either in the application itself or in src/protocols/docking and add constraints from there.
Another option is to use the DockDesign parser - it will respect command line constraints and can do regular docking.
I think abinitio does take constraints.
I started looking through code yesterday to figure out how the resnum reference work in the atompair. As a Java programmer I have a built in defense mechanism about trying to tweak C++ code. Reading code does help figure out what is missing in the docs. My first thought was it was the residue number as referenced in the PDB file which means you would need to also give the chain ID which does not appear possible in the atom pair constraints code. It could be as simple as counting residues from 1 to N in the file and using that index. It would be nice if atom pair supported the indexes used in PDB.
I was also thinking that I may play around with the dock_pert option where I create scripts to explore all the rotational and offsets so that instead of nstruct = 1000 you would do nstruct = 1 and submit 1000 low res jobs to the cluster. The only problem is that it looks like the offset is a delta move value for both X,Y where I would need to give a deltaX and a deltaY plus the rotation of the PDB structure to explore the possible patterns for a known interface. Of course I am spending to much time trying to minimize the computational task when I should just submit everything to the cluster as random and let it brute force.
I will probably try and submit 100 jobs where each job does random docking with nstruct=10 to get a 1000 decoys and then move on to the high res option for the best scoring structures assuming they match the interface.
With regards to AtomPairConstraints and PDB numbering:
Almost nothing inside rosetta runs on PDB numbering. Nearly everything runs on 1-N numbering, where N is the total number of residues in the pose. This is because the Pose keeps said residues as a vector (indexed from 1). (You'll notice vector1 everywhere instead of the C++ STL vector class - this is because the developers are mostly biological scientists, not computer scientists, and don't like indexing from 0).
The only time you will EVER use PDB numbering in rosetta is if you see a function with pdb in the name. There is a class inside the pose which keeps track of pose versus PDB numbering and lets you convert between them. The resfile, for example, takes advantage of this.
The constraints framework does not use the PDB numbering options. The likely reason is that the developers who wrote the constraints use silent file input, not pdb input, and thus don't think about PDB numbering at all. It would be possible to tweak the code to support this; it's just as easy to renumber PDBs from 1. (There's a script, chains.pl, that does this - I think it goes out with the releases).
With regards to dock_pert - I'm reasonably certain that the values you give here are maxima, not settings. In other words a value of 8 will allow 7, 6, 5, etc... So you may not need to try lots of different perturbations - just leave it at your maximum and run more. I'm not sure I understand your question so I may not be addressing it well.
After looking at the code I figured it was a simple index. The problem with renumbering is that it makes it harder to reference back to the original PDB structure. So I manually did the index. As you stated earlier it does not appear that Dock in 3.0 considers constraints which is a big negative. Is there a way to formally request features or see what is being worked on for the next release is some sort of formal bug tracking system?
Thanks for clarifying that the dock_pert values are probably max range values where I assume that the nstruct would then be evenly spaced in the range of values. This makes it hard to run in parallel assuming nstruct is a large value to cover all possible docking positions.
I just finished a 10,000 -randomize1 -randomize2 run and looked at a couple structures with the lowest energy and they are not even close to what should be the docking interface. I am probably going to write some code that will go through and find models from the low resolution run where the predicted structure matches a set of distance constraints and then work from those models.
I started up 20 jobs with -nstruct 500 each and I assume that rosetta is using a good random seed so that the 20 jobs are independently random.
It's a good idea to use the flags -constant_seed and -jran ###, where ### is some large number (I use 6 or 7 digits). This "sets" the random number seed for rosetta's internal (and robust!) RNG. If you do not supply a seed, rosetta queries your OS for a random number to use as its seed: this may be useful or not depending on your OS. I personally always set the seed manually but many people do not.
There is no formal feature request system nor bug tracker. (There may be one associated with the rosetta @home project and/or the FoldIt game; but I think these are mostly for the UIs and not the protein code.) I'll point this request out.
I asked around; it looks like the next version of the docking protocol, released in 3.1, will respect AtomPairConstraints (atom-to-atom distance constraints.) Site constraints like in ++ will not be supported (yet).