Speeding up docking to a large complex

31 posts / 0 new

Top

Greetings,

I would like to dock a small protein (59 amino acids plus one 140 atom ligand) to a large trimeric complex (358 amino acids *per monomer*, plus eight 140-atom ligands). The total number of atoms in the system is nearly 21,000.
When I run a small test of the system using the following flags (we use Torque for queuing jobs) and generate 10 decoys per instantiation, each decoy takes between 1800 and 2400 seconds to complete (30-40 minutes).
command:
/pathto/bin/docking_protocol.linuxgccrelease
flags:
-in:path /pathto/rosetta_database/
-in:file:s ../$input.pdb
-extra_res_fa /pathto/BCL.fa.params
-extra_res_cen /pathto/BCL.cen.params
-packing:ex1
-packing:ex1aro
-packing:ex2
-packing:ex2aro
-docking:partners "ABCDEF_GH"
-out:nstruct $NUM
-out:file:fullatom true
-out:user_tag $PBS_ARRAYID
-out:prefix Proc$PBS_ARRAYID
-out:file:silent Proc$PBS_ARRAYID-silent.out
-out:file:silent_struct_type binary
-out:file:scorefile Proc$PBS_ARRAYID.score

I have experimental evidence which suggests a particular region of the trimeric complex (let's call it the "top"...) participates in binding this small protein (though the site itself is not precisely known) and note that the trimeric complex itself is relatively stable.
Is there some way to fix atoms or *not* scan rotamers for the ligands on the "inside" and amino acids making up the "underside" of the complex to somehow speed up the process?
Ideally, I'd only like to sample conformations of the small protein, it's ligand, and the top surface of the trimeric complex.

Would the liberal use of site constraints (as mentioned near the end of: http://www.rosettacommons.org/content/docking-protein-symmetrical-complex) help at all?

Note that I have already successfully pre-packed the complex (that took only 300 seconds...) and am starting from that structure for the test case above.

Any thoughts and advice are welcome!

Thanks,
Sándor

Post Situation:

Unsolved

Thu, 2012-05-10 20:50

skovacs

Top

I do not think packing is your slow step here. I'm pretty sure docking autodetects the interface and only packs that.

If you think you are docking at the "top", then if you use some combination of constraints and a manually assembled starting structure, you should be able to cut away a huge portion of the trimer (basically everything not near the binding site). It will ruin certain scorefunction terms (because of the ridiculous pretend-bonds where you've cut protein away) but it will reduce the number of atoms in the system.

Give Rosetta a starting structure with the ligand somewhere along the "top". You may need to play with the perturbation flags to get complete sampling along the top without getting too much sampling.

The simplest way I can think of to keep your ligand in the box you want is to combine CoordinateConstraint on some CA near its center of mass with BoundedFunc, to define some box that the center of mass has to stay in. I think that defines a sphere, so maybe you'll need several overlapping "allowed" coordinate centers combined via AmbiguousConstraint.

Delete any residues you think are sufficently far from the interface that you won't need them. Be sure your PDB input does NOT have TER where the deletions occur. Rosetta will assign garbage backbone terms to those positions (like rama), but they'll be fixed garbage, so they'll be equal for all structures.

I have no idea if this will speed you up enough to be worth the trouble.

Thu, 2012-05-10 21:08

smlewis

(Reply to #3)

Top

Interesting...

I had thought about something like this at first, but was unsure if it was considered a "sound practice"--any thought as to how "much" of the original complex I ought to include?

Since it's the "top" of the the complex, my first approach would be to take a horizontal slice through it, say 15 angstroms below the binding interface, and discard all the residues below.
I would then gradually move this cutoff downward (or maybe upward?) until I reached a point where the electrostatic contribution to the score didn't change...

Does this approach make sense?

Fri, 2012-05-11 06:55

skovacs

(Reply to #4)

Top

It's not a standard practice. The standard practice is "MORE POWER" - throw more computational resources at it. I think it will work but I've never tried.

15 Angstroms sounds reasonable. I think our longest scorefunction terms are in the 7-8 angstrom range. I'd say you want at least two residue layers under the interface to prevent packing incompatible with the full structure. Unless you turn it on, the standard score functions do not use explicit electrostatics, so you won't see that effect of range. I would suggest picking one range (15 sounds ok), docking, then grafting together the docked conformation plus the original bottom half of the interface and rescoring. If the scores get hugely worse then you can try a different range for the initial step.

Fri, 2012-05-11 07:07

smlewis

(Reply to #5)

Top

Reducing the system from 21K atoms to 12K atoms resulted in the times for a single decoy dropping from 30-40 minutes to around 4-5 minutes.
I suspect that not having to read in 2000 rotamers for 25 spatially different (but chemically identical) pigments contributed to the significant speedup.

The scores obtained from the reduced system were similar in "spread" or "distribution" to those obtained from the complete system, but only half the magnitude--are all interfaces present in a system (i.e. interfaces between *all* chains) scored?
Or does using the -docking:partners flag mean that only the interface between the two *sets* of chains listed is scored?

Thanks!
Sándor

Sun, 2012-05-13 20:50

skovacs

(Reply to #6)

Top

The score terms themselves ignore the interfaces, they're just scoring protein structure and don't know or care where the interfaces are. You've roughly halved the number of atoms, so getting half the score makes sense.

If you are referring to a reported binding energy, (which is not part of the scorefunction, but is calculated using the scorefunction), that shouldn't have changed much - it will be reporting only on the -partners defined interface.

Mon, 2012-05-14 07:08

smlewis

(Reply to #7)

Top

A few quick clarifications, and then I believe we can call this matter resolved.

When you mention a "reported binding energy" are you referring to the quantity called I_sc?

If so, then yes, I obtain similar results for both the full system and the reduced system.
Therefore, would "Potential Energy of Binding" be an acceptable interpretation of this I_sc quantity?

If not, how would such a binding energy be obtained?

Finally--and this is more of an aside--the units for such a binding energy would be reported as REU (Rosetta Energy Units) instead of kcal/mol, correct?
If so, is there a standard conversion between the two available? I have seen conversion factors ranging from 0.5-2.0 in the published literature, so at least it seems to be within one order of magnitude...

Thanks,
Sándor

Mon, 2012-05-14 08:36

skovacs

(Reply to #8)

Top

Assuming I'm following the code right, I_sc (that's capital i, not lowercase L) is a binding energy calculated as:

I_sc = bound_pose - unbound_pose

Where the unbound is generated by a 1000 Angstrom translation of the mobile jump. Because all the atoms near the interface are present in the full and reduced systems, this quantity should not be affected by the mass deletion.

You can also calculate binding energy (in the same fashion) using the InterfaceAnalyzer code.

Units are REU. REU do not convert to physical units. It is possible to balance the scorefunction in such a way that Rosetta can be used to estimate ddG's of mutation using kcal/mol units, but a scorefunction balanced for that purpose is useless for most other purposes. "0.5 to 2" is correct, but the key is that the exact conversion isn't the same for all systems and isn't knowable anyway.

The value of the scorefunction by itself is useful only for determining if the structure has huge flaws or not; positive scores generally indicate clashes and unfeasible structures.

The value of the scorefunction normalized by the number of residues is useful for determining structure quality; generally "good" scores are around -2 REU per residue.

Ranking the _relative_ value of the score for different conformations of the same system is the real purpose of the scorefunction: determining which of many conformations is most likely.

Tue, 2012-05-15 10:34

smlewis

(Reply to #9)

Top

Thanks for the clarification and explanation--that's particularly helpful!

Another quick question: I came across the flags -norepack1 and -norepack2 when re-reading the docking_protocol manual.
In practice, I observed that using one of these essentially fixes the atomic positions to their input configuration (i.e. the chains do not repack (move) during either the global or local searches).

Since I am using -docking_partners and have multiple chains specified, is there a similar option which would allow the repacking of *SOME* of the chains (say, the protein), while leaving others fixed (say, the ligands), instead of all or none?

Thanks!
Sándor

Wed, 2012-05-16 18:00

skovacs

(Reply to #10)

#10

Top

I think you're asking this: If -partners says ABCDEF_GH, you want to repack only ABCH, but not DEFG. I don't think there's a way to do that from command line. You could probably manage it from RosettaScripts. To my understanding the docking code is fully integrated into RosettaScripts so you can reassemble the existing protocol as a script (maybe it's already floating around as an example, I don't know) but tweak the input packing parameters (TaskOperations).

Thu, 2012-05-17 07:00

smlewis

(Reply to #11)

#11

Top

Yes; that's precisely what I'm hoping to do.

I found the documentation for RosettaScripts as part of the 3.3 release manual: http://www.rosettacommons.org/manuals/archive/rosetta3.3_user_guide/Rose...
In the documentation, a repository of existing scripts (examples) is mentioned at https://svn.rosettacommons.org/trac/browser/trunk/mini/demo/rosetta_scri... but it's prompting me for a developer's username and password for access (which I do not have).

As for the input packing parameters, there is one called "prevent repacking" which I believe I could use to keep the appropriate ligands fixed:
http://www.rosettacommons.org/manuals/archive/rosetta3.3_user_guide/Task...

It seems that I will need to specify the number of each of the individual residues (not just the chain names) that are to be held fixed, but this *should* allow me to accomplish the same result.

If I could see an example of the working docking protocol in RosettaScripts, I can try modifying it.
Could I be given access to this folder? Alternatively, can the docking example (.xml file) be posted to this thread?

Thanks,
Sándor

Thu, 2012-05-17 13:07

skovacs

(Reply to #12)

#12

Top

What was under that link should now be found under the rosetta/rosetta_demos/rosetta_scripts directory in the 3.3 and 3.4 releases. Unfortunately, the docking example is rather bare-bones. It simply uses the Docking mover, rather than breaking it out into steps. There's a number of options you can change, so you might be able to do a little bit finer control by making many Docking movers, with different stages turned off in each, and then compositing those movers in the appropriate order in the protocols block.

Regarding packing parameters, task operations take a little bit to get your head around, but once you do they're a somewhat flexible way of specifying residues. Note you don't have to explicitly list residues with prevent repacking - you can do tricks with the OperateOnCertainResidues operation. For example, to prevent repacking on chain "C", you can do the following:

[OperateOnCertainResidues name=nopackC]
[PreventRepackingRLT/]
[ChainIs chain=C/]
[/OperateOnCertainResidues]

(NOTE: Steven edited this post to replace the angle brackets with square brackets to make the forum not eat it - use angle brackets)

Thu, 2012-05-17 14:07

rmoretti

(Reply to #13)

#13

Top

Hi guys,

I've had a chance to play around a little bit with TaskOperations and have some more questions.

When I define
[OperateOnCertainResidues name=NoPackNonProt]
[PreventRepackingRLT/]
[ResidueLacksProperty property=PROTEIN/]
[/OperateOnCertainResidues]
with
[Docking name=dock1 fullatom=1 local_refine=0 score_high=score12 task_operations=NoPackNotProt/]
It ought to fix all the ligands in place, correct?

This is essentially the same as
[OperateOnCertainResidues name=NoPackB]
[PreventRepackingRLT/]
[ChainIs chain=B/]
[/OperateOnCertainResidues]
[OperateOnCertainResidues name=NoPackD]
[PreventRepackingRLT/]
[ChainIs chain=D/]
[/OperateOnCertainResidues]
[OperateOnCertainResidues name=NoPackF]
[PreventRepackingRLT/]
[ChainIs chain=F/]
[/OperateOnCertainResidues]
with
[Docking name=dock1 fullatom=1 local_refine=0 score_high=score12 task_operations=NoPackB,NoPackD,NoPackF/]
Which should only allow the ligand on the movable docking partner to scan and repack.

In either case, I am able to get the protocol to run.

However, the docking now takes *significantly* longer and I noticed the following lines in the .log file (which are different from when I used conventional docking and -norepack1)
protocols.docking.DockMCMCycle: in DockMCMCycle.apply
core.pack.task: Packer task: initialize from command line()
core.pack.dunbrack.SingleLigandRotamerLibrary: Added 2000 rotamers for BCL
core.pack.dunbrack.SingleLigandRotamerLibrary: Added 2000 rotamers for BCL
core.pack.dunbrack.SingleLigandRotamerLibrary: Added 2000 rotamers for BCL
core.pack.dunbrack.SingleLigandRotamerLibrary: Added 2000 rotamers for BCL
core.pack.dunbrack.SingleLigandRotamerLibrary: Added 2000 rotamers for BCL
core.pack.dunbrack.SingleLigandRotamerLibrary: Added 2000 rotamers for BCL
core.pack.dunbrack.SingleLigandRotamerLibrary: Added 2000 rotamers for BCL
... a total of 25 times, one for each instance of the ligand in the entire structure.

The same lines reappear often in the log as follows (though never all 25 in a row together):

core.pack.interaction_graph.interaction_graph_factory: Instantiating DensePDInteractionGraph
core.pack.pack_rotamers: built 1753 rotamers at 100 positions.
core.pack.pack_rotamers: IG: 2009676 bytes
protocols.docking.DockMCMCycle: in DockMCMCycle.apply
core.pack.task: Packer task: initialize from command line()
core.pack.dunbrack.SingleLigandRotamerLibrary: Added 2000 rotamers for BCL
core.pack.dunbrack.SingleLigandRotamerLibrary: Added 2000 rotamers for BCL
core.pack.dunbrack.SingleLigandRotamerLibrary: Added 2000 rotamers for BCL
core.pack.dunbrack.SingleLigandRotamerLibrary: Added 2000 rotamers for BCL
core.pack.dunbrack.SingleLigandRotamerLibrary: Added 2000 rotamers for BCL
core.pack.dunbrack.SingleLigandRotamerLibrary: Added 2000 rotamers for BCL
core.pack.dunbrack.SingleLigandRotamerLibrary: Added 2000 rotamers for BCL
core.pack.dunbrack.SingleLigandRotamerLibrary: Added 2000 rotamers for BCL
protocols.docking.DockMCMCycle: in DockMCMCycle.apply

Ultimately:
protocols.jd2.JobDistributor: Proc1FMOandCsmA_J_0001 reported success in 1114 seconds

When I compare this result with one obtained using regular docking and the -norepack1 flag, this current protocol takes roughly ten times longer.

When I take a look at the output structures, I *almost* see the behavior I expect:
The ligands inside the stationary protein partner all remained fixed (possibly because they were already prepacked), although now the protein sidechains near the binding interface are able to shift (instead of being fixed).
However, the ligands on top of the protein and on the mobile partner must've repacked because they adopted a different conformation for each pose generated, therefore I'm concerned that the OperateOnCertainResidues is not having the intended effect of fixing the ligands.

Next, are the flags I normally use for docking_protocol read in when I use rosetta_scripts? (I used the identical command except for changing docking_protocol.linuxgccrelease to rosetta_scripts.linuxgccrelease and adding the -parser:protocol flag)
If so, (and I think this is the case because the I/O formats looks the same) I should be okay...
If not, how do I make sure I am still doing the docking-specific tasks correctly, such as partner docking, adding the extra protein sidechains (-ex1, -ex1aro, etc.), and any additional constraints?

Also, I have a suspicion that the (re)loading of this rotamer library during a generation of a single decoy is the significant contribution to the slowdown--is there another flag that I missed somewhere along the way?

Thanks,
Sándor

Thu, 2012-05-24 13:50

skovacs

(Reply to #14)

#14

Top

You're asking great questions but unfortunately none of them are questions to which I know the answers. I've sent this along to some of the docking folks for further comment.

On packing - you can of course check that "BCL" does not define PROTEIN in its params file, which would ruin your prevention scheme. You could also try hard-coding the residue numbers of the ligands you don't want packed.

Do you want only some of these ligands repacked and not others? If you don't want them to repack, why not just fail to provide the rotamer library...?

"Next, are the flags I normally use for docking_protocol read in when I use rosetta_scripts? " -- They are certainly supposed to be but I cannot guarantee that they are. It looks to me from the code that it is.

Fri, 2012-05-25 13:09

smlewis

(Reply to #15)

#15

Top

Hi, coming in late to this but Steven suggested I try to help out. Those are really good questions. Everything you are doing looks right, however given the results you are seeing it is likely that the TaskOperations are not being read in correctly. I have never used [OperateOnCertainResidues/] so I'm not sure that it works. I ran into a similar problem a while back with a different TaskOperation and wound up making another Docking interface called: [DockingProtocol/]. It is a lot closer to the actual docking_protocol.linuxgccrelase version, you can find the documentation on it here: http://www.rosettacommons.org/manuals/archive/rosetta3.4_user_guide/Move...

The same command line flags you use for the docking_protocol.[exe] should work in rosetta_scripts as well. The in most cases the options are read by the individual protocols and thus will work the same in either context.

Hopefully this gives you some things to try but I don't see anything in particular that you are doing wrong.

Fri, 2012-05-25 13:37

stranges

(Reply to #16)

#16

Top

Greetings!

I've had a chance to play around with a few more of the TaskOperations in RosettaScripts when using Rosetta 3.3 including [OperateOnCertainResidues][PreventRepackingRLT] (where I specify either lacking the property "PROTEIN" or list their individual chains) and the direct [PreventRepacking] (where I list the 24 individual residues) but did not meet with much success. Unfortunately, none of these TaskOperations seemed to (actually) prevent the repacking of the ligands (and therefore loading of the ligand rotamer files), making the docking take prohibitively long.

I have upgraded to Rosetta 3.4 in order to try and use the new [DockingProtocol] Mover and [PreventResiduesFromRepacking] TaskOperation present in the newest RosettaScripts as suggested. [PreventResiduesFromRepacking] didn't work, but I now obtain the following NEW error with [PreventRepacking]:

ERROR: ReturnSidechainMover used with poses of different sequence; aborting
ERROR:: Exit from: src/protocols/simple_moves/ReturnSidechainMover.cc line: 81

Here is the .xml script I tried using in Rosetta 3.3:
This protocol will simply do low-resolution followed by high-resolution docking. It will also report the binding energy (ddg) and buried-surface area (sasa) in the score file.
[dock_design]
[SCOREFXNS]
[/SCOREFXNS]
[FILTERS]
[/FILTERS]
[TASKOPERATIONS]
[PreventRepacking name=FixBCLs resnum=359,360,361,362,363,364,365,366,725,726,727,728,729,730,731.732,1091,1092,1093,1094,1095,1096,1097,1098/]
[/TASKOPERATIONS]
[MOVERS]
[Docking name=dock1 fullatom=1 local_refine=0 score_high=score12 task_operations=FixBCLs/] make fullatom=0 to do low-resolution only docking. Change local_refine=1 to do only high-resolution docking.
[ConstraintSetMover name=topconstraint cst_file=constraints.txt/]
[/MOVERS]
[APPLY_TO_POSE]
[/APPLY_TO_POSE]
[PROTOCOLS]
[Add mover_name=dock1/]
[Add mover_name=topconstraint/]
[/PROTOCOLS]
[/dock_design]

And here is the new .xml script I am using in Rosetta 3.4:
[dock_design]
[SCOREFXNS]
[/SCOREFXNS]
[FILTERS]
[/FILTERS]
[TASKOPERATIONS]
[PreventRepacking name=FixBCLs resnum=359,360,361,362,363,364,365,366,725,726,727,728,729,730,731.732,1091,1092,1093,1094,1095,1096,1097,1098/]
[/TASKOPERATIONS]
[MOVERS]
[DockingProtocol name=dock1 docking_local_refine=0 docking_score_high=score12 partners="ABCDEF_GH" task_operations=FixBCLs/]
[ConstraintSetMover name=topconstraint cst_file=constraints.txt/]
[/MOVERS]
[APPLY_TO_POSE]
[/APPLY_TO_POSE]
[PROTOCOLS]
[Add mover_name=dock1/]
[Add mover_name=topconstraint/]
[/PROTOCOLS]
[/dock_design]

Along with the following command line options to run rosetta_scripts (in either version):
/path-to-executable/rosetta_scripts.linuxgccrelease
-in:path /path-to-folder/rosetta_database/
-in:file:s ../$input.pdb
-extra_res_fa /path-to-file/BCL.fa.params
-extra_res_cen /path-to-file/BCL.cen.params
-packing:ex1
-packing:ex1aro
-packing:ex2
-packing:ex2aro
-docking:partners "ABCDEF_GH"
-use_input_sc
-out:nstruct $NUM
-out:file:fullatom true
-out:user_tag $PBS_ARRAYID
-out:prefix Proc$PBS_ARRAYID
-out:file:silent Proc$PBS_ARRAYID-silent.out
-out:file:silent_struct_type binary
-out:file:scorefile Proc$PBS_ARRAYID.score
-parser:protocol path-to-file/dockingscript.xml
> $stem-Proc$PBS_ARRAYID.log

As an aside, am I using the constraint mover correctly? I only have a single line in the file: AtomPair CA 139 CA 1147 BOUNDED 0 100 0.50
I am trying to restrict the global search area to the top of the larger docking partner (rather than the sides, or bottom), and AtomPair seemed to be the simplest to try and use.
Steven mentioned the use of "Coordinate Constraints" in one of the posts above, but what are the three numbers after the atom name but before the function definition, i.e.

From the RosettaScripts UserGuide: ConstraintSetMover

Adds constraints to the pose using the constraints' read-from-file functionality.

[ConstraintSetMover name=(&string) cst_file=(&string)/]
cst_file: the file containing the constraint data. e.g.,:

...
CoordinateConstraint CA 1 CA 380 27.514 34.934 50.283 HARMONIC 0 1
CoordinateConstraint CA 1 CA 381 24.211 36.849 50.154 HARMONIC 0 1
...

I welcome your additional thoughts and suggestions!

Cheers,
Sándor

Thu, 2012-06-07 17:31

skovacs

(Reply to #17)

#17

Top

Bounded constraint: BOUNDED takes 5 arguments. I think in this situation it will work just fine, but if you had more than one line, it would "eat" the next line of input as the 'tag' for BOUNDED. http://www.rosettacommons.org/manuals/archive/rosetta3.4_user_guide/de/d...

Coordinate Constraint: "///@details one line definition "Coordinates atom1 res1 atom2 res2 function_type function_definition""

Atom 1 is a moving atom whose coordinates you wish to constrain.

Atom 2 is a nonmoving atom; its purpose is just to allow the scorefunction to detect the movement of atom 1 relative to 2. The scorefunction doesn't see an atom at the coordinate point and doesn't automatically watch for relative movement.

The coordinates are what Atom 1 is compared to. The three numbers are the coordinates XYZ.

ReturnSidechainMover: Well, Rosetta thinks the sequence is changing. Throw a command line flag in for preventing design (packing:repack_only) and see if that helps. I assume the centroid/fullatom ligands have identical names?

Fri, 2012-06-08 06:41

smlewis

(Reply to #18)

#18

Top

So, based on your response, I ought to use the following line as my constraint (the ligand is centered at the origin to start):

CoordinateConstraint CA 1147 CA 139 0 0 0 BOUNDED 0 100 0.50

Which, in turn, would allow this atom of the movable docking partner to explore a sphere centered at the origin with radius 100Å?

Unfortunately, adding the flag for preventing design had no effect; I still obtain the error:

ERROR: ReturnSidechainMover used with poses of different sequence; aborting
ERROR:: Exit from: src/protocols/simple_moves/ReturnSidechainMover.cc line: 81

The centroid and full-atom ligands have identical names.

Additionally, I noticed that Rosetta 3.4 now tries to repack ALL the residues (even though they were pre-packed in 3.3), complaining about missing atoms:
core.pack.pack_missing_sidechains: packing residue number 1 because of missing atom number 5 atom name CB
core.pack.pack_missing_sidechains: packing residue number 2 because of missing atom number 5 atom name CB
core.pack.pack_missing_sidechains: packing residue number 3 because of missing atom number 5 atom name CB
core.pack.pack_missing_sidechains: packing residue number 4 because of missing atom number 5 atom name CB
core.pack.pack_missing_sidechains: packing residue number 5 because of missing atom number 5 atom name CB
core.pack.pack_missing_sidechains: packing residue number 6 because of missing atom number 5 atom name CB
etc.

I suspect that this is having an effect...

Fri, 2012-06-08 07:11

skovacs

(Reply to #19)

#19

Top

CoordinateConstraint CA 1147 CA 139 0 0 0 BOUNDED 0 100 0.50

Use "CoordinateConstraint CA 1147 CA 139 0 0 0 BOUNDED 0 100 0.50 0.5 TAG". 0.5 is discretionary, but the value 0.5 will provide a continuous derivative in minimization. TAG is, well, a string tag, but you want it to avoid triggering a bug in the constraint reader.

This will give zero score within a 100 angstrom radius sphere. The score will rise first parabolically and then linearly outside that sphere (Try plotting the BOUNDED function to see what I mean).

This "packing residue number X" with CB missing means that Rosetta is reading in a centroid (or simply sidechainless) pose when it expects a fullatom pose. Which are you passing in? Note that this is not the same type of repacking about which you were concerned before; this is occuring during PDB file loading.

Fri, 2012-06-08 07:16

smlewis

(Reply to #20)

#20

Top

Glad I caught you this morning. ;)
Thanks for clarifying the constraint definitions.

The pdb file I am using as a starting structure is indeed fullatom, so if a centroid pose is expected, then the ReturnSidechainMover error I obtained makes sense...

Here is a little more of the log file:
protocols.rosetta_scripts.ParsedProtocol: =======================BEGIN MOVER DockingProtocol=======================
{
protocols.docking.DockingProtocol: Danger Will Robinson! Native is an impostor!
protocols.docking.DockingProtocol: Setting docking foldtree
protocols.docking.DockingProtocol: old fold tree: FOLD_TREE EDGE 1 358 -1 EDGE 1 359 1 EDGE 1 360 2 EDGE 1 361 3 EDGE 1 362 4 EDGE 1 363 5 EDGE 1 364 6 EDGE 1 365 7 EDGE 1 366 8 EDGE 1 367 9 EDGE 367 724 -1 EDGE 1 725 10 EDGE 1 726 11 EDGE 1 727 12 EDGE 1 728 13 EDGE 1 729 14 EDGE 1 730 15 EDGE 1$
protocols.docking.DockingProtocol:
protocols.docking.DockingProtocol: new fold tree: FOLD_TREE EDGE 1 358 -1 EDGE 358 359 6 EDGE 359 366 -1 EDGE 366 367 5 EDGE 367 553 -1 EDGE 553 1158 1 EDGE 553 724 -1 EDGE 724 725 4 EDGE 725 732 -1 EDGE 732 733 3 EDGE 733 1090 -1 EDGE 1090 1091 2 EDGE 1091 1098 -1 EDGE 1158 1157 7 EDGE 1157 1099 -1
protocols.docking.DockingProtocol:
protocols.docking.DockingProtocol: ////////////////////////////////////////////////////////////////////////////////
protocols.docking.DockingProtocol: /// Rosetta 3 Docking Protocol ///
protocols.docking.DockingProtocol: /// ///
protocols.docking.DockingProtocol: /// Dockable Jumps: 1 ///
protocols.docking.DockingProtocol: /// Low Resolution Docking Protocol: on ///
protocols.docking.DockingProtocol: /// High Resolution Docking Protocol: on ///
protocols.docking.DockingProtocol: /// Low Resolution Filter: on ///
protocols.docking.DockingProtocol: /// High Resolution Filter: on ///
protocols.docking.DockingProtocol: ////////////////////////////////////////////////////////////////////////////////
core.chemical.ResidueTypeSet: Finished initializing centroid residue type set. Created 2125 residue types
protocols.docking.DockingProtocol: FOLD_TREE EDGE 1 358 -1 EDGE 358 359 6 EDGE 359 366 -1 EDGE 366 367 5 EDGE 367 553 -1 EDGE 553 1158 1 EDGE 553 724 -1 EDGE 724 725 4 EDGE 725 732 -1 EDGE 732 733 3 EDGE 733 1090 -1 EDGE 1090 1091 2 EDGE 1091 1098 -1 EDGE 1158 1157 7 EDGE 1157 1099 -1
protocols.docking.DockingProtocol:
protocols.docking.DockingLowRes: in DockingLowRes.apply
protocols.docking.DockingLowRes: ////////////////////////////////////////////////////////////////////////////////
protocols.docking.DockingLowRes: /// Docking Low Res Protocol ///
protocols.docking.DockingLowRes: /// ///
protocols.docking.DockingLowRes: /// Centroid Inner Cycles: 50 ///
protocols.docking.DockingLowRes: /// Centroid Outer Cycles: 10 ///
protocols.docking.DockingLowRes: /// Ensemble 1: off ///
protocols.docking.DockingLowRes: /// Ensemble 2: off ///
protocols.docking.DockingLowRes: /// Scorefunction: ///
protocols.docking.DockingLowRes: ScoreFunction::show():
weights: (interchain_pair 1) (interchain_vdw 1) (interchain_env 1) (interchain_contact 2)
energy_method_options: EnergyMethodOptions::show: etable_type: FA_STANDARD_DEFAULT
EnergyMethodOptions::show: unfolded_energies_type: UNFOLDED_SCORE12
protocols.docking.DockingLowRes: EnergyMethodOptions::show: atom_vdw_atom_type_set_name: centroid
protocols.docking.DockingLowRes: EnergyMethodOptions::show: exclude_protein_protein_hack_elec: false
protocols.docking.DockingLowRes: EnergyMethodOptions::show: exclude_monomer_hack_elec: false
protocols.docking.DockingLowRes: EnergyMethodOptions::show: hackelec_max_dis: 5.5
protocols.docking.DockingLowRes: EnergyMethodOptions::show: hackelec_min_dis: 1.5
protocols.docking.DockingLowRes: EnergyMethodOptions::show: hackelec_die: 10
protocols.docking.DockingLowRes: EnergyMethodOptions::show: hackelec_no_dis_dep_die: false
protocols.docking.DockingLowRes: EnergyMethodOptions::show: exclude_DNA_DNA: true
protocols.docking.DockingLowRes: EnergyMethodOptions::show: cst_max_seq_sep: 18446744073709551615
protocols.docking.DockingLowRes: EnergyMethodOptions::show: bond_angle_central_atoms_to_score:
protocols.docking.DockingLowRes: EnergyMethodOptions::show: bond_angle_residue_type_param_set: none
protocols.docking.DockingLowRes: HBondOptions::show: exclude_DNA_DNA: true
protocols.docking.DockingLowRes: HBondOptions::show: include_intra_res_RNA_: false
protocols.docking.DockingLowRes: HBondOptions::show: exclude_self_hbonds: true
protocols.docking.DockingLowRes: HBondOptions::show: use_hb_env_dep: true
protocols.docking.DockingLowRes: HBondOptions::show: use_hb_env_dep_DNA: true
protocols.docking.DockingLowRes: HBondOptions::show: smooth_hb_env_dep: true
protocols.docking.DockingLowRes: HBondOptions::show: bb_donor_acceptor_check: true
protocols.docking.DockingLowRes: HBondOptions::show: decompose_bb_hb_into_pair_energies: false
protocols.docking.DockingLowRes: HBondOptions::show: params_database_tag_: standard_params
protocols.docking.DockingLowRes: HBondOptions::show: use_incorrect_deriv_: false
protocols.docking.DockingLowRes: HBondOptions::show: use_sp2_chi_penalty_: false
protocols.docking.DockingLowRes: HBondOptions::show: Mbhbond: false
protocols.docking.DockingLowRes:

protocols.docking.DockingLowRes: ////////////////////////////////////////////////////////////////////////////////
protocols.docking.DockingLowRes: ::::::::::::::::::Centroid Rigid Body Adaptive:::::::::::::::::::
protocols.TrialMover: Acceptance rate: 0.32
protocols.TrialMover: Acceptance rate: 0.38
protocols.TrialMover: Acceptance rate: 0.32
protocols.TrialMover: Acceptance rate: 0.08
protocols.TrialMover: Acceptance rate: 0.04
protocols.TrialMover: Acceptance rate: 0
protocols.TrialMover: Acceptance rate: 0
protocols.TrialMover: Acceptance rate: 0.08
protocols.TrialMover: Acceptance rate: 0.06
protocols.TrialMover: Acceptance rate: 0.04
core.pack.task: Packer task: initialize from command line()
core.pack.pack_missing_sidechains: packing residue number 1 because of missing atom number 5 atom name CB
core.pack.pack_missing_sidechains: packing residue number 2 because of missing atom number 5 atom name CB
etc. etc. etc.
core.pack.pack_missing_sidechains: packing residue number 1158 because of missing atom number 1 atom name C37
core.pack.dunbrack: Dunbrack library took 0.01 seconds to load from binary
core.pack.dunbrack.SingleLigandRotamerLibrary: Read in 2000 rotamers for BCL!
core.pack.dunbrack.SingleLigandRotamerLibrary: Added 2000 rotamers for BCL
etc. etc. etc.
core.pack.dunbrack.SingleLigandRotamerLibrary: Added 2000 rotamers for BCL
core.pack.interaction_graph.interaction_graph_factory: Instantiating DensePDInteractionGraph
core.pack.pack_rotamers: built 48321 rotamers at 1034 positions.
core.pack.pack_rotamers: IG: 157861740 bytes

Which then terminates with the same ReturnSidechainMover error as before.
Could this be interpreted as the initial conversion to centroid is not happening correctly before the low resolution dock is attempted?

Fri, 2012-06-08 07:34

skovacs

(Reply to #21)

#21

Top

Upon further investigation, I think it appears to be an issue with the high resolution protocol...

When I split up the two docking commands as follows,
[MOVERS]
[DockingProtocol name=dock1 low_res_protocol_only=1 partners="ABCDEF_GH" task_operations=FixBCLs /]
[ConstraintSetMover name=topconstraint cst_file= "/home/sak2/Rosetta/constraints2.txt" /]
[DockingProtocol name=dock2 docking_local_refine=1 partners="ABCDEF_GH" task_operations=FixBCLs /]
[/MOVERS]
[PROTOCOLS]
[Add mover_name=dock1 /]
[Add mover_name=topconstraint /]
[Add mover_name=dock2 /]
[/PROTOCOLS]

The low resolution (centroid) command completes successfully, the distance constraint *seems* to be applied, and then the high resolution (which should be fullatom, correct?) issues the pack_missing_sidechains warning for each residue due to a missing atom--either CB or C37 (in the case of the ligand).
It is able to complete a structure, but because certain ligands were repacked instead of held fixed, there are now horrible clashes and scores in the positive thousands.

Is there a way to ensure that we're back in fullatom before the refining step begins?

Fri, 2012-06-08 08:41

skovacs

(Reply to #22)

#22

Top

I have someone more experienced with docking than I taking a look at this, but we're out well beyond what we can do via forum.

One relatively easy diagnostic: If you delete your ligands from the input, does everything work again? (Can we be sure they are the source of the issues?)

Also, I think this is the thread where you were clipping the complex to delete large numbers of unmoving residues to speed computation - are all those broken chains being registered as single chains, or does the output list them all as separate chains?

Fri, 2012-06-08 11:37

smlewis

#23

Top

I'm not surprised that the above example fails. You should only have a centroid level pose after dock1 that is never converted to a full atom before dock2. Unfortunately that doesn't really help much as you are seeing the same problem when you run the full protocol otherwise.
For debugging purposes it is nice to throw in DumpPDB statements every so often on your protocol to see what's going on where. A look at the PDB at the different points in a protocol can tell you a lot. Look at the documentation on DumpPDB: http://www.rosettacommons.org/manuals/archive/rosetta3.4_user_guide/Move...
You could one after dock1 above to see what your PDB looks like at this point.

I don't know of anyone who has does protein docking with a ligand, so there might be a problem at that level. Not sure off hand the best way to investigate that. I realize the ligand is the whole point of this, but could you try the above steps without it to see if the problem still exists. It should help us figure out a way forward.

Fri, 2012-06-08 11:58

stranges

(Reply to #24)

#24

Top

Addendum to the above:
Have you tried running a simple protocol on your input structure just to make sure it runs. Leave out the docking steps and just run PackRotamers: http://www.rosettacommons.org/manuals/archive/rosetta3.4_user_guide/Move...
You could give it a simple task operation such as PreventRepacking
http://www.rosettacommons.org/manuals/archive/rosetta3.4_user_guide/Task...

Fri, 2012-06-08 12:09

stranges

(Reply to #25)

#25

Top

Greetings,

Simple Rosetta 3.4 protocols seem to work on the input structure (read: generate output without error), however, I obtained four non-identical output structures (each had a different number of ATOMS!) when running the [PackRotamers] mover with the task operation [PreventRepacking], so I'm unsure as to how to interpret this result...

I also went ahead and ran both the conventional bin/docking_protocol.linuxgccrelease and /bin.rosetta_scripts.linuxgccrelease with [docking_protocol] mover in Rosetta 3.4 on the full system, on the system without the offending ligand, on systems with just the single ligand on the movable partner, and on systems without ANY ligands. I have not cut away large parts of the protein because I have yet to get it to run successfully with the entire protein.
When I use [dumppdb] to check the format between dock1 (low res) and dock2 (high res), I do obtain a centroid .pdb file.

Furthermore, I obtain the missing sidechains message in ALL cases (i.e. it doesn't matter if the ligands are present/absent) after completing the low resolution (centroid) search at the start of the high resolution (fullatom) local docking search:

protocols.docking.DockingLowRes:

If I run docking_protocol.linuxgccrelease in Rosetta 3.3 on any of input structures (with/without ligands), I do not obtain this error and am able to obtain plausible (read: negative scoring) decoys.
Therefore, it seems that there is a bug when returning to fullatom mode from centroid mode when docking in Rosetta 3.4
It doesn't seem that the reverse mapping is being applied at all, causing ALL sidechains to repack and the ensuing horrible clashes (I obtained positive scores in each case, all with very high fa_rep values)

I've attached the no-ligand input file I used to obtain this message (added .txt to the filetype to allow uploading) in order to replicate the results.

Sándor

Mon, 2012-06-11 10:17

skovacs

#26

Top

Greetings,
Is there a solution for this problem? I have the same issue!
Rosetta 3.4 fails with the same Error when running protein protein docking
>>>
Additionally, I noticed that Rosetta 3.4 now tries to repack ALL the residues (even though they were pre-packed in 3.3), complaining about missing atoms:
core.pack.pack_missing_sidechains: packing residue number 1 because of missing atom number 5 atom name CB
core.pack.pack_missing_sidechains: packing residue number 2 because of missing atom number 5 atom name CB
core.pack.pack_missing_sidechains: packing residue number 3 because of missing atom number 5 atom name CB
core.pack.pack_missing_sidechains: packing residue number 4 because of missing atom number 5 atom name CB
core.pack.pack_missing_sidechains: packing residue number 5 because of missing atom number 5 atom name CB
core.pack.pack_missing_sidechains: packing residue number 6 because of missing atom number 5 atom name CB
>>>>>

I do not use any fancy flags, so far.
rosetta_source/bin/docking_protocol.linuxgccrelease --database /path to/Rosetta/rosetta_database -s /home/path/to/input.pdb -ex1 -ex1aro -nstruct 100 -partners ABCD_E -out:path:pdb /path/to/results/folder/ -out:path:score /path/to/results/folder

I would be very grateful for any help!

Daniel

Sun, 2012-12-09 19:37

dati

(Reply to #27)

#27

Top

The "missing atom name CB" indicates to me that there may be an issue with your input structure - is it a backbone only structure, or is there occupancy only on the backbone atoms? Did it come from the wwPDB/RCSB, or was it generated by other means? If it came directly from Rosetta 3.3 full atom mode output, there shouldn't be repacking when read into Rosetta3.4, although if you did any manipulation of the structure (or if it was centroid-mode output) that might be an issue.

Regarding the problem in the rest of the thread, there were a number of things discussed. What's specifically your problem? Is it simply that you think docking is taking too long? If so, how long is it taking per structure, and what's the size of the protein being docked and the size of their docked interface?

Mon, 2012-12-10 10:36

rmoretti

(Reply to #28)

#28

Top

Hello,
sorry for being not specific enough. I'm doing protein docking (total 463 Res). Since my ligand contains Hyp I used the procedure discribed here to get the docking running: http://www.rosettacommons.org/content/creating-centroid-patches-proline-... . When I now try to run docking I get the following error:
core.pack.pack_missing_sidechains: packing residue number 1 because of missing atom number 5 atom name CB
core.pack.pack_missing_sidechains: packing residue number 2 because of missing atom number 5 atom name CB
...
...
...
core.pack.pack_missing_sidechains: packing residue number 462 because of missing atom number 6 atom name CB
core.pack.interaction_graph.interaction_graph_factory: Instantiating DensePDInteractionGraph
core.pack.pack_rotamers: built 4724 rotamers at 421 positions.
core.pack.pack_rotamers: IG: 3175572 bytes

ERROR: ReturnSidechainMover used with poses of different sequence; aborting
ERROR:: Exit from: src/protocols/simple_moves/ReturnSidechainMover.cc line: 81

The input structure was generated with the prepack protokoll and is a full atom pdb. If I mutate Hyp to Pro the docking goes well and there are no errors. I think the Hyp causes the troubles when Rosetta does the high resolution refinement?! I do not want to speed my docking up as discussed above. I just have a similar error meassage as discribed at post # 17.
flags i used:
flags="-s $inputpath/$inputfile -ex1 -ex1aro -partners ABCD_E -dock_pert 3 8 -nstruct 10 -randomize2 -out:path:pdb $outpath -out:path:score $outpath"

I would be very grateful for any help!
Best Daniel

Sun, 2012-12-16 20:11

dati

(Reply to #29)

#29

Top

Let's carefully check your hydroxyproline patches. First, make sure the hydroxyproline patches are active in both the centroid and fa_standard patches.txt files. rosetta_database/chemical/residue_type_sets/[fa_standard | centroid]/patches.txt should have the hydroxyproline patches commented-in (no # sign). Next, check the patches themselves and make sure the IO STRING lines match each other and match your input PDB (HYP as the three-letter code in the default case; P as the one-letter code).

The error suggests to me that Rosetta recognizes HYP in both states but thinks it has a different name in fullatom vs. centroid.

If the HYP is not at your interface...just let it be proline and don't sweat it.

Mon, 2012-12-17 07:10

smlewis

(Reply to #30)

#30

Top

Hi smlewis

Thank you so much for these helpfull hints. There was a typo in one of the patch files, which caused a wrong name for HYP. Docking runs fine now!
Thanks again!

Best Daniel

Mon, 2012-12-17 21:05

dati

(Reply to #31)

#31

Top

You're quite welcome. I'm just glad it's an easy-to-fix error, lots of the questions we get here are unresolvable...

Tue, 2012-12-18 08:18

smlewis

Search form

You are here

Speeding up docking to a large complex