You are here

P-P docking with ambiguous constraints

6 posts / 0 new
Last post
P-P docking with ambiguous constraints
#1

Hi, I've a question about protein docking with constraits.

i want to run a low resolution docking of 2 proteins each around 450 residues to understand how the dimer is being formed based on distances from cross-linking experiment.
I have got around 300 distances that i want to use as constraints, i know that all of them cannot be satisfied at the same time, giving rise to multiple binding modes. So i really hope to find some poses that will satisfy max amount of my cross-link distances. Moreover all of them come with kind of confidence scores. For example (first column=first residue; second=second residue; the distance is not fixed but should be <25A; the last column is the confidence= rosetta should use it as a weight thus to give a preference to the ones with bigger number):
311,335,<25,16
311,353,<25,1
311,358,<25,31
333,358,<25,1
335,353,<25,3

Reading the threads, i found out about the possibility to use AmbiguousConstraints.

AmbiguousConstraint
AtomPairConstraint Atom1 Res1 Atom2 Res2 BOUNDED lb ub sd rswitch Experimental_constraint
AtomPairConstraint Atom1 Res1 Atom2 Res2 CONSTANTFUNC value
END_AMBIGUOUS

Do you think it's a good idea to have 300 ambiguous constraints?
Can anyone point me out the way to do it?
I would really like some more information about constraints with weights.

So far i just obtained prepacked version of my protein (the solved structure i am using is a dimer, and thus i am trying to find out alternative dimer forms of this protein based on cross-linking experiment).

Thanks so very much !

Post Situation: 
Fri, 2013-11-08 11:38
Anouk

First off, a terminology disclaimer: When Rosetta talks about "constraints", most of the time it's really talking about what most other computational approaches call "restraints". That is to say, Rosetta "constraints" don't prohibit sampling outside the "constraint", but instead attach an energy penalty to the "constraint" violation - the lower the penalty the more satisfied the "constraint" is.

As far as Rosetta is concerned, it's perfectly fine with 300 ambiguous constraints - writing such a constraint file by hand might not be recommended from a tedium point of view, but Rosetta will be fine with reading it in. (If you have the list like you showed above, it should be easy to script a conversion, though.)

AmbiguousConstraints might be what you''re after. To use them, you need to be able to break your constraints into groups where you can say "I either want this constraint to be satisfied, or I want that constraint to be satisfied, whatever gives me a better score". Each of those groups then gets their own ambiguous constraint.

That might not be what you want, though. I get the sense that you don't have any subgroupings of constraints, but instead want to satisfy as many of them as possible at the same time. In that case you don't need to use ambiguous constraints - you can just apply all the constraints plain. Rosetta will attempt to maximize the number of constraints it satisfies, and will simply accumulate the energy penalty for the ones it can't. If you set up the associated functions appropriately, the constraints you want will be the ones favored.

For your situation, I'd recommend using an AtomPairConstraint to constrain distances, and then using the BOUNDED function. You can set the lower bound to 0 and the upper bound to 25 to have the constraint be zero in the range of 0 to 25 angstroms. You can then adjust the sd value such that the ones with the most preference get the greatest sd value, making them the ones Rosetta tries to optimize the most.

This might not quite get you where you want to go, though. In this scheme if you can't make a constraint, you'll do your best to move the structure as close as possible to where you could satisfy it. This may compress the protein unnecessarily. There's a couple of approaches to get around this. If you have a sense about what fraction of the constraints will likely be invalid, you can wrap everything in a single KofNConstraint. That's like an AmbiguousConstraint, but will allow you to specify that you want the top 250 or 170 of your 300 constraints to be evaluated, and the worst 50 or 130 to be ignored. The other possibility is to do what the Ambiguous constraint example is doing. You set up a bounded constraint with the main penalty, and then you add a CONSTANTFUNC or a LINEAR_PENALTY function to cap how severe the penalty gets (and how steep the slope), minimizing the effect you get for violating the constraint.

Fri, 2013-11-08 15:46
rmoretti

Hi Rocco,
please provide an example showing how to wrap the constraints in a single KofNConstraint with top 50 % constraints to be satisfied.

constraints.cst:
-----------------
Angle CB 8 SG 8 ZN 32 HARMONIC 1.95 0.35
AtomPair NE2 13 V3 32 HARMONIC 0.0 0.2
Angle CD2 13 NE2 13 ZN 32 HARMONIC 2.09 0.35
Dihedral CG 13 CD2 13 NE2 13 ZN 32 CIRCULARHARMONIC 3.14 0.35
---------------

Thanks

Mon, 2013-11-18 05:14
nawsad

Just like the ambiguous constraints, you simply wrap them in a KofNConstraint block, but with a parameter to take the number of constraints to satisfy.

-----------------
KofNConstraint 2
Angle CB 8 SG 8 ZN 32 HARMONIC 1.95 0.35
AtomPair NE2 13 V3 32 HARMONIC 0.0 0.2
Angle CD2 13 NE2 13 ZN 32 HARMONIC 2.09 0.35
Dihedral CG 13 CD2 13 NE2 13 ZN 32 CIRCULARHARMONIC 3.14 0.35
END KofNConstraint
-----------------

This will attempt to satisfy 2 (of the four) constraints in the block, whichever can be best satisfied.

Mon, 2013-11-18 07:38
rmoretti

Dear rmoretti!
Thanks a lot for clearing up the things for me :)

According to your experience should i run low resolution docking first then take the result and run full atom or it's wise to do directly the full atom procedure. (I am running it on my McBook Pro 2.3 Ghz IntelCore i7, 8Gb so don't have any idea about the run time, will it be an hour/day/century ? :)

I would appreciate so much if you could have an expert look at the following:

============ flag file===========

in:file:s my.pdb
-dock_pert 3 8
-ex1
-ex2aro
-out:file:fullatom
-out:file:o score
-constraints:cst_fa_file constraint.cst
-out:path:pdb /blabla/bla

=============costraint.cst============

for each constrain {
AtomPairConstraint Atom1 Res1 Atom2 Res2 BOUNDED lb ub sd rswitch

where
lb=20;
ub=25;
sd=Max(confidence score) / confidence score for given cross-link
}

is there any let's say limit for the force, because perhaps in the way i define the sd, it can get to quite big numbers.

Thanks a lot! and more!

Wed, 2013-11-13 07:07
Anouk

If you know where your protein is going to be docked, you can often skip the low resolution stage, and go straight to the high resolution refinement. You would want to go back to the low resolution stage if you want to sample different binding sites, or if you want to change the docked orientation of the two proteins. High resolution docking will slightly sample different rigid body orientations, but it doesn't do large movement searches like the low resolution stage does.

Regarding run time, it depends on the protein system you're using. A larger number of residues and a large interface will slow things down. As a rough guess, you're probably looking at 25-50 models per processor hour, so on a single processor a run of 1000 structures will likely take around 24-48 hours. But again, that can vary widely based on the proteins you're docking and your computer system.

Regarding sd, there's no limit to how big the constraint values to get. Generally, you'll want to try and balance the influence of the constraints with the influence of the score function. You're trying to reach a point where the effect of the constraints are strong enough to override the imperfections in the score function, while not being so large that it completely swamps the scorefunction. This goes doubly in your case, where you have constraints which will never be simultaneously satisfied. I'd try doing a couple of test runs, scaling all the sd's by a uniform multiplier, and seeing how that affects the results. You want a level where the constraints stay satisfied, but not so much that the protein is pulled apart by them. BTW, an easy way to do the uniform multiplier is by adjusting the weight of the appropriate constraint scoreterm in the scorefunction. Many protocols observe the -cst_weight option, which takes the constraint weight/scaling multiplier.

Fri, 2013-11-15 06:37
rmoretti