You are here

protein interface design for multi-chain complex

17 posts / 0 new
Last post
protein interface design for multi-chain complex


I want to use the XML script design_script.xml in demos/design_raf_rac_interface/ to design the interface of a complex structure.

the syntax of "ProteinInterfaceDesign" used in the example xml is typically like:
<ProteinInterfaceDesign name=design repack_chain1=1 repack_chain2=1 design_chain1=1 design_chain2=0 interface_distance_cutoff=8/>

Does the attribute "repack_chain1" mean "to repack sidechains of residues in first chain"?

While my complex structure has 4 chains, the first two chains form the partner A and the last two chains form the partner B, what I want to do is to design the interface residues on partner B not on A. So how to assign such attributes for chain 3 and 4? Is there something like "design_chain4=1"?

Is there a way to make "ProteinInterfaceDesign" to only work on a set of specific residues rather than whole interface, for example by loading sort of .res file?


Post Situation: 
Sun, 2013-05-05 23:18

A lot of the protein-protein interface design code is written on the assumption of two chains separated by a single rigid body jump.

Here "chain1" and "chain2" simply mean "those residues before the jump" and "those residues after the jump". For a 4-chain protein, you have three jumps, one between (at least in the before/after sequence sense) A & B, the second between B & C, and the third between C & D. If you want to do designs between the AB complex and the CD complex, you need to specify the "jump=2" parameter for ProteinInterfaceDesign to tell it to work off of jump 2 instead of the default jump 1.

Then you would do "repack_chain1=1 repack_chain2=1 design_chain1=0 design_chain2=1" to design the last two chains and not the first two chains. Keep in mind that TaskOperations are strictly restrictive. If you have one thing saying to design the residue and one that's telling it to not pack, the not packing behavior will dominate over the design. (Because the "don't repack" is more restrictive behavior than "design it" behavior.)

Mon, 2013-05-06 12:58

Thank you very much.
So is it reasonable to activate "repack" for both chain partners in design?

Sat, 2013-05-11 21:11

Depends on what you want to accomplish. On one hand, the fixed-sequence side of the interface certainly moves in the test tube, so allowing movement will more closely represent reality, and allow for induced-fit-type binding. On the other hand, Rosetta's energy function doesn't match reality precisely, so the more flexibility you allow, the more chance you have to find one of the false minima. Also, if you think thermodynamically, an induced-fit mode would result in lower binding energy, as you "lose" the energy you need to move the structure from its apo state to the bound conformation - though if you play it right, the rearrangement could more than make up for it. A final consideration is that packing is the method by which we change the position of hydrogens on things like serine and histidine, so if you don't repack those residues, you may not get good hydrogen bonding patterns unless you're lucky to begin with.

Generally, people will allow repacking in the interface of the fixed-sequence partner (but not secondary or distal shells), but keep the backbone conformation fixed. They then they look at the resulting designs and make sure the structural changes are reasonable and aren't too severe.

Sun, 2013-05-12 13:57

Very appreciate!

Sun, 2013-05-12 18:42

further discuss on the "jump".
one protein with 4 chains (in order: A, B, C, D), has original fold tree like:
FOLD_TREE EDGE 1 110 -1 EDGE 1 111 1 EDGE 111 216 -1 EDGE 1 217 2 EDGE 217 432 -1 EDGE 1 433 3 EDGE 433 650 -1

I want to docking AB over CD,
if I use <Docking /> mover, shall I use "jumps=2"?

if I use <DockingProtocol /> mover, shall I use "partners=AB_CD" (AB is fixed, rotate CD?)?

Wed, 2013-05-15 06:57

Interpreting the fold tree:

The first two numbers for each edge are the two residues (pose numbered) that are connected by the edge. The third number tells you which type of edge you have. -1 means a polymeric edge - that is, the residues between the two endpoints are connected in a polymer. Positive numbers indicate jumps, being the jump number of the edge.

So in your fold tree you have four segments (likely chains but if your foldtree has been manipulated, it could be a one or more chains being handled as if they were four chains).

A 110 residue segment from residue 1 to 110 (likely chain A)
a 106 residue segment from residue 111 to 216 (chain B), attached to residue 1 by jump #1
a 216 residue segment from residue 217 to 432 (chain C), attached to residue 1 by jump #2
a 218 residue segment from residue 433 to 650 (chain D), attached to residue 1 by jump #3

If you just moved jump 2, it looks like you'd move chain C but not chain D. What you'd want to do is use either "jumps=2,3" or "partners=AB_CD" - I don't have enough experience with the various docking movers to know if there might be a difference between the two designations/movers would be.

Wed, 2013-05-15 11:33

Thank you for helping me to understand the Fold Tree concept. I am digesting it ...

I am using PyRosetta to print out the Fold Tree interactively.
after loading the 4-chain PDB structure (chain A: 110 res., chain B: 106 res., chain H: 216 res., chain L: 218 res.),
the fold tree is printed out in ipython shell:

In [11]: print pose.fold_tree()
-------> print(pose.fold_tree())
FOLD_TREE EDGE 1 110 -1 EDGE 1 111 1 EDGE 111 216 -1 EDGE 1 217 2 EDGE 217 432 -1 EDGE 1 433 3 EDGE 433 650 -1

there are three jumps attached to first residue: 1-111, 1-217, 1-433

then I set up a new fold tree for docking two rigid bodies by:

In [16]: rosetta.setup_foldtree(pose, "AB_HL", rosetta.Vector1([1]))
In [17]: print pose.fold_tree()
-------> print(pose.fold_tree())
FOLD_TREE EDGE 1 75 -1 EDGE 75 476 1 EDGE 476 650 -1 EDGE 75 110 -1 EDGE 110 111 2 EDGE 111 216 -1 EDGE 476 433 -1 EDGE 433 432 3 EDGE 432 217 -1

The "setup_foldtree" code was borrowed from demo script in PyRosetta web site.
after setting the partners "AB_HL", the fold tree changes.
The 3rd parameter of "setup_foldtree" is vector of "movable_jumps". The initial pose has 3 jumps. if the vector is set to [2,3], the output fold tree is same to above:

In [21]: rosetta.setup_foldtree(pose, "AB_HL", rosetta.Vector1([2,3]))
In [22]: print pose.fold_tree()
-------> print(pose.fold_tree())
FOLD_TREE EDGE 1 75 -1 EDGE 75 476 1 EDGE 476 650 -1 EDGE 75 110 -1 EDGE 110 111 2 EDGE 111 216 -1 EDGE 476 433 -1 EDGE 433 432 3 EDGE 432 217 -1

Sun, 2013-05-19 03:12

Sometimes with docking the default foldtree isn't ideal - connecting the N-terminus to the C-terminus can result in lever arm effects if you allow backbone movement. Depending on how things are set up, a small backbone change can cause a big swing in the docked partner. Instead, people change the foldtree such that the two docked partners are connected by their closest approach, minimizing lever arm effects.

That's what's happening here. The setup_foldtree() function notices that residue 75 (on chain A) is very close to residue 476 (on chain L) so it connects the HL complex with the AB complex through a jump (jump #1) through those residues. It then builds the rest of the protein off of those, with jump 2 connecting chains A and B, and jump 3 connecting L and H. If you were to dock with this foldtree, movement of jump 1 would be sufficient to move the rigid body orientation of the HL complex, where H and L are kept as a rigid block (assuming no backbone movements and no change in jump 3).

The reason that what you pass to the third argument doesn't matter is because it is an *output* argument. (This is a C++ convention to get around the fact that it doesn't have multiple return values like Python does with tuple packing/unpacking.) One of the very first things the function does (when used with the string-based partner specification - other calling conventions are different) is to clear the contents of the vector. After the function returns, the passed vector should contain the values of the movable jumps determined from the string specification.

Sun, 2013-05-19 11:31

P.S. There's several functions in the core::kinematics namespace (core.kinematics module for PyRosetta) to help with FoldTree visualization. core.kinematics.visualize_fold_tree(FoldTree) probably is the easiest to use.

Mon, 2013-05-20 11:02

I've been coding Rosetta either in PyRosetta or C++ for a few years and I'm STILL digesting it.

Sun, 2013-05-19 21:04

The attached XML script was modified from design_script.xml in demos/design_raf_rac_interface/ .
I used this XML script to design the interface between antigen (chain A and B) and Fab (chain H and L). A docking protocol with local refinement was used for docking the two partners. The backbone minimization in "MinMover" was turned off because I don't want the backbone has large conformational changes. The start structure was minimized with constraints to its original coordinates. Is there any way to keep the constraints during minimization after docking?
After harvesting 2000 structures from the simulation, I found the output structures have very very small deviation from the start structure even several residues at interface were changed. I don't know if these output structures are what I want.

Sun, 2013-05-19 21:43

I'm not quite sure which constraints you're talking about regarding "minimized with constraints". If you're talking about using the relax application and its autogenerated constraints, they aren't yet available through RosettaScripts. If you're not doing backbone minimization, though, I'm not sure they would help you much, as they would either do nothing or else over-constrain the system. Other constraint system may vary. If you have generated a separate constraint file, there is a way to load that into the pose. Take a look at the ConstraintSetMover mover - but be aware that upon application it will knock out any pre-existing constraints on the pose.

When you say "very very small deviation", what sort of changes do you actually see? The XML file you're using is only doing a local refinement and design. I wouldn't expect big changes from the HL rigid body position, although I would expect some small movements. I'd also expect to see mutations and rotamer rearrangments on the HL complex side - though that would depend highly on the resfile you're using. I'd also expect to see all the sidechains slightly displaced from their starting locations (though not necessarily a great amount, especially if the input strucuture was minimized with constraints.)

Mon, 2013-05-20 11:14

my start structure was prepared by "minimize_with_cst" in src/apps/public/ddg/.

I am checking some minimization movers, the typical one is "MinMover". MinMover uses chi=0/1 and bb=0/1 to control minimizing sidechain or backbone, and use jump="" to set the jumps to minimize over. I have several questions:

1) is there any way to restrict a set of residues for minimization like the selection operation used in pymol, e.g. minimize bb and sc of residues within 6A of residue 18.
2) if jump is not set, does it mean to minimize all residues?
3) if jump is set, does it mean to minimize residues on the right of jump but keep those on the left fixed?

some movers does not have "jump" attribute, e.g. MinPackMover, how does such movers recognize jumps?

When I run docking on a 4-chain complex and then minimize it, I use something like:
<DockingProtocol name=dock1 docking_local_refine=1 dock_min=0 partners="AB_HL" />
<MinMover name=min3 jump="2,3" chi=1 bb=0 tolerance=0.005/>

In the log file, during docking, the fold tree is changed:
protocols.docking.DockingProtocol: Setting docking foldtree
protocols.docking.DockingProtocol: old fold tree: FOLD_TREE EDGE 1 110 -1 EDGE 1 111 1 EDGE 111 216 -1 EDGE 1 217 2 EDGE 217 432 -1 EDGE 1 433 3 EDGE 433 650 -1
protocols.docking.DockingProtocol: new fold tree: FOLD_TREE EDGE 1 75 -1 EDGE 75 476 1 EDGE 476 650 -1 EDGE 75 110 -1 EDGE 110 111 2 EDGE 111 216 -1 EDGE 476 433 -1 EDGE 433 432 3 EDGE 432 217 -1

I want to know if the new fold tree is only effective during docking stage; in the later minimization, does the MinMover still use the original fold tree even if the pose conformation is changed? I very care about this because the jumps used in docking are not same to those in MinMover.

Tue, 2013-05-21 00:12

You can control which residues to minimize and which ones not to with a movemap specification. The details on how to do it are with the FastRelax mover documentation (, but the MoveMap subtag will work with the MinMover as well.

The jumps setting only controls the minimization of the jumps proper - that is, the rigid body orientation represented by the jump. For the MinMover, if jump isn't set, the rigid body jumps aren't allowed to minimize (but any backbone and sidechain torsions you allow to move will be minimized, including those that are after the jump in the foldtree).

If you want finer, property-based control of which residues get minimized and which ones don't, the TaskAwareMinMover may be what you want. This allows you to use taskoperations to define minimized residues, rather than a movemap.

The MinPackMover is something different from the MinMover. Instead, it's like the PackRotamersMover, but instead of just evaluating just the on-rotamer (or e1/ex2/etc exploded rotamer) positions, it does sidechain minimization prior to evaluation. You don't need to specify jumps or backbone, as the MinPackMover will never change those.

Once you change a FoldTree, it stays the FoldTree of the pose until you change it again. So assuming the FoldTree isn't reset by DockingProtocol before it exist, the new foldtree should be the one the MinMover sees. If you want, you can use the simple_ft option of the AtomTree mover ( to reset the foldtree to a simpler default.

Tue, 2013-05-21 11:23

Very clear comments! Thank you.
I am just wondering why in some cases it uses "jump" (e.g. in "RestrictToInterfaceVector") and in some cases it uses "rb_jump" (e.g. "RestrictToInterface"), I think it is always "rb_jump" according to your comments, "jump" and "rb_jump" are for the same meaning - rigid body.

Tue, 2013-05-21 18:33

It's historical accident, mainly. The various components of RosettaScripts have been put together by a number of different people over a period of time. Naming conventions weren't ever explicitly designed, they just sort of emerged organically, and we still don't have any official naming convention list, so people are free to use different names if they think they should. It's best to refer to the documentation for each component to see what each option does. (As even identically named options might not do exactly the same thing in two different components.)

Wed, 2013-05-22 10:31