I have a protein complex structure. It has two chains, chain A and chain B.
I want to design chain A. In the same time, I wonder RosettaDesign can design chain A in a complex structure? I think design chain A only without chain B will miss the information from the complex structure. Can anyone tell me how to do fix backbone design for chain A in a complex structure? Thank you!
You can do fixed bb design with as many chains as you want. Just use a resfile to restrict residues to those you want to design:
I think resfile knows which chain to design. But I want to let rosetta knows that the chain A is in a complex so that the fixbb design can have some restrain by the complex structure.
1. design chain A only(100 residue).
2. design chain A only, but in a complex structure. And in resfile, I ask rosetta to design chain A.
3. Modified the pdb chain identification so that chain A and chain B will be renumbered as chain A and only design the residue in original chain A
Will the results from 1 and 2 be different? I believe result from 3 will be different from 1.
The results of 2 will definitely be different than that of 1, especially in the AB interface. In the apo-A context, the interface residues of A will be viewed by Rosetta to be surface residues, so you may get them designing to things like ARG and LYS, whereas in a holo-AB context ARG and LYS would have clashes with the backbone of B, resulting in designs to things like ALA, VAL and THR instead.
Regarding the difference between 2 & 3, if you renumber/rename chain B to be part of chain A, the results will be almost the same as if you treated A & B as separate chains. To the packer (the portion of the code doing the amino acid assignment), the two are identical. The only difference will arise if you allow rigid body degrees of freedom between the A and B chains. By treating them as seperate chains, you can more freely sample the rigid body orientation of the complex, resulting in slight changes of orientation that may have effects on the design. (The chains move outside the amino acid assignment code, but the chain movement results in different orientations in the two cases, which causes the packer to assign different amino acids.) -- But that's only if you use a protocol that allows for rigid body movement sampling or minimization. If I recall correctly, the standard fixbb design protocol doesn't do any rigid body sampling/minimization by default, so in that case the results for 2 & 3 should be identical. (Up to the normal caveats about Rosetta stochastic protocols.)
Thank you very much.
I should design chain A in a complex structure (not matter by way 2 or 3) instead of single chain A to get benefit from the structure. Am I correct? Because the holo structure contains different information for some residues in chain A, expecially in dimer interface.
Right. If you're interested in having your designed proteins preserve complex formation, I would design against the model of the complex, especially if you're allowing the interface residues to change. (BTW, I'd recommend using approach 2 if at all possible. It tends to make your life easier in the long run. Only use approach 3 if the protocol you're using doesn't work with multiple chains)
The place where you can get away with omitting portions of the protein is where they're well separated from the regions you're designing/moving. The standard Rosetta score function only extends out to a certain distance (about 6-7 Angstroms), so if you have a large enough complex, you can omit portions that fall outside a fixed shell of greater than that distance, as the portions of the protein which are being changed won't be aware of their existence.
Including extra portions won't hurt, however, except perhaps on memory footprint grounds, so when in doubt, include it.
I have tried approach 2 and approach 3.
I found the score are very similar except fa_atr. One possible reason I can think about is there is a huge gap in the structure when I set both chain as chain A. Because the C terminal of chain A and N terminal of chain B are very far away from each other.
SCORE: total_score dslf_ca_dih dslf_cs_ang dslf_ss_dih dslf_ss_dst fa_atr fa_dun fa_intra_rep fa_pair fa_rep fa_sol hbond_bb_sc hbond_lr_bb hbond_sc hbond_sr_bb omega p_aa_pp pro_close rama ref description
SCORE: -28.556 0.000 0.000 0.000 0.000 -830.376 282.954 2.184 -26.133 262.789 413.709 -28.118 -46.740 -16.885 -35.754 38.456 -19.372 2.527 7.692 -35.490 onechain_0001.pdb
SCORE: 91.966 0.000 0.000 0.000 0.000 -814.488 365.201 2.373 -21.721 325.371 410.351 -26.804 -47.136 -14.873 -35.754 3.334 -18.529 3.296 4.953 -43.610 separate_chain_0001.pdb
By the way, what do you mean by "Including extra portions"? Thanks
I have designed thousands sequence but it seems that the sequences produced by approach 2 are same.
I used fixbb.linuxgccrelease @flag and flag file is:
-mute core.io core.scoring core.conformation
I also attach the resfile and pdb files. Could you please have a look? Thanks!
When you say "are the same", are you meaning that the interaction helix is coming out all alanine like the input structure, or do you mean that the sequence differs from the input structure, but is identical for all thousands of sequences which were designed?
If it's the latter, it may be due to the fact that the packer can be highly convergent, when given a sufficiently constrained system. Here you have a system where eveything is fixed except for the 18 or so amino acids on the interface helix. While there is some combinatorial variation that may exist with interactions between adjacent amino acids, that combinatorial variation is limited, and it's likely that the packer is able to sufficiently sample all the possibilities such that it's giving you the one sequence that best represents the limited input design you've given it.
If you're looking for variation, there are several ways to get it. The first is to allow more variability - instead of doing NATRO on the non-designed interface residues, do NATAA instead. This will allow the surface residues to move in response to the design, introducing more variability. Another possibility is to allow for backbone flexibility. Each run of the packer happens in a fixed backbone context, so frequently what people will do is to iterate rounds of fixed backbone packing/design with backbone flexibility (e.g. all atom minimization), which will give you different sequences due to subtle variations in backbone arrangement. A final possibility is to flip that conception - create a library of slightly different backbones/interfaces through something like relax, and then do your very limited fixed backbone redesign in that library of starting structures. This should give you different output structures based on slight variations on your input structure.
I am sorry that I didn't make the problem clearly. The designed sequence are same. However, if I designed the 18 residues by approach 3(make two chains as one chain and renumbering the index), then I can get sequences with high diversity.
That's strange. I'm not quite sure why that would be the case. I certainly can recapitulate the issue with Rosetta3.5, but I'm not seeing a similar issue with the recent weekly release version. For the more recent releases in both cases you get practically the same sequences for most runs, with just a few variations at one or two positions. (This behavior being what I would expect from the system as it is currently set up.)
I am using rosetta3.5. Maybe I should use the latest version. Thank you!