After doing ab initio structure prediction on the individual domains of my target protein I was trying to assess the quaternary structure of the dimer using the SymDock protocol. The results I get are, firstly, never for a full-length protein and, secondly, the amino acid numbering is confused; trying to extract pdbs from the silent out file results in an error message (core.io.silent: ERROR: add_chain_ending() invalid chain ending 226
ERROR:: Exit from: src/core/io/silent/ProteinSilentStruct.tmpl.hh line: 671).
The construct I am dealing with looks like this: --[domain A]-----[domain B]
I would like to know whether I am using a correct protocol or how one would use Rosetta to model a dimeric, but multi-domain protein?
This is my run file:
Residue Pair Jump File:
jump_def: 61 262 261 261
aa: ASN FMN
cst_atoms: CA CB CG C4 C5 O7
jump_atoms: N CA C C4 C5 O7
Symmetry Definition File:
E = 2*VRT0001 + 1*(VRT0001:VRT0002)
start -1,0,0 0,1,0 0,0,0
rot Rz 2
connect_virtual JUMP1 VRT0001 VRT0002
set_dof BASEJUMP x(50) angle_x(0:360) angle_y(0:360) angle_z(0:360)
Thank you very much in advance for any advise.
PS: I just tried 3.3 but to that does not change anything.
I can't address all the questions, but the "never full-length protein" issue might be caused by problems with the input PDB. Do you have residues in the input that are either missing a backbone heavy atom (N, CA, C, O) and/or have zero-occupancy backbone atoms?
I checked the pdbs but they seem okay. What comes to my mind in this respect is, that domain A binds a cofactor that is also present in the pdb file. The structure was generated using a Rosetta ab initio protocol, though. I could try without the cofactor, since it is rigid body docking. I will try that and post again.
Removing the cofactor from the pdb and fasta sequence file does not help.
There might be something else, though, that might point us into the right direction. When running rosetta the following problem occurs:
core.util.switchresiduetypeset: core::util::switch_to_residue_type_set: residue 114 already in centroid residue_type_set
core.util.switchresiduetypeset: core::util::switch_to_residue_type_set: residue 115 already in centroid residue_type_set
core.util.switchresiduetypeset: core::util::switch_to_residue_type_set: residue 116 already in centroid residue_type_set
core.util.switchresiduetypeset: core::util::switch_to_residue_type_set: residue 117 already in centroid residue_type_set
core.util.switchresiduetypeset: core::util::switch_to_residue_type_set: residue 118 already in centroid residue_type_set
core.util.switchresiduetypeset: core::util::switch_to_residue_type_set: residue 119 already in centroid residue_type_set
core.util.switchresiduetypeset: core::util::switch_to_residue_type_set: residue 120 already in centroid residue_type_set
This ranges from residues 114 to 226 and occurs again after the program switches to full atom mode; i.e. error messages of the type "residue 119 already in fa_standard residue_type_set" are displayed. Also for residues 114 to 226. The numbering is also a bit weird since domain A spans aa 24-127 and domain B 149 to 261 (input fasta sequence is for aa 1-261).
I attached a log file of the run. To me it seems as if first a run for domain A is performed (also with different residue numbering issues) and than a second docking for domain B.
I've asked Ingemar for comment.
Looking briefly through the SymDock documentation: http://www.rosettacommons.org/manuals/archive/rosetta3.3_user_guide/app_..., "# The input to the symmetric docking protocol is a single monomeric structure. If you are running a refinement of an assembly or a dock_pert the rigid body orientation of the input monomer must match with the input symmetry definition to correctly regenerate the full assembly. Use the make_symmdef_file script to get a matching input structure and symmetry defintion file. ".
I don't think this is what you have, since you seem to be using in:file:l to pass two PDBs. Try doing regular docking to get the inherently asymmetric dimer complex, then do symdock on the dimer to get the tetramer. You may need to fake the chain letters on the dimer to make Rosetta pretend it's a monomer (with one awful bond geometry).
One thing to keep in mind is that PDB numbering is not (currently) preserved by a trip through a silent file. You structures will be renumbered according to "pose numbering", where each chain gets a sequential A,B,C ..., and the residues will be renumbered starting at one.
Even when PDB numbering is preserved in the output, many of the tracer outputs print information in the no-chain pose number format. For example, when it says "residue 119" it's not probably not referring to either residue A 119 or B 119 from the input PDBs, but to the 119th residue in the pose, which, if I understand your input files correctly, would be chain A, residue 96 of the input structure (chain A, residue 24 being "1", chain A, resi 127 being "104", chain B, resi 149 being "105", chain B resi 261 being "217"). (If residues A 1-A 23 don't have coordinates, they won't be counted as present in the pose.)
You say you have provided a fasta of 261 amino acids, but it sounds like Rosetta only has the coordinates for 217 of them. I know nothing of the symmetry or docking code, but I might suggest trying to matching the fasta input with the structure input. Unfortunately, it's often easiest just to renumber your structures to correspond with Rosetta's "pose numbering" format. Often this can be accomplished with the score application and the -out:output/-out:pdb and -out:file:renumber_pdb flags.
I think the solution to this problem has been hinted at in the previous answer.
First, the two domains would have to be treated as one asymmetric unit. Use in:file:s with a pdb containing A+B without any chainbreak in between. You can still use the chain identifier. You may have to predict the interaction between A and B first. That would have to be done with assymetric docking.
Second, the symmetric system has quite complicated jump structure. SymDock cannot currently deal with residue pair jump files.
Third, previous versions of SymDock does not have a relax step in the end. I think that the 3.3 version has it but it is not documented to my knowledge. If its in there I think the flag to trigger it would be -docking:kick_relax (or -docking:full_relax, the name has changed at some stage). I can't rule out that it was in 3.2 so give it a try. If you don't have access to this feature you would have to run symmetric relax as a standalone app.
Thank you all for your replies.
Maybe SymDock is not the way to go for me then. What I expect (from experimental evidence) is a structure like this in the end:
--[domain A]---[domain B]
--[domain A]---[domain B]
So there are no interactions between domain A and B, neither in a monomer nor in the oligomer. The linker is actually a helix that I expect to adopt a coiled-coil-type structure in the oligomer. So, I cannot dock A to B to get a structure of the monomer and use this for further oligomeric structure prediction.
I tried the "fold and dock" protocol but this seems not to be able to handle residue pair jumps at all (quits with segmentation fault), which I need for ab initio prediction of domain A since it binds a cofactor.
What would you suggest?
I could "build" a monomer pdb file since I did ab initio calculations for all four parts of a monomer. But this way I am not sure whether the relative domain orientations would be flexible enough during full-length docking to provide an unbiased answer. I have an idea of the interface in both domains but, depending on how the linkers interact, domain B could be significantly rotated with respect to domain A.
Since it doesn't look like you're anticipating any A-B interactions, couldn't you do the A-A and B-B dockings separately, and then combine them afterwards?
I think that is definitely an option I will try. But I am not sure whether the result will be really meaningful if I just put the structures together afterwards.
Any suggestions how to model the interactions of the linker adequately?
I am still thinking about modelling the full-length protein in one go.
Would it be hard to implement the residue pair jump algorithm into "foldndock"?
Or, alternatively, is it possible to use foldndock with one rigid domain while modeling the second?
PS: BTW, "kick_relax" was the way to use relax with symdock.
I contacted Ingemar off the list. This is what he answered:
The docking protocol (assymmetric and symmetric) would not be able to handle all the steps required by itself without significant code modification. You are going to have to divide it into separate steps:
1) Make the domainA-linker-domainB model(s).
2) Symmetrically dock the model(s)
1) Separately dock domainA and domainB
2) From a set of decent looking models from self-docking with domain A and domain B find combinations of dimers of A and B that would be able to connect with a linker.
3) Place the domains in such a manner that a linker can connect A to B.
4) Connect the linker to the proteins using loop modeling to close the covalent links between A, linker and B.
The majority of these steps are not docking related. In particular, I know no trivial way of connecting A +linker+B (which is needed in both scenarios described above) in a single protocol.