protein docking question

52 posts / 0 new

Topic locked

Top

Hi, I've a question about protein docking. In the manual for protein docking, there is another application for preparing structures for docking: docking prepack protocol. This prepack protocol can generate input structures from bound protein-protein complex. So how can I generate input structures from unbound protein components (for example, if protein A and B form a complex, and I have the structures for A alone and B alone)? Right now, I just do a fast relax for A and B, and then combine them into the same pdb file, then use docking prepack to generate the input structure. Does this make sense? Also, for some complex structures, after prepack, I got positive scores. I suppose this is normal since the interface sidechain rotamers are not optimized. Is this correct? Also, sometimes when I run docking from unbound components of the complex, I get this error message: ERROR: Conformation: fold_tree nres should match conformation nres. conformation nres: 628 fold_tree nres: 639 When I comment out the native complex structure, there is no more error. So I checked the native complex pdb and the separate component pdbs and found some residues are missing in either the native complex pdb or the component pdbs. Is there a way to add these missing residues back to the pdb so I can use the native complex pdb to make calculation directly? Thanks for help.

Category:

Docking

Post Situation:

Unsolved

Tue, 2012-07-24 13:19

Top

"Right now, I just do a fast relax for A and B, and then combine them into the same pdb file, then use docking prepack to generate the input structure. Does this make sense?"

You're doing it right.

"Also, for some complex structures, after prepack, I got positive scores. I suppose this is normal since the interface sidechain rotamers are not optimized. Is this correct?"

Prepacking should not cause scores to increase, and your relaxed inputs should have negative scores - why are the scores positive? Can you see which energies term is causing the problem in the energies table? Generally, for don't-have-a-starting-complex docking, you can just do fastrelax without prepacking. fastrelax will pack the entire protein anyway.

"Is there a way to add these missing residues back to the pdb so I can use the native complex pdb to make calculation directly?"
It depends on when and how they went missing. Usually residues disappear because they do not have all four backbone heavy atoms (N CA C O) defined with nonzero occupancy. You can get rid of the extra residues in whichever PDB is larger, or you can re-do the earlier steps for the smaller PDB to get them back into the larger PDB. This calculation is only an RMSD calculation, which is meaningless because your starting structure is random anyway (unless I've misunderstood you), so just leaving the native flag out is preferred.

Tue, 2012-07-24 13:59

smlewis

(Reply to #3)

Top

If the prepacking application is the one I think it is, I believe the protocol is to separate the two proteins, repack each of them in an apo context, and then shove the proteins back together without any additional repacking. Depending on how the sidechains repack, this could cause huge positive scores if you happen to get clashes between sidechains across the interface.

As I understand it, part of the goal of this procedure is to minimize the native bias in the docking run, so you don't start with the "docked" conformation of the sidechains (thus making it much easier to recover the docked sidechain conformations).

Of course, if you're prepacking in the docked/holo context, Steven's comments are correct, and you shouldn't expect a higher score or clashes in the output structure.

Tue, 2012-07-24 15:24

rmoretti

(Reply to #4)

Top

Thank you very much for help. Now I'm much more clear. Another question about ligand dock. Based on protein dock, I assume I can do the same (get rid of the extra residues in whichever PDB is larger, or re-do the earlier steps for the smaller PDB to get them back into the larger PDB) to make use of the native pdb for RMSD calculations, right? Also, when I do ligand dock, I just copy the ligand to a new pdb file, use babel to convert it to a sdf file, then use the python script to generate params file. Since I do not have openeye omega access, I do not generate different conformers and assign charges for the ligand. However, when I use the python script to generate params file from sdf file, it also gives me the following warning no matter which ligand it is:
WARNING: structure contains double bonds but no aromatic bonds
Aromatic bonds must be identified explicitly --
alternating single/double bonds (Kekule structure) won't cut it.
This warning does not apply to you if your molecule really isn't aromatic.
Total naive charge -6.975, desired charge 0.000, offsetting all atoms by 0.170
WARNING: fragment 1 has 41 total atoms including H; protein residues have 7 - 24 (DNA: 33)
WARNING: fragment 1 has 41 non-H atoms; protein residues have 4 - 14 (DNA: 22)
WARNING: fragment 1 has 12 rotatable bonds; protein residues have 0 - 4
Average 41.0 atoms (41.0 non-H atoms) per fragment
(Proteins average 15.5 atoms (7.8 non-H atoms) per residue)
Sometimes my ligand do have aromatic ring. So how can I specify aromatic bond explicitly?
Also, can I dock two ligands at the same time by specifying params file for each ligand?
One more similar question as in protein dock, when I do cross ligand dock (say ligand A binds to protein C,and ligand B also binds to protein C, and I want to dock B into C from the pdb of A with C), I just copy the pdb file for B from the pdb of B with C into the pdb of A and C. Is this correct?

Wed, 2012-07-25 09:50

(Reply to #5)

Top

"Sometimes my ligand do have aromatic ring. So how can I specify aromatic bond explicitly?"

This is probably a Babel question. You need to edit the file you input into molfile_to_params.py to specify aromaticity. I've asked someone knowledgeable for inputs.

Also, can I dock two ligands at the same time by specifying params file for each ligand?

You can have two ligands present, yes. Actually moving both simultaneously is possible but not straightforward.

"One more similar question as in protein dock, when I do cross ligand dock (say ligand A binds to protein C,and ligand B also binds to protein C, and I want to dock B into C from the pdb of A with C), I just copy the pdb file for B from the pdb of B with C into the pdb of A and C. Is this correct?"

Yes

Wed, 2012-07-25 10:11

smlewis

(Reply to #6)

Top

How to specify aromatic bonds depends on what sort of input file you have.

If it's a mol2 file format, you'd specify the bond order (fourth column) as 'ar' in the bond block (columns two and three are the atom numbers involved in the bond). I've found that most molecular modeling programs (particularly OpenBabel) will already type most aromatic systems as such when saved as a mol2 file.

For mol/sdf files it's a bit harder, as the sdf specification doesn't actually have an aromatic bond type specification for structures, so most modeling programs won't output aromatic bonds for mol files. It *does* have an aromatic type for queries, which we (ab)use for aromatic specification. So what you would need to do is open the mol/sdf file in a text editor, and change the bond order (third entry in the bond block) to "4" for every bond in the aromatic system. (Note the first two numbers are the atom numbers for the bond.)

Regarding multiple ligands in docking, if you want to actively dock a single ligand in the presence of a non-protein cofactor, you just need to specify the additional params file (and include the coordinates in the input PDB). If you want to simultaneously dock (actively perturb the rigid body location) of two ligands, the ligand_dock executable won't do it for you (it's limited to a single ligand.) To do multiligand docking, you would need to use RosettaScripts. I believe there should be examples of multiligand docking in the demos directory and/or one or more of the various example sets (may be a separate download).

Regarding the cross docking, you're essentially correct, with some caveats. First, molfile_to_params has a tendancy to rename atoms, so you should use the PDB file that it outputs, rather than the original PDB lines from B. (This also applies to self docking - you should replace the original coordinate lines in the input PDB with the ones output by molfile_to_params, otherwise you might run into ugly atom naming issues.) Secondly, you need to be aware of coordinates. By default, the ligand_dock application will perturb the ligand in the general region it starts in. When you copy B from a B/C complex to an A/C complex, the binding pockets of the two C's won't necessarily match, and you won't be docking where you think you should be. Either move the ligand into the rough area of the binding pocket, or explicitly specify the coordinates of the binding pocket.

Wed, 2012-07-25 10:53

rmoretti

Top

Thank you guys for your help. The prepack is the one that separating the two protein, repacking and putting them back together.
I attached an example pdb after prepacking from the complex 1ACB.pdb. The score of this prepacked pdb is 501.922, and it seems the lj-repulsive and solvation are high. I used these flags for prepack: -s 1ACB.pdb -docking:partners E_I -docking::sc_min -nstruct 1. Is there anything I did wrong?

Tue, 2012-07-24 15:47

(Reply to #8)

Top

" Is there anything I did wrong?"

Not that I see, assuming rmoretti's explanation above is correct.

Tue, 2012-07-24 17:16

smlewis

Top

One more question, if one protein in the complex has multiple chains (say A, B, C, D), and the other protein has only one chain (say E), if chain E only interacts with chain D, will there be a difference between docking ABCD_E and docking D_E by removing all the other chains in the first protein? It seems if I remove some chains from a multiple chain protein, the score after relaxation will be higher.
How about docking a symmetric complex? Say I have a protein (chain A-chain A dimer), and its interacting partner (chain B-chain B dimer), if docking them, can I just dock one chain A with one chain B or should I use symmetric docking? How about a heterodimer?

Tue, 2012-07-24 15:59

(Reply to #10)

#10

Top

"will there be a difference between docking ABCD_E and docking D_E by removing all the other chains in the first protein?"

A) it will run faster, possibly a lot faster; B) if you are doing global docking you can get lots of impossible conformations where E drifts to where ABC normally is (not true for minor/perturbational docking without the randomize flags)

" It seems if I remove some chains from a multiple chain protein, the score after relaxation will be higher."
Don't do relaxation of D in the absence of ABC, it will relax to the wrong conformation. The score will be a lot higher because you've lost a lot of protein; generally Rosetta scores are -2 units per residue for good structures, so if you're 100 residues short you'll lose 200 points in energy.

"How about docking a symmetric complex? Say I have a protein (chain A-chain A dimer), and its interacting partner (chain B-chain B dimer), if docking them, can I just dock one chain A with one chain B or should I use symmetric docking? How about a heterodimer?"
This depends on the symmetry of the system. If multiple interfaces are exactly identical, Rosetta's symmetric docking modes are a good bet - it docks on one subinterface but propagates the changes to all interfaces so you get a complete complex output.

Tue, 2012-07-24 17:19

smlewis

(Reply to #11)

#11

Top

So I tried using rosetta script for protein docking. The script I used is attached and the flag I used is:
-s 1ACB_0001.pdb
-nstruct 1000
-ex1
-ex2aro
-parser:protocol protein_dock.script
But when I checked the output structures, I found that they were redesigned. But in the script, I use design=0 so I think there should be no redesign. What's the problem?
Moreover, I'm not so clear about how the script recognize which chain is receptor, which chain is ligand, and which chain to randomize. If I use docking_protocol for dock, I can control these options by -partners E_I -randomize2 -spin. So how does the script control these?

Thu, 2012-07-26 08:40

(Reply to #12)

#12

Top

"But when I checked the output structures, I found that they were redesigned. But in the script, I use design=0 so I think there should be no redesign. What's the problem?"

Either add a RepackOnly operation to the script, or add -packing:repack_only to the command line.

"Moreover, I'm not so clear about how the script recognize which chain is receptor, which chain is ligand"

It doesn't.

"I can control these options by -partners E_I -randomize2 -spin. So how does the script control these?"

I believe you can still use those options (I am not 100% sure). Give it a shot.

Thu, 2012-07-26 08:44

smlewis

(Reply to #13)

#13

Top

The design issue likely isn't the docking mover (so the design=0 is probably working as you expected) - it's the PackRotamersMover.

You don't specify any task operations for the PackRotamersMover, so it'll use a default packing task. And the default packing task is "everything repacks and everything designs". If you want to have the PackRotamersMover only do repacking, you have to specify a task operation which would restrict the packer task to only do repacking. As Steven indicates, try adding a RestrictToRepacking task operation to the "task_operations=" tag of the PackRotamersMover definition.

Thu, 2012-07-26 09:10

rmoretti

(Reply to #14)

#14

Top

Thanks a lot, after I modified the PackRotamersMover, it works. Another question about ligand dock, in previous posts, you mentioned that for cross docking, "When you copy B from a B/C complex to an A/C complex, the binding pockets of the two C's won't necessarily match, and you won't be docking where you think you should be. Either move the ligand into the rough area of the binding pocket, or explicitly specify the coordinates of the binding pocket." How should I do this? Using a constraint file?
Right now, what I do is: first I superimpose the two pdb (A/C and B/C) and find that A and B use the same pocket in C, then I use the ligand center coordinates of A (generated when using molfile_to_params script) as the start_from option to cross dock B into C from A/C complex, and I briefly calculate the radius of B, and use this radius as the perturbation distance in docking_pert option. In this way, I think it can sample the correct pocket in the A/C complex.
Does this make sense? If not, what's the correct way to specify the binding pockets for cross docking?

Thu, 2012-07-26 12:43

(Reply to #15)

#15

Top

Typically the way I do the ligand moving is with PyMol - either using the align feature or the pair fitting tool to translate the item I want to move onto whatever reference frame I want them to be in. I then save the translated coordinates and cut and paste the PDBs to put things together how I want. The way you describe doing it sounds reasonable.

Thu, 2012-07-26 16:12

rmoretti

(Reply to #16)

#16

Top

Thank you guys for your help.
So I used script for ligand dock. The script I used is attached, and the flags are:
-in:file:s 1AQ1_input.pdb
-in:file:extra_res_fa STU.params
-in:file:native 1AQ1.pdb
-nstruct 1000
-out:file:silent 1AQ1_silent.out
-packing:use_input_sc
-packing:ex1
-packing:ex2
-packing:extrachi_cutoff 1
-parser:protocol ligand_dock.script
However, when I run the script, it always gave me the following error:
Tag::read - parse error, printing backtrace.

Tag::read - parse error - file:istream line:5 column:1 -
Tag::read - parse error - file:istream line:5 column:1 - ^

Tag::read - parse error - file:istream line:8 column:1 -
Tag::read - parse error - file:istream line:8 column:1 - ^

Tag::read - parse error - file:istream line:9 column:1 -
Tag::read - parse error - file:istream line:9 column:1 - ^

Tag::read - parse error - file:istream line:14 column:1 -
Tag::read - parse error - file:istream line:14 column:1 - ^

Tag::read - parse error - file:istream line:19 column:1 -
Tag::read - parse error - file:istream line:19 column:1 - ^

Tag::read - parse error - file:istream line:23 column:1 -
Tag::read - parse error - file:istream line:23 column:1 - ^

Tag::read - parse error - file:istream line:26 column:1 -
Tag::read - parse error - file:istream line:26 column:1 - ^

Tag::read - parse error - file:istream line:37 column:1 -
Tag::read - parse error - file:istream line:37 column:1 - ^

Tag::read - parse error - file:istream line:41 column:1 -
Tag::read - parse error - file:istream line:41 column:1 - ^

Tag::read - parse error - file:istream line:42 column:1 -
Tag::read - parse error - file:istream line:42 column:1 - ^

Tag::read - parse error - file:istream line:46 column:1 -
Tag::read - parse error - file:istream line:46 column:1 - ^

Tag::read - parse error - file:istream line:48 column:1 - somewhere in file

Tag::read - parse error - file:istream line:47 column:1 -
Tag::read - parse error - file:istream line:47 column:1 - ^

Tag::read - parse error - file:istream line:1 column:1 -
Tag::read - parse error - file:istream line:1 column:1 - ^

ERROR: false
ERROR:: Exit from: src/utility/tag/Tag.cc line: 387
I assume this means some format in the script is not correct? But I compared this script with other scripts I use, and I didn't see format difference. What's the problem here (sorry for this dumb error)?

Fri, 2012-07-27 07:56

#17

Top

There is an error in your ligand_dock.txt file. Try this one that I am attaching should get you started. However there are still some problems with your LIGAND_AREAS & INTERFACE_BUILDERS, try matching the cases. I don't have the time to fully investigate this at the moment. I simply changed the outer XML tags to and added the closing to the end. You had forgotten to use the "/" in the final in your script.

Fri, 2012-07-27 08:23

stranges

(Reply to #18)

#18

Top

Thank you very much for your help. I tried the attached script, but it gave me the following error:
ERROR: Could not find ligand_areas and name docking_sidechain in Datamap
This is strange because I found the record for ligand_areas here:
http://www.rosettacommons.org/manuals/archive/rosetta3.4_user_guide/Rose...
Also, in this record, it also says "all_atom_mode should not be used with add_nbr_radius". So I guess in the script, I should set one to true, one to false, but not both to true?

Mon, 2012-07-30 12:05

(Reply to #19)

#19

Top

The docking_sidechain error may be due to a typographical error - it looks capitalized in one place and lowercase in the other.

For the latter - yes, probably so. I can't find a copy of that error message in the code so I can't look it up, but it seems pretty clear.

Mon, 2012-07-30 16:44

smlewis

(Reply to #20)

#20

Top

Hi, so I spent several days to fix the problems in the ligand dock script. But it still could not run. I've attached the script and the error message. It seems the error is a file format error, and I tried many times (by backspace, by rewriting the relevant lines), but I can't fix it. Your help will be greatly appreciated!

Also, I tried to use constraint file to filter protein-protein dock decoys. From the cocrystal structure, I calculated the distance (3.5A) between CE in resi 192 on one chain and CD2 in resi 45 on the other chain, and used the following constraint file and flags:
AtomPair CE 192E CD2 45I HARMONIC 3.5 0.1

-s 1ACB_0001.pdb
-native 1ACB.pdb
-partners E_I
-dock_pert 3 8
-docking:sc_min
-nstruct 1
-use_input_sc
-ex1
-ex2
-constraints:cst_file 1acb.cst
However, when I run docking, I got this output:
protocols.jd2.FileSystemJobDistributor: Too many retries (max_retry_job = 10)
protocols.jd2.JobDistributor: 1ACB_0001_0001 reported failure and will retry
protocols.jd2.JobDistributor: no more batches to process...
protocols.jd2.JobDistributor: 1 jobs considered, 11 jobs attempted in 44 seconds
I guess this is because rosetta can not find the conformation that satisfied the constraint, which means something is wrong with the constraint I used.
So what's the problem? How can I make a correct constraint?
I'm also a little confused about constraint file format. What is the function type used for? I can not find any explanation here:
http://www.rosettacommons.org/manuals/archive/rosetta3.4_user_guide/de/d...
Is it used as a probability distribution?
Thank you for your help!

Mon, 2012-08-06 09:15

(Reply to #21)

#21

Top

RosettaScripts protocols file are XML (well, XML-ish), and as such use the same tag opening/closing conventions as XML.

In XML, there are two ways of closing a tag. The first, which allows for sub-tags, is to repeat the tag, but with a '/' as the first charachter. e.g.

The other is to include the '/' as the last charachter in a single tag.

In the included script you're mixing the two. While <ligand_soft_rep weights=ligand_soft_rep/> will work if it's by itself, as soon as you want to include the Reweight subtag, you have to remove the ending '/'. That's why Rosetta is expecting the </ligand_soft_rep> to close the <SCOREFXNS> tag - because the ligand_soft_rep tag has already been closed.

Mon, 2012-08-06 10:44

rmoretti

(Reply to #22)

#22

Top

Thank you so much. Finally I got the script to run correctly. One more question, when I used the script of ligand dock, the silent output is binary, and I put several lines here:

SEQUENCE: MENFQKVEKIGEGTYGVVYKARNKLTGEVVALKKIVPSTAIREISLLKELNHPNIVKLLDVIHTENKLYLVFEFLHQDLKKFMDASALTGIPLPLIKSYLFQLLQGLAFCHSHRVLHRDLKPQNLLINTEGAIKLADFGLEVVTLWYRAPEILLGCKYYSTAVDIWSLGCIFAEMVTRRALFPGDSEIDQLFRIFRTLGTPDEVVWPGVTSMPDYKPSFPKWARQDFSKVVPPLDEDGRSLLSQMLHYDPNKRISAKAALAHPFFQDVTKPVPHLRLZ
SCORE: score fa_atr fa_rep fa_sol fa_intra_rep pro_close fa_pair hbond_sr_bb hbond_lr_bb hbond_bb_sc hbond_sc dslf_ss_dst dslf_cs_ang dslf_ss_dih dslf_ca_dih rama omega fa_dun p_aa_pp ref rms maxsub time description
SCORE: -444.938 -1066.189 124.973 479.981 2.605 2.431 -20.108 -66.889 -50.757 -31.785 -19.506 0.000 0.000 0.000 0.000 14.382 37.567 209.149 -20.611 -40.180 0.182 277.000 44.000 1AQ1_input_0001
REMARK BINARY SILENTFILE
FOLD_TREE EDGE 1 277 -1 EDGE 1 278 1 1AQ1_input_0001
RT -0.180351 0.00540914 0.983587 0.979044 0.097149 0.178984 -0.0945864 0.995255 -0.0228167 10.8857 -6.40673 -14.2276 1AQ1_input_0001
ANNOTATED_SEQUENCE: M[MET_p:NtermProteinFull]ENFQKVEK[LYS_p:protein_cutpoint_lower]I[ILE_p:protein_cutpoint_upper]GEGTYGVVYKARNKLTGEVVALKKI[ILE_p:protein_cutpoint_lower]V[VAL_p:protein_cutpoint_upper]PSTAIREISLLKELNHPNI[ILE_p:protein_cutpoint_lower]V[VAL_p:protein_cutpoint_upper]KLLDVIHTENKLYLVFEFLH[HIS_D_p:protein_cutpoint_lower]Q[GLN_p:protein_cutpoint_upper]DLKKFMDASALTGIPLPLIKSYLFQLLQGLAFCHSH[HIS_D]RVLHRDLKPQ[GLN_p:protein_cutpoint_lower]N[ASN_p:protein_cutpoint_upper]LLINTEGAIKL[LEU_p:protein_cutpoint_lower]A[ALA_p:protein_cutpoint_upper]DFGLEVVTLWYRAPEILLGCKYYSTAVDIWSLGCIFAEMVTRRALFPGDSEIDQLFRIFRTLGTPDEVVWPGVTSMPDYKPSFPKWARQDFSKVVPPLDEDGRSLLSQMLHYDPNKRISAKAALAHPFFQDVTKPVPHLRL[LEU_p:CtermProteinFull]Z[STU] 1AQ1_input_0001
CHAIN_ENDINGS 277 1AQ1_input_0001
LdTCZBfnvAI0d+eHwwKHWB3NJ3HERLyGw35bUBDVjyHUf/EKwCB2QB757rHUzMTKwYljZBnBBvHka8yDw78deBnYQrHk8S3GwwKHhBz0NhHka8SDwQ14dBDW5VHkAr8Dwb8ycBb6GBI0gALGw/pmXB34FEIUf/0Fwc9oZBLFOCIET3kJwGZbSBHue4Hk/UjEwZQAXBvxLoHUy25Cw99zaB3MzyH0hWEAwKxggBbPKyHU14lHwEtIdBb78nH0coVJwwfqfBnBBPHUPK8BwueUdBblDUH0coFIwsyBaBbktXHErcICw 1AQ1_input_0001
LaR2WBjqx2H0isNMwGEYVB77nzHUep7Ow8n6QBzKH7HUGE4PwShrOBnDt6HkTiBRwDCMaBDqG0HEXPaQw2jieBra8rHkR2GQwQglhBblDtHUepDRwxgghBblD0H0Jx4Rwf/UjB3hWmH0fqBRw9owZB76R8HkAr8LwbnvTBfTirHkU46OwCB2bB3NJ8HUKcVQwhArYB3MzyHkU4aRwhrHdBnDtjH0OfHQw4P1fB3jCuHEZ7MOw 1AQ1_input_0001
LAAgPBPIQBI0IbHOwzMTLBHtIFIUwKXOwy25FB7P1CIkuJhNwpbyBBj/UFIkEDCOwsHlMBvcIKIk9owMwY6GRBvzXNIUNe5Nwb8SSBb6GNIUHaJQwtIbTBLwqQIksdOMwDXvRBHCMBIkqxbMw14FLBvHlGIEEYNQwe++MBPc9JIEpwlKw99zIBjWEMI0fqRNwUNeWBb78SIkBB2MwmuJSB325QIkQgVKw 1AQ1_input_0001
L8S3FBD/p8HE/pGMwzh2ABHtI4HUc9ILwHaR/A3hWtHkvfaMwwfKDBLv0mHUNe5Mwnv/ABzgA3HkBBGIwqGvAB33vAIEW5QFw8SXFBHYlDI0IbHEw5lu3APy2CIk++pDwrcIFBPHaIIEtIbBwDXP3A7QrHIUKc9Aw9oQABb7cKIU0is/v4PVJBPy24H025rLwYlD7AzeU9HklDtLw0isEBHtIzHkAr8GwcSM7Af8SyH0Gv0GwIFOJBns9BI0jCXFwamZwAzfqAIUdTiEwzMzIB34lKIk/UjAwsyhvArbSJIUepb/v6mEABnYQOI0isd7v 1AQ1_input_0001
LiBB1AnDtrHkU46MwKc9wAPIwhHUdTCOwGZ7pAf8ScHEDC8LwdTihAz1jgHEFuOLw14lqAHsyjHU5QTQwIwKxAnsdoHE99bRwc9oqAzgAqHEZ7sSw2jChALFunHkR2uSwktzvAHZ7tHkSMwTw4P1vAjArxH0kYgMwrco3AHuecHkuJhOwEtIkALaRpH01jCQwgVOnAXMIcHEFumQwKc94Af8SlHkmZuRw+TNyAHXPwHUKc9QwU46rAjrHvHkCXnUwmZm3AHCsvHkqxrTw 1AQ1_input_0001
LFuetAz0NTHUjX6KwnvfnADBWNHkDt4IwlYQaAvz3IHk/UDKwEtIbA/RhCH0FZ7Lwb8SuAPdTEHEoacHwdTi3ArEDIHUPK8DwDCs9A77n+G057nBwe+eDBnCXCHEueU8vEtoHBrEDIHkBBWBw/Uj0A76RQH0LdjLw+TNlArGvSH01jiGwPKcxArGv/Gk++ZJw789oAHXP/G0NJGFwb8S0AT14MH0bSsAwBW58AX46MHk8SXGwVjXABTLy5GEtI7EwgVO4A7P15GUepb+v9oQFBLEY7GkYQg4v8n6BBLFuHHE8na0vEtoKB7kYKHUktz9vPf/FBvyhOHUwKHDwOJGJBvHFDHU5QLEw 1AQ1_input_0001
LXPKIA/naMH0z3/Iw14lm/CAAJHk789JwKxgQ/OIw+G0colIw2Off+S0i3GU5Q7Jwf/UD+aktRHkpbiJwShrn/syhNH01jSKwhAr8+C/pbHEXPKLwoa8IAjArRH0257Gwbnvn/GZ7GHkQgFMwOJGE+eoFUH0eU4GwtIb//EsyTHUgV+Jw2Ofv/scoGHEnEDJwgqxn/IwKLH0P1YMwhrHl+4luhH03P1Kwc9oA/qZmZHEsyRNw+TNu/+oweHERLiKw 1AQ1_input_0001
L5QLa/+9T+GUy25DwW5QD/akt0GUep7AwBrcg/alD1Gknvf0vShrj/Kbn9Gk789jvrcod/Qf/yG00NJBwBW54/cnv7GUy258vnEDUAViB6Gknvf+v9owZAltzyG057HCwXPKgAlsdAHklDt6v4P1o/GCsEHUy25Bw9owh/SLytGk++pCw99Tl/0hWrGkZmZ+vShrn/4OfyGULyWFwzhWw/gqxDHEsyh/vhrHx/cR26GUjX6wv 1AQ1_input_0001
LY6mo/S0irG0257nvzhW0/WjXqGUgVO5PWO0y+6P1rG057HAQgqxD/04llGkQgV/PYo42+239zGUd2iDQ6n/L/IkP2G0qqHHQjsdEAzLdfG0kYQ7P7RhLALbndGU3kYDQ67nYA/naTGUjX6DQSMIcADrcQG0573IQbnvkAzfqGGUjXKJQ67nm/a78kGU46R1vFDCGAbmZwGUGEY7PMdTSA/S3eGEZ78zPPKcx/amZZGkTiB5Px1j6/GDCdGkoF5FQ/pGVAjpbkGEXPqEQFDCkAzfqUGEUN+BQSMIQAb78MGEBW5BQx1jMAPJGPGECsyJQxgAiA3NJXGk8S3JQXktlAvd+EG057HLQ789rAnX6HGUJGUIQdoFhAzLdAG0z3PIQ 1AQ1_input_0001
L4NRA9wHysGEvHwCQhnIp/YQ+uGEPihFQ4HZk/AByuGUjfyJQXUrCA1rbuGkHeJLQoT/B+u5DkGEzVk/PcHq7/ox45GkEu2DQJwmUA5aM7GkXtAGQK5Oe/U86CHUYx0FQH0cgABKREHkDKdDQkizO/W6ByGkLNGDQHAAAA5qhoG0+/2EQGe49/4wM6GUiFB/PDwXTAFok8GkqsJJQZK/cA1erzGEy2SFQZqrs/QAlKHEmPnEQteyt82kDCHUdpGEQAx0Z/0fnCH08jFJQKecoA5LsEH0d8LFQEWFhAV84CHUZcU+PWFkYAxKyLHEk6KEQ 1AQ1_input_0001
LeGct9O9kuGkZa0KQ20Nc+qzztGEGUuNQMJZ1/WIQtGED/rOQVyFKA/6LwGUNPWNQiFua/2x7uGELrqJQqEki+cqmmGkbLbOQ/FhY+cPz0GUgjpOQ 1AQ1_input_0001
LNh/5/yhIpGUX3jQQMdCSAHCfnGUlFKRQDSwRArOitGUUOfSQOPDEA3DXsGUreWTQHf9WAnombGUgIXRQu5qXAPXFVGEE6EQQIW3fAHW9JG0EFVQQ5nTiAbAgHGEdXdRQ21ihALLNEGUEiwOQN7Pf/urFnG0ncFRQ1BXeAzO5qGU30gQQxLkKAXnaYGEwEBSQYfNjAXDIbGkRM3RQ9qjgAXzzYG0r2kOQun+GAvusUGkzQiPQ 1AQ1_input_0001
LrmbgALVA1GU5njSQUMShAz007GUkMuTQaQZoAnCv2GE9jxUQYFztA7ekuGERbpUQjkplALM/1GEa5yRQRMqSAvBG9GUl7IUQ2brkArefDHEnLbTQ 1AQ1_input_0001
LAv7oA3Bd8G0w34VQDo2vAfQb5G0GKAXQLMe7Ajqc4GkBEkWQW9lABH3JxG0FDAXQ/xIvAL3iBHU+AFYQtKKkAXDBDHEz8QYQdTc1APWn9GE+krYQOeZkAjaHDHEs29VQYmntAPBhxGUxdXXQ09YyAzXLJHk6p0XQgrPhALX/IHkfjBYQ8z10ALkdDHUPVFZQ30z9ArNl8GUmriYQYkLyA3b/1GkzU2YQ 1AQ1_input_0001
LDMX/ArGR/GE/dqVQhODFBjWW/GEEiJVQTlYGB7H66Gk1nxTQqrFLBTyD4GkU2kTQut5GB7oPLHESbCVQh3FHBj8HPH0ZVeWQF+wLBbK9NH0duMXQHvkCB/gNTH0jTFXQkm+LBfGFRH0dvQYQzUvCBn3WWH0/BNYQ1MdHB/yTVH09ajYQpXoHBHgiYH02hNZQQsF6AbXyEH0jQVVQKMsHBPAA7Gkq10VQbl+DBH4ZPHEDvaUQUgxKBXePLH0GYiUQf1QPBD4iKHkgRtWQvNz9Abw5THUkDgWQaAqPBrKKQH0QCjYQRpV+APEnZHkPRcYQJDFLBTWgXHEjcaZQ 1AQ1_input_0001
LxnmCBnWO7GU202SQp6vDB/GO3GkDwgRQDa48AjLe2GEhNxQQN4s0Ansz5GkaxURQX+29ATwK+G0yxDTQeyoFBbvYvGUcslRQLZdGB3mv8G0QuARQ 1AQ1_input_0001
Llwc9Ab4MyGUP8HPQ7YvzAnnMxG0IbiNQGXxwA7cK8GU2RcMQiDw3AP3DDHUjcHMQqIE1AvJOpGEOtNLQeTirAzk2pGkC7UJQJeX2A7Y5dGkovRMQPjOCBzMnvGUjkWOQRQAtArsluGEgtwOQFRX8AX2IrGEswHKQSomsArfKkGU4/YHQC87qAXt6xGkh2gIQrNPkAzT7nGUrnaKQozS3AvJXYGUr9mKQ8AUvArksbGU60bNQ7tb9A3sfdGUsXjNQ 1AQ1_input_0001
LHKemAnKn9GU/J+LQujeiAX4iHH0691KQPurZA3ZHFHUvAOIQhN5NArkS9GEgrBIQ7LdVA7EVNHk6JwMQ8yHMAfzNXH0vjaLQCRZgAbPXQHEJeVPQtvPhAHWs3G0wxZMQT7ApATB5MHEb5SKQxZjIA/13HHUdrSNQ70MBAzENbHE9tyMQoKcDAPM3UHUtlnJQz1AZADbrcHU+J4KQA0mVA74bUHUtKVQQ5EFnA3pqVHUzm4OQs9XjAvOKJHEiBKQQ 1AQ1_input_0001
LxZldA38yLHUE3aEQc9HUATleKH0umZ+Pr8/GAb1QUHEEfH7PCGaMAjrKdHkvoB+PshyiAnTQIHEyqN0PSr/pAXI9+GkkS84Pb2HyAjCHAHEkGNAQG+poADLD1GEnOQwPdVu4AfGp3GkXAhBQVfNvAbvhsGU8JX1PdXP3Abt2tGkz/I+PR/y9AfNZlGEtqWAQGJyjArQBSHkWcEFQLtYKAPzPDH0rwP+PriknALRhPH0lZwyPaeldA3boGHUpZvjvL1PzAXW3HHEWqHCQX9SiAnTD0GUD1CmvvxE/Avjq4GUhRdEQlaBuAPMzkG0hXQrP49E8AD2+eG0t718P 1AQ1_input_0001
L3myu/KFgSH0yPewPDxiP/OXIbHUbl6gvCsyx/iArcHEtIb7vjvY9/CMoUHUvvSAwXqFO/UaWYHk2Jatv+S0t/YfJgHEHGy6v7oJNANRKcH0fNt9vWU5XA1chjHU7z4CwaismAV7hfHkEVfEwDKsm/ew1KHU65lqPk+3O/iFOiHU983vP50bn/ASKYH0YQswPLc8O/INYQHE8rx1vDaeV/sNDhHEUDBBwWbXw/Ef2nHUpgg1vPfyWARrybHk8V40vebtLAxjKUH0AijAwUNrNAZCEkHEi7ZGwkkfZABrarHUvGFBwMt+pApRokHU7PKHwKdcrANNCfH01kOBw0b9lAN9NYHkYvKGw 1AQ1_input_0001
LiWk9/S0imHkU469vwKHMA33PoHkvfqDw0isFAjAryHEtIbGwoa8y/2254HUc9IEwQ14hAnCXoHEXPKCwT3k0/mX6sHEFue5vtIbJAnDthH0isdGwDCsmAf9opHkCXvFwnEDkALFugHULyWAw3kYjAHsyuHUc9o+v 1AQ1_input_0001
L9owNAT2O0HErcoJwv0NJArED+HkBBGLwsyheAfSsBIUktzLwEYlmAf8S+HEue0MwHaR2/iVO8HERLiNwVjXq/muJDIUJGEPw/Uj3+iVOCIkArsQwUNep9eTCHI0w1bRwaR2O/MeJHIU3kgSwHv0p/Y5wCIEZ78SwqxLd/ktzLIk78FTwmuJXALbnuHEqGfKwtIb//+TtBI0yh2JwepbS/OdT4HkQg1Mwpw1DArw12H0jC3OwktzDAvyBFI01jyPwFDCU/KE4FIkArsNwR2O/+MHaAIksdWQwEYlb/qED/H00NZRwIFuu+Wj3KIU46RRwFuek/ovf+H0d+eSw03P9/457CIU3kwTw46RB/kuJPI04luSwZ78x/YQAMIUNe5Tw 1AQ1_input_0001
Lpw1fAzJxGI03PFLw3kYpAHDCKIUjXqLwUNepALFuKIEoasOwLHaiA7lONIE/p2Pwf/UpA7kYPIEFuOKwhArzAX4aSIksdeKwPKc5ATK8RI0HFeMwIFu1A76xVIUdTiIwLHaTATKcIIEaRGKwmZmwAT14HI0w1DLw3kYmAjVuPIkqxLIwZQgjA77HRIkmZmLwjsd8A757XIkVOkIw/UjwAfSMWIk/UDGw 1AQ1_input_0001
LiBBxA3iMIIE7RBQwTiByArbyHIU0icRwXktxAzJxMIEShTSw6mEuAf9IMIUf/cTwjsd8Aj/0EIEVjvRwDCs+Ans9DIUgVOTwiBhEBvz3AIEDCcTwqxrFBPJGAIUO06Uw99zKBvIb6H00NJVwsyh2APHaGIk8S3OwW5QrAz1jFI0yh2RwPf/7Ans9AI0IbPRwjsdBBjALHI0LdTRwpbS/A/S3HIE+TtTw2jC4AfnvBIk++pTwc9IEBPc95HkrH9SwRLyHB34FDI0yh+SwfU4FB/RBEIEW5YVwLHaCB77n7HULyWVwepbLBz1j5HkDtIWwrcoKBHtIzHE+TtUwaR2NBD/p+Hk8SvUw 1AQ1_input_0001
LhAr1ADBWRIkX62RwUNe1AHu+VI0eUwSwBrctAThLaIE99TSwiBBpA34FdIEBWJTw4PVAB/8dYIksd2SwDXvEBfSsUIEUNWTw67HKB3jiXIEZ7UTwamZDBzJxSIUGEwUwCsy4AnYwRIUep7Qw67nyAPIwUIU2OvTwjs9AB75bZIU5QzRwMdTAB3NJcIEDCcTw03vEBz0NRIEDCsSw5lONBLv0UIEShrTwO0CLBf8yYIkqxTSwyL9JBHDCbI0yh+TwFDiGBvHFQIksdGVwwKHDBfSMWI0gAbVw5QL/A3NpQIEqGvUw 1AQ1_input_0001
Lf/UrArvfaIUc9ARwnEDkADWZeI0MzcQwCBWRA7PVcIEnETQwjsdCAzLdfIkrHFQwCsynAfSMgIEpwFOw4P1mAfolbIk++ZMwcSMzAfnPiI0MzMOwiBBvAXj3XI0fqZQwY6mjAT14hIkmZGRwsyhiA/SXjIkSMYNwGZ7oAzggcI0IbnKw0is1ATffjI0xLNMwrcozA3NplIE99jPwyLd4AX3EfIEXP6Ow 1AQ1_input_0001
Le++NAb6GXIEqGXQw++py/67nUIECsSQwFDCc/OzMTIEoa8NwHaRW9IFOQI0z3/Nw/pGbArbyUIEpwdQwbnvz/WO0QIU6m0QwQgVW/6lOXIUc9wQw 1AQ1_input_0001
LzhWs/+TNVIUy2pLw+TNO/Oc9TIEVjHJwYlDN/ms9NIUMIgIw03Px/aktKI0LdTJw4P1s/WN+WIEmupFwCsyl/CAAdIkR2uFwpw17/iArfIkX6GBwnvfIADr8cIkCXP8vhAr0/2hWkIECsSAw/pGDAXMoXIEBWpLwEtIr+ks9UIEAAQJw2OfHAXMIWIEtI7FwgVOg/i/UVIUep7BwTiBB+GtIeIkuJxFwvJx0/mYQeIEFuuIw 1AQ1_input_0001
LhArs+oFZMIEv0NGwBW5A/Mc9GIkvfKEw6mED+syhHIknvf8vHaRG/4PVLIUc9o2v1NJ+/cnvEIUktzEwaR2AABrc+HERLSCwW5QFA9RhEIkBBWJw+TNi/scIPIknvfFwHv0N+eSMEIEEYFGwHaRKAlsdHIk8S3Cw/pGRARKc7H0fqxCw1NJ6/sd++HU6mE8vf/Ur/kBB5H04lOEwyLdVAZ78CIUdTiJwueU0/sI7BIUjXaKwY6mEAFDiIIU0iMKw 1AQ1_input_0001
LzhWU/OyWEI0P144vgqxr/W3kEI0isdmP++pu/ud+9HUktz1P14lm/24l1HUktznPoa8MA325GIUxgAjPU46NAXihMIEmuJtvmZmbArbSDIkArcyvmZml/KbnBI0NJG+vAAAQ/+RBHI00NJ2P2OfTAzgAHI0xLd5PjsdeAPH6NIUPKctvHaREALFOPI0gArkPbnvHAvdeMIUPKc7vHv0lAHu+EIk8S3yvoFZVAXN+CI0yhW9v5QLcAf9o+Hk8S3kv 1AQ1_input_0001
LIwK7/CW59HUdTiAQgqxBArbS0HUwKnDQJW+ZAbcL0H0evnEQ1LQhAf0f7H036LHQi6Cq/GAH0H08eZIQfU4+/CVjCIUzMTCQNQw4/OmYtHkrIUBQzCIy/mz7sHUd5hJQ/Ref+iYF0Hkeo4HQl2zx/yUK7HElKlJQ 1AQ1_input_0001
LQDjiAHRMsHEE3fCQwwNuAnm8rHkAFIDQXeEyAnuSjH0dF5GQrA/vA7d1ZHUnRFGQGlBzA/PkqH04XF7Pt2O/ArllpH0k6S6PIIQCBz89zHUya09Ps77ABn33mHEVXfiv6kzdArcQmHUk2cAQ+66wAX6azH0yd6EQQFgwAznGyHEsTC3PMHWvA747jH00zj1PPP0ABXx6iHU/JY/PtalGB7zLzHk4XO9P6AKBBfSs1HEwiCDQug6ABj7e6HEGkr4PwVQFBvGGmHEPa6ov+K9+A/lOtH0+nD1vO3C+AfFWfH0Vr9tv 1AQ1_input_0001
LfR03ADZemHkUGiJQTjv7Az8geHEaxeLQL81DBXplcH0AmILQg/BHBbsJkHUoCCLQtfa5AjcHiHENIWOQkbc7AD6lZHEKyOQQsJF3ADb/cHU7glRQCdX3A/KzTHkaUlSQYDPyA3utWH0EC4TQ6Wk5APHSuHkkL0JQYY13ADG2WHEPdILQyWCxADEikH0lYaOQeie+AHcCpH09hvOQRlBCB7GSYHEvnTQQHqj3AzRNSHkLU1PQKM2uA/GsfH0KMdRQrf77AzfljHUKj9RQGUs/ATjpRHExfwSQlxKzArnANH0lQJSQqtlyAXnaQH0UcgUQEagqAv/sYHErttTQIDI2A/HAdHEjFSUQ 1AQ1_input_0001
LUxWFBDEhSHEO3ALQSp5KB7QTPH0IehKQXpNNBz/oKHUPFHNQrYaLBLEZCHkb0GOQE9PLBX4KHHk6pPIQMH6QBTJfDHUt6LHQUS6QBjEA7GEPs0CQBviWBPqA3GUsCrBQZwkWBP/RuG0/3P7PyviCBX1zMHk9aRLQWzRNBnBSWH0/p+JQ7TfJBDl6KHUHz9EQzz1IB/ESAHEyC1IQA1sSBDYNAHkqCaJQSBOTBPxYKHEGH3FQfSNPBzMe+GksXZ+PuUbOB7oU0GkfgIEQS3QYB3HF0GE/YZFQn75YBbDu9G0iCNAQP+XaB7o3rGkfs85POr+UBbj/wGUoKqwP3xYUBvlDoGUOx99P 1AQ1_input_0001
L8j7QBXPbQHkDaTOQJebTBX9nNHETNaQQIf1XB7tSFHkrgNQQElubBneHIH0XMCPQLXEXBza47GkkIzQQpn9aBfqPzGE5DsQQymyVB7JjXHkcsGRQOukRB7xYgH0faNRQeM7XBjYoUHkIyeSQZBtMBH2PdHkVBBSQERDSBrXOXHkleXNQ5YjQB3q0JH0Q4ERQVFCZBfquaHkVmfQQcKXQBndciHEJgMQQKkfTBjjSnHEFKrRQh4kZBzWvbHknU9SQLwCbB/omOHkWaYSQxfrUBLcdRH0Z8FTQBj5JB7y6jHUf+CSQdY6NBfwMbHU7DCTQ3vxKBftXWHk0ZjRQ 1AQ1_input_0001

When I tried to use extract atomtree to extract pdb from it, it gave me this warning:
extract_atomtree_diffs.main: Finished all 0 structures in 1 seconds.
Warning: No structures processed. Existing output files may have been skipped, did you mean to delete them?

But when I used the ligand docking executable to do ligand docking, I got a normal silent file and I can use extract atomtree to extract pdb. Also, if I changed the output in the script to out:pdb, I got normal pdb with script running.

Is there something wrong? How can I extract this binary silent file? I found a previous post talking about extracting pdb from a binary file. I tried the flags in that post, but it didn't work.
http://www.rosettacommons.org/content/how-get-multiple-chains-pdb-fold-a...

Wed, 2012-08-08 07:27

(Reply to #23)

#23

Top

Can you be more specific about which flags you've tried?

Wed, 2012-08-08 08:24

smlewis

(Reply to #24)

#24

Top

These are the flags I used:
-s 1AQ1_input.pdb
-in:file:extra_res_fa STU.params
-in:file:native 1AQ1.pdb
-nstruct 1000
-out::file:silent 1AQ1_silent.out
-packing:ex1
-packing:ex2
-packing:extrachi_cutoff 1
-parser:protocol ligand_dock.script
And the script I used is attached.

Wed, 2012-08-08 08:29

(Reply to #25)

#25

Top

Sorry, I meant, what flags you've tried for the _extractions_?

Wed, 2012-08-08 08:29

smlewis

(Reply to #26)

#26

Top

First I tried these:
-extra_res_fa STU.params
-in:file:silent 1AQ1_silent.out

Then I tried:
-extra_res_fa STU.params
-in:file:silent 1AQ1_silent.out
-out:pdb
-in:file:silent_struct_type binary

Wed, 2012-08-08 08:35

(Reply to #27)

#27

Top

-extra_res_fa STU.params
-in:file:silent 1AQ1_silent.out
-out:pdb
-in:file:silent_struct_type binary

As arguments to score_jd2 is basically what I use for extraction; I'm surprised it doesn't work. Is it crashing, or finishing without doing anything? What if you use -in:file:tags and give it a single tag argument (1AQ1_input_0001) - does it get that one extracted correctly?

Wed, 2012-08-08 13:06

smlewis

(Reply to #28)

#28

Top

I'm an idiot so I thought the silent output using script was still an atomtree format. So I used extract atomtree instead of score_jd2, which didn't work. Both Rocco and you found the mistake and now it works fine. Thank you guys very much.
Now I tried to dock two ligands at the same time using script. I named one ligand chain X and the other ligand chain Y, changed the chain ID in both pdb and params files output by mol2toparams script. Then I combined the two pdbs with the receptor pdb, and rewrote the script(create ligand_areas, interface_builders and movemap_builders separately for each ligand, and tranlate and rotate each ligand separately). I attached the new script. Then I got this error:
ERROR: unrecognized mm_atom_type_name Y
ERROR:: Exit from: src/core/chemical/MMAtomTypeSet.hh line: 80
I checked the input and didn't find any atom named Y, but from the source code, it is caused by an atom named Y, right?
How did this happen?

Thu, 2012-08-09 14:35

(Reply to #29)

#29

Top

It is upset about an mm_atom_type being Y, not an atom's name being Y. This does not mean Y appears as an atom name in the PDB (which you checked). It probably means you have a Y as an mm atom type in the params file (Y.params, or whatever).

Check at the top of the params file; a few lines down it starts a list of atoms with four columns (here for alanine):

NAME ALA
IO_STRING ALA A
TYPE POLYMER #residue type
AA ALA
ATOM N Nbb NH1 -0.47
ATOM CA CAbb CT1 0.07
ATOM C CObb C 0.51

One of your ATOM lines probably has a Y as the third (possibly second) atom type entry (4th or 3rd column). Change that back to X and see if it works. X is a special character for user-created ligands in the 4th column to mark "I don't know the MM atom type", so you probably accidentally changed some X to Y when you changed the ligand name from X to Y.

Thu, 2012-08-09 19:43

smlewis

(Reply to #30)

#30

Top

Interesting. I did change the 4th column to Y on purpose for I thought I would use X for one ligand, and Y for the other ligand. After I changed it back to X, it worked. But I'm more confused. Can only X be used for user-defined ligands? Then How does Rosetta know that there are two ligand but both of them use X (in the same chain X)? I tried to use the script of docking one ligand to dock two ligands, I just add another start from coordinates to the other ligand and kept all the other line in the script the same. But the structures generated were not correct: either the two ligands were together in the same pocket or one ligand did not change at all (the two ligand were in different pockets in the crystal structure). How can I dock two ligands at the same time correctly (perturbing both)?

Thu, 2012-08-09 21:07

(Reply to #31)

#31

Top

I found this paper "Rosetta Ligand docking with flexible XML protocols" by Meiler group talking a bit about simultaneous ligand docking. It says "By specifying multiple ligand areas, multiple ligand docking is enabled". This is consistent with my initial thoughts: define a ligand areas, a interface builder, and a movermap builder for each ligand. However, if only X can be used for ligands, how can I define different ligand areas for different ligands like in the definitaion below?

Thu, 2012-08-09 21:36

(Reply to #32)

#32

Top

X is treated as a special character in the mm atom type fields (that fourth column). The letter in that field has no relationship to what you choose to name your residue's type. You could change _residue_ type X to be "blahblah", leaving the mm atom type column at X, and it would work fine. The Xes in the fourth column for the Y residue type behave the same way; they aren't related in any way to X the residue type, they're just X the special mm atom type.

Fri, 2012-08-10 07:19

smlewis

(Reply to #33)

#33

Top

I think it's important to draw out the distinctions between the various things that could be called "X".

First off, there's the chain. The chain designator is only going to be found in the PDB, in the chain designations of an XML script, and other places which use PDB numbering for residues (like resfiles). It is not going to be found in any form in a residue type params file, ligand or otherwise.

Secondly, there's the residue type name. Usually you wouldn't want to have the three letter name be a single "X" (or rather " X", " X ", or " X"), so you'd never see that in a PDB file, and many of the other locations where you'd want to specify a residue type. There is, however, a single letter residue type specifier, which typically is "X" for ligands. This is used only in locations where you represent the structure as a string of one-letter codes, like the SEQUENCE lines of silent files. It's also found as the third entry in the IO_STRING line of a residue type params file.

The third X being discussed is in the fourth column of the ATOM lines in the params file. This X has nothing to do with either the chain nor the residue type name. Instead it's the designation of the atom type for that atom in molecular mechanics-related contexts in Rosetta. As these molecular mechanics atom types are not commonly used, and molfile_to_params.py is stupid/lazy, it doesn't bother to figure out those atom types, and instead just puts "X" ("unknown") in that column.

A fourth thing may be an atom name. These would be found in the second column of the ATOM lines of a params file, and as the atom name field (third column) of a PDB file. It would also be found elsewhere (like later in a params file) where you would want to refer to a specific atom. This, again, is a completely separate designation from a chain, a single letter residue designation, or an atom *type* name.

So, if you have multiple ligands, you don't need to touch the params file at all to change the chain designation. You can leave the MMtype, the single letter code, and any atom names as "X" and things will work as intended.

But when doing multiple ligand docking, you do want change the *chain* of the second ligand to be something other than X. Again, to do this don't touch the params file, but instead change the chain field (fifth column) of the PDB file. You would then want to make sure that any chain reference in your XML file that refers to the second ligand was correctly labled as Y (or whatever chain you want them to be).

Short answer: *chains* gets changed to Y's, all other X's stay as they are.

Fri, 2012-08-10 15:18

rmoretti

(Reply to #34)

#34

Top

Thank you for your patient and clear explanation, Rocco. Now I can run multiple ligand docking correctly.

So I'm trying to substitute a Tyr residue with other residues in a protein. I allowed the original position to pick up any aa except Tyr and Cys, and also allowed any residues with an atom within 6A designable, and allowed any residues with an atom within 8A but farther than 6A repackable, and run 5 cycles of design (packrotamer mover), minimization (min mover), and relax (frelax mover).

(1)In this case, some residues with H atom in 6A are also designable. Is this necessary or do I just redesign residues with Ca atom within 6A (just like enzyme design)? Will the two have a big difference? Is 6A far enough to include all residues interacting with the residue I want to substitute?
(2)About min mover, when I use it for backbone minimization, the working principle from my understanding is first to do a small perturbation, then to do a gradient-based minimization. So I think movers for backbone movement (Shear, Small) are already included in min mover, right? So there is no need to specify backbone movement before min mover? And if I want to do a flexible backbone design, I can perturb backbone using backbone movement movers before design, right? Also, is frelax mover necessary here?
(3)The target protein is large (about 600 residues). From other posts, it is recommended not to do a relax for large proteins. But the score of the protein is high (about 10000), which means there are some steric clashes. If without relax, the design step will only remove the clashes to achieve a lower score, right?

Mon, 2012-08-13 08:18

(Reply to #35)

#35

Top

(1) I usually use 10 A in "neighbor atoms" (usually CB) for deciding what should pack around a change. I think you are safe letting things design if they have just an H within 6 Angstroms, but you are equally safe not to. You are going to need to make judgment calls and use your biophysical intuition to decide if those mutations make sense anyway, and you'll be able to see when those further mutations are useless versus when they are important. You can always re-run the simulation with a resfile that forces the sequence you want to examine.

(2) No, MinMover does only minimization. Generally when the Rosetta community refers to minimization they mean only gradient-based minimization; any Monte Carlo steps to introduce diversity beforehand (like Small/Shear moves) are not part of minimization. MinMover is just doing gradient minimization of the backbone into a local minimum. If you want _sampling_ of the backbone, you generally need to to relatively fancy things, but if you just want to let the backbone relax a bit into your designed sequence, minimization alone is a good time/accuracy tradeoff.

(3) I think fast relax is marginally less sensitive to large proteins than old relax was, but it will still be very slow at that size. Try it at -nstruct 5 and see how long it takes versus your CPU budget. Doing repacking alone on the entire protein (don't forget to disable design) will almost always lower the score of a crystal structure, especially one with clashes. I'd suggest "fixbb -ndruns 10 -nstruct 1 -packing:repack_only -ex1 -ex2 -ex_cutoff 0 -use_input_sc -resfile a_resfile_that_locks_down_the_ligand_site" or similar. This should take less than 30 minutes on your desktop.

Mon, 2012-08-13 08:28

smlewis

(Reply to #36)

#36

Top

Thanks for your fast reply, Smith. So for (3), you mean I can prepare the structure for Rosetta input either by fast relax or by repack without design? For fast relax, I can generate structures with RMSD less than 0.01A from the crystal structure using cst file output by sidechain_cst.py. But I don't know RMSD ranges of structures generated with repack.
Also, are there any scripts that can output residues positions with atoms within certain distance from atoms in a specific residue in 3.3 or 3.4? Right now, the way I use to know this information is using pymol, which is slow.

Mon, 2012-08-13 09:12

(Reply to #37)

#37

Top

"you mean I can prepare the structure for Rosetta input either by fast relax or by repack without design?"

You can lower its score, which implies relaxation into the Rosetta energy function, by using either relax or repacking. Fastrelax = repacking + minimization. Fastrelax will do a better job, but just repacking is much, much, much faster because it is only a subset of fastrelax.

Repacking will generate backbone RMSDs of zero; it doesn't alter the backbone.

Rosetta contains code to calculate neighbors, but there isn't an application for it that I know of. You might be able to get to it with PyRosetta or RosettaScripts. InterfaceAnalyzer will print residues at an interface; you could hack an extra residue into the PDB at your site of interest and get InterfaceAnalyzer to tell you the "interface" with that made-up added residue as a quick-and-dirty workaround. Few or none of the commonly-used neighbor detection schemes for this level of work (including InterfaceAnalyzer) do atom-to-atom calculations as you describe, it's usually CA-CA or CB-CB distances instead.

Mon, 2012-08-13 09:22

smlewis

(Reply to #38)

#38

Top

The other thing relax does, in addition to the minimization, is to cycle the repulsive energies to collapse and then re-expand the structure to get better packing. While minimization definitely adds some time, it's these cycles of repack and minimization with varying fa_rep that really increase the runtime of relax versus repacking.

I should note that relax with all-heavy atom constraints is rather convergent. You don't need to do the hundreds-of-output-structures-then-pick-the-best-one thing that's normally done with a regular unconstrained relax. A few structures (e.g. Steven's -nstruct 5) should be more than sufficient. In practice, I typically only run relax with all heavy atom constraints for a single output structure, as that's usually "good enough" to get reasonable starting scores.

Regarding distances, one thing you should keep in mind is that the commonly used pairwise interactions in Rosetta (especially those used in score12) all fall to zero outside of about 6 Ang. Residues with all atoms further than 6 Ang apart will not interact with each other energetically. However, they may interact indirectly. That is, residue A will interact energetically with residue B, causing it to repack/move, and the change in position of residue B will cause a change in energy of residue C, which may cause it to want to repack .... In such a fashion the whole protein may be involved. Typically you set the cutoffs such that you redesign all the residues which are in energetic contact with your ligand/interface/etc, and then repack all the residues which are in energetic contact with the residues you're designing - though there are alterations depending on how aggressive you want to be.

Mon, 2012-08-13 11:16

rmoretti

(Reply to #39)

#39

Top

So when I did relax with constraints for one target, I got this error, which I did not find similar cases from previous post:
[ERROR] Exception caught by JobDistributor for job 1L8T_0001Atom CG 2
Does this mean some residues missing CG atom? (But I thought Rosetta can fill up the missed atoms, right? So this error seems strange.)

Tue, 2012-08-14 14:27

(Reply to #40)

#40

Top

I'm not sure I've ever seen that either. It does indeed look like it's trying to find the CG atom of residue 2 (pose numbering). My guess is that it's not so much that the CG atom is missing (Rosetta will fill in atoms which are supposed to be there but aren't in the input), but rather the residue type at that position doesn't have a CG atom in the first place (e.g. it's something like Ala or Gly).

Are you using the automatically generated constraints, or are you providing your own constraint file? If the latter, my guess would be that you have a bad constraint definition involving residue 2, atom CG - possibly because of pose numbering/PDB numbering confusions.

If that's not it, I'd need more information (e.g. the full commandline and the complete tracer output) to debug.

Tue, 2012-08-14 18:25

rmoretti

(Reply to #41)

#41

Top

I used the generated constraints from sidechain_cst_3.py, and you are correct that resi 2 ia ala. But even after I removed CG constraints (CoordinateConstraint CG 2 CA 2 55.265 -10.151 -258.024 BOUNDED 0 0.1 0.5 0.5 tag) for that resi, it still gave me errors. I've attached the constraint file generated, and the script always_constrained_relax_script is in the package, the flags I used are:
-in:file:s 1L8T.pdb
-in:file:fullatom
-constrain_relax_to_start_coords
-relax:script always_constrained_relax_script
-constraints:cst_fa_file 1L8T_sc.cst
-nstruct 1
-linmem_ig 10
I've attached the run log, too.
The version I used is 3.3, and I copies these scripts from 3.4 and put them in the same folder as in 3.4. The pdb is also attached, in which I removed all hetatm.

Fri, 2012-08-17 09:17

(Reply to #42)

#42

Top

One more question, I'd like to replace a short b-strand in my target. Is there a way to use rosetta to search the whole pdb to look for sequences with similar structures? I found a relevant reference "High-resolution design of a protein loop", in which they did similar work for loops. But I don't understand how they searched the whole pdb for sequences with similar structures?

Fri, 2012-08-17 09:30

(Reply to #43)

#43

Top

First off, if you have loosely/unrelated questions, it's probably better to start a new thread than continuing the current one. We're getting a little squashed here, and quite far afield of "protein docking question".

That said, it looks like there really isn't an off-the-shelf solution in Rosetta. After talking to a few people, it sounds like your best bet is to use a downloaded version of TMAlign (http://zhanglab.ccmb.med.umich.edu/TM-align/) to scan a PDB of your fragment (one-by-one) against the set of PDBs you're interested in. TMAlign will output a score and a sequence alignment for each structure, showing you the best match. You can then collect all the ones that pass whatever threshold you think works best.

Fri, 2012-08-17 17:11

rmoretti

(Reply to #44)

#44

Top

This sounds a lot like LoopHash, which I don't think is public yet.

The loop design paper you're talking about was done extremely laboriously in Rosetta++ with a lot of manual intervention. It could be repeated, yes, but you'd need many months or years to learn the code well enough to do it.

Sat, 2012-08-18 18:22

smlewis

(Reply to #45)

#45

Top

First off, we should be clear to distinguish between pose numbering and PDB numbering. You residue 2A (chain A residue 2) is Ala, but as you don't have a 1A residue, it isn't going to be pose number 2. The constraint file format you're using is based on pose numbers, not PDB numbers, so it's looking at residue with a pose number of 2, not a PDB number of 2A.

That's actually what you're running up against. If you take a close look at the input PDB, you'll notice the occupancy of atoms for residues 2A to 4A (and some others) are zero - which causes Rosetta to discard those atoms. If there's an insufficient number of atoms left after discarding, the whole residue is thrown out. I don't know why the relax application is not flagging it, but if you pass the input structure through e.g. the score_jd2 application, you'll get a number of warnings like

core.io.pdb.file_data: PDB reader is ignoring atom N in residue 2 A. Pass flag -ignore_zero_occupancy false to change this behavior

The constraint generation script is less intelligent than the Rosetta PDB reader, so it isn't aware of the zero occupancy issues, so it generates a constraint for PDB residue 3A, labeling it as pose residue 2, whereas Rosetta is dumping all the residues below 5A, so Rosetta is labeling PDB residue 6A as pose number 2, with associated numbering shifts throughout the protein.

There are two possible fixes. The first is to try passing the "-ignore_zero_occupancy false" flag to relax application, so it builds those residues with zero occupancy labeled atoms. The other is to pre-trim the PDB, removing all the zero occupancy atoms, and then regenerating the constraint file.

Fri, 2012-08-17 16:56

rmoretti

(Reply to #46)

#46

Top

Thank you guys for your great help, Rocco and Smith!

Tue, 2012-08-21 09:13

(Reply to #47)

#47

Top

As you indicate, that's a binary silent file, rather than an atom tree diff silent file.

Most of the ligand docking stand-alone applications are built around atom tree diff format files, although in 3.4 they should be able to handle binary silent files with the standard jd2 flags as well (although for backward compatibility you might have to be explicit with things like "-in:file:silent_struct_type binary") - though in 3.3 they absolutely won't recognize binary silent files.

That said, if you just want to extract structures from a binary file, you don't need to use the extract_atomtree_diffs application. The binary silent file that's output by RosettaScripts ligand docking is a regular Rosetta binary silent file. You can use any extraction program, such as extract_pdbs. Personally, I tend to use score_jd2, with the -in:file:silent, -in:file:tags, and -out:pdb flags. (-out:pdb is score_jd2's equivalent to score's -output flag.

Wed, 2012-08-08 11:25

rmoretti

#48

Top

I am performing protein-peptide docking and want to prepack the starting structure without touching few specific residues which I am very certain about not prepacking.
Can you suggest a way to do this.
Thanks.

Mon, 2013-03-04 09:52

nawsad

(Reply to #49)

#49

Top

fixbb -ex1 -ex2 -use_input_sc -packing::repack_only -resfile resfile -ndruns 10 -nstruct 1 -s my.pdb

resfile contains:

NATAA
start
1 A NATRO
2 A NATRO

except instead of 1, 2, 3... just the positions that you do NOT want repacked.

Generally, it's better to start a new thread for a new project. Thanks!

Tue, 2013-03-05 06:43

smlewis

(Reply to #50)

#50

Top

Thanks a lot.
It worked as I wanted.
I will keep your last point in my mind..

Fri, 2013-03-08 14:09

nawsad

#51

Top

Hi Everyone, I was wondering if anyone knows how to set path for saving output pdb files in Rosetta protein docking. I was using following command line to run protein docking, didn't found any saved output pdb files.

/path to/rosetta_2014.35.57232_bundle/main/source/bin/rosetta_scripts.default.linuxgccdebug @docking.options -parser:protocol docking_full.xml -database /path to/database/ -out:suffix _full -nstruct 50 > docking_full.log -restore_pre_talaris_2013_behavior

Docking.option file

-docking # the docking option group
-partners AB_HL # set rigid body docking partners
-dock_pert 3 8 # set coarse perturbation paramaters (degrees and angstroms)
-dock_mcm_trans_magnitude 0.1 # refinement translational perturbation
-dock_mcm_rot_magnitude 5.0 # refinement rotational perturbation
-s 3gbm_HA_3gbn_Ab.pdb # input model
-run:max_retry_job 10 # if the mover fails, retry 50 times
-use_input_sc # add the side chains from the input pdb to the rotamer library
-ex1 # increase rotamer bins to include mean +- 1 standard deviation
-ex2 # increase rotamer bins to include mean +- 2 standard deviations
-corrections:score:use_bicubic_interpolation true # use bicubic interpolation of the phi/psi bins
-out # out option group
-file # out:file option group
-scorefile docking.fasc # the name of the model score file
-silent docking.silent # the name of the model silent file
-silent_print_all_score_headers true # print score headers for each model in the silent file

Docking.full.xml file

<dock_design>
<SCOREFXNS>
</SCOREFXNS>
<TASKOPERATIONS>
<InitializeFromCommandline name=ifcl/>
<RestrictToRepacking name=rtr />
<RestrictToInterfaceVector name=rtiv chain1_num=1,2 chain2_num=3,4 CB_dist_cutoff=10.0 nearby_atom_cutoff=5.5 vector_angle_cutoff=75 vector_dist_cutoff=9.0 />
<PreventResiduesFromRepacking name=prfrp residues=11,41,345 />
</TASKOPERATIONS>
<FILTERS>
</FILTERS>
<MOVERS>
MINIMIZATION MOVERS
These movers are part of the minimization protocol put together in the minimize_iface ParsedProtocol
<PackRotamersMover name=soft_rp scorefxn=soft_rep task_operations=rtiv,rtr,ifcl,prfrp />
<PackRotamersMover name=rp scorefxn=score12 task_operations=rtiv,rtr,ifcl,prfrp />

This minimization protocol is adapted from Whitehead, Fleishman, ... , Baker. See the XML scripts available in the Supp Mat.
<ParsedProtocol name=minimize_interface mode=sequence >
<Add mover=soft_rp />
<Add mover=soft_min_sc />
<Add mover=soft_rp />
<Add mover=soft_min />
<Add mover=soft_rp />
<Add mover=min_sc />
<Add mover=soft_rp />
<Add mover=min />
<Add mover=rp />
<Add mover=min_sc />
<Add mover=rp />
<Add mover=min />
</ParsedProtocol>

DOCKING MOVERS
<Docking name=dock_low score_low=score_docking_low score_high=score12 fullatom=0 local_refine=0 optimize_fold_tree=1 conserve_foldtree=0 ignore_default_docking_task=0 design=0 task_operations=ifcl,prfrp jumps=1/>
<Docking name=dock_high score_low=score_docking_low score_high=score12 fullatom=1 local_refine=1 optimize_fold_tree=1 conserve_foldtree=0 design=0 task_operations=ifcl,prfrp jumps=1/>

<SaveAndRetrieveSidechains name=srsc allsc=0 /> Speeds the move from centroid to full atom mode
</MOVERS>
<APPLY_TO_POSE>
</APPLY_TO_POSE>
<PROTOCOLS>
<Add mover=dock_low/>
<Add mover=srsc />
<Add mover=dock_high />
<Add mover=minimize_interface />
</PROTOCOLS>
</dock_design>

Fri, 2015-06-26 12:48

khadkab@mcmaster.ca

Pages

Topic locked

Search form

You are here

protein docking question

Pages