Hey all, I've two questions, which I didn't found relevant post in the forum.
1. One of my protein has selenocysteine residues in it. How can I deal with it? I don't want to simply remove it since it's near to the active site. I assume I can just replace the atom name "SE" in the pdb with "S" and change "HETATM" to "ATM", but Se and S aren't the same size, which may cause different bond length?
2. One of my enzyme doesn't have substrate(galactose) molecule in its crystal structure, and I want to redesign it to bind to another substrate(glucose). Can I just first dock the original substrate into the enzyme and then put the new substrate into the same pocket and do a design?
"1. One of my protein has selenocysteine residues in it. How can I deal with it? I don't want to simply remove it since it's near to the active site. I assume I can just replace the atom name "SE" in the pdb with "S" and change "HETATM" to "ATM", but Se and S aren't the same size, which may cause different bond length?"
This will be fine. Rosetta will replace the bond length with the proper one if the residue is ever repacked.
"2. One of my enzyme doesn't have substrate(galactose) molecule in its crystal structure, and I want to redesign it to bind to another substrate(glucose). Can I just first dock the original substrate into the enzyme and then put the new substrate into the same pocket and do a design?"
"Just" is too small a word, but what you are describing could work. If you don't know where the galactose pocket is, then knowing you have the right pocket first (experimental validation) would be worthwhile before trying to mutate the pocket to bind glucose instead. I can't promise that it will work but it's a reasonable thing to try.
Another option for placing ligands is to find another crystal structure (perhaps of a homolog) which does have the substrate present, and then align the ligand-containing complex with the protein you want to design (I would probably do all of this under PyMol), and once you have a reasonable overlay, copy the ligand position from the aligned complex to the desired complex. You could follow this up with some ligand docking/repack+minimize/relax to adjust the ligand position, if you want.
A third option is to define the interaction geometry to the ligand, and then use Rosetta's matcher (with the single input structure and a very restrictive position specification file) to find ligand placement(s) which is consistent with the proposed interaction geometry.
Following your guidance, I found a homologous crystal structure which has the substrate, aligned the two in pymol, and copied the ligand coordinate into my target. But when I tried to convert the mol2 file of the ligand to params file, I always get this error:
Traceback (most recent call last):
File "rosetta3/rosetta_source/src/python/apps/public/molfile_to_params.py", line 1365, in
File "rosetta3/rosetta_source/src/python/apps/public/molfile_to_params.py", line 1284, in main
molfiles = list(read_tripos_mol2(infile, do_find_rings=False))
File "/home/bo/rosetta3/rosetta_source/src/python/rosetta_py/io/mdl_molfile.py", line 393, in read_tripos_mol2
atom.rsd_id = int(f)
ValueError: invalid literal for int() with base 10: 'AA52004'
My guess is there is something wrong with the mol2 file since if I generated a mol file for the ligand, I can convert it to a params file (and also from this post http://www.rosettacommons.org/content/error-recognizing-rotamer-library-...). But I don't know where the mistake is. I've attached the mol2 file. Any suggestions?
One more question, one homologous crystal structure has a Cu ion in its active site, which is functionally important. So I want to keep it during design or dock (just as a part of receptor but not actively moving it). Based on this post http://www.rosettacommons.org/content/how-deal-cu-ligand-or-part-receptor, I guess I should generate a params file for Cu and add it the the database?
Molfile_to_params.py doesn't have a full-fledged, robust mol/mol2 parser - it's just good enough to deal with commonly encountered mol/mol2 files. What's happening is that the columns of the mol2 file are running together (e.g. "O.co22004"). Other mol2 readers may be fine with this, but molfile_to_params.py doesn't like it. You need to separate the atom type and substructure ID fields that are running together by putting a space between them.
Regarding Cu - yes, you'd need a params file for Cu. You can either add it to the database or supply it on the command line with the -extra_res_fa flag (which can take a list of multiple files).
So I tried a run for my target. The design process ran well, but when I checked the output score file, I found a lot of terms were 0, which doesn't make sense. I've attached the score file generated, and the flags I used are:
Basically, I aligned the pdb with substrate to the target pdb, copied the substrate into the target pdb, ran ligand dock to better define the substrate position, then used constraints generated from the pdb with substrate to design the target. One point I'm not sure about is for design, the catalytic residues in the input pdb need to be specified. Since I didn't run match, the input pdb didn't already have catalytic residues placed in like pdbs generated from match. So I manually replaced the original residues in the target pdb with catalytic residues based on the structure alignment, and kept the CA, C, N coordinates the same to prevent chain break introduced by the replacement. Does this cause the strange score terms?
The big concern I see with your scorefile is that total_score is zero. Also, your constraints (all_cst) are really high.
How do the output structures look? Are they reasonable, or are you getting garbage? Also, if you look through the tracer output, is there anything that looks like a warning or an error term? What does the energy table at the bottom of the output PDB files look like (specifically the weights and total line)? Also, are you running enzdes through the enzyme_design application, or through the RosettaScripts interface - if the latter, what's your XML file like?
Regarding the constraints, you don't necessarily need to physically replace the catalytic residues - at least if they're already at the correct identity. Designing/repacking/minimizing with constraints on should result in the sidechains going into about the correct orientation. The problem you'd want to be careful about is if you cut and paste the catalytic residues, you may mess up the backbone geometry, or the distances/angles around the cutpoints where you transition between the two segments. As the enzdes application only does torsional space sampling, it won't fix up residues with bad bond lengths or bond angles (even if they're *really* bad).
The key thing that you need to supply if you're not running match is the REMARK lines which specify which protein residues correspond to which constraints.
I've attached the template pdb, the generated design, the cst file, and the log file. My guess for the high constraint score is I constrained the dihedrals very strictly (I didn't find many references talking about the mechanism, so unlike in the demo where the constraints for dihedrals are loose, I measured the dihedrals from the pdb with substrate and used as tight constraints). It seems the output structure doesn't make sense (in pymol, the substrate seems covalently linked to the residues). I didn't find any warning or error related to the design process. I run it using the enzdes application. The REMARK lines are:
REMARK 666 MATCH TEMPLATE X AA5 0 MATCH MOTIF A TYR 91 1 1
REMARK 666 MATCH TEMPLATE X AA5 0 MATCH MOTIF A CYS 93 2 1
REMARK 666 MATCH TEMPLATE X AA5 0 MATCH MOTIF A LYS 189 3 1
I don't quite understand "you don't necessarily need to physically replace the catalytic residues - at least if they're already at the correct identity." Does this mean I only need to remove the original residues and put the catalytic residues at the same position? (I actually tried this, but this caused chain break in front of and behind the catalytic residue, which I don't think the design will fix, so I cut and pasted the catalytic residues, and kept the backbone CA, N, C coordinates unchanged to prevent chain break).
The first thing I notice is that the scorefunction you look to be using is quite off:
From the output PDB:
# All scores below are weighted scores, not raw scores.
label atom_pair_constraint coordinate_constraint angle_constraint dihedral_constraint total
weights 1 1 1 1 NA
This means that the scorefunction you are using has only the constraint terms, and not any of the other ones like fa_atr, fa_dun, etc.
There's typically two ways this can happen. The first is if you give a -score:weights flag with an empty/minimal weights file. The second is if you have edited the weights file in the database (in this case enzdes.wts) to remove all the other terms. If your flags file is as above, I would try re-running things with a fresh copy of the database.
I wouldn't be concerned about the "covalent bonds" between the ligand and the protein - I think that's from the fact that you didn't have any of the other energy terms on during design. You didn't tell Rosetta to pay attention to steric repulsion (fa_rep) so it didn't bother with keeping atoms appart.
Regarding the cut and paste, my main point was that you have to be careful not to mess up the bond geometry. Keeping the backbone coordinates the same was a good call, though you should also visually check the Calpha-Cbeta geometry to make sure it looks reasonable. (From what I can tell, it's not too bad in the template.)
I'm still concerned about the high constraint scores. Since the only thing you were optimizing was the constraint geometry, the scores and geometry should be as close to perfect as Rosetta can make it. It's not going to get any better when you use the full weights file. The 3.4 release has an application called CstfileToTheozymePDB which takes the constraint file with the -match::geometric_constraint_file flag (long with other standard flags such as -database and -extra_res_fa) and generates PDBs corresponding to all the samplable geometries of your specified geometry. Note that to limit combinatorial explosion, it's recommended to reduce (i.e. set to 360) the periodicity of sampling and reduce the number of samples (last column set to 0) for all constraints (also, *don't* specify sample increasing flags like -ex1/-ex2/etc.). I'd run this application and make sure that the constraint file specifies the geometry you want, and the geometries look capable of fitting on your protein.
Thank you very much for you valuable suggestions, Rocco! Finally, it worked. I checked the enzdes.wts file and I didn't think I changed it. Anyway, I got a new copy of the database, and re-ran the design again. But it still gave me 0 score terms. So I added the flag: -score:weights rosetta3/rosetta_database/scoring/weights/enzdes.wts, which then gave me meaningful scores. I also modified my cst file, which is not that strict for dihedrals now. The interesting point is I have 3.4 installed on another computer, which also gave me 0 score terms if I didn't use the weights flag. One question, the flag -enzdes:lig_packer_weight will not mess up with enzdes.wts, right?
I actually ran the CstfileToTheozymePDB apps before design to check if the cst file generated geometries as expected. I've attached the new design and the score file. Does this make sense to you?
One more question about placing catalytic residues into template without running match. I found a better way to do this. From the alignment of the pdb with substrate and the template pdb, I knew where I should put the catalytic residues. Then I extracted the coordinates for catalytic residues, and for the corresponding residues in the template. I aligned the residues based on their ca, c, n, o atoms in pymol, then replaced the original residue with the aligned catalytic residue. This prevented chain break and also kept the ca cb geometry in catalytic residue. How would you place catalytic residues into template pdb without matching?
Interesting ... it was my understanding that the enzyme_design application should default to enzdes.wts weights if no explicit weights file was given. I might have been mistaken in that, though. At any rate, explicitly specifying the weights file should get around that.
The -enzdes:lig_packer_weight flag shouldn't change the weights. It should just upweight all the designable ligand interactions during design. Output scoring shouldn't change, and the presence/absence of any terms shouldn't change.
I don't see anything majorly off with the new results, although you'll have to be the judge as to if they make physical/experimental sense.
That method of placing residues might also work. There's a number of different ways to do it. Frankly speaking, if I was going to be repacking the catalytic sidechain anyway (especially repacking with constraints), I might just delete off all the sidechain atoms from the PDB and change the residue type with a text editor, and then trust that Rosetta's sidechain rebuilding and repacking algorithms would be able to put in the appropriate geometry.
Two more questions:
1. For ligand dock, the score terms in atomtreediff file are:
SCORES 1GOF_gal_input_00009 angle_constraint 0 atom_pair_constraint 0 chainbreak 0 coordinate_constraint 2.14075 dihedral_constraint 0 dslf_ca_dih 0.395697 dslf_cs_ang 2.57874 dslf_ss_dih 2.16399 dslf_ss_dst -3.70108 fa_atr -2553.23 fa_dun 314.63 fa_intra_rep 4.92855 fa_pair -96.9051 fa_rep 295.422 fa_sol 1230.32 frac_atoms_within_0.5 0 frac_atoms_within_1.0 0 frac_atoms_within_2.0 0 hack_elec -89.7339 hbond_bb_sc -176.705 hbond_lr_bb -524.567 hbond_sc -92.7243 hbond_sr_bb -112.835 if_angle_constraint 0 if_atom_pair_constraint 0 if_buried_sasa 314.734 if_buried_unsat_hbonds 2 if_chainbreak 0 if_coordinate_constraint 0 if_dihedral_constraint 0 if_dslf_ca_dih 0 if_dslf_cs_ang 0 if_dslf_ss_dih 0 if_dslf_ss_dst 0 if_fa_atr -6.98776 if_fa_dun 0 if_fa_intra_rep 0 if_fa_pair 0 if_fa_rep 0.3119 if_fa_sol 7.71845 if_hack_elec -0.0367318 if_hbond_bb_sc -0.187806 if_hbond_lr_bb 0 if_hbond_sc -1.51148 if_hbond_sr_bb 0 if_omega 0 if_p_aa_pp 0 if_pro_close 0 if_rama 0 if_ref 0 interface_delta -0.69343 ligand_auto_rms_no_super 5.76669 ligand_auto_rms_with_super 4.8667e-08 ligand_centroid_travel 4.54129 ligand_is_touching 1 ligand_num_chi 1 ligand_radius_of_gyration 2.25988 omega 31.0942 p_aa_pp -135.458 pro_close 7.24348 rama -3.09845 ref -124 total_score -2022.04
However, when I extracted pdb from this file, the score terms in the pdb were:
label fa_atr fa_rep fa_sol fa_intra_rep hack_elec pro_close fa_pair hbond_sr_bb hbond_lr_bb hbond_bb_sc hbond_sc dslf_ss_dst dslf_cs_ang dslf_ss_dih dslf_ca_dih atom_pair_constraint coordinate_constraint angle_constraint dihedral_constraint rama omega fa_dun p_aa_pp ref chainbreak total
weights 0.8 0.4 0.6 0.004 0.25 1 0.8 2 2 1.3 1.3 0.5 2 5 5 1 1 1 1 0.2 0.5 0.4 0.5 1 1 NA
pose -2554.98 301.507 1230.38 4.92911 -89.9595 7.24447 -96.9268 -112.806 -524.201 -176.186 -92.0269 -3.70126 2.58116 2.16379 0.395847 0 0 0 0 -2.44603 32.8419 315.069 -134.654 -124 0 -2014.78
What caused the score terms being different for the same pdb (both different score functions and different values for the same function)? Is this because in the atomtreediff file, the ligand-protein interaction is included when calculating the score? Are there any flags I can use to get the same score terms in the output pdb?
2. When I used script for ligand dock and score_jd2 to extract pdb from the generated silent file, the score terms in the pdb were the same as in the pdb extracted from atomtreediff. Again, are there any flags I can use to get the score terms related to ligand? Also, I had one ligand dock silent file generated by mpi script docking, and when I tried to extract pdb using score_jd2 from it, I got this error:
core.io.silent: ERROR: add_chain_ending() invalid chain ending 640
ERROR:: Exit from: src/core/io/silent/BinaryProteinSilentStruct.cc line: 186
My guess is the chain for one model didn't have end residue? Any comments on how to solve this error?
Ligand docking uses a different scorefunction from the standard Rosetta scorefunction, and also attaches a number of other non-scorefunction terms to aid in analysis. You don't mention how you extracted your atomtreediffs, but depending on how you did it, that information could be lost. For example, the point of score_jd2 is to rescore the structures, so it will replace all the old scoring information. I believe that the extract_atomtree_diffs application in 3.4 should be able to extract the PDBs with the docked scoring, although the non-scorefunction terms might end up in a separate scorefile instead of the PDB file.
The errors you're getting is probably from a malformed silent file. I would take a look at the lines starting with CHAIN_ENDINGS in the file and see if there's one with an obvious error. (Baring that, did you modify your ligand params file to be a polymer? Binary silent files should work with regular ligands - as long as the program reading them is given the same params file that the program generating them had.)
I tried to fix the errors, but it still doesn't work. I checked the lines starting with CHAIN_ENDINGS, and they are like these:
CHAIN_ENDINGS 639 640 641 1GOF_dex_input_0001
CHAIN_ENDINGS 639 640 641 1GOF_dex_input_0002
CHAIN_ENDINGS 639 640 641 1GOF_dex_input_0003
CHAIN_ENDINGS 639 640 641 1GOF_dex_input_0004
CHAIN_ENDINGS 639 640 641 1GOF_dex_input_0005
CHAIN_ENDINGS 639 640 641 1GOF_dex_input_0006
CHAIN_ENDINGS 639 640 641 1GOF_dex_input_0007
CHAIN_ENDINGS 639 640 641 1GOF_dex_input_0008
CHAIN_ENDINGS 639 640 641 1GOF_dex_input_0009
CHAIN_ENDINGS 639 640 641 1GOF_dex_input_0010
In the input pdb, the protein chain (chain A) has 639 amino acids, and the cofactor is chain B, copper ion is chain C, the ligand is chain X. My understanding is the chain_endings should be 639, so I changed the lines above to these:
CHAIN_ENDINGS 639 1GOF_dex_input_0001
CHAIN_ENDINGS 639 1GOF_dex_input_0002
CHAIN_ENDINGS 639 1GOF_dex_input_0003
CHAIN_ENDINGS 639 1GOF_dex_input_0004
CHAIN_ENDINGS 639 1GOF_dex_input_0005
CHAIN_ENDINGS 639 1GOF_dex_input_0006
CHAIN_ENDINGS 639 1GOF_dex_input_0007
CHAIN_ENDINGS 639 1GOF_dex_input_0008
CHAIN_ENDINGS 639 1GOF_dex_input_0009
CHAIN_ENDINGS 639 1GOF_dex_input_0010
And it gave me a new error:
core.io.silent: ERROR: trying to index off the end of the secstruct array, idx is 641 secstruct_.size() is 640
terminate called after throwing an instance of 'utility::excn::EXCN_BadInput'
I don't quite understand what this error means (the size of the input in bigger than secstruct_.size?).
I also tried to use 640 and 641 as chain_endings, which gave me these errors:
core.io.silent: ERROR: add_chain_ending() invalid chain ending 640
ERROR:: Exit from: src/core/io/silent/BinaryProteinSilentStruct.cc line: 184
core.io.silent: ERROR: add_chain_ending() invalid chain ending 641
ERROR:: Exit from: src/core/io/silent/BinaryProteinSilentStruct.cc line: 184
So I think chain_endings should be 639 (the last amino acid). Any suggestions?
It looks like there's a sequence length mismatch error.
I'd take a look at the first entry (1GOF_dex_input_0001) and check lines to see if the numbering matches up with what you think it should be. Particularly, pay attention to the SEQUENCE: line at the start of the silent file - the length should match the size of the pose. (Silent file support for mixed protein silent files is somewhat sketchy.) For the ligands, there should be a single letter present, which should match the one given in the params file line (probably 'X' or 'Z').
Another thing I'd do is just run the starting structure through score_jd2 with forced binary silent file output. See if the new silent file can be read in, and if so, how does it differs from the entries in the erroring silent file. (The encoded coordinate lines will be different, but the main framework should be the same.)
You are spot on, Rocco. I tried your suggestions and found that the copper ion didn't seem to be read in. The SEQUENCE: line at the start of the silent file is this:
There are only two Z at the end. But my input have one cofactor, one copper ion, and on ligand.
Also, from the first entry, the ANNOTATED_SEQUENCE: is like this:
Again, the copper ion didn't have any chain label (Z).
With the copper ion in the input, score_jd2 can not read in the silent file generated from the input using score_jd2. When I removed the copper ion, the generated docking silent file can be extracted, and score_jd2 can read in the silent file generated from the input using score_jd2.
So I guess the error is caused by the copper ion. To use copper ion, I added the parameters (copied from Zn) to the files: rosetta_database/chemical/atom_type_sets/fa_standard/atom_properties.txt and extras/atom_orbital_hybridization.txt gen_born_params.txt memb_fa_params.txt NACCESS_sasa_radii.txt sasa_radii.txt soft_rep_params.txt std_charges.txt. And I supplied the params file for copper as extra_res_fa for docking. The input and the params file for copper are attached.
Did I do something wrong so the params file of copper can not be read in?
It looks like the copper ion is getting assigned a single space as its one-letter code in the annotated sequence, instead of Z as is suggested by your parameters file - is that correct, or am I misinterpreting it? (The forum inserts a break there so it's hard to tell, but from admin-editing mode on your post it looks like the end is:
VTQ[GLN_p:CtermProteinFull]Z[ACY] [CU2]Z[DEX] 1GOF_dex_input_0001
This "VTQ[GLN_p:CtermProteinFull]Z[ACY] [CU2]Z[DEX] 1GOF_dex_input_0001" is correct. It is what I got from the silent file. The break is in the original silent file. Does that mean something is wrong with the copper ion?
Well, the copper ion should read as "Z" not " ". I guess you could try fixing the SEQUENCE line to include the extra Z, and the ANNOTATED_SEQUENCE line to replace the " " with Z, and see if it works?
Ha, when I added an extra Z to the SEQUENCE line and the ANNOTATED_SEQUENCE line and changed "CHAIN_ENDINGS 639 640 641 1GOF_dex_input_0001" to "CHAIN_ENDINGS 639 1GOF_dex_input_0001", it worked. But I'm still wondering what I did incorrectly for copper incorporation into the database.
Also, I noticed that in the input, the line for copper is:
HETATM 1 CU2 CU2 C 1 19.207 8.966 30.716 1.00 22.36 CU
But in the score_jd2 extracted pdb from the silent file, the line is:
HETATM 9440 CU2 CU B 641 19.207 8.966 30.716 1.00 0.00
Why did the residue name change from CU2 to CU? Is this the reason why copper was not read in?
"But I'm still wondering what I did incorrectly for copper incorporation into the database. "
I haven't spotted anything you did wrong. You're stretching this code well beyond its authors' use cases, so you're discovering bugs as you go.
The params file you posted uses "CU2" exclusively. Maybe there's a second copper ion floating around in your database somewhere that caused the name change, I don't know why it is happening.
Exactly. Look at the IO_STRING line in your copper params file:
I've replace spaces with periods to make the problem clear. We use a confusing mix of whitespace separated and position based input in Rosetta, and the IO_STRING line uses position to figure out where the three-letter and one-letter codes are. That extra space shift things over, so the three letter code is " CU" and the one letter code is " " (the space between CU2 and Z). Note that the full Rosetta name is defined by whitespace separated fields, so is still "CU2".
So I run ligand docking by using both the application and script. For both, I used the same input structure, made the ligand start from the same position, and sampled the same conformational space (translation 0.5A gaussian, rotation 10 gaussian). For the script run, I also used flag -out:file:atom_tree_diff 3COG_script_silent.out to output an atomtreediffs file. I suppose the generated structure should be similar. But for 5000 structures generated by the script, more than 95% are like the one attached: the energy without ligand is quite high, but with ligand, the energy is very low. When I checked the structure generated by the script, I found several residues with very high fa_rep score. This doesn't make sense because HighResDocker and FinalMinimizer mover should remove high-energy rotamers. For 5000 structures generated by the application, all are normal: the energy without ligand is low, the energy with ligand is also low. How did this happen? Also, is the coordinates used for StartFrom mover the position of the center of the ligand?
I need a bit of explanation on how we can prevent Rosetta enzyme design from forming covalent bond between the ligand and the receptor. I see you talk about it above.
(P.S. It might be better to start a new thread and link to the old one - I almost missed your new query buried in the old threads.)
Rosetta enzyme design actually won't form a covalent bond between the ligand and the receptor, not unless you want it to. The way that you tell enzdes you want a covalent bond is by setting the fourth column ("periodicity") of the distanceAB field in the enzdes constraint file to 1. This will make Rosetta put a covalent bond between the two atoms listed in that constraint. If it's zero, it will be a non-covalent constraint.
Generally in enzyme design you'll run into situations where you have close approach (e.g. proton abstractions) where it's techincally not a covalent bond, but you might want to model it as if it is because it's closer than a standard non-covalent interaction. In this context the covalency between the atoms is simply to overcome the non-bonded repulsion. The actual distance between the atoms is set by the constraint, and doesn't have anything to do with standard bonding geometries.
A situation where you might be getting "covalent bonds" when you haven't specified them is when there's something about your system which is causing a too-close approach between atoms. The PDB format doesn't really have (reliable) bonding information in it, so most display programs use heuristics based on atom-atom distances to display bonds. This means that you can get "bonds" between atoms displayed in PyMol and the like which aren't actually being treated as bonds by Rosetta.
I think what may be happening is that you're running into rounding errors with the AtomTreeDiff format. To make a more compact representation, atom tree diff looks at the dihedrals, angles and bond lengths (i.e. the AtomTree) of the output structure, and compares them with those of a reference structure. If the values are different than some threshold, it saves the change, but if it's within the threshold, it just marks them the same (i.e. a diff).
The issue arises when the threshold is set too wide - you get angle changes which cause significant structural changes, but fall below the threshold. This is particularly bad for backbone changes, which results in downstream lever arm effects, and for large structures, where errors can accumulate. The defaults were set for a set of small proteins, and might not be ideal for all structures. (I also think the defaults might differ between the ligand_docking application and RosettaScripts.)
I'd try doing short runs of the script protocol with PDB output, and make sure that it's producing reasonable structures. Then I'd try playing with thresholds. The relevant options are -out:file:atom_tree_diff_bb -out:file:atom_tree_diff_sc and -out:file:atom_tree_diff_bl (for backbone angles, sidechain angles, and bond lengths). Note the values here are digits of precision, so higher numbers are more accurate. The nominal defaults are 6/4/2, but these sometimes get manipulated inside the program (but explicitly setting them should override that).
Thank you very much for your help, Rocco! Now it works. But I have one more question: in my design target, the copper ion is used as the catalytic center and one atom (carbon which will be converted from alcohol to aldehyde) should be near to the copper ion (about 3A). For the initial position of the ligand, I placed the carbon atom near to the copper ion. However, after docking, the carbon atom was far away from the copper ion, no matter how small the space I allowed the ligand to travel. I tried gaussian distribution of 0.5, 1, and 3A for translate mover, and the best structures obtained all had the carbon atom far away from the copper ion. So is there any way I can improve the process to get structures with carbon atom near the copper ion? I guess I can use a constraint file for this, but I didn't find any documentation about using constraints in ligand dock.
Enzdes style constraints should work with ligand docking, although, as you said, they aren't documented.
When you say "far away", how far is it? My guess is that one of the problems you're having is that the carbon atom and the copper atom are being stericly repelled, 3A being less than the sums of the van der Waals radii. My guess is that you're trying to model a (semi-)covalent interaction, and the problem is going to be that there's no energy term in Rosetta that will account for that interaction. You're going to either need to use constraints, or you might want to consider modeling the metal as part of the ligand. (Rosetta is rather permissive in what it can model as a ligand - it doesn't need to be covalent, or even chemically feasible - e.g. take a look at the ligand used in the Jiang retroaldolase papers.)
I see your point. I think I can combine copper ion and the original ligand to make a pseudo ligand. One more question, in the retroaldolase paper, I understand that the lowest energy transition state model was generated by QM calculation, but how can I diversify the conformations of the transition state from this lowest energy model? Can I use openeye omega?
I also read the organophosphate hydrolase design paper by Khare et. al.. It seems in this paper, they generated transition state ensembles which included zinc ion. But I'm not quite sure how they did it. Did they generate the transition state ensembles for the substrate only first and then add zinc ion to the ensemble or generate the transition state ensembles for the whole substrate and zinc ion? I would guess the first, so for my ligand, I would generate conformers first, then add copper ion to the mol2 file, then convert it to params file. Do I need to specify a pseudo bond between copper and the carbon atom in the ligand?
The dirty little secret is that it doesn't seem to matter all that much. Rosetta isn't fine grained enough that minor QM details make a crucial difference. You certainly won't be harmed by using a higher/more rigorous level of theory, but if you're stuck with a low level approximation, that's certainly workable. Do the modeling in whatever way you think is best/most feasible.
Regarding the mol2 file input for molfile_to_params.py, you do need to have a ligand that's completely connected, so you will need to add a dummy bond to connect the copper with the rest of the ligand. As molfile_to_params.py isn't too sophisticated about things, you may want to check the output to make sure that the extra bonds don't mess up atom typing and other things (I find the kinemage output is helpful for checking these things). If it is, you can just manually edit the params file to correct it.
People, I hope this is the best thread to post this.
I have the following issue:
I am designing a protein, which so far is going really well, most of the protein is folding very well according to the structure template, except for the N-terminus helix, it keeps folding the other way round!!! I think the attached picture might help explain my issue (template structure in blue, lowest energy folded structure [abinition 25,000 structures] in green), instead of folding to the correct position is keeps folding away from its position.
I tried refining this helix many time, looked into salt bridges, hydrophobic packing, hydrophylic surface, and now I am stuck, I do not know what else to change to make it fold at the correct position.
What am I doing wrong?
Any ideas, scripts, papers, advice, protocols would be greatly help me.
If you haven't already, look at the "Koga rules" for de novo protein designs. These are particularly about how to design loops such that the secondary structures are oriented correctly.
(If you have additional questions after reading those papers, I'd recommend opening a new thread.)
Thank you, I did not know about the PNAS paper. I will give them both a concise read.
I am using Rosetta 2018 to copy the enzyme design demo. When I ran the CstfileToTheozymePDB.linuxgccrelease apps before design to check if the cst file generated pdb as demo gived, I only obtained a 5.1k result, which only include a model, while the demo include 36 models. I don't know what ‘s wrong. I used the command CstfileToTheozymePDB.linuxgccrelease -extra_res_fa rosetta_inputs/1n1.params -match:geometric_constraint_file rosetta_inputs/mocktim_first_2interactions_example.cst,
and the input file all belong to enzyme design demodem, so please help me solve the problem, thank you!