Ab initio with multiple metal ions

20 posts / 0 new
Ab initio with multiple metal ions
#1

Hi everyone,

I am a beginner in Rosetta and I am wondering if Rosetta could do ab initio folding of a metalloprotein with multiple metal ions.

I read the article by Wang et al.,(Prediction of structures of zinc-binding proteins through explicit modeling of metal coordination geometry, 2010, Protein Science) and was able to set constraints for one metal binding site in residue_pair_jump_cst file and run AbinitioRelax without any issues. However I have no idea how to set things up when two or more metal ions are involved. Do I also have to define every jump between those metal ions?

I would be grateful if someone could help.

Post Situation:
Sun, 2013-12-29 17:58
cheyuk

From best I can tell, things should be able to handle multiple zinc atoms.

For your residue_pair_jump_cst file, simply give a block (each block starts with a BEGIN line and ends with an END line) for each zinc atom you want to model. You have to repeat the geometry lines, but that should be more-or-less a simple copy and paste.

You'll likely also have to update any other references to zinc (e.g. in the -constraints:cst_file and -constraints:cst_fa_file) so that you have a set of references to each zinc atom (each to their own CYS residue).

Tue, 2013-12-31 09:35
rmoretti

I ran the AbinitioRelax application on a metallothionein (PDB code: 2L62) with the input files as attached. Just want to see if Rosetta could reproduce the native fold. The fasta file:

>2L62
GSGC[CYZ]DDKC[CYZ]GC[CYZ]AVPC[CYZ]PGGTGC[CYZ]RC[CYZ]TSARZ[ZN]Z[ZN]

This protein has 26 aa and I appended two ZN ions to its fasta file. Constraints files were modified accordingly. However, I met the following error:

protocols.abinitio.AbrelaxApplication: read fasta sequence: 66 residues
GSGC[CYZ]DDKC[CYZ]GC[CYZ]AVPC[CYZ]PGGTGC[CYZ]RC[CYZ]TSARZ[ZN]Z[ZN]
protocols.evaluation.ChiWellRmsdEvaluatorCreator: Evaluation Creator active ...
core.chemical.ResidueTypeSet: Finished initializing centroid residue type set. Created 1981 residue types
core.io.fragments: reading fragments from file: 2L62_09_05.200_v1_3 ...
core.io.fragments: rosetta++ fileformat detected! Calling legacy reader...
core.fragments.ConstantLengthFragSet: finished reading top 25 9mer fragments from file 2L62_09_05.200_v1_3
core.io.fragments: reading fragments from file: 2L62_03_05.200_v1_3 ...
core.io.fragments: rosetta++ fileformat detected! Calling legacy reader...
core.fragments.ConstantLengthFragSet: finished reading top 200 3mer fragments from file 2L62_03_05.200_v1_3
core.scoring.constraints.util: Constraint choice: 2L62.cen_cst
core.fragment: compute strand/loop fractions for 28 residues...
protocols.jumping: read ResiduePair jump-definitions from 2L62.residue_pair_jump_cst
core.chemical.ResidueTypeSet: Finished initializing fa_standard residue type set. Created 9281 residue types
core.scoring.ScoreFunctionFactory: SCOREFUNCTION: score13_env_hb
core.scoring.etable: Starting energy table calculation
core.scoring.etable: smooth_etable: changing atr/rep split to bottom of energy well
core.scoring.etable: smooth_etable: spline smoothing lj etables (maxdis = 6)
core.scoring.etable: smooth_etable: spline smoothing solvation etables (max_dis = 6)
core.scoring.etable: Finished calculating energy tables.
basic.io.database: Database file opened: scoring/score_functions/PairEPotential/pdb_pair_stats_fine
basic.io.database: Database file opened: scoring/score_functions/hbonds/sp2_elec_params/HBPoly1D.csv
basic.io.database: Database file opened: scoring/score_functions/hbonds/sp2_elec_params/HBEval.csv
basic.io.database: Database file opened: scoring/score_functions/P_AA_pp/P_AA
basic.io.database: Database file opened: scoring/score_functions/P_AA_pp/P_AA_n
basic.io.database: Database file opened: scoring/score_functions/P_AA_pp/P_AA_pp
basic.io.database: Database file opened: scoring/score_functions/rama/Rama_smooth_dyn.dat_ss_6.4
core.pack.dunbrack: Dunbrack 2010 library took 0.31 seconds to load from binary
Segmentation fault (core dumped)

Then I ran Rosetta in debug mode and saw the following error:

protocols.abinitio.AbrelaxApplication: read fasta sequence: 66 residues
GSGC[CYZ]DDKC[CYZ]GC[CYZ]AVPC[CYZ]PGGTGC[CYZ]RC[CYZ]TSARZ[ZN]Z[ZN]
protocols.evaluation.ChiWellRmsdEvaluatorCreator: Evaluation Creator active ...
core.chemical.ResidueTypeSet: Finished initializing centroid residue type set. Created 1981 residue types
core.io.fragments: reading fragments from file: 2L62_09_05.200_v1_3 ...
core.io.fragments: rosetta++ fileformat detected! Calling legacy reader...
core.fragments.ConstantLengthFragSet: finished reading top 25 9mer fragments from file 2L62_09_05.200_v1_3
core.io.fragments: reading fragments from file: 2L62_03_05.200_v1_3 ...
core.io.fragments: rosetta++ fileformat detected! Calling legacy reader...
core.fragments.ConstantLengthFragSet: finished reading top 200 3mer fragments from file 2L62_03_05.200_v1_3
core.scoring.constraints.util: Constraint choice: 2L62.cen_cst
core.fragment: compute strand/loop fractions for 28 residues...
protocols.jumping: read ResiduePair jump-definitions from 2L62.residue_pair_jump_cst
core.chemical.ResidueTypeSet: Finished initializing fa_standard residue type set. Created 9123 residue types
core.scoring.ScoreFunctionFactory: SCOREFUNCTION: score13_env_hb
core.scoring.etable: Starting energy table calculation
core.scoring.etable: smooth_etable: changing atr/rep split to bottom of energy well
core.scoring.etable: smooth_etable: spline smoothing lj etables (maxdis = 6)
core.scoring.etable: smooth_etable: spline smoothing solvation etables (max_dis = 6)
core.scoring.etable: Finished calculating energy tables.
basic.io.database: Database file opened: scoring/score_functions/PairEPotential/pdb_pair_stats_fine
basic.io.database: Database file opened: scoring/score_functions/hbonds/sp2_elec_params/HBPoly1D.csv
basic.io.database: Database file opened: scoring/score_functions/hbonds/sp2_elec_params/HBEval.csv
basic.io.database: Database file opened: scoring/score_functions/P_AA_pp/P_AA
basic.io.database: Database file opened: scoring/score_functions/P_AA_pp/P_AA_n
basic.io.database: Database file opened: scoring/score_functions/P_AA_pp/P_AA_pp
basic.io.database: Database file opened: scoring/score_functions/rama/Rama_smooth_dyn.dat_ss_6.4
core.pack.dunbrack: Dunbrack 2010 library took 0.56 seconds to load from binary
AbinitioRelax.linuxgccdebug: src/utility/pointer/owning_ptr.hh:225: T* utility::pointer::owning_ptr::operator->() const [with T = protocols::jumping::ResiduePairJump]: Assertion p_ != 0' failed.
Aborted (core dumped)

Looks like something wrong in ResiduePairJump? Anyone could help with this? Thanks in advance.

Wed, 2014-01-01 18:07
cheyuk

Unfortunately, there's not much to go on there, and it's a little difficult to debug locally without the fragment files.

Could you run gdb and get a backtrace? To do this, run "gdb AbinitioRelax.linuxgccdebug". Then at the gdb prompt, do a "catch throw", followed by a "run ". The program will run for a while, and then halt at the prompt. If it halts before the place where you saw the error (frequently in the hbond lines), do a "continue". Once it prints the "Dunbrack 2010 library took XXX seconds to load from binary" line and halts, do a "backtrace", and post the resulting lines here. You can then do a "quit" to exit the gdb environment.

Alternatively, if you could upload the fragment files to some (publicly accessible) external site (something like Google Drive or Dropbox), and post or PM me a link, I can download them and test the issue myself. (Include the complete commandline that you used to run things - I'm assuming it might be "AbinitioRelax.linuxgccdebug @flags", though differences might change things.)

Also, which version of Rosetta are you using? Is it one of the weekly releases?

Thu, 2014-01-02 09:33
rmoretti
Thanks rmoretti. Now I attached all the files including fragment files in dropbox: https://www.dropbox.com/s/kcxx0rlw08c8hpp/2L62.zip The command line I used is ~/rosetta/main/source/bin/AbinitioRelax.linuxgccdebug @flags -database ~/rosetta/main/database" The Rosetta version is 2013wk50. Another thing I noticed is that the Robetta server will replace the residue name Z (which refers to ZN in the fasta file) to Q (Gln?). This is observed both in the fasta file (though I don't need it to run Abinitio) and psipred_ss2 file. Will this cause any issue?
Thu, 2014-01-02 17:38
cheyuk

I ran gdb and got the following backtrace:

(gdb) backtrace
#0 0x000000329c2328e5 in raise () from /lib64/libc.so.6
#1 0x000000329c2340c5 in abort () from /lib64/libc.so.6
#2 0x000000329c22ba0e in __assert_fail_base () from /lib64/libc.so.6
#3 0x000000329c22bad0 in __assert_fail () from /lib64/libc.so.6
#4 0x00007ffff465a37b in utility::pointer::owning_ptr::operator-> (this=0x7fffffffc530)
at src/utility/pointer/owning_ptr.hh:225
this=0x2ba2ff0, fname="2L62.residue_pair_jump_cst")
at src/protocols/jumping/ResiduePairJumpSetup.cc:91
#6 0x00007ffff452f1d3 in protocols::abinitio::AbrelaxApplication::setup_jumps
(this=0x7fffffffdf20, extended_pose=...)
at src/protocols/abinitio/AbrelaxApplication.cc:1370
#7 0x00007ffff452fe86 in protocols::abinitio::AbrelaxApplication::setup_fold (
this=0x7fffffffdf20, extended_pose=..., prot_ptr=...)
at src/protocols/abinitio/AbrelaxApplication.cc:1468
#8 0x00007ffff4537b6d in protocols::abinitio::AbrelaxApplication::run (
this=0x7fffffffdf20) at src/protocols/abinitio/AbrelaxApplication.cc:2156
#9 0x000000000040307c in main (argc=2, argv=0x7fffffffe158)
at src/apps/public/AbinitioRelax.cc:57

Thu, 2014-01-02 17:52
cheyuk

It looks like your first issue is the blank line between the BEGIN and END blocks in 2L62.residue_pair_jump_cst - removing that fixes the error that you saw.

You get an additional error later on, though. This is due to the foldtree setup. It's (incorrectly) thinking that there's going to be a polymeric bond between the two zincs. I think this can be solved by changing the cutpoint range of the second residue from "26 26" to "27 27". (Basically, tell Rosetta you want to break the chain between the two zincs.)

But we're not done yet. There's a third error. Now it's trying to insert fragments across the zincs, which involves changing their backbone coordinates. (Yeah, that works about as well as it sounds.). The issue here is that it looks like you generated fragment files with the full 29 residue sequence, including the two zincs. I'd recommend just generating fragments for just the protein portion of the complex (or equivalently, deleting off the last two positions on the current fragment files), which should work as long as the non-protein portion is at the very end of the pose.

Those three fixes result in a successful run with a nice compact structure, with the two zincs near their cysteines. It looks nothing like 2L62, but who can say what will happen over a tens to hundreds of thousands output structure production run.

Also, I'm not sure why things are changing your Z's to Q's, (was this for fragment generation? If so, the reason would be the fragment generator doesn't know about non-protein residues), but it's likely to be an issue if you use that output for input later on. If you see that happen, you'll want to change it back. The single letter code of the residue should match the one given by the IO_STRING line of the residue type params file in the database (which for zinc is Z).

Fri, 2014-01-03 09:01
rmoretti

Thanks for your reply. A follow-up question, is it necessary and feasible to set up any constraints between zinc atoms in those constraints files?

Wed, 2014-01-01 00:56
cheyuk

You can certainly try it without the 2L62.cen_cst and 2L62.fa_cst files, though I'm guessing it won't go as well as with them. In the residue_pair_jump_file there's some specification of geometry, but I'm not sure in how many stages of the protocol that carries through. Adding the constraints to specify the geometry makes sure that the desired geometry is enforced in more locations.

An additional note regarding scoring: In your flags files your explicitly calling for the "score13_env_hb" weights file. Along with the switch from 3.5 to weekly releases, we also did some significant changes to the way the scorefunction works. Using "-score:weights score13_env_hb" with the weekly releases isn't going to work the same as it would with 3.5 and earlier. I'd recommend doing one of two things. The first is just remove the -score:weights line and use the new default scorefunction (talaris2013), or alternatively add the flag "-restore_pre_talaris_2013_behavior" to your flags file to reverse the changes made, going back to (mostly) what was the 3.5 scorefunction behavior.

Fri, 2014-01-03 09:10
rmoretti

Thanks so much for your help. Now it is working all right. I also tried this protocol with four metal ions and it worked just as well.

Only one small question: Does Rosetta strictly comply with the constraints we provide? When violations of constraints occur (Not all constraints can be satisfied), will Rosetta still outputs decoys as best it can?

Sun, 2014-01-05 17:55
cheyuk

"Constraints" in Rosetta are more properly called "restraints" in the Molecular Dynamics world. That is, Rosetta "constraints" aren't strictly enforced by Rosetta, instead there's just an energy penalty when they're violated. So Rosetta will indeed output decoys the best it can when not all constraints can be satisfied at the same time.

You can see this in the scorefile output. Normally there will be columns corresponding to the various constraint scoreterms in the scorefile. If one or more of the "constraints" are "violated", then those scoreterms will have a non-zero energy. Typically this isn't too bad in most cases (less than an REU per constraint), but when you start getting larger values, then you know something is off about the structure. Often you'll get a mix of high and low constraint values, and you can simply ignore/discard those decoys which have bad constraint violations.

If you expect that only a subset of your constraints will be valid at any one time, you may want to look into AmbigiousConstraints and KofNConstraints, which allow you to pick a certain best scoring subset of constraints.

Mon, 2014-01-06 07:21
rmoretti

Thanks a million for your help, rmoretti. Your explanations are so clear and spot-on. I really appreciate it.

Mon, 2014-01-06 19:12
cheyuk

Hello everyone,

how can I generate fragments with the zinc-atom?! I believe Robetta dont know the Z at the end of the sequence. For the 1dsv-metallo-protein(30 amino acids), which is given in Rosetta, I calculated structures with rmsd-values between 6-12 . Is it normal or could sometime wrong with my structures?!

What mean the values from the 1dsvA.cen_cst file?!

AtomPair CB 5 ZN 32 HARMONIC 3.2 0.2 (CYS)
AtomPair CB 8 ZN 32 HARMONIC 3.2 0.2 (CYS)
AtomPair CB 13 ZN 32 HARMONIC 3.5 0.2 (HIS)
AtomPair CB 18 ZN 32 HARMONIC 3.2 0.2 (CYS)

For Cystein it could be the distance between CB and Zn. But for Histidin I think the value is to small. Or are there radians?!

In the 1dsvA.residue_pair_jump_cst file stand something from A and B. What are this for parameters?!

disAB: 2.20
angleA: 68.0
angleB: 70.5
dihedralA: -150.0 -120.0 -90.0 -60.0 -30.0 0.0 30.0 60.0 90.0 120.0 150.0 180.0
dihedralAB: -150.0 -120.0 -90.0 -60.0 -30.0 0.0 30.0 60.0 90.0 120.0 150.0 180.0
dihedralB: 120.0
END

Thanks

Fri, 2014-11-28 11:37
masterofpuppets

As long as your zinc atom is at the very end of a pose (or more precisely, so long as the protein portion you wish to generate fragments for is all a the beginning of the pose), you can just simply generate the fragments for the protein portion. (That is, give Robetta just the protein sequence without the zinc residue.) Most protocols which use fragments should work if you have "extra" residues at the end of the pose which aren't in the fragments file, which will then be ignored for fragment insertion portions of the run. (Which is what you probably want for your zinc residue.)

The 1dsvA.cen_cst file is a centroid (low resolution) constraints file. Here you're specifying extra terms to add to the scoring function. Particularly, you're saying that the distance between atom pairs involving the ZN atom on residue 32 and the CB atoms on residues 5, 8, 13, and 18 should be restrained. (AtomPair is a distance constraint: https://www.rosettacommons.org/docs/latest/constraint-file.html ) HARMONIC and what follows tells how to do that - you want a harmonic (parabolic) restraint, centered on either 3.2 or 3.5 Ang, with a deviation parameter of 0.2 Ang. (See the constraint file documentation ) for details, but basically the smaller the value the more closely Rosetta will require that distance to be. -- I think you may be right about the histidine distance. One limitation is that in centroid mode you don't have individual sidechain atoms besides CB, so you have to approximate distances.

For 1dsvA.residue_pair_jump_cst, the A and B are the two sides of the jump that are being sampled. A being the first (fixed) side, whereas B is the second (movable) side. The values listed are the ones which specify what the allowable values for the respective parameters are. So in this case you exactly specify the distance, the two angles, and the dihedral on the zinc side of the bond, but allow the dihedral on the protein side and the one around the "bond" connecting the two sides to vary in 30 degree increments.

Sun, 2014-11-30 11:17
rmoretti

Thank you.
I find the 3.5 value for histidin in the 1dsv example from Rosetta. It might be that the value must be 5.3?!
A stands for the first aminoacid, from which the jump is definded and B stands for the other three residues?! I dont understand which angles are concretely that should be.

Sat, 2015-01-03 09:43
masterofpuppets

The 3.5 value probably works in that situation because that's what the value is in the experimental structure for that particular system. A different system may need a more general/loose value to match up with reality.

What A and B are depend on your setup. In this case, I believe that the A residue is your amino acid, and the B residue is the metal ion. So distanceAB is the distance between the two atoms, one in A and one in B, which connect the two residues. (Note that this is the logical connection, rather than any physical/chemical connection.) The angleA is the angle on the "A" side of that connection, so two atoms on the amino acid (the connection atom and the "parent" of the connection atom), and the connection atom on the metal. The angleB is the other way around. (The metal ion residue has "virtual" atoms for orientation, which are the ones used for the second atom on the metal ion residue.) dihedralAB is the dihedral for the "bond" between the two residues, so one where there are two atoms from each residue. dihedralA is the dihedral where there are three atoms on the "A" residue (the connection atom, the parent of the connection atom, and the parent of the parent) and one (the connection atom) from the metal ion. dihedralB is the other way around - which explains why it's not really sampled. It's effectively just rotating the virtual atoms on the metal ion.

If your mental picture of it isn't too good, you might want to dust off your old organic chemistry model building kit and play around with it to see how moving bonds/angles/dihedrals will change how things are oriented. (Just keep in mind that changes propagate toward the "B" side of the interaction.)

Tue, 2015-01-06 10:11
rmoretti

Thanks a lot.
For Example my Aminoacid is cystein:
angle A is: CB-SG-ZN
angle B: SG-V-ZN
dihedral AB: CB-SG-V-ZN
dihedral A: CA-CB-SG-ZN
dihedral B: ZN-SG-CB-CA
in my file i found the fixed value of 68.0 but in my strucure i measure a angle of 110, whats wrong?!
I believe that dihedral B is wrong, because it is equal to dihedral B, but I don´t know which angle it should be.

Fri, 2015-01-16 08:36
masterofpuppets

Hello again,
I also wondering about an additional virtual atom by the cystein residue. I believe that I dont define this atom. Why is this atom there, when I make the zinc coordination. I have the other four virtual atoms to define the zinc coordination.

Tue, 2015-01-20 03:33
masterofpuppets

Is the virtual atom on the cysteine residue, or the zinc residue? If it is actually on the cysteine residue, om is there probably to help define geometries in certain situations. I wouldn't necessarily include it with your constraint files. The virtual atoms on the zinc residue are there to help orient the zinc residue in space. They're supplied automatically by Rosetta even in the case where the input structure doesn't provide them. They are needed to define the geometry of the interaction, but probably shouldn't be constrained, or constrained loosely if at all.

In your situation, I'd recommend looking for a chain of six atoms, three on the amino acid side, and three on the zinc side. From these six atoms, you can build each of your geometries.

So in your case it would be something like (CA-CB-SG)-(ZN-V1-V2)

The distance is from the atoms between them: SG ZN
Angle A is an angle two on the cysteine side, and one on the zinc side: CB SG ZN
Angle B is an angle one on the cysteine side, and two on the zinc side: SG ZN V1
Dihedral A is three on the cysteine side, one on the zinc side: CA CB SG ZN
Dihedral AB is two on the cysteine side, two on the zinc side: CB SG ZN V1
Dihedral B is one on the cysteine side, three on the zinc side: SG ZN V1 V2

Note that order matters, and all of the measures are going to use the same path of six, so there aren't going to be any rearrangements between the measures.

Angle B and dihedrals AB and B are used mainly to orient the virtual atoms on the zinc residue. In something like a spherically symmetric zinc ion, they probably don't really matter all that much, so do not need to be constrained heavily. The first atom on the B side of the chain (the zinc) will be placed with distance, angle A and dihedral A alone.

Mon, 2015-01-26 13:50
rmoretti

Thank you very much. I am first wondering which V are mean, but through your explanation I understand it.

Tue, 2015-01-27 04:26
masterofpuppets