You are here

Ab Initio Modelling of Protein with Small-Molecule Cofactor

15 posts / 0 new
Last post
Ab Initio Modelling of Protein with Small-Molecule Cofactor

Hello everyone,

I am in need of some help regarding the ab initio folding of a protein in the presence of its cofactor. I already made myself familiar with the tutorial that comes with Rosetta where you have that Zn-ion binding to a small peptide. Still I don't have any clue how to setup an ab-initio-relax with a small molecule.

The actual situation is the following:
The cofactor is FAD.
I know which residues of the protein are most likely to be involved in binding of FAD.

I would like to make an ab-initio-relax in the presence of the cofactor and constraints build from what I know about the cofactor-protein interaction. The param-file I generated of FAD looks quite alright.

The main questions I have are:
1.) Where do I have to put the param-file of FAD?
It seems that for the zinc in the demo Rosetta is fully aware of that remote param-files folder under /[..]/rosetta_database/chemical/residue_type_sets/fa_standard/residue_types/metal_ions/
Is Rosetta routinely searching the whole /rosetta_database/ folder for new entries?

2.) How do I tell Rosetta that it should consider FAD? I already tried a fasta file like the following, but to no avail.
{some sequence}HKLTGN[ASN]RFD{some other sequence}CALIFQ[GLN]GR{rest of sequence}KLMP[FAD]
Knowing, that Asn43 and Gln65 are coordinating the FAD via hydrogen bonds.

Unfortunately, Rosetta does not accept this input but I have no idea how to tell the program that it should use FAD during the ab initio relaxation.

I figured how the constraints should look like from the following post.
So, I think I know how to set that up.
Nevertheless, what still puzzles me is the "residue_pair_jump_cst" file.

I would be very thankful for any answer to these questions or general advise.

Best regards,

Post Situation: 
Thu, 2011-03-10 06:56

1) You can read in extra parameter files with either the flag -extra_res_fa path/to/filename, or by putting your FAD in the main database and adding it to the residue_types.txt file near where you found zinc.

2) This might start working after 1 works...?

3) I don't know anything more about residue_pair_jump_cst than I did then, sorry...

Thu, 2011-03-10 08:34

Thanks for that quick answer. I will try that today and post the result.

Do you have any idea who might know about that "residue_pair_jump_cst" file? I mean, someone should have implemented that.

--- Edit ---

I added the "-extra_res_fa" flag but that does not help. Adding the params file to the residue_types.txt results in the same error message:

ERROR: unrecognized residue name 'FAD'
ERROR:: Exit from: src/core/chemical/ResidueTypeSet.hh line: 148

If you look up line 148 in ResiudeTypeSet.hh you find that the program checks whether FAD is part of a "name_map" array - which seems to be not the case; hence the error message.
I don't understand the code sufficiently enough to draw any helpful conclusions. The big question is how to add FAD to that array if the latter is not build from the "residue_types.txt" list?

Fri, 2011-03-11 04:23

Can you post your FAD params file? Maybe it's not named FAD in the file. (there is a field in the file for name, the filename is not relevant). There should be a button for you to attach files.

Fri, 2011-03-11 08:22

Thank you, for your help and suggestions, smlewis.

I had to attach the file with the additional ending .txt; otherwise it was rejected by the attachment system.

Maybe that helps.

Sat, 2011-03-12 02:15

I have some questions. Are you feeding in the sequence above with FAD as the last letters?

Another question: are you using the flag correctly (I do not mean this to sound condescending)?

For example, the flag should look like this: -extra_res_fa FAD.params


Steven Combs

Sat, 2011-03-12 06:42

I'm pretty sure that ab initio goes through centroid mode at some point (I actually think it starts off in centroid mode, though I'm not sure). It might not be the full atom ResidueTypeSet which is missing the FAD - it might be the centroid ResidueTypeSet instead. You might want to try generating a centroid topology file with (see command line option "-c"), and then passing it to the ab initio program with -extra_res_cen (so your command line would look something like " ... -extra_res_fa FAD.params -extra_res_cen FAD_cen.params ... "

Sun, 2011-03-13 12:21

Yes, scombs, I added FAD in brackets as the last letters in my fasta sequence file.

The flag seemed okay. What really did the trick was rmoretti's suggestion not only to feed in a full atom parameter file but also to generate a centroid one.

So I am beyond that first problems now. What I am now getting relates somehow to the declaration of constraints.

The error message is the following:

[..some text..]
core.scoring.constraints: Constraint choice: ./prot.cen_cst read constraints from ./prot.cen_cst read constraints section --NO_SECTION--- no section header [ xxx ] found, try reading line-based format... DON'T MIX read constraints from ./prot.cen_cst ignored constraint (no such atom in pose!)CB O7 41 129 ERROR: reading of AtomPair failed. ERROR: reading constraints from file./prot.cen_cst
[..some text.. without errors]
Stage 1
Folding with score0 for max of 2000
segmentation fault

So Rosetta actually tries to start that calculation but fails later. I tried running everything without the cofactor, the constraints etc. and that works. The problem seems to lie with the constraints. I kept to the constraints definition as it can be found in the abinitio_metalloprot_folding demo.

Here is, how I did it:
AtomPair CB 41 O7 129 HARMONIC 5.9 0.5
AtomPair CB 46 O5 129 HARMONIC 5.4 1.0
AtomPair CB 74 O8 129 HARMONIC 5.0 0.5
AtomPair CB 84 O9 129 HARMONIC 5.0 0.5
AtomPair CB 103 O9 129 HARMONIC 5.6 1.0

"129" is the pseudo-sequence number of FAD since I've got 128 residues in that protein chain. O5, O7, O8 and O9 are the oxygen atoms in nomenclature as found in the params file (would be O2', O4', O2, O4) otherwise.

Mon, 2011-03-14 04:01

Is the protocol adequately initializing the FAD? Could you try a short run without the constraints, and see if an FAD is present in the output file? (That is, are the constraints an issue or just a symptom.) Looking at your input sequence, you may need to include the FAD's single letter code ("Z") prior to the brackets. (So that would be "{rest of sequence}KLMPZ[FAD]".)

As you're getting a segmentation fault, compiling and running the debug mode, if you're able to, (compiling without the "mode=release" on the scons command line, and using the resulting ".linuxgccdebug" or equivalent executable) should give better diagnostics of the failure, even without using a debugger - though doing the recompile and debug-mode run can be tediously slow.

(Another thing to double check is that there are atoms of the appropriate name in the centroid topology file and not just the full atom topology file.)

Mon, 2011-03-14 11:18

It was really the identifier for the ligand, as rmoretti suggested. After adding "Z" before [FAD] in the fasta input no more errors occurred and I now get structures with a correct coordination of the FAD.


So, in summary for anyone who wants to do something similar, here is the way it works:

1) convert a pdb file of your ligand to a param file using the /[..]/rosetta_source/src/python/apps/public/ python script WITH "-c option" to create centroid and full atom parameter files
2) add the unique identifiers from your .params file to your fasta sequence (you may want to change NAME and IO_STRING of your ligand in the .params file)
3) adjust your constraint files and the ".residue_pair_jump_cst" accordingly (pay special attention to different atom nomenclature)
4) feed your ligand .params files to rosetta using the "-extra_res_cen" and "-extra_res_fa" flags


Thank you all so much for your helpful suggestions. I really appreciate your advise.

Tue, 2011-03-15 02:28


I am facing problem with reading constraints with rosetta v3.3 while abinitio folding guided by symmetry file. The command used:
~/ROSETTA/v3.3/rosetta_source/bin/minirosetta.linuxgccrelease \
-seed_offset `echo $RANDOM` \
-run:protocol broker \
-broker:setup ~/input_files/broker_input \
-database ~/ROSETTA/v3.3/rosetta_database \
-nstruct 2 \
-file:frag3 ~/input_files/frags.3mers \
-file:frag9 ~/input_files/frags.9mers \
-constraints:cst_file ./test.cst \
-in:file:fasta ~/iput_files/prot.fasta \
-symmetry:symmetry_definition /~/symmfile.sym \
-symmetry:initialize_rigid_body_dofs \
-out:file:silent ./silent.out \
-out:file:silent_struct_type binary \
-out:file:scorefile \
-relax:fast \
-relax:jump_move \
-rg_reweight 0.0001 \
-packing:ex1 \
-packing:ex2 \
-fold_and_dock:move_anchor_points \
-fold_and_dock:set_anchor_at_closest_point \
-fold_and_dock:rigid_body_cycles 100 \
-fold_and_dock:rigid_body_frequency 0.01 \
-fold_and_dock:slide_contact_frequency 0.01 \

The constraint file test.cst has the following lines:
AtomPair CEN 5 CEN 16 BOUNDED 4.000 7.000 0.500
AtomPair CEN 5 CEN 81 BOUNDED 4.000 7.000 0.500
AtomPair CEN 5 CEN 36 BOUNDED 4.000 7.000 0.500
AtomPair CEN 18 CEN 90 BOUNDED 4.000 7.000 0.500
AtomPair CEN 18 CEN 63 BOUNDED 4.000 7.000 0.500
AtomPair CEN 36 CEN 63 BOUNDED 4.000 7.000 0.500
AtomPair CEN 54 CEN 83 BOUNDED 4.000 7.000 0.500

The error generated during the run: read constraints from ./test.cst read constraints section --NO_SECTION--- no section header [ xxx ] found, try reading line-based format... DON'T MIX read constraints from ./test.cst ERROR: reading of AtomPair failed.

core.scoring.constraints: combine constraints 1 -> 1
protocols.abinitio: ConstraintFragment Sampler: S_00001
protocols.abinitio: Fragment Sampler: S_00001
protocols.abinitio: max_seq_sep: 3

What is amazing is that when I run a different constraint file containing 510 constraints, the error "reading of AtomPair failed" does not appear. Anyother constraint file lesser than 510 constraints gives this error. This constraint file looks like
AtomPair CEN 1 CEN 2 BOUNDED 4.000 7.000 0.500
AtomPair CEN 1 CEN 3 BOUNDED 4.000 7.000 0.500
AtomPair CEN 1 CEN 4 BOUNDED 4.000 7.000 0.500
AtomPair CEN 1 CEN 6 BOUNDED 4.000 7.000 0.500
AtomPair CEN 1 CEN 19 BOUNDED 4.000 7.000 0.500
AtomPair CEN 1 CEN 23 BOUNDED 4.000 7.000 0.500
AtomPair CEN 1 CEN 31 BOUNDED 4.000 7.000 0.500
AtomPair CEN 1 CEN 79 BOUNDED 4.000 7.000 0.500

510 constraints in total
The output by this run: read constraints from ./test.cst read constraints section --NO_SECTION--- no section header [ xxx ] found, try reading line-based format... DON'T MIX read constraints from ./test.cst
core.scoring.constraints: combine constraints 1 -> 1
protocols.abinitio: ConstraintFragment Sampler: S_00001
protocols.abinitio: Fragment Sampler: S_00001
protocols.abinitio: max_seq_sep: 3
This seems to include the constraints, but, I am not sure whether it includes all the constraints. I am trying to incorporate all the constraints while folding the protein. The line "core.scoring.constraints: combine constraints 1 -> 1" does not tell whether other constraints are also included.

It shall be of great help if helped in this regard.
Thank you in advance

Thu, 2014-03-06 02:30

As it mentions in the documentation for bounded constraints ( the "tag" entry is not optional (although it's not used for anything).

AtomPair CEN 5 CEN 16 BOUNDED 4.000 7.000 0.500 tag
AtomPair CEN 5 CEN 81 BOUNDED 4.000 7.000 0.500 tag

I'm guessing there's some issue with line ending and the like which cause issues with your truncated files.

Regarding the number of constraints read, in Rosetta3.5 and later there should be a line printed to the tracer at the standard (info) output level which tells you how many constraints are being read. Rosetta3.4 and earlier do not have it. What you may be able to do is see if the output pdb has non-zero values in the appropriate scoreterm for each of the constraints. This is a little iffy, though, as the output pdb may not have been scored at a point when the constraints were being applied. (It also doesn't work if you don't have output PDBs.)

Thu, 2014-03-06 07:58

Thank you for your reply. editting the constraint file with the tag helped. The atom constraints score has a non-zero value, but, I am not sure whether all the constraints are considered.

Thu, 2014-03-06 09:34

If you want to stick with Rosetta3.3, one thing you may want to try is to rescore the a centroid mode output structure with a pdb output.

A command something like:

score_jd2.linuxgccrelease -score:weights apc.wts -out:file:scorefile -constraints:cst_file ./test.cst -in:file:centroid_input -out:pdb -in:file:silent ./silent.out -in:file:tags S_00001

Should probably do the trick, where apc.wts is a file containing the single line:

atom_pair_constraint 1.0

The output PDB (named something like S_00001_0001.pdb ) should list each residue in a table at the bottom, and for the atom_pair_constraint column should have non-zero energies for each of the residues involved in a constraint.

Fri, 2014-03-07 08:37

Hi everyone,

    I saw this post very profitable for a very similar I was facing a couple of weeks ago, where the Ab initio was used to predict the folding of an enzyme in the presence of HEM group. However, no successful results were obtained

    I post here a little more details in case any of you might help. I realize is a very old post, and probably no one will continue, but just in case.

    The external parameters of my HEM group were created and introduced in the Ab initio call as suggested in this post, for both, the centroid and full-atom representation.  All was done with no errors observed with the script

    I was using some constraints for the full atom and the centroid, assigning different weights to each case and apparently, without the presence of HEM group, everything worked!

    The problem, when I try to introduce the "small" ligand in the prediction. I generated more than 1000 models, and no matter what the protocols I followed, as written below, all the times, the models have the same problem, the HEM ligand is in contact with the first atom of the first residue of the protein model.

     When you open the generated models, you can quickly visualize how the first atom of the protein (nitrogen atom) start at the same coordinates as the first atom of the HEM group (in this case, the iron ion), and then, it's like all the sampling starts using both entities to be occupying the sample place. I also tried to translate the HEM, re-parametrize to see if this could be a plausible solution, but no success, the problem was not solved. 

    I tried everything I found and I could imagine, but apparently, nothing works and I would like to know if anyone of you experienced the same problem in the past, and if yes, how it was solved, because documentation is pretty scarce for the Abinitio module and the examples found are pretty basic

     The commands were used in the execution are detailed below:

AbinitioRelax.static.linuxgccrelease -database PATH2ROSETTA_DB \
    -fasta seq.fasta \
    -frag3  frags.200.3mers -frag9  frags.200.9mers \
    -out:file:silent seq_silent.out -out:pdb -out:path results/  \
    -constant_seed -jran `echo $RANDOM` \
    -loops:extended -loops:build_initial -loops:remodel quick_ccd -loops:refine refine_ccd -loops:relax fastrelax \
    -random_grow_loops_by 4 \
    -select_best_loop_from 1 \
    -nstruct 10  \
    -abinitio:relax -relax:fast \
    -extra_res_fa   HEM.fa.params \
    -extra_res_cen HEM.cen.params \
    -cst_fa_file native.cst -cst_file  native_CEN.cst \
    -cst_weight 50 -cst_fa_weight 5 \
    -ex1 -ex2 -extrachi_cutoff 10  >  AbInitio.log


      Other protocols were tested as well, like this one, but with identical solution in terms of generated models:

AbinitioRelax.static.linuxgccrelease -database PATH2ROSETTA_DATABASE \
-seed_offset `echo $RANDOM` \
-run:protocol broker \
-nstruct 10  \
-fasta seq.fasta \
-frag3 frags.200.3mers -frag9  frags.200.9mers \
-out:file:silent seq_silent.out -out:pdb -out:path results/ \
-relax:fast \
-relax:jump_move \
-rg_reweight 0.0001 \
-packing:ex1 \
-packing:ex2 \
-fold_and_dock:move_anchor_points \
-fold_and_dock:set_anchor_at_closest_point \
-fold_and_dock:rigid_body_cycles 100 \
-fold_and_dock:rigid_body_frequency 0.01 \
-fold_and_dock:slide_contact_frequency 0.01 \
-extra_res_fa   HEM.fa.params \
-extra_res_cen  HEM.cen.params \
-cst_fa_file native.cst -cst_file native_CEN.cst \
-run:reinitialize_mover_for_each_job >  AbInitio.log

     Any help will be really welcome.

    And here goes the sequence, where the ligand is denoted in brackets with the letter code Z as suggested in other posts where ligand is included in the fast sequence

    The sequence:



       Many thanks in advance for any help

File attachments: 
Mon, 2021-03-08 03:46