multiple template homolgy modelling

52 posts / 0 new

Top

Hi,

I am working on predicting a structure of a protein from several templates. Each template has homologous sequences for different parts of the target protein. It seems however that only one template is picked up by Rosetta. Is there a tutorial or a manual for multiple template homology modeling with Rosetta? What is the format of input alignment file I should use? how should I specify paths to template pdb files in the flag file? Is there a way to specify which template should be used for specific sequences of the target protein (as there is an overlap in sequences of some template proteins and I would like to chose which template should be used for predicting specific part of the target protein).

Thank you for your insights.

Post Situation:

Unsolved

Tue, 2012-10-30 08:12

pdbb

Top

This type of homology modeling is actually covered in the new "hybrid" homology modeling protocol. Unfortunately, that hasn't been released yet. (Though it probably will be in the next release, although I don't know when that would be.)

Until that point, as you have structures which are homologous to different portions of the target protein, you could cheat by making "Frankenstein" templates that have manually combined the relevant sections of the different protein. Rosetta will do it's best to model the various sections and mesh up the connections. (What with indels and all, it has to do this for the regular homology modeling cases too.) This might get a little touchy if you're unsure of the relative orientation of the various parts, as my understanding is that Rosetta will cue off of the orientation in the template (although depending on the contacts/constraints, you may get a rather large sampling of orientations).

Tue, 2012-10-30 10:41

rmoretti

Top

Hello!
Dear rmoretti is in Rosetta3.5 version the new "hybrid" homology modeling protocol? If yes, how we can use it?

Thank you very much!

Sun, 2013-12-08 23:23

grisha

(Reply to #4)

Top

I don't think RosettaCM (what the "hybrid" protocol is called now) is in Rosetta3.5, though it should be available through any of the recent weekly releases. See http://dx.doi.org/10.1016/j.str.2013.08.005 for the paper describing it.

Mon, 2013-12-09 10:50

rmoretti

Top

Thank you !!!

And with help of RosettaCM in the near future we can use more than one template pdb for comparative modeling? Yes?

And one more questions

Is a good idea to use the result from Robetta as template pdb for comparative modeling?

Tue, 2013-12-10 07:58

grisha

(Reply to #6)

Top

I wouldn't recommend using Robetta output as input for comparative modeling - you're probably better off going to original input pdb(s), instead of using model that was based off those input pdbs.

Tue, 2013-12-10 08:37

rmoretti

(Reply to #7)

Top

Dear rmoretti please help me choosing the right method:

I'm trying to model a tertiary structure of big protein, which consists of five domains, and known is only its C-terminal domain structure, other domains have homologous with the known structures.

Which method is better for modeling the tertiary structure of my protein?

Tue, 2013-12-10 09:29

grisha

(Reply to #8)

Top

For your four unknown domains, do you have homologs which span multiple domains, or does each domain have a distinct set of homologs?

If each domain has a seperate set of homologs, I might recommend splitting each domain into it's own modeling run, and then using the homology modeling protocols to model each domain separately from the homologs. You could use the old threading-based protocol, but if you're up for it and have a number of different homologs, the new RosettaCM protocol is likely the way to do it.

If you have homologs which span multiple domains, then I might consider modeling the multiple domains in a single run. (E.g. if you have a homolog which spans domains 2&3, then model 2&3 as a unit, but 1 & 4 as a seperate domain.) This would be where the RosettaCM protocol starts to shine, as you may have one homolog which bridges the two domains, and several other homologs which only have one domain or another. If set up appropriately, the RosettaCM protocol should be able to take the best portions of each together.

Alternatively, you could simply model each domain seperately, and then superimpose the two modeled domains using the dual-domain homolog as a template for the orientation. Running loop modeling and relax could fix up the connection between the two.

If you don't have a homolog which will bridge the interaction between the domains, then things become a little more speculative regarding the structure of the large complex. You can try protein-protein docking to see if you can find an interaction. You'll probably also have to consider the possiblity that there isn't a stable interaction between the domains. Especially in eukaryotes, any interaction between domains in a multidomain protein can be transient, and the protein can exist as two seperate domains connected by a flexible, unstructured linker. If you have any experimental evidence to help pin down the inter-domain interactions, that can help tremendously.

Tue, 2013-12-10 09:45

rmoretti

(Reply to #9)

Top

Thank you very much for the detailed answer!

No, each domain have a distinct set of homologs.

As I understand the best option is RosettaCM. But when it will be available?

And is not the results of Robetta alternative option? If you look at them as on "frankenstein" or "hybrid" protein structure which includes all of the domains structures of my protein. And use the rosetta comparative modeling for its correction?

Tue, 2013-12-10 10:29

grisha

(Reply to #10)

#10

Top

I believe everything you need to do RosettaCM should be available with the most recent weekly release. If you're missing something, let me know, and I'll talk to the appropriate people such that it gets included in the next weekly release.

I do believe that Robetta is now using the RosettaCM protocol for modeling, so if you wanted to model each domain with Robetta, that would be an option too. It may even work to submit the entire sequence to Robetta, and it may do the domain parsing and assembly itself, although I'm not sure how well the automated system would handle it.

If you don't have any homology-based structural information as to how the domains come together, coming up with a composite structure that contains all of the domains will be slightly tricky. Again, you may want to take the individual domains (modeled locally or through Robetta), and do protein-protein docking to come up with likely interactions. You can then use loop modeling to figure out how to connect the two domains. If you have any experimental evidence (like crosslinking information, or NMR association data), that can greatly help in figuring out how the domains interact (the data can be added as constraints during the protein-protein docking). Though to be realistic, if you don't have any experimental information on how domains interact, and don't have example structures of the interactions, I'd be hesitant to believe any predicted domain-domain interactions.

What I was suggesting against was using the comparative models of the individual domains as templates in a RosettaCM run to model the complex. (Especially if you don't have structures which bridge two domains, I'm not sure how much the RosettaCM protocol by itself would get you.) You can certainly use the output of the individual domain modeling as starting points for further (non-RosettaCM) runs.

Wed, 2013-12-11 08:01

rmoretti

#11

Top

Ok !
Thank you very much dear rmoretti !!!

I look forward to next weekly release!

I hope very much that it will include RosettaCM.

Wed, 2013-12-11 09:36

grisha

#12

Top

I'm also interested in trying the multi-templated version of RosettaCM described in the recent Structure paper. In the Supplemental Methods, it is mentioned that there is a script (setup_rosettaCM.py) to perform the necessary preliminary steps in the Rosetta release. However I have been unable to find this script in the most recent release (2013week49)

I did look at the script provided in the Supplemental materials of the paper, but I am running into a problem. This script calls the rosetta program partial_thread.linuxgccrelease. However, this is code not present in my bin directory after building Rosetta (2013week 42) and I can't find any .cc file for partial_thread in the most recent release (2013week49). Are there compile-time options that I am missing or is this just something that needs to be included in the weekly release?

If I ignore the error about not finding partial_thread and try to run comparative modeling after the script finishes, I get a lot of warnings about not being able to find side-chain atoms, and an error from the Hybridize protocol: tgt_pos<=nres_tgt. I'm assuming this is because the partial templates were not built.

Thanks in advance for your help,

Marissa

Wed, 2013-12-11 09:51

Marissa

(Reply to #13)

#13

Top

The setup_rosettaCM.py script should be located at Rosetta/tools/protein_tools/scripts/setup_RosettaCM.py with recent weekly releases -- as you guessed, it's more-or-less the same as the script that was included in the supplemental information of the paper.

Unfortunately, the partial_thread application hasn't yet made it to the release. I'm in contact with the appropriate people, and hopefully will get the application released, if not for the next weekly release, at least for the one after.

Fri, 2013-12-13 07:49

rmoretti

(Reply to #14)

#14

Top

Rocco,

Is there any documentation besides the supplemental paper we could use for the script/program? Also, where can we find the rosetta_cm xml script?

Fri, 2013-12-13 08:47

jadolfbr

(Reply to #15)

#15

Top

Thanks, found it. Although I see that it too is dependent on partial_thread. So I'll keep an eye out for the next weekly release or two.

Fri, 2013-12-13 13:22

Marissa

#16

Top

I just wanted to follow-up on this. I looked at the 2013 Week52 release and did not see partial_thread yet. Any estimate on when this will be in the release?

Tue, 2014-01-14 10:24

Marissa

(Reply to #17)

#17

Top

Sorry, one of the people involved was on a long vacation over the holidays, so it's taking longer than I expected to get it into the release. They're working on it, but there's some issues that need to be addressed before it can be released, so I can't say exactly when it will make it into the release.

Wed, 2014-01-15 08:09

rmoretti

#18

Top

Any further news on this, or indeed when the weekly releases will resume? Thanks

Thu, 2014-04-17 02:42

danzinho

(Reply to #19)

#19

Top

So partial_thread should be in the weekly releases when they resume, but there's been some technical/administrative difficulties recently getting the weekly releases put together and out on the website. From what I hear from the people responsible for putting together the weekly releases, there should be a new one going up soon-ish - but don't hold me to that.

Mon, 2014-04-21 07:33

rmoretti

(Reply to #20)

#20

Top

partial_thread is indeed in the weekly releases now. I have found it in rosetta_2014.16.56682_bundle.

Tue, 2014-04-29 05:17

nannemdp

#21

Top

Hi!
There is any news about RosettaCM protocol and application?

Wed, 2014-08-27 02:08

grisha

(Reply to #22)

#22

Top

The current weekly releases should now have all the necessary parts for running the RosettaCM protocol.

Thu, 2014-08-28 09:24

rmoretti

(Reply to #23)

#23

Top

Hello,

I have been trying to run the setup_RosettaCM.py script, and I am able to generate the rosettacm/ directory with the flags file, and rosetta_cm.xml protocol. However, when I run the generated command, I am getting the following error: ERROR: tgt_pos<=nres_tgt, ERROR:: Exit from: src/protocols/hybridization/HybridizeProtocol.cc line: 391.

Could you help me resolve the issue with the HybridizeProtocol.cc?

My generated command is:
Users/kulkarnik/Downloads/rosetta_2014.35.57232_bundle/main/source/bin/rosetta_scripts.macosclangrelease @flags -database /Users/kulkarnik/Downloads/rosetta_2014.35.57232_bundle/main/database -nstruct 10

and I have attached my flags, rosetta_cm.xml, and alignment files below.

Thank you!

File attachments:

gadd45align.txt

flags.txt

rosetta_cm.xml_.txt

Thu, 2014-11-06 21:10

kulkarni

(Reply to #24)

#24

Top

I got the exact same error a few months ago (but I didn't use the setup script, I used the each individually and read the setup script as a guide) and asked Frank Dimaio for help. Here was his response, which worked for me (Yes, I wish we had documentation on all this too. Frank was very helpful though, and hopefully we will have some documentation on Hybridize soon as it works great):

" I think the problem is that you need to give hybrid a threaded model, not the template. So the sequence and sequence numbering must match the target fasta (but residues may be missing). So if you run:
partial_thread.default.linuxgccrelease -in:file:fasta MIS.fasta -in:file:alignment MIS.grishin -in::file::template_pdb 1xxxA.pdb
THEN give the output of this to hybrid it will work fine. Remember that the first 5 characters of the 2nd field in the alignment file must match the 1st 5 characters of the template file."

Now, when you run partial thread, you have to be careful of the output. It does not use JD2 and so wierd. It will overwrite your PDB if your not careful.

"
The header is something like:
## t000 1xxx_out

The second field (1xxx_out above) is used to specify both the input and output template. So if you give the full input template, it willl overwrite your file. If you add something to the end, _thread or whatever, then it will not overwrite things.
"

Other useful flags/options/etc that were suggested to me. The setup script pretty much did all this for you, but they will soon go into documentation:

"
With multiple templates:
Generate a single ali file
call partial thread with the ali file, give it ALL the input templates
partial thread will make threaded models for all inputs
make a seperate template line in the xml for each partial thread
The grishin file will look something like this for multiple templates:

## 1xxx 1xxx_thread
# hhsearch 1
scores_from_program: 0.0 1.0
0 AAAAAA
0 AAAAAA
--
## 1yyy 1yyy_thread
# hhsearch 1
scores_from_program: 0.0 1.0
0 AAAAAA
0 AAAAAA
--

etc.

Modelling with multiple chains:
create alignment file and template as if it were a single chain and call partial thread
when calling hybrid, add a '/' in the fasta file at each chainbreak

Recommended flags:
I think the defaults are all reasonable. You might want to play with atom_pair_constraint weights in all three stages. Something like:
<stage1 weights=score3 symmetric=0>
<Reweight scoretype=atom_pair_constraint weight=0.5/>
</stage1>
<stage2 weights=score4_smooth_cart symmetric=0>
<Reweight scoretype=atom_pair_constraint weight=0.5/>
</stage2>
<fullatom weights=talaris2013_cart symmetric=0>
<Reweight scoretype=atom_pair_constraint weight=0.5/>
</fullatom>
...
<Hybridize name=hybridize stage1_scorefxn=stage1 stage2_scorefxn=stage2 fa_scorefxn=fullatom/>

Other options to Hybridize:
stage1_increase_cycles=1 stage2_increase_cycles=1 allows you to scale the number of centroid cycles. 0.5 for both runs a bit faster and doesn't hurt things if the alignment is pretty good

Turn off fullatom:
Add batch=0 to hybridize mover options. Make sure to give a centroid scorefunction to -score:weights
"

Mon, 2014-11-10 11:55

jadolfbr

(Reply to #25)

#25

Top

Thank you very much for your response! I'll try it out.

Wed, 2014-11-19 07:14

kulkarni

#26

Top

Hello,

I've been trying to follow very nice instructions for using RosettaCM as described by jadolfbr, but still have several questions. I try to model a multidomain protein (3 domains), for 2 domains I have a known structure, and for one there is no homologs at all. All the domains do not overlap (gaps around 50-100 AA or more). I also have cross-linking data on this protein, that I'd like to use during modeling.

1) Do I need to generate an ab-initio model of first domain before running RosettaCM protocol? It's about 500 AA long... Or it may be done automatically during the threading command? In the paper I see "Rosetta de novo fragments" often quoted but what do they mean exactly? Are they ab-initio generated models or these are fragment files coming from, e.g. Robetta fragment server?

2) If I run partial_thread and rosetta_scripts as described before, I get miriads of warnings but it continues up to the end. Do I need to worry about them? Examples:

core.io.pdb.file_data: (6) [ WARNING ] discarding 13 atoms at position 1 in file ./input_files/rosetta_cm/threading/3ghgA_thread.pdb. Best match rsd_type: LEU:NtermProteinFull core.io.pdb.file_data: (6) [ WARNING ] discarding 10 atoms at position 2 in file ./input_files/rosetta_cm/threading/3ghgA_thread.pdb. Best match rsd_type: VAL

.......

core.conformation.Conformation: (1) [ WARNING ] missing heavyatom: CEN on residue VAL 2 core.conformation.Conformation: (1) [ WARNING ] missing heavyatom: CEN on residue SER 3

.......

core.io.pdb.file_data: (5) [ WARNING ] can't find atom for res 1 atom CD1 (trying to set temp) core.io.pdb.file_data: (5) [ WARNING ] can't find atom for res 1 atom CD2 (trying to set temp)

To me it looks like it removes all the side chains in order to build centroids and then complaining about missing atoms that were removed... :)

Also, there are more warnings like that:

core.optimization.AtomTreeMinimizer: (17) ***************************************************************
core.optimization.AtomTreeMinimizer: (17) ** WARNING: Non-ideal minimization used with a ScoreFunction **
core.optimization.AtomTreeMinimizer: (17) ** which isn't set up for non-ideal minimization **
core.optimization.AtomTreeMinimizer: (17) ***************************************************************
core.optimization.Minimizer: (16) WARNING: LBFGS MAX CYCLES 200 EXCEEDED, BUT FUNC NOT CONVERGED!

Is it related to wrong scoring functions mentionaed in xml file (please see attached file)? I could try to change -default_max_cycles in flags file...

And the last one - at the end of the job I get:

protocols.jd2.JobDistributor: (18) WARNING: The following options have been set, but have not yet been used:
-constraints:cst_file hsgf29.cst
-constraints:sog_upper_bound 15
-out:file:silent_struct_type binary

Why didn't it work? I checked the log file, contraint file was read by program at the start.

Gregory.

File attachments:

hsgf29.aln_.txt

rosetta_cm.xml_.txt

flags.txt

Sun, 2015-01-04 14:23

azazello654

(Reply to #27)

#27

Top

Your weights file needs to include the cart_bonded term - so when the cartesian minimizer is called it will appropriately close the chain.

Thu, 2015-01-08 13:20

jadolfbr

(Reply to #28)

#28

Top

Thanks, jadolfbr

I have figured this out with little help from Yifan Song:

"Rosetta gives you an error when cart_bonded and pro_close terms are used together. It should work if you remove the pro_close line in the stage3_rlx.wts file"

Fri, 2015-01-09 02:01

azazello654

(Reply to #29)

#29

Top

Thanks to Frank Dimaio, I'm posting here the answers to my previous questions in case someone is interested:

1) You do not need to generate an ab initio model, RosetaCM will treat this as a (very large) insertion and generate models. However, you will likely get better results running ab initio on the missing domain first. Unless the constraints you have are very dense, it is unlikely that ab initio will produce anything very reasonable at that length, unfortunately.
Fragments may be generated from the Rosetta fragment server and passed to RosettaCM. Alternately, fragments can be automatically generated in the protocol (using a simplified version of the protocol run on the server).

2) These are harmless warnings. (discarding/missing atoms etc)

3) question about weights is already answered.

4) You will need to give constraint files through the xml rather than the command line. Replace cst_file="AUTO" with your constraint file instead. You can either replace these constraints with your constraints , or, append your constraints to this file (using SCALARWEIGHTEDFUNCTION ##) to reweight the constraints relative to the homologue constraints.
-out:file:silent_struct_type binary is only used if you are writing binary silent files.

Fri, 2015-01-09 02:12

azazello654

(Reply to #30)

#30

Top

Hello azazello654, i know this is an old post but did you find a solution to the error regarding the

Kind regards

Dan

Mon, 2017-07-17 09:18

Daniel Hall

(Reply to #31)

#31

Top

You should be using the cartesian scorefunction. talaris_cart or ref2015_cart. If you post your full commandline and xml script I can fix it for you.

-Jared

Mon, 2017-07-17 12:12

jadolfbr

#32

Top

Hello dear Rosetta users and developers!
I have same questions about RosettaCM. Please help me if you can.
1) When I am runing setup_RosettaCM.py i have error -
File "./setup_RosettaCM.py", line 708, in <module>
thread_fullnames = ["%s/%s.pdb"%(run_dir, x.target_tag) for x in alignment.alignments ];
NameError: name 'alignment' is not defined
What is this error and how can I fix it?

2) How can I input my own templates pdbs instead of downloading them from the DB?

Thanks in advance!

Thu, 2015-01-29 11:06

grisha

(Reply to #33)

#33

Top

It looks like there's a bug in the script. You have several options. The first is to find a version of the script from an older version of Rosetta (prior to Oct 2014). Or it looks like providing an alignment file in the "modeller", "hhsearch", "clustalw", or "fasta" format (specify with the --alignment_format option) will skirt the problem.

Alternatively, you can try to patch the issue (this patch is untried). Try adding the following line after both 662 and 669, right after the two "os.system" commands (and indented to the same level):

alignment = Alignment(); alignment.read_grishin( open(converted_aln).readlines() )

2) It doesn't look like there's a method for skipping PDB file download. However, you can patch things such that it won't download the file if it already exists in the directory. Just go to line 548-549, and put those lines into an if statement, so they look like the following (indent the if statement to be the same level as the existing lines.:

if not os.path.exists(dest[:-3]): 
    log_lines = os.popen( wget_cmd ).readlines() 
    os.system("gunzip -f %s"%dest)

(Again, untested). Then simply provide your desired template as a pre-existing appropriately named file in the directory.

Mon, 2015-02-09 11:54

rmoretti

#34

Top

Dear Rocco and Others,

Thanks for all the questions and replies on this thread. Based on them, I have been able to model a heterodimeric protein based on a single template structure, and it looks great. However, there is a large 15-residue insertion, which requires some extra effort. I think part of this insertion is helical. Is there a way within the hybrid modelling protocol to add extra constraints? I tried adding - as a test - the traditional rosetta fragments commands (-loops:frag_size, -loops:frag_files, -loops:quick_ccd) into the hybrid flags file, but I got a warning at the end of the run that these had been specified but not used. I would particularly like to specify that certain stretches of loop/protein are helical. What would the constraint file look like in this instance?

Sun, 2015-02-15 20:08

Derek Smith

(Reply to #35)

#35

Top

The Hybridize mover should take fragment files in the XML itself to model those sections of the protein which aren't modeled by the templates. If those fragment files are predominantly alpha helical in the region of the insertion, Rosetta should model things as alpha helical. (Take a look at the secondary structure prediction files which are generated by the fragment creation process - they should indicate if things were adequately modeled as alpha helical.)

The other option if you know that this insertion should be structured in a certain way is to simply find a template which has a suitably similar structure, and use that as a template for just that region. RosettaCM is built to assemble a structure out of different parts, so you don't need to have template homologs which are homologous to the entire protein - you can certainly use "homologs" which match only a subset of the protein. When doing the alignments, just make sure that everything that doesn't match is unaligned to your desired sequence.

Thu, 2015-04-16 11:33

rmoretti

(Reply to #36)

#36

Top

Thanks for the reply, Rocco!

I had already managed to work out how to add the fragment files into the XML file. I had actually tried to delete my comment, as I had managed to sort out most of my problems.

When working with templates where you only have one region of interest, can you simply cut out that region of interest and call partial threading with the entire model sequence, and use this as one of 'templates' in the CM XML file? Or is that just a re-statement of what you wrote above?

Thu, 2015-04-23 18:22

Derek Smith

(Reply to #37)

#37

Top

In your alignments you want to have the full sequence of your desired protein and the full sequence of the template PDB you're passing to the threading application. However, you don't need everything "aligned" - you can just have a small portion with correspondences in the two sequences, and then the rest of the proteins (template and target) aligned to gap characters. The threading application will pick out only those portions you wish to use.

That said, if it's easier for you, you certainly can manually truncate your template protein, and use a cut down region for your template PDB input. Just make sure that the template sequence you use in your alignments is cut down to match exactly the cut-down template PDB.

Fri, 2015-04-24 12:07

rmoretti

#38

Top

Rocco,

I have another question regarding multiple template modelling.

I am building a heterodimer, where one chain is a known structure, and the second requires homology modelling. I have good homologues, but some conformational changes are seen in the complexed homologue, so I have used both the template complex and a second homologous structure as follows:

SEQUENCE: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA/BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
TEMPLATE 1: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA/---------------CCCCC-------------CCCCCC------------CCCCCC--------
TEMPLATE 2: ---------------------------------------------------------------/DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD

I have used all of the suggested hybridisation flags above, and added a weight of 1.0 for template 1 and 0.8 for template 2, and the model looks good with the appropriate conformational changes in the second chain. I also see some changes in chain A, mostly sidechain rotamers, but some other backbone movements. Is there a way to restrain the chain A in the hybridisation protocol? I'd like to fix the rotamers and backbone.

Thu, 2015-07-02 22:21

Derek Smith

(Reply to #39)

#39

Top

I've never done anything like it myself, but from what I can tell, you should be able to add constraints to the pose prior to the hybridization stage, and if you add the option "keep_pose_constraint=1" to the Hybridization tag in the XML, it should keep the constraints on during the Hybridize run.

I might recommend trying the AtomCoordinateCstMover (https://www.rosettacommons.org/docs/latest/scripting_documentation/Roset...). Use native=1 and supply a structure with the chain A you want to constrain as the -in:file:native.

I'm not entirely sure that will work, though. There may be bugs to work out.

The other option is to attempt to model chain B as a monomer in the absence of chain A, and then attempt to dock them later.

Fri, 2015-07-03 14:14

rmoretti

#40

Top

One more question...

I would like to model a symmetric dimer or dimer of dimers. Can you use symmetry constraints within RosettaCM? I have never used the symmetry option before so I need to play around with it first, but I would like to include this if possible.

I have just noticed that Frank DeMaio's Density Tutorial online may have what I am looking for. I will check it out.

Tue, 2015-09-08 23:16

Derek Smith

(Reply to #41)

#41

Top

To use symmetry with RosettaCM, simply add the option "symmdef=xxx.symm" to the Hybridize mover tag to set the symmetry file.

Wed, 2015-09-09 13:48

rmoretti

#42

Top

Hi guys....

I'm traying to use the RosettaCM, and reading the article and the article suplementer, reports that must be created an XML file from a fasta file and an aligment file. Citing the article (A script for setting up and running RosettaCM jobs, setup_RosettaCM.py, is included with the Rosetta release. Given the fasta-format target sequence and an alignment file in hhsearch or CLUSTALw format, a configuration file can be generated by running the command: setup_rosettaCM.py --fasta input.fasta --alignment input.alignment This creates two configuration files. command: A modeling job can be run using the Rosetta rosetta_scripts.linuxgccrelease \ -parser:protocol rosetta_cm.xml \ -in:file:fasta input.fasta @flags).

Runing the setup_RosettaCM.py return this error

usage: setup_RosettaCM.py [-h] --fasta FASTA [--alignment ALIGNMENT]
[--alignment_format ALIGNMENT_FORMAT]
[--templates [TEMPLATES [TEMPLATES ...]]]
[--rosetta_bin ROSETTA_BIN] [--build BUILD]
[--platform PLATFORM] [--compiler COMPILER]
[--compiling_mode COMPILING_MODE]
[--setup_script SETUP_SCRIPT] [-j J] [--run]
[--keep_files] [--run_dir RUN_DIR] [--equal_weight]
[--use_dna] [--verbose]
setup_RosettaCM.py: error: argument --fasta is required

I did not understand that the configuration files I'm using this correct or not? or the way I'm calling the program that is wrong(./setup_RosettaCM.py fasta.fasta alinhamento.ali)?

What next steps to continue after generating the XML file?

Sun, 2015-10-04 15:47

jrcf

(Reply to #43)

#43

Top

You actually need to use the "--fasta" and "--alignment" labels with the inputs, rather than just having them be positional arguments. For example:

./setup_RosettaCM.py --fasta fasta.fasta --alignment alinhamento.ali

Wed, 2015-10-07 16:41

rmoretti

#44

Top

Here's another question about RosettaCM...

I'm trying to model an enzyme with low (~20%) sequence id to the best templates. I have 11 templates and have generated all the necessary files for the basic RosettaCM run. I would also like to incorporate the evolutionary constraints from my templates as described in the Thompson and Baker (2011) paper, but I have no idea where to start, and I can't find any documentation. Can anyone help me with this? Is this already taken care of using the 'cst_file="AUTO"'?

Wed, 2016-02-03 05:57

Derek Smith

(Reply to #45)

#45

Top

My understanding of cst_file=AUTO is that it is used to keep the pieces of the various templates aligned with each other while you're doing the template swapping. It would not have the addtional evolutionary constraint information from Thompson and Baker.

If you did have that information in a Rosetta-formatted contraint file format, then I think you should be able to pass it to the fa_cst_file option in the main Hybrizide tag (not the individual template tags). Unfortunately, that will only apply during the fullatom refinement stage, and not during the template section swapping stage.

Thu, 2016-02-04 08:21

rmoretti

#46

Top

Hi guys,

I've noticed that recently, when the tutorials pages were updated, the basic RosettaCM method has been adapted to include a 'relax' step after the comparative modelling. I tried implementing this by adding the relax line into the XML file, but what I end up with is ruined structures. Do I need to add some kind of restraining options somewhere? Is it as simple as adding an options file to the flags list, or do I need to call the options file from the XML script. Is this also implemented in Robetta? What does the options file look like?

Tue, 2017-09-19 22:25

Derek Smith

(Reply to #47)

#47

Top

I can't say for sure why your structure is getting messed up, but if separating out the relax step into a separate run doesn't give you the same problems, one possible fix is to clear the constraints from the pose. (As constraints don't get passed through the saving/loading as PDB/silent file process.) To do this I'd use the ClearConstraintsMover (https://www.rosettacommons.org/docs/latest/scripting_documentation/RosettaScripts/Movers/movers_pages/ClearConstraintsMover).

Tue, 2017-10-10 10:03

rmoretti

(Reply to #48)

#48

Top

Thanks Rocco,

I have one more question..

I'd like to introduce some constraints to enable coordination of a ligand during the homology modelling protocol. I created a file containing a list of distances and added them in the XML file, replacing cst_file='AUTO' with my distance.cst file. The program crashes out, saying that it can't find any of the atom types I have used (even though they are standard amino acid sidechains, and the ligand has been parameterised and shows up in the model if I don't add any constraints). How do I correctly add my constraints file in this case?

Mon, 2017-10-23 22:00

Derek Smith

#49

Top

Hi guys,

When using multiple templates, there are certain regions in my target sequence where I would like to use the coordinates from one particular template. Is there a way to specify this in the XML file?

Tue, 2018-01-09 23:57

Derek

(Reply to #50)

#50

Top

Not that I'm aware of. However, in your alignment files you can align just that template to that region of the protein, and de-align (align to gaps) all the other templates. In that way Rosetta will only use that template as the template for that region of the protein, and will completely ignore the structures of the other templates in that region.

Mon, 2018-01-15 15:53

rmoretti

#51

Top

Hi guys,

I recently downloaded Rosetta 3.9 and noticed that when I used the 'setup_rosettaCM.py' script, I ended up with an XML file where the atom_pair_constraint weight is set to 0.1 rather than the value of 0.5 recommended here in this thread and used in earlier scripts. Is this a deliberate change due to the REF2015 scoring function, or just an oversight by those editing the script?

Derek.

Wed, 2018-04-25 21:26

Derek

You are here

multiple template homolgy modelling

Pages

Search form

You are here

multiple template homolgy modelling

Pages