I want to do homology modelling with the below structures. First, I'm trying to create a threaded model according to the tutorial here https://www.rosettacommons.org/docs/latest/application_documentation/structure_prediction/RosettaCM. I'm recieving the below error and I would appreciate if someone helps me debug it. I also want to note that my structures have two chains, which is unlike the other multiple template tutorial here https://www.rosettacommons.org/demos/latest/tutorials/rosetta_cm/rosetta_cm_tutorial.
The error:
[labusr@luxor scripts]$ /usr/bin/python /home/labusr/rosetta/tools/protein_tools/scripts/setup_RosettaCM.py --fasta /home/labusr/Ahmad/Rosetta/CM/5ucy.fasta --alignment /home/labusr/Ahmad/Rosetta/CM/alignment.clustal --alignment_format clustalw --templates /home/labusr/Ahmad/Rosetta/CM/1JFF.pdb --rosetta_bin $ROSETTA3/source/bin/ --verbose
Using rosetta from /home/labusr/rosetta/main//source/bin/
Switching to directory: /tmp/tmptpxMnK
['##', '1JFF:A|PDBID|CHAIN|SEQUENCE', '5UCY:_thread']
Traceback (most recent call last):
File "/home/labusr/rosetta/tools/protein_tools/scripts/setup_RosettaCM.py", line 609, in <module>
-> You may have to rename or swap the order of sequences in your alignment file."
AssertionError: The first sequence ID in a clustalw alignment file must match the fasta file name.
-> You may have to rename or swap the order of sequences in your alignment file.
[labusr@luxor scripts]$
The alignment file:
CLUSTAL O(1.2.4) multiple sequence alignment
5UCY:B|PDBID|CHAIN|SEQUENCE MREIVHIQGGQCGNQIGAKFWEVISDEHGIDPTGTYHGDSDLQ--LERINVYYNEATGGR
1JFF:B|PDBID|CHAIN|SEQUENCE MREIVHIQAGQCGNQIGAKFWEVISDEHGIDPTGSYHGDSDLQ--LERINVYYNEAAGNK
5UCY:A|PDBID|CHAIN|SEQUENCE MREVISIHVGQGGIQVGNACWELFCLEHGIQPDGQMPSDKTIGGGDDAFNTFFSETGAGK
1JFF:A|PDBID|CHAIN|SEQUENCE MRECISIHVGQAGVQIGNACWELYCLEHGIQPDGQMPSDKTIGGGDDSFNTFFSETGAGK
*** : *: ** * *:* **: . ****:* * .*. : : :*.::.*: ..:
5UCY:B|PDBID|CHAIN|SEQUENCE YVPRAILMDLEPGTMDSVRAGPFGQLFRPDNFVFGQTGAGNNWAKGHYTEGAELIDSVLD
1JFF:B|PDBID|CHAIN|SEQUENCE YVPRAILVDLEPGTMDSVRSGPFGQIFRPDNFVFGQSGAGNNWAKGHYTEGAELVDSVLD
5UCY:A|PDBID|CHAIN|SEQUENCE HVPRAVFLDLEPTVIDEVRTGTYRQLFHPEQLISGKEDAANNFARGHYTIGKEIVDLCLD
1JFF:A|PDBID|CHAIN|SEQUENCE HVPRAVFVDLEPTVIDEVRTGTYRQLFHPEQLITGKEDAANNYARGHYTIGKEIIDLVLD
:****:::**** .:*.**:* : *:*:*:::: *: .*.**:*:**** * *::* **
5UCY:B|PDBID|CHAIN|SEQUENCE VVRKEAEGCDCLQGFQITHSLGGGTGSGMGTLLISKVREEYPDRIMETFSVVPSPKVSDT
1JFF:B|PDBID|CHAIN|SEQUENCE VVRKESESCDCLQGFQLTHSLGGGTGSGMGTLLISKIREEYPDRIMNTFSVVPSPKVSDT
5UCY:A|PDBID|CHAIN|SEQUENCE RIRKLADNCTGLQGFLVFNSVGGGTGSGLGSLLLERLSVDYGKKSKLGFTIYPSPQVSTA
1JFF:A|PDBID|CHAIN|SEQUENCE RIRKLADQCTGLQGFSVFHSFGGGTGSGFTSLLMERLSVDYGKKSKLEFSIYPAPQVSTA
:** :: * **** : :*.*******: :**:.:: :* .: *:: *:*:** :
5UCY:B|PDBID|CHAIN|SEQUENCE VVEPYNATLSVHQLVENADECMVIDNEALYDICFRTLKLTTPTYGDLNHLVSAAMSGVTC
1JFF:B|PDBID|CHAIN|SEQUENCE VVEPYNATLSVHQLVENTDETYCIDNEALYDICFRTLKLTTPTYGDLNHLVSATMSGVTT
5UCY:A|PDBID|CHAIN|SEQUENCE VVEPYNSILSTHSLLEHTDVAVMLDNEAIYDICRRNLDIERPTYTNLNRLIAQVISSLTA
1JFF:A|PDBID|CHAIN|SEQUENCE VVEPYNSILTTHTTLEHSDCAFMVDNEAIYDICRRNLDIERPTYTNLNRLIGQIVSSITA
******: *:.* :*::* :****:**** *.*.: *** :**:*:. :*.:*
5UCY:B|PDBID|CHAIN|SEQUENCE CLRFPGQLNSDLRKLAVNLIPFPRLHFFMIGFAPLTSRGSQQYRALTVPELTQQMFDAKN
1JFF:B|PDBID|CHAIN|SEQUENCE CLRFPGQLNADLRKLAVNMVPFPRLHFFMPGFAPLTSRGSQQYRALTVPELTQQMFDAKN
5UCY:A|PDBID|CHAIN|SEQUENCE SLRFDGALNVDITEFQTNLVPYPRIHFMLSSYAPIISAEKAYHEQLSVAEITNSAFEPAN
1JFF:A|PDBID|CHAIN|SEQUENCE SLRFDGALNVDLTEFQTNLVPYPRGHFPLATYAPVISAEKAYHEQLSVAEITNACFEPAN
.*** * ** *: :: .*::*:** ** : :**: * . :. *:* *:*: *: *
5UCY:B|PDBID|CHAIN|SEQUENCE MMCAADPRHGRYLTASALFRGRMSTKEVDEQMLNVQNKNSSYFVEWIPNNIKSSICDIPP
1JFF:B|PDBID|CHAIN|SEQUENCE MMAACDPRHGRYLTVAAVFRGRMSMKEVDEQMLNVQNKNSSYFVEWIPNNVKTAVCDIPP
5UCY:A|PDBID|CHAIN|SEQUENCE MMAKCDPRHGKYMACSMMYRGDVVPKDVNASIATIKTKRTIQFVDWCPTGFKVGINYQPP
1JFF:A|PDBID|CHAIN|SEQUENCE QMVKCDPRHGKYMACCLLYRGDVVPKDVNAAIATIKTKRTIQFVDWCPTGFKVGINYEPP
* .*****:*:: . ::** : *:*: : .::.*.: **:* *...* .: **
5UCY:B|PDBID|CHAIN|SEQUENCE KG--------LKMAVTFVGNSTAIQEMFKRVAEQFTAMFRRKAFLHWYTGEGMDEMEFTE
1JFF:B|PDBID|CHAIN|SEQUENCE RG--------LKMSATFIGNSTAIQELFKRISEQFTAMFRRKAFLHWYTGEGMDEMEFTE
5UCY:A|PDBID|CHAIN|SEQUENCE TVVPGGDLAKVMRAVCMISNSTAIAEVFSRLDHKFDLMYAKRAFVHWYVGEGMEEGEFSE
1JFF:A|PDBID|CHAIN|SEQUENCE TVVPGGDLAKVQRAVCMLSNTTAIAEAWARLDHKFDLMYAKRAFVHWYVGEGMEEGEFSE
: :. ::.*:*** * : *: .:* *: ::**:***.****:* **:*
5UCY:B|PDBID|CHAIN|SEQUENCE AESNMNDLVSEYQQYQDAT----------------
1JFF:B|PDBID|CHAIN|SEQUENCE AESNMNDLVSEYQQYQDATADEQGEFEEEGEEDEA
5UCY:A|PDBID|CHAIN|SEQUENCE AREDLAALEKDYEEVGIETAE--------------
1JFF:A|PDBID|CHAIN|SEQUENCE AREDMAALEKDYEEVGVDSVEGEGE--EEGEEY--
Thanks a ton.
Category:
Post Situation:
Attempting to debug this error, I tried to test the scripts in the tutorial with an example. Target: 1u19.fasta, template: 2rh1.pdb. I made the alignment into the grishin format manbually because setup_RosettaCM.py couldn't do it automatically. The alignment is:
I entered the following for the threading, and it seemed to have run fine:
For the hybridization, I used the XML provided in the tutorial, which gave an error. I had to convert it to the new XML format using rewrite_rosetta_script.py, this is the error I recieved:
I would appreciate any help with this.
Any help would be appreciated.
On the issue with setup_RosettaCM.py in your first post, you need to make sure you match the names of all the sequences you're using -- and this includes capitalization. "5UCY" is different from "5ucy". Another thing to keep in mind during the process is that Rosetta structure prediction pipelines like identifiers of 5-charachters (the four letter PDB code plus a chain identifier). They should be general, but they're not always. If you're having issues, try changing all your names to five-letter ones. (e.g. by padding four-letter or smaller ones with underscores.)
On your second issue, the key bit is the following
[ ERROR ] Sequence mismatch between input fasta and template /home/labusr/Ahmad/Rosetta/CM/Test/2rh1.pdb at residue 1 [ ERROR ] Expected: P Saw: V
If I had to guess, I'm guessing you're running things with the structures from before the threading, rather than with threaded structures. -- Once you're done with the threading step, you're done with the original templates (and the Grishin alignments) -- you should use the output of the threading step for the rosetta_scripts hybridization step.
If this is not the issue, then something happened with your threading step. When you're doing threading, be sure that the sequences you use match up exactly -- the template sequence in the Grishin alignment should match exactly to the sequences that's actually in the template PDB, and the target sequence should match exactly to the sequences which is in the fasta file. And the fasta file for threading should be identical to the fasta file that's used for the rosetta_scripts Hybridization stage.