You are here

Error when renumbering PDB

8 posts / 0 new
Last post
Error when renumbering PDB
#1

Hi,I am a new Rosetta learner, just following the protein protein docking tutorials to study now.
In the protein preparation step, an error turned up as follows:
$ python ~/rosetta_scripts/scripts/clean_pdb.py complex.pdb ignorechain
Looking for: complex.pdb
Found preoptimised or otherwise fixed PDB file.
Traceback (most recent call last):
File "/home_soft/home/simm05/rosetta_scripts/scripts/clean_pdb.py", line 156, in ?
if (chainid == line[21] or ignorechain or removechain):
IndexError: string index out of range

1. I wonder if there is anything wrong with my pdb? I just put 2 docking proteins in 1 pdb file, removed waters and hydrogens, and named them as two chains.
2. Is there any certain requirments about the pdb file?
3. One protein lacks the structure of 22 amino acid , its structure is broken in two. Does it affect the docking process? Do I need to add its structural information by homology modeling or something?
4. Does existing of small molecule ligands affect docking?

Thanks a lot!

Post Situation: 
Wed, 2012-07-18 01:03
dzhao

I don't think the python script should crash, but it does say "Found preoptimised or otherwise fixed PDB file.", which implies you're ok to move on to docking.

"I just put 2 docking proteins in 1 pdb file, removed waters and hydrogens, and named them as two chains."

This is correct.

"3. One protein lacks the structure of 22 amino acid , its structure is broken in two. Does it affect the docking process? Do I need to add its structural information by homology modeling or something?"

A gap in the numbering will not affect docking, so long as the chain lettering is constant, and there are no termini atoms (OXT atom, TER card in the PDB) on the false termini. Of course, if it tries to dock to the part of the surface missing the 22 residues, then you should filter out those structures as impossible.

"4. Does existing of small molecule ligands affect docking?"
Yes? Can you be more specific?

Wed, 2012-07-18 06:39
smlewis

4.One of the proteins(A) is a protein kinase, its auto-phosphoralation may lead to the ther protein(B) binding with it. The ligands,AMP and formic acids, are also in the structure of A.
(1)I am not sure whether B's binding site overlaps with ATP's or binds with A after ADP's left.Maybe I can dock respectively in two systems using A with AMP and without AMP ?
(2)The formic acids do interact with A but its physiological function is unknown.
I kept all the ligands in the pdb, wondering if they led to the error.

Thu, 2012-07-19 00:18
dzhao

I think the first thing I should do is point out that errors like these are not uncommon. Even for Rosetta pros it takes a few tries to get a new modeling simulation working properly. Don't fret errors, we can probably work them out.

It is possible the ligands are leading to this error.

If you want to keep the ligands present, you need to make sure they have Rosetta-readable parameter files. So long as you do not want them to move internally, this is only a medium challenge. I strongly suggest docking without them first (for simplicity and to smooth out the learning curve), then docking with them later if you decide you need them.

Thu, 2012-07-19 07:43
smlewis

I rechecked the pdb file. Its number does start from 1. But the number is not continuous, is it OK?

You said it's OK to move on to dock? Then "IndexError: string index out of range" can be ignored?

Does it matter whether the bigger protein's number start from 1 or the smaller start from 1?

I didn't find a rosetta3.3 tutorial, so I learned from earlier version tutorials which don't match well with the 3.3 manual. Could you paste a link of a general rosetta3.3 tutorials to me? Or any usual websites for a beginner?

Thanks so much for you kind reply!

Thu, 2012-07-19 00:40
dzhao

"I rechecked the pdb file. Its number does start from 1. But the number is not continuous, is it OK?"

Rosetta won't care. It will internally renumber them to start from one, so be careful - if Rosetta says "residue number 45 has a problem", it might not mean PDB #45.

"You said it's OK to move on to dock? Then "IndexError: string index out of range" can be ignored?"
The only way to find out is to try. I've never bothered with the python script in question so I don't know why it's spitting errors.

Does it matter whether the bigger protein's number start from 1 or the smaller start from 1?
If there is a huge size difference, the big one should be first. This won't affect the science, but the code will run faster that way.

There should be demos in with your copy of Rosetta - I think it's a folder called rosetta_demos? We've rearranged things a lot since 3.3 so I forget how it used to be arranged...

Thu, 2012-07-19 07:45
smlewis

I moved on to dock. Since one residue is phosphorated. There is another error :
$ docking_protocol.linuxgccrelease @flag_1 > test.txt
ERROR: unrecognized aa PTR
ERROR:: Exit from: src/core/io/pdb/file_data.cc line: 655
I suppose I should add some parameter files for the phosphorated residue just as for ligands. But how to add such files??

Thu, 2012-07-19 19:36
dzhao

First thing to do is try renaming the residue from PTR to TYR (i.e. with a text editor) There's a phosphotyrosine patch in the standard database, but I believe that the older versions of Rosetta (3.3 and before) don't recognize the "PTR" name, expecting it to be TYR instead. That should be fixed in the development version, and I believe also 3.4

The worst case scenario with renaming is that Rosetta will discard the extra atoms and treat it like a regular tyrosine, which may be okay, depending on your system. (You can do a test run with the edited file - or simply use the scoring application with the pdb output option - to find out. If the output doesn't have the phosphate group, Rosetta isn't recognizing it. If it does, it did.)

I'm 90% confident that should work. If not, you'll either have to adjust patches, or make a new noncanonical amino acid parameter file - but again, that probably won't be necessary.

Fri, 2012-07-20 10:32
rmoretti