You are here

Structure Prediction: Max Sequence Length?

3 posts / 0 new
Last post
Structure Prediction: Max Sequence Length?


I have been getting good structure predictions for medium and short sequences, of length somewhere around 212 or less.

I had already noticed some larger sequences are rather troublesome. Today I tried running a sequence of length 400 or so, and get a Segmentation Fault error. The previous steps are actually successful, including secondary structure prediction and fragment files generation. I also set my system stack size and max locked memory to unlimited.

Being kind of new to Rosetta, is there something like a maximum sequence length for structure prediction?
Or maybe it's an error in my compilation? System memory allocation perhaps?

Finally, I am using Debian 5.0 64-bit on an Intel Xeon SMP.

Here is the output I get from running AbinitioRelax.linuxgccrelease:

core.init: command: /home/jcsaborio/Rosetta3.0/rosetta3_source/bin/AbinitioRelax.linuxgccrelease -in::file::fasta 1706.fasta -database /home/jcsaborio/Rosetta3.0/rosetta3_database/ -in:file:frag9 aa1706_09_05.200_v1_3 -in:file:frag3 aa1706_03_05.200_v1_3 -out:pdb true -out:nstruct 250 -abinitio:relax true -out:file:silent 1706.out
core.init: 'RNG device' seed mode, using '/dev/urandom', seed=-1019192820 seed_offset=0 real_seed=-1019192820
core.init.random: RandomGenerator:init: Normal mode, seed=-1019192820 RG_type=mt19937
protocols.abinitio.AbrelaxApplication: read fasta sequence: 410 residues
Segmentation fault

Any help will be greatly appreciated, thanks in advance.

Tue, 2009-07-14 12:10

Caveat: I've never used abrelax!

I'm not aware of any hard length restrictions on abrelax, and I find it very unlikely that they'd've been coded in as hard crashes instead of error messages.

I'm assuming that's a DNA sequence (it would surely be a strange looking protein!) Are you sure that the abrelax application will fold DNA (I would guess that it doesn't, but I have no evidence either way)?

If this is really all protein, a possibility is that with so many cysteines, the disulfide machinery is overloading and imploding.

You could always try folding a 400-mer AAAAAAA (just to check that it fails to crash).

It seems strange that it prints far less than 400 characters of sequence before crashing...

Wed, 2009-07-15 13:49

Thanks for your answer. I am receiving sequences from somebody else who didn't provide any kind of information, but you are probably right about it being DNA. I will talk to them about it.

Also, I found this information in the Robetta FAQs, dealing with possible issues when processing long sequences:

__** Begin ** __

The Rosetta folding program itself is a FORTRAN program that has to be careful with its memory usage. Extremely long sequences cannot be fit into memory, so any domain level models are limited to no more than __250 residues for the de novo protocol__, and __600 residues for the comparative modeling protocol__.

The de novo protocol suffers, as do all such methods, from a limitation in the ability to sample conformations available to the protein. Larger targets that are high contact-order are more difficult to sample in a reasonable amount of computer time, and probably require much larger decoy ensembles than the Robetta server can afford to generate. Therefore, we impose a de novo domain size limit of about 200 residues, which is clearly often incorrect, but necessary. It is hoped that in such cases, features of the target are still captured by the models.

There is additionally a limit on the length of the full chain of about 1000 residues, so that the independently modeled domains may be assembled into a contiguous chain.

__ ** End ** __

Found here: [|]

There is a possibility that they are running an older version of Rosetta under the hood, but maybe it is important to consider there seems to be, indeed, some kind of length restriction.

Again, thanks for your help.

Thu, 2009-07-16 11:30