I am performing de novo structure prediction for a 1135 amino acid long sequence. I ran robetta server to generate fragments for my 1135 amino acid long sequence, but got the error : Sequence length must be between 27 and 1000 residues
Can you please et me know how to get fragments in this scenario? Thanks!
I don't believe there's a way to get the Robetta server to give fragments for proteins that large. If you want to get them, you'll have to run the fragment picking protocol locally.
That said, attempting to do ab inito prediction directly on a 1135 aa protein is probably misguided. Rosetta ab inito generally maxes out at ~150 residue, pushing up into the 300+ aa region if you have strong experimental constraints. Trying to use ab inito on a 1000+ aa protein is almost certainly going to give you rubbish - the sampling space is just too large to effectively sample in the way ab inito does.
A much better approach is to attempt to separate out the protein into domains. It's highly unlikely that the actual protein is a single cooperatively folded domain in the organism. Instead, it's likely that this large protein is composed of several domains which fold independently and then assemble together.
As such, I'd recommend that the computation follows the biology and that you split the protein into whatever independent domains exist. Then you can fold each domain separately - either by ab inito, or if there's a conserved subdomain fold you can use comparative modeling techniques. Once you have the structure of each domain in isolation, you can then use something like protein-protein docking to orient the domains together with respect to each other. Findally you can use loop remodeling (or comparative modeling with your partial model as your "homolog" template) to create the linkers which connect the individual domains.
(The other thing I'd highly recommend doing is running things like disordered structure prediction on the sequence. If there's a large intrinsically disordered region in the protein, Rosetta won't be able to model a structure for it. - You can put it in for a representative usage or to "take up space", but modeling of intrinsically disordered regions tends to depend heaviliy on the downstream application and why you need a structure for an unstructured protein region.)