Hello. I have a few questions about abinitio structure prediction.
1) Can abinitio predict secondary structure as well, or does it just take secondary structure as a given based on inputs for fragment picker?
2) Is it worth using Rosetta to fold an 800 AA long protein? Are there some tips/tricks needed to run predictions for such a large protein?
To my knowledge, Rosetta contains no SS prediction algorithms. There is a DSSP lookalike to state what SS exists, but nothing like psipred. So, no, abinitio can't guess, it just takes what it's given.
I don't think 800 residue proteins are encouraged. The public fragments generator on Robetta quits at 400. Anything that large is probably made of multiple domains - perhaps they can be folded individually? I'm not aware of any Rosetta papers that mention folding something that large, but if there are, we can try to contact those developers directly.
The protein I am working with is a Tbx protein that contains a T-Box domain, which is highly conserved. However, other than the T-box domain, there are no other identified domains in the protein that I am working with. In fact, the SS predictions that I have from psipred and sam show coils for all parts of protein except for T-box domain. From about 100-300 AA is the T-Box domain and the rest do not have any kind of homology. I am not quite sure how to approach this problem.
I am interested in using Rosetta because when we replace the gene Tbx6 (formerly tbx24) in zebrafish with that of the mouse, it appears to work perfectly fine even though the only conserved area is the T-box domain (mouse Tbx6 is about 400AA and zebrafish Tbx6 is about 800AA). It would be interesting to see the tertiary structure of both proteins to try and identify similarities.
I don't think anyone, Rosetta included, has a protein folding algorithm that can handle hundreds of contiguous residues with no secondary structure.
Is there even evidence that those parts of the protein adopt a stable fold at all?
There is some code under development meant to handle disordered regions but I don't think it will scale to this.
Our lab does not deal with figuring out the protein structures, so no we have no evidence that these parts fold at all. We are interested in which part of the tbx6 protein is required for it to behave normally, and we are exploring this idea by created truncated versions of the protein. Since it is a blind search, I was hoping to use Rosetta to get some kind of information that may guide in the search.
What is the realistic upper limit of protein size that can be folded in Rosetta?
My impression is that 100 is a soft cap and 200 is as far as anyone takes seriously. I'm not an ab initio person, nor is my lab, so I may be wrong. If you look through CASP history that's roughly what you'll see.
I am pretty sure that the fragment generator uses a few SS prediction algorithms to make its fragments for a given sequence...
Have you tried doing homology modeling first? Are there parts of your protein that are homologous to other domains? Do you have any experimental info you could use for constraints? There are domain assembly type applications floating around...The only one I've used is RosettaRemodel, which didn't make it into the 3.4 unfortunately...
The fragment generator takes SS prediction files as inputs - the hardest part of using it is getting its inputs all in order so it works correctly. The infamous make_fragments.pl is..challenging to use...because it has so many external dependencies on SS prediction programs.