I have multiple X-ray structures (PDB files) of the same protein. All of them have missing stretches of residues. I want to use all of them to build a single model, with as little missing residues as possible. How can I do this?
In a particular application, I have one additional requirement. One of the PDB files is special (call it the template), in the sense that I only want to add missing residues to it. That is, the resulting model should contain the special pdb file, and align perfectly to it, the only difference being that the resulting model has some additional residues that were completed from the information in the other pdb files. Okay, the resulting model should not align "perfectly" to the template. I'll allow some relaxation in the connecting points. But still, the idea is that the template PDB should carry a "larger weight".
So, those are the two problems: 1) How to reconstruct a more complete model from a bunch of PDB files of the same protein? 2) How to complete missing residues in one PDB file using information from other PDB files of the same protein?
Note that I am asking here because I assume that Rosetta (or a Rosetta related web-server) has tools to do this. But if Rosetta is not adequate and you know of another tool, please point that out too!
My understanding is that you have a template structure of your protein. Let's say it's a 100 residue protein and your template has 80. You then have some other structures which, collectively, have another 15 of the 20 missing residues. I assume the missing regions are loops (they usually are). This is one of those problems that is simple to describe but hard to do correctly.
First issue: these regions that are present in some models and missing in others are crystallographically very suspect. Look at those regions versus their electron density, their B-factors, and perhaps their MolProbity results - you may find that they don't contain much good information anyway, and you can just ignore them and proceed to loop modeling.
If you want your template to stay _perfectly the same_ and add extra atoms from the other structures, there isn't a great way to do this. Grafting on residue bits from other structures without allowing the template to relax into their presence will leave you with odd geometries at the connection points. If you let the grafted residues move to get the connection geometries right, you'll lose a lot of the information from the new residues. I don't know of a C++ executable to do this, but I think it's readily achievable in PyRosetta - just write snippets of code to copy the desired residues from each source into your template, fix the connection geometries, and then minimize/repack/whatever the copied residues.
If some modification of the template is ok, you may be able to set this up as "homology modeling" where you use your desired structure as both the sequence and the template, and generate fragments for the missing regions from your other PDBs. I don't know enough about using this code to guide you through it, but does that experiment even make sense for your needs?
I edited the question. You're right, I should allow some relaxation of the template. What I really want is that the template "carry more weight" than the other files. It seems that the "homology modeling" you mention is closest to my intention in question 2). But I don't know how to do that.
What about question 1)? Here all PDBs are equivalent, there's no special template. Is there a tool to get a "balanced" model that exploits the information in all the PDBs, all with "equal weights"?
It sounds like you want to use the Comparative Modeling protocols of Rosetta. While typically comparative modeling (homology modeling) is used to model a sequence that hasn't been crystallized based on the structures of homologs which have been, it's also completely able of modeling parts of a sequence that hasn't been crystallized based on partial crystal structures of that same structure. (A protein is technically a homolog to itself.)
There's two protocols for comparative modeling. The older one is the single-template approach, also known as threading and loop remodeling. This is the protocol that's covered in the comparative modeling tutorials at http://meilerlab.org/index.php/jobs/resources - this is routinely used in our lab to rebuild loops with missing density.
The newer one is the RosettaCM multi-template comparative modeling protocol (http://www.ncbi.nlm.nih.gov/pubmed/24035711). This allow production of structures incorporating structural information from multiple templates (multiple crystal structures). It also has fragment-based loop rebuilding which will be able to fill in regions missing from all of the crystal structures. There's also a "weight" option available in the tag for each template PDB which should allow you to adjust how "important" each structure is, if you want to do that.
Can you include links to the user guide pages for these protocols? I found this: https://www.rosettacommons.org/docs/latest/application_documentation/str..., but I don't see the option to add weights? And what's the syntax to add more than one PDB template file?
What you linked is the documentation for the older, single template comparative modeling protocol.
For multiple templates you want to use the RosettaCM protocol. The protocol as a whole doesn't have a documentation page on the rosettacommons.org site. The Song et al paper (linked above) is the best resource, and there is a tutorial put out by Frank Dimaio about using it in context of electron density, which may be helpful (https://faculty.washington.edu/dimaio/files/density_tutorial.pdf)
Also, if you want to private message me, I can send you a pre-release version of the RosettaCM tutorial the Meiler lab put together, which should be up on the Meiler lab website later in the year.
We decided there wasn't any reason to put off releasing the current version of the RosettaCM tutorial, even if it is in the process of being updated.
See the entry for RosettaCM in the "Rosetta Tutorials" tab at http://meilerlab.org/index.php/jobs/resources for a link to the download.