You are here

Rebuilding the structure from unknown residues

5 posts / 0 new
Last post
Rebuilding the structure from unknown residues


I am trying to rebuild the structure from unknown residues (UNK). I have the corresponding EM map and sequence but the PDB file is missing some residues and rest is given as "UNK". I am trying to build an all-atom model using Rosetta. Please suggest as Rosetta does not accept any undefined residues. Any input is welcome.




Post Situation: 
Mon, 2016-06-27 09:32

If you know the sequence, put the right sequence in over the UNKs in the PDB file.  I guess if you don't know the sequence, make it all alanine or something.  I don't fully understand the question.

Mon, 2016-06-27 09:35

Rosetta needs to model *something*. There isn't really an "unknown" residue type, because any sort of modeling you do will make assumptions about what the residue properties are.

As Steven says, if you know what the sequence should be (e.g. from other sequencing results), you should put that in. Rosetta electron density fitting takes something of a "best fit" approach, and will try an match up the experiemental density with the actual sequence. Unlike some other density fitting programs, Rosetta isn't really befuddled by having extra atoms in the structure.  You're not going to mess up your fit by having atoms present which aren't well represented by the density. Rosetta is built to accomodate the presence of atoms, residues and even loops which are poorly represented in the density data. It will fit the density the best it can, and if there are atoms for which there isn't density, it will model them with the standard Rosetta structure prediction energy function.

If you don't know what the sequence is for some reason, you can just pick a semi-arbitrary amino acid to use at that position. Alanine is a fine choice, as it's somewhat neutral. Valine is also a good one for positions you know are hydrophobic, as it's a good middle-sized hydrophobic. (Poly-valine is typically used in Rosetta de novo structure design projects for this reason.)  If you're a hydrophillic/surface residue, something like serine might be a good choice. -- Basically, you're trying to do a best -uess matchup between the amino acid used and the likely properties of the real amino acid at that position.

You also might want to do something like an iterative approach. Do an inital run with a simple sidechain (poly-ALA/poly-VAL), then look at the density/surface exposure at each UNK position and adjust the identity of the amino acid to match, refining the fit with the new amino acid identities. Do this a couple of times based on your best guess at the identities at each position.

If you're looking at an unknown sequence length in addition to unknown identity, things get harder. But you can do multiple runs each  with different loop lengths and pick the best. You can also take an iterative approach, starting with a mid-length loop length, and then doing density-guided loop remodeling to extend/shorten the loop, as appropriate.

Mon, 2016-06-27 10:25

Thank you for your suggestions. To clarify my question, following is the example of residue 14 in pdb.

"ATOM  52801  N   UNK Y  14     104.651   2.058 -96.784  1.00 30.00           N
ATOM  52802  CA  UNK Y  14     105.993   2.712 -96.646  1.00 30.00           C
ATOM  52803  C   UNK Y  14     105.883   4.238 -96.676  1.00 30.00           C
ATOM  52804  O   UNK Y  14     105.099   4.790 -97.453  1.00 30.00           O"

and as per the sequence, it should be Valine.

So what I understand here is simply replacing residues from sequence file into the correspoding residues in PDB should work.

Thanks again.



Mon, 2016-06-27 10:51

Right. Take your favorite text editor and change the "UNK" to "VAL". As long as the backbone heavy atoms are present, Rosetta should be able to take it from there. 

Tue, 2016-07-12 09:35