You are here

Any backrub expert?

5 posts / 0 new
Last post
Any backrub expert?
#1

Dear all,

I have just started out my work in a computational protein design lab.
The first algorithm I learned at the beginning was backrub, which I know basically could be used for flexible backbone design.
After using it for a while, I have some doubt in my mind.
Would you mind sharing your valuable experience on the following situations regarding backrub?

1) I have generated 50k structures by backrub.
After removing redundant sequences, the number of unique amino acid sequence was just 1k.
There was a wide variability of the redundancy of the sequence (i.e. sequence A -> 10k structures, sequence B -> 1k structure, sequence C -> 1 structure)
Assuming I want to find the best sequence which gives rise to the most stable energy score, how would you analyze the result?
Do you just consider the best energy score of each unique sequence or do a statistics of the population of structures generated per sequence?

2) Could I force backrub not to generate structures with redundant sequences?

3) Is it a bad idea if I use backrub to design interface? My idea is : initial pose by docking -> backrub to generate sequence variation -> analyze sequence-binding energy correlation -> (iterate with backrub, experiment..)

Thank you very much!

 

Category: 
Post Situation: 
Fri, 2019-03-01 01:58
johnnytam100

Hi Johnny,

Are you using backrub through the rosetta executable, or through the backrub server?

-Amanda

PhD student, Kortemme lab

Fri, 2019-03-01 11:25
aloshbau

Hi Amanda,

I am using backrub through rosetta executable.

Anyway, how does it matter?

Fri, 2019-03-01 16:22
johnnytam100

Hi,

I wanted to know which you are using to give you the best advice. It's great that you're using the executable, this means you are comfortable w/ the commandline and can explore the many features Rosetta has to offer.

The correct way to design with backrub is to first generate backbone diversity using the backrub application, then second do sequence design on each member of the ensemble using the FixBB application. This is known as "BackrubEnsemble" design because you first generate an ensemble of conformationally diverse structures, then design on that. Is this what you're doing now? Or are you doing design with the backrub application? The options allow that, but it's not recommended. Also, 50k is more structures than you need. How many positions in the interface are you designing? I wouldn't go over 1000.

I also encourage you to consider instead using the CoupledMoves application for design. This application iterates repeatedly between backrub moves and side chain design, in contrast to the BackrubEnsemble method which first moves the backbone, then designs side chains, then finishes without repeating the cycle. I've been comparing CoupledMoves and BackrubEnsemble methods on a number of experimental datasets and it turns out CoupledMoves does a much better job of recapitulating observed sequences than BackrubEnsemble. CoupledMoves performs backbone moves using the same backrub algorithm, it's just the way the methods go between backbone and sidechain moves are mixed differently. CoupledMoves goes back and forth many many times between backbone and side chain moves, which it turns out gets more realistic answers that doing many many backbone moves, then many many side chain moves as in BackrubEnsemble design.

As for the frequency of sequences generated by each method, don't filter for redundancy. If a sequence is modeled more frequently, it's more likely to be good IRL. I recommend looking at both the frequency of each side chain at a given position, and the total energy rank of each given side chain. When you're picking which amino acid side chain to test experimentally, pick one that's modeled frequently and with favorable energy ranking. The energy table can be found at the end of the PDB output by Rosetta. CoupledMoves ranks the energy of known consensus positions better, and at greater frequency, than BackrubEnsemble. Note that I recommend looking at the energies and frequencies position by position, rather than total protein.

I think it's a good idea to use BackrubEnsemble and/or CoupledMoves for interface design. To get the most sequence diversity, you can generate a backrub ensemble (use Backrub application to generate 400 structures without changing sequence) and then use those as input for CoupledMoves design.

Are you trying to design a new interface or redesign an existing one?  Do you plan to experimentally test individual designs, or do a library screen?

-Amanda

Sun, 2019-03-03 16:29
aloshbau

Hi Amanda,

I wanted to know which you are using to give you the best advice. It's great that you're using the executable, this means you are comfortable w/ the commandline and can explore the many features Rosetta has to offer.

I see! Yes I am using command lines but still have a lot to learn.

The correct way to design with backrub is to first generate backbone diversity using the backrub application, then second do sequence design on each member of the ensemble using the FixBB application. This is known as "BackrubEnsemble" design because you first generate an ensemble of conformationally diverse structures, then design on that. Is this what you're doing now? Or are you doing design with the backrub application? The options allow that, but it's not recommended. Also, 50k is more structures than you need. How many positions in the interface are you designing? I wouldn't go over 1000.


Opss... I thought the sequences coming along with the structures straight from backrub would be my designs, but it seems to be an incorrect way if the correct way of using backrub is as what you have mentioned? Then I have some questions,
1) What 
are the drawbacks or using the sequences directly generated from backrub? 
2) For example, if the backbone RMSD between the ensembles and the starting structure is within 0.5A (essentially there is not much structural variation among ensembles), wouldn't it be just the same if I just do FixBB?
3) Is there a major difference in the nature between FixBB and backrub (except whether they move the backbone), so that there is a reason to do backrub -> FixBB?

I also encourage you to consider instead using the CoupledMoves application for design. This application iterates repeatedly between backrub moves and side chain design, in contrast to the BackrubEnsemble method which first moves the backbone, then designs side chains, then finishes without repeating the cycle. I've been comparing CoupledMoves and BackrubEnsemble methods on a number of experimental datasets and it turns out CoupledMoves does a much better job of recapitulating observed sequences than BackrubEnsemble. CoupledMoves performs backbone moves using the same backrub algorithm, it's just the way the methods go between backbone and sidechain moves are mixed differently. CoupledMoves goes back and forth many many times between backbone and side chain moves, which it turns out gets more realistic answers that doing many many backbone moves, then many many side chain moves as in BackrubEnsemble design.

I am checking out CoupledMoves right now, thanks for the suggestion!

As for the frequency of sequences generated by each method, don't filter for redundancy. If a sequence is modeled more frequently, it's more likely to be good IRL. I recommend looking at both the frequency of each side chain at a given position, and the total energy rank of each given side chain. When you're picking which amino acid side chain to test experimentally, pick one that's modeled frequently and with favorable energy ranking. The energy table can be found at the end of the PDB output by Rosetta. CoupledMoves ranks the energy of known consensus positions better, and at greater frequency, than BackrubEnsemble. Note that I recommend looking at the energies and frequencies position by position, rather than total protein.

What is IRL short for? May I know why if a sequence is modeled more frequently, it's likely to be a good IRL? I still have doubt on how to understand the meaning of the frequency of occurrence of some sequences. For energy rank, I understand the lower the energy, the more stable of the protein, then we choose those designs. But for the frequency of sequences, I am thinking: if a sequence is modeled more frequently, then isn't it natural that there will also be a better chance of getting some structures with low energy values? However, in the opposite way, there will also be a higher chance of getting some structures with poor energy values, which is exactly at the heart of my doubt: how should I interpret the frequency of sequence, when which is increased, give me a wider range of energy values in both good and poor sides?

Are you trying to design a new interface or redesign an existing one?  Do you plan to experimentally test individual designs, or do a library screen?

Actually I am designing a nanobody to bind my target protein. That means the ligand will be a mutant protein, the receptor is a wild-type protein, and the final interface is novel. Should it be called a redesign(?) Our collaborator will do pull-down of my design with the target protein and calculate the Kd of binding. As they do not have the expertise on any molecular display techniques, we will not do any library screens by experiment.
 

Amanda thank you so much for your detailed reply, it helped me a lot! Thank you!

Sun, 2019-03-03 23:42
johnnytam100