I have two questions concerning conformers in the enzyme design application:
1) Is hydrogen bonding calculated from heavy atoms only or should the conformer library contain, for example, hydroxyl groups with the hydrogen pointed in multiple directions for it to recognize when a hydrogen bond can be formed?
2) I noticed that the ligand in the output pdbs is always the ligand that was in the last chain of my protein receptor file, and never one from the library (even though terminal clearly tells me that it read the conformer file). Does Rosetta only take the library into account while doing the calculations in the background, or should it actually replace the ligand in the output file to the optimal conformer if everything goes right?
I know basically nothing about enzyme design, but can comment that the Rosetta hydrogen bond term is fully atomic, and the position of the H atom does matter for hydrogen bonding energies. I don't think enzyme design uses a separate hydrogen bonding term from everyone else. (Throw my comment out the window if someone who knows more objects).
Thank you for your reply!
Perhaps someone else can weigh in on the second problem?
When you generate the params file for your ligand with the molfile_to_params.py script, it should be smart enough to recognize rotatable hydrogen-bondable hydrogens, and add lines to sample those hydrogen positions in the params file. (You can look for "PROTON_CHI" lines in the params file for the rotatable hydrogens.) The conformers in the conformer files only need to be the heavy atom conformers.
Rosetta should replace the conformer in the output file with the one it selects. If it isn't, that might mean there is an issues with your conformer sampling. Several things to check.
First, make sure the PDB_ROTAMERS line is included in the params file you use.
Second, check your protocol and see if you've accidentally turned off repacking for the ligand in the resfile or task operations which you specified.
Third, check the atom naming on the ligand on your input PDB. The names should match up with those in the params file, and the PDB file spit out by molfile_to_params. If not, Rosetta will mess up atom name assignment, and things can get funky.
Fourth, take a look at how your ligand is sitting in your active site. Pay particular attention to the location of the atom which is annotated as the NBR_ATOM in the params file. Rosetta will superimpose the ligands based on the location and orientation of the NBR_ATOM. If you're in a tight pocket, some or all of your rotamers might make clashes with the protein when superimposed on the neighbor atom, which would result in them never being chosen and the input rotamer being kept. Also, if you have a critical interaction at one end of your long and skinny ligand, superimposing the rotamers on a NBR_ATOM in the middle of the ligand might result in those interactions being broken for all of the rotamers, leading to them never being chosen. You can move the NBR_ATOM such that superposition make better results. (If you move the NBR_ATOM, be sure to update the NBR_RADIUS to be slightly more than the longest distance from the NBR_ATOM to any heavy atom in any rotamer.)
Thank you, your answer has been very helpful.
I do have one more question. I'm currently running EnzDes on multiple processors (MPI). However, for every processor, the application creates separate output folders outdir_0 to outdir_(n-1). In each of these folders, the output pdb numbering restarts from 1. This means that, for 10 processors, I would have ten files named output_DE_1.pdb, ten files named output_DE_2.pdb, etc.
The problem is that in the scorefile only the filename but not the path is given in the description column, making it impossible to link a certain score to a its output pdb. Is there a flag that could solve this problem?
Thank you in advance.
Unfortunately, given the way the enzyme_design application is set up, there really isn't a way to control this.
However, there isn't necessarily any benefit to running the enzyme_design application with MPI - that is, there isn't reallly anything different from you just manually launching n separate non-MPI jobs in folders outdir_0 to outdir_(n-1). The benefit to the multiple serial jobs approach is that you can use the flags -out:suffix or -out:prefix to control naming of the output files, so you can give each run in each directory its own label. (You unfortunately can't do that with MPI, as all processors get the same commandline, and thus the same suffix/prefix.)
The only reason to run MPI would be if your cluster administrators require it of you (Though you should be able to run enzyme_design as a single processor MPI run.) in which case you might be able to cobble together something with matching up names and scores, or you could possibly rename the output PDBs, and then post-renaming just rescore them using the -enzdes:enz_score option of the enzyme_design application (which just rescores the structures without designing them).