You are here

Garbled Ligands from docking_protocol

22 posts / 0 new
Last post
Garbled Ligands from docking_protocol
#1

Greetings,

I'm trying to construct a homo-dimer of a protein+ligand (so a total of two peptide chains and two ligand chains) using docking_protocol with the -docking::partners flag.
I've parameterized the ligand using molfile_to_params.py and docking_protocol runs without generating any fatal errors (although there are numerous warnings about discarding atoms, missing heavyatoms, can't find atoms, etc.).
However, when I check the output structures, I find that both peptide chains are intact but both ligands have completely come apart.

My suspicion is that when Rosetta is switching from centroid back to all-atom that somehow the ligand bonds are not being reassembled correctly.

I've attached the two ligand parameter files as well as the input file (with .txt extensions added solely to allow uploading) so that the problem might first be reproduced, then I welcome any advice the community may have.

Thanks,
Sándor

AttachmentSize
BCL.fa_.params.txt16.63 KB
BCL.cen_.params.txt8.25 KB
CsmA_Input.pdb_.txt152.81 KB
Post Situation: 
Tue, 2011-12-20 07:55
skovacs

If you take a closer look at the "Missing Atoms" warning lines in the output, you'll likely notice that most of your ligand atoms are disappearing. molfile_to_params.py has a habit of renaming ligand atoms names, and it's the atom name which allows Rosetta to match up the atoms in the PDB file with the corresponding one in the params topology file. It can't find an atom named CHA in the params file, so it discards it and tries to rebuild it. It can find H43, but H43 in the PDB is unlikely to be the same atom as H43 in the params file, so it generates a contorted pose, where the hydrogens are sticking off in strange directions.

One solution, and what I typically do, is to use the .pdb file that is output from molfile_to_params.py for the PDB lines in the ligand. This contains all of the atoms with their original input coordinates, but with their new names. (You may have to use a program like PyMol to translate and superimpose the ligand onto the appropriate location for the second monomer.)

The other option is to manually rename each of the atoms in the params file to match those in the PDB. (Whitespace sensitive, in the format ATOM.XXXX, where "." is a space and XXXX is the whitespace sensitive four charachter atom name that is in the PDB file.)

Tue, 2011-12-20 13:45
rmoretti

Thank you rmoretti--you were spot-on!

The original log file mentioned discarding 105 out of 140 ligand atoms, and ended up rebuilding the ligand with contorted atoms and bonds.

When I used your first solution, i.e. retaining the atoms as named by molfile_to_params.py and translating them, it worked! No atoms were discarded and no missing atom warnings were generated.
I also understand how your second solution would work, and may have to switch to that method once we extend to larger systems so that I don't have to superimpose multiple ligands.

If you would indulge my curiosity again, I would like to ask a follow-up question:
The documentation accompanying molfile_to_params.py mentions generating a rotamer library by including a a PDB_ROTAMERS: line and then concatenating additional information (a list of .pdb filenames, not the actual coordinates, right?)
As my ligand is rather large with flexible chains, this is something that I would like to do--I currently do not have any additional rotamers.
I presume that these too should share the same atom names as listed in the main part of the parameter file above, so assuming I begin with the .pdb file generated by molfile_to_params.py, what is the best way to go about generating such a library? What are the most important considerations to keep in mind?

Many Thanks,
Sándor

Wed, 2011-12-21 09:05
skovacs

Right, the PDB_ROTAMERS line takes a filename (a single filename, rather than a list of ones). This file should contain multiple PDB structures (ATOM/HETATM lines only) separated by TER lines (technically any non ATOM/HETATM line works at present), one section for each rotamer.

The way I typically produce such rotamers is to use a third-party program like OpenEye Omega, which takes a single input structure and produces a sdf or mol2 file containing all the different possible rotamers. molfile_to_params.py can take such an input molfile, and produces a number of PDB-formatted structures, all with the appropriate atom names, which you can then concatenate together (e.g. simply with the unix "cat" command) into the PDB_ROTAMERS file.

Fri, 2011-12-23 12:41
rmoretti

Greetings again and Happy New Year!

I was able to obtain a license for OpenEye Omega and successfully construct a set of 1000 rotamers. However, when I try to use molfile_to_params.py to generate the PDB-formatted structures (I presume I'll need to create two PDB_ROTAMERS files, one for full-atom, a second for centroid), I get the following output, and then error:

Centering ligands at ( -1.004, 5.066, 2.318)
Total naive charge -2.600, desired charge 0.000, offsetting all atoms by 0.019
Traceback (most recent call last):
File "../for_ash/python/apps/public/molfile_to_params.py", line 1207, in ?
sys.exit(main(sys.argv[1:]))
File "../for_ash/python/apps/public/molfile_to_params.py", line 1183, in main
num_frags = fragment_ligand(m)
File "../for_ash/python/apps/public/molfile_to_params.py", line 474, in fragment_ligand
raise ValueError("More than 9 ligand fragments!")
ValueError: More than 9 ligand fragments!

This is interesting because I do NOT get the same error if I use a molfile of only a single ligand, so I am confused by the "More than 9 ligand fragments" error. Any suggestions?
Also, the .mol2 file is over 10MB in size, so it won't attach...

Thanks,
Sándor

Fri, 2012-01-06 16:48
skovacs

Interesting. The "more than 9 ligand fragments" typically happens to me when there's an issue with the input file (though I can't rule out other cases). Is it the same ligand in the single-ligand and the multiple rotamer cases? (e.g. from the same original input file) If so, I wonder if Rosetta isn't liking the format that your version of omega is outputting. How did you run omega (what was the command line)? You can also try reducing the number of conformations you generate (I think it's something like -maxconfs with recent versions of omega). Try running it with just one rotamer generated, and if that works try doing just a few (2-3). At the very least that will reduce the size of the mol2 file.

Fri, 2012-01-06 19:36
rmoretti

Hello again,

So I think I figured out what happened with regard to the "more than 9 ligand fragments" error...

When I first ran Omega, I specified .pdb as the output type and loaded the Omega output in VMD to check. Then since molfile_to_params.py reminded me that it needed a .mol/.sdf or .mol2 file, I had VMD save the coordinates in .mol2, but in doing so, none of the bond information transferred to the file. I suspected that this might be a source of the error, so when I reran Omega this afternoon (using the command and flags "omega2 -in BCL_RosettaNames_NoMg.pdb -out 1000Omega2Rotamers.mol2 -prefix BCL_NoMg -strictstereo false -commentEnergy true -verbose true -fromCT false -enumRing false -maxConfs 1000"), I tried directly specifying .mol, .sdf, and .mol2 as different output filetypes. In two of the three cases, molfile_to_params.py ran without the "more than 9 ligand fragments" error, writing out the 1000 .pdb files for both full-atom and centroid when I used the .sdf or .mol2 omega files as input. (When I specified the .mol file, perhaps as expected, a single full-atom and centroid set of .pdb files was written.)

However, since Omega doesn't have support for metal ligands, in order to properly represent my ligand, I deleted the four H atoms capping the ring Nitrogen atoms, added the single Magnesium atom back in, reassigned the atom indices for those four bonds, and changed the number of atoms from 143 to 140. As it turns out, this was not too difficult to do with a few find/change commands in the .mol2 file as the additional atoms and modified bonds were at the end of the list, so no renumbering needed to take place. (Phew!)

I then tried molfile_to_params.py using this modified .mol2 input. It seems to have worked for full-atom mode, writing out the 1000 files as expected, but when it tried to start writing the centroid parameters, I get the following output and error:

/usr/bin/python ../../for_ash/python/apps/public/molfile_to_params.py -n BCL -k OmegaPlusMg.kin -c --clobber 1000Omega2RotamerswithMg.mol2
Centering ligands at ( -1.004, 5.066, 2.318)
WARNING: structure contains double bonds but no aromatic bonds
Aromatic bonds must be identified explicitly --
alternating single/double bonds (Kekule structure) won't cut it.
This warning does not apply to you if your molecule really isn't aromatic.
Total naive charge -3.260, desired charge 0.000, offsetting all atoms by 0.023
WARNING: fragment 1 has 140 total atoms including H; protein residues have 7 - 24 (DNA: 33)
WARNING: fragment 1 has 66 non-H atoms; protein residues have 4 - 14 (DNA: 22)
WARNING: fragment 1 has 22 rotatable bonds; protein residues have 0 - 4
Average 140.0 atoms (66.0 non-H atoms) per fragment
(Proteins average 15.5 atoms (7.8 non-H atoms) per residue)
WARNING: no root atom specified, using auto-selected NBR atom instead.
Wrote params file BCL.fa.params
Wrote kinemage file OmegaPlusMg.fa.kin
Wrote PDB file BCL_0001.fa.pdb
Wrote PDB file BCL_0002.fa.pdb
(skipping a few lines here...)
Wrote PDB file BCL_0999.fa.pdb
Wrote PDB file BCL_1000.fa.pdb
Wrote params file BCL.cen.params
Wrote kinemage file OmegaPlusMg.cen.king
Wrote PDB file BCL_0001.cen.pdb
Traceback (most recent call last):
File "../../for_ash/python/apps/public/molfile_to_params.py", line 1207, in ?
sys.exit(main(sys.argv[1:]))
File "../../for_ash/python/apps/public/molfile_to_params.py", line 1201, in main
ret = write_all_files(m, molfiles, num_frags, options, suffix=".cen")
File "../../for_ash/python/apps/public/molfile_to_params.py", line 1028, in write_all_files
write_ligand_pdb(pdb_file, m, molfile, options.name, options.center)
File "../../for_ash/python/apps/public/molfile_to_params.py", line 983, in write_ligand_pdb
raise ValueError("Atom order mismatch between first and subsequent conformer")
ValueError: Atom order mismatch between first and subsequent conformer

It actually partially writes the second .cen.pdb file, and when I examine it, the first 65 lines look identical to the first except all the coordinates have been ever-so-slightly shifted. The difference is the second file is missing the entire MG atom line (line 66) and the TER at the end (line 67), so I'm puzzled as to what could be causing this error. Furthermore, why would it able to get through the full-atom representation, only to fail when writing centroid files?
To assist with diagnosis, I've attached the first 10 conformers from both the original (as output from Omega) and my modified (Mg added) .mol2 input files (again, with .txt extensions appended to allow for uploading) which are able to reproduce the results I obtained with the larger rotamer sets above.

Thanks,
Sándor

Sat, 2012-01-07 15:49
skovacs

There's an implicit assumption in a lot of Rosetta code that hydrogens are listed after heavy atoms. We try to be forgiving of input files which don't follow that convention, but we frequently fall short.

From what I can tell, in generating the centroid mode parameters molfile_to_params strips off the hydrogens, which then changes the numbering of the magnesium, resulting in the error message. If possible, try placing the magnesium atom before any of the hydrogens. Hopefully that should fix it.

Sat, 2012-01-07 16:23
rmoretti

I see--I was hoping to avoid such intense pre- and post-processing, and fortunately, I think I may have done just that...

This morning I tried a workaround by using an easily recognizable (read: unique) dummy atom in the input file I used to generate the rotamers with Omega.
I was lucky because the magnesium atom is far from any of the rotatable bonds, so it should have little, if any, impact on the structures Omega generates.
This way, I already have a placeholder in my system so I wouldn't have to renumber all the hydrogen atoms.

The trick there was preserving the original optimized planar ring geometry using the OMEGA flags "-fromCT false" and "-enumRing false" (otherwise the ring became distorted).

Next, a simple find/change switched the dummy atom with Mg and allowed for complete and successful ligand parameterization using molfile_to_params.py, both full-atom and centroid.

Hooray!

I have some questions about the size of the library I should generate using this method--namely, how do I know if I've generated enough rotamers? There are 22 rotatable bonds in my ligand, and (for the sake of argument) if we assume each has three low-energy positions, that suggests 3^22 or more than 31 billion possible conformers. For the rotamer library I am generating to be used in Rosetta would 1,000 structures enough? 10,000? A million? Certainly I shouldn't have to exhaustively generate all 31.3 billion to be certain I am sampling a reasonable set structures...right?

Perhaps a bit more information about what I intend to do now that I have the ligand parameterized would be helpful:
I first plan on running RosettaLigand to generate bound pigment-protein complexes and then docking_protocol to attempt to predict the structure of a protein+ligand dimer system.
From these dimer structures, I plan to take a representative handful of those most favorable and calculate binding free energy and predict the mechanism of assembly using Molecular Dynamics.

One challenge I have is that no experimental reference structures for these dimer systems exist--we only have the coordinates for a monomer, and not even one with a bound pigment ligand at that, so how could I go about evaluating the literally hundreds of thousands of possibilities that are generated? I don't think docking score alone will be enough--I'm open to suggestions about a clever structure-based technique (clustering, binning, etc.) that could help narrow down the list of 'possible suspects' that I would eventually be further evaluated using MD.

Thanks,
Sándor

Mon, 2012-01-09 12:47
skovacs

Glad to hear you were able to get the conversion with the dummy atom to work.

Regarding the number of conformations, you technically only need one - as long as that one is the final desired conformation. Given that predicting a priori which rotamer that is isn't easy, in practice you need to include enough rotamers so that the (unknown) desired one is represented. Note that you can include too many rotamers - the more you include, the larger the search space, the longer it takes to search, and the more chance the stochastic search might miss it.

22 rotatable bonds is a *lot*. Most docking protocols won't work well with that much flexibility. It would help if you can use your chemical knowledge to reduce the amount of flexibility in the system. For example, if you know that a long flexible chain is in an extended conformation, you can avoid sampling the gauche conformers. Likewise, if there's an intramolecular hydrogen bond, that'll limit the degrees of freedom. As I understand it, recent versions of omega have facilities to explicitly specify how to sample specific rotatable bonds, as opposed to using the defaults.

A large number of possible conformers will be unlikely due to the geometry of the binding site. You can incorporate that information as above, or you might also want to try a multi-step protocol, where you do a very coarse rotamer sampling, and then do a finer rotamer sampling based on the results of the coarse sampling. (You might miss narrow binding wells that way, but narrow-well binding modes are probably less likely due to entropic considerations.) Also note that standard ligand docking allows for rotamer minimization, so even your fine sampling doesn't need to be *too* fine - if you get "close enough", you can minimize into a better conformation.

Regarding selecting structures, the standard RosettaLigand protocol is to select the top 5% by total score (to get rid of the unlikely conformations), and then rank them by the protein-ligand interaction energy. Those structures are then clustered (as a number of the top structures are likely to be similar), and the top ranking clusters are presented as the likely structures.

Outside of any additional information, the clustered results produced by the standard protocol would be where I'd start. If you have additional information (e.g. there's experimental evidence that there's a hydrogen bond to a particular residue) you can attempt to incorporate that either in the docking protocol itself, or as a post-analysis filter.

Mon, 2012-01-09 18:22
rmoretti

Greetings again,

I was able to successfully run ligand_dock, generating 128,000 possible decoys. From this, I used extract_atomtree_diffs and best_ifaceE.py to extract the top 5% (6,400) as established in the ligand_docking documentation.
The first problem I have comes when running the clustering algorithm from the silent.out file(s)--no structures are read in and nothing is written. The log shows:

core.init: Mini-Rosetta version Release 3.3, from SVN 42942 from https://svn.rosettacommons.org/source/branches/releases/rosetta-3.3/rose...
core.init: command: ~/cluster.linuxgccrelease -database ~/rosetta_database -in:file:silent silent.out -in::file::fullatom -native ../CsmABCL_LigandDockInput.pdb
core.init: 'RNG device' seed mode, using '/dev/urandom', seed=-1796857391 seed_offset=0 real_seed=-1796857391
core.init.random: RandomGenerator:init: Normal mode, seed=-1796857391 RG_type=mt19937
(intentionally skipping the description)
core.scoring.ScoreFunctionFactory: SCOREFUNCTION: standard
core.scoring.ScoreFunctionFactory: SCOREFUNCTION PATCH: score12
core.scoring.etable: Starting energy table calculation
core.scoring.etable: smooth_etable: changing atr/rep split to bottom of energy well
core.scoring.etable: smooth_etable: spline smoothing lj etables (maxdis = 6)
core.scoring.etable: smooth_etable: spline smoothing solvation etables (max_dis = 6)
core.scoring.etable: Finished calculating energy tables.
basic.io.database: Database file opened: pdb_pair_stats_fine
basic.io.database: Database file opened: scoring/score_functions/hbonds/standard_params/HBPoly1D.csv
basic.io.database: Database file opened: scoring/score_functions/hbonds/standard_params/HBFadeIntervals.csv
basic.io.database: Database file opened: scoring/score_functions/hbonds/standard_params/HBEval.csv
basic.io.database: Database file opened: env_log.txt
basic.io.database: Database file opened: cbeta_den.txt
basic.io.database: Database file opened: pair_log.txt
basic.io.database: Database file opened: cenpack_log.txt
basic.io.database: Database file opened: env_log.txt
basic.io.database: Database file opened: cbeta_den.txt
basic.io.database: Database file opened: pair_log.txt
basic.io.database: Database file opened: cenpack_log.txt
basic.io.database: Database file opened: CEN6_mem_env_log.txt
basic.io.database: Database file opened: CEN10_mem_env_log.txt
basic.io.database: Database file opened: memcbeta_den.txt
basic.io.database: Database file opened: mem_pair_log.txt
basic.io.database: Database file opened: mem_pair_log.txt
basic.io.database: Database file opened: P_AA
basic.io.database: Database file opened: P_AA_n
basic.io.database: Database file opened: P_AA_pp
basic.io.database: Database file opened: Rama_smooth_dyn.dat_ss_6.4
core.chemical.ResidueTypeSet: Finished initializing fa_standard residue type set. Created 4326 residue types
Assigning extra structures ...
protocols.cluster: Redistributing groups ...0 cluster centers---------- Summary ---------------------------------
protocols.cluster: ----------------------------------------------------
protocols.cluster: Clusters: 0
protocols.cluster: Structures: 0
protocols.cluster: ----------------------------------------------------
protocols.cluster: Sorting each cluster's structures by energy:
protocols.cluster: ---------- Summary ---------------------------------
protocols.cluster: ----------------------------------------------------
protocols.cluster: Clusters: 0
protocols.cluster: Structures: 0
protocols.cluster: ----------------------------------------------------
Timing:
Readin:3s
Cluster: 0s
Additional Clustering: 0s
Total: 3

However, when I use the 6,400 extracted.pdb files as input, the default options are able to generate a single cluster, but I see the following warning for every structure (full log file too large to attach):

core.init: Mini-Rosetta version Release 3.3, from SVN 42942 from https://svn.rosettacommons.org/source/branches/releases/rosetta-3.3/rose...
core.init: command: ~/cluster.linuxgccrelease -database ~/rosetta_database -in:file:s *.pdb -in::file::fullatom -cluster:sort_groups_by_energy -native ../CsmABCL_LigandDockInput.pdb -out:prefix ClusteredCsmaBCL_LigandDock
core.init: 'RNG device' seed mode, using '/dev/urandom', seed=-976789190 seed_offset=0 real_seed=-976789190
core.init.random: RandomGenerator:init: Normal mode, seed=-976789190 RG_type=mt19937
(skipping much of the rest of the intro--it's identical to that above)
core.chemical.ResidueTypeSet: Finished initializing fa_standard residue type set. Created 4326 residue types
core.pack.task: Packer task: initialize from command line()
core.pack.dunbrack: Dunbrack library took 0.05 seconds to load from binary
protocols.cluster: RESCORING: CsmABCL_LigandDockChild10_0018.pdb
protocols.cluster: Adding struc: 2.3254e-310
core.pack.task: Packer task: initialize from command line()
protocols.cluster: RESCORING: CsmABCL_LigandDockChild10_0037.pdb
protocols.cluster: Adding struc: 2.3254e-310
(skipping several thousand lines here)
core.pack.task: Packer task: initialize from command line()
protocols.cluster: RESCORING: CsmABCL_LigandDockChild9_1958.pdb
protocols.cluster: Adding struc: 2.3254e-310
core.scoring.rms_util: WARNING: CA_rmsd out of range... range1 60 requested but only 59 CA atoms found in fill_rmsd_coordinates
protocols.cluster: Structure identical to existing structure - ignoring
protocols.cluster: Adding a CsmABCL_LigandDockChild9_1958.pdb 16.3052
core.pack.task: Packer task: initialize from command line()
protocols.cluster: RESCORING: CsmABCL_LigandDockChild9_1999.pdb
protocols.cluster: Adding struc: 2.3254e-310
core.scoring.rms_util: WARNING: CA_rmsd out of range... range1 60 requested but only 59 CA atoms found in fill_rmsd_coordinates
protocols.cluster: Structure identical to existing structure - ignoring
protocols.cluster: Adding a CsmABCL_LigandDockChild9_1999.pdb 16.3708
protocols.cluster: ---------- Summary ---------------------------------
protocols.cluster: Cluster: 0 N: 400GN: 401 c.0.*.pdb
protocols.cluster: CsmABCL_LigandDockChild1_1062.pdb 16.2101
protocols.cluster: CsmABCL_LigandDockChild11_1162.pdb 16.2186
(skipping some more lines here)
protocols.cluster: ----------------------------------------------------
protocols.cluster: Clusters: 1
protocols.cluster: Structures: 400
protocols.cluster: ----------------------------------------------------
protocols.cluster: Sorting each cluster's structures by energy:
(skipping some more lines here)
protocols.cluster: ----------------------------------------------------
protocols.cluster: Clusters: 1
protocols.cluster: Structures: 400
protocols.cluster: ----------------------------------------------------
protocols.cluster: 181 CsmABCL_LigandDockChild1_1062.pdb 0 0
protocols.cluster: 210 CsmABCL_LigandDockChild11_1162.pdb 0 1
(lastly, skipping a few lines here)
protocols.cluster: 171 CsmABCL_LigandDockChild11_0483.pdb 0 398
protocols.cluster: 262 CsmABCL_LigandDockChild1_1563.pdb 0 399
Timing:
Readin:46s
Cluster: 29s
Additional Clustering: 724s
Total: 799

I presume this has to do with the fact that my 60th residue (as listed in the extracted .pdb files) is the ligand, and it doesn't possess a CA atomtype (it shouldn't anyway--it's not part of the protein backbone)
I've tried dropping -cluster:radius down (as suggested elsewhere) to as low as 0.5 Å to try and get some differentiation, but I still only get the initial single cluster of 400, presumably because the RMSD of all the structures isn't being calculated, so the structures appear identical, and are therefore ignored.

I also tried clustering using the flag -cluster:gdtmm
This produced a similar output (one cluster, 400 structures), but no warning lines about CA_rmsd being out of range. It also took alot longer:

Timing:
Readin:45s
Cluster: 863s
Additional Clustering: 732s
Total: 1640

Any help deciphering/correcting these errors (inability to use silent.out and inability to find more than one cluster) is greatly appreciated!

Sándor

Mon, 2012-01-16 23:02
skovacs

I have attached the docking "funnel" obtained for the top 5% of structures. It seems like there should be at least two (if not more) clusters, based on RMSD values alone...

Tue, 2012-01-17 10:30
skovacs

The atomtree diff format is a bit of a specialized format for ligand docking, so it has some quirks. It's only the (non-JD2) ligand docking applications like extract_atomtree_diffs which recognize it when passed through -in:file:silent. Other protocols can automatically recognize protein versus binary silent file formats, but won't recognize atomtree diffs (which is why cluster doesn't find any structures in the file). While (at least of 3.3, I believe) JD2 applications can recognize atomtree diff files with the -in:file:atom_tree_diff command line option, there are a number of applications (cluster is one of which) which don't work off of the JD2 machinery, and thus don't necessarily recognize the option (cluster apparently doesn't).

By the way, even if you were able to use the silent file as input, you wouldn't want to - at least the way you're doing it now. I don't recall off-hand the algorithmic complexity for the default method Rosetta uses, but clustering typically scales quadratically or even worse. Clustering 128,000 decoys and then ignoring 95% of them is going to be a lot more resource intensive (both memory and computing power) than clustering just the 6,400 you're actually interested. If you wanted to use a silent file as input to clustering, I'd highly recommend limiting it to the structures you wanted with the -in:file:tags flag, like you did with extract_atomtree_diffs.

You're correct that the "60 requested but only 59 CA atoms found" error has to do with the ligand not having a CA atom. This isn't usually a problem ... except that it points to the key issue: what you're clustering over. The cluster application used CA RMSD as the distance measure over which to cluster the structures. As ligand docking is a mostly-fixed backbone protocol (there's some small (< ~0.5 Ang) backbone minimization, depending on options) the CA RMSD won't change all that much between structures, leaving you with a single large cluster.

What you really want to do is cluster not on the CA RMSD, but on the non-superimposed ligand RMSD. To do this, the easiest way is to use the select_best_unique_ligand_poses application. Unfortunately, it looks like this isn't covered in the ligand docking documentation - which may mean that it isn't included in the released version. (It isn't really a clustering application, rather it walks down the ranked list of structures, outputting each structure that's at least min_rms away from any of the previously output structures.)

If you have it, a sample command line is:

select_best_unique_ligand_poses.linuxgccrelease -database ~/rosetta_database -extra_res_fa {any ligand .params files} -docking:ligand:max_poses 10 -docking:ligand:min_rms 1.0 -out:path:pdb pdbs -in:file:silent silent.out -out::file::silent selected.out

Unfortunately, it doesn't look like it's currently possible to use the standard cluster application to do all atom RMSD for a specific residue, although there is a separate cluster_ligand_poses application in the ligand docking toolset, although it doesn't look like that is part of the released version.

Tue, 2012-01-17 10:39
rmoretti

Greetings again,

I checked both ~/rosetta-3.3/bin and ~/rosetta-3.3/build/src/release/linux/2.6/64/x86/gcc (the install directories), but haven't found either select_best_unique_ligand_poses or cluster_ligand_poses.
A second look at the downloads section confirmed this.

Since I now seem to be at an impasse, how would you recommend I proceed?
Is the source containing select_best_unique_ligand_poses and cluster_ligand_poses available (as a beta, perhaps)?

Thanks,
Sándor

Tue, 2012-01-17 14:01
skovacs

I believe that the only reason that select_best_unique_ligand_poses and cluster_ligand_poses aren't distributed is that they aren't documented and lack decent tests.

I'll look into getting them into the 3.4 release, but until then, I'll email you the the source code for the two applications. Simply copy the files to rosetta_source/src/apps/public/ligand_docking, remove the .txt suffix, and then add lines to rosetta_source/src/apps.source.settings for the two applications, right underneath the line for ligand_dock.

Fri, 2012-01-20 11:50
rmoretti

After adding the source code of the two applications and successfully recompiling, I was able to obtain clustered results for my docked ligand. Thank you!!!
However, I noticed that the clustered "representative" ligands were in an configuration IDENTICAL to the initial, which makes me wonder if the additional rotamers were sampled at all...

The original docking command (with flags taken from ligand_dock documentation) was as follows:
~/ligand_dock.linuxgccrelease
-in:path ~/rosetta_database
-in:file:s ../$input.pdb
-nstruct 2000
-ex1
-ex1aro
-ex2
-extrachi_cutoff 1
-mute protocols.geometry.RB_geometry core.scoring.dunbrack.SingleLigandRotamerLibrary core.scoring.rms_util
-uniform_trans 5
-improve_orientation 1000
-docking:ligand:minimize_ligand
-harmonic_torsions 10
-minimize_backbone
-harmonic_Caplhas 0.3
-soft_rep
-old_estat
-protocol abbrev2
-start_from -2 8 2
-start_from -2 7 2
-start_from -1 7 2
-start_from -1 8 2
-start_from -2 7 1
-start_from -2 8 1
-out:file:fullatom true
> $stem-Proc$SGE_TASK_ID.log

Note that my cluster uses SGE, not Condor, to achieve parallelization (not that it should matter), so the above was run 64 times (in different folders and run on different processors) to obtain the 128,000 decoys.
Did I miss a simple flag somewhere?

Tue, 2012-01-24 13:19
skovacs

One possibility is that Rosetta isn't recognizing the extra rotamers. Check that you have an appropriate PDB_ROTAMERS line in your ligand params file. You can also check the log output (stdout) for lines that tell you how many ligand rotamers were loaded. Your actual runs might not have that line, as I think it's printed to the core.scoring.dunbrack.SingleLigandRotamerLibrary tracer. You may need to redo a short run without the -mute line (but everything else the same) to see it.

The other possibility is that Rosetta *is* finding the extra ligand rotamers, but doesn't sample them, or does sample them, but finds them terribly high energy. Try extracting a number of the some of the higher energy structures to see if the ligand in them is also in the same conformation. In fact, try re-extracting the cluster center tags from the original output file, to see if the clustering application messed things up. (I did mention that the ligand clustering application is not necessarily thoroughly tested, right?)

Another issue is where you placed the "center" of the ligand. When Rosetta tries new ligand rotamers, it overlays the "neighbor" atom and the surrounding atoms. If your ligand has a long tail, or is in a deeply buried/narrow pocket, depending on where the neighbor atom was placed, you could get lever-arm effects, with all alternative rotamers being placed with the flexible tails clashing with the backbone.

P.S. It doesn't look like you're using a resfile, but if you were, that might be a problem as well.

Wed, 2012-01-25 11:53
rmoretti

Greetings again,

A quick re-run with -mute off didn't mention how many ligand rotamers were loaded. The following lines appeared in the output:

ligand_dock.main: Starting Proc1CsmABCL_LigandDockInput2_0001 ...
core.chemical.ResidueTypeSet: Finished initializing fa_standard residue type set. Created 4326 residue types
core.pack.task: Packer task: initialize from command line()
core.pack.dunbrack: Dunbrack library took 0.12 seconds to load from binary
protocols.moves.RigidBodyMover: Translate: Jump (before): RT -0.90471 0.357446 -0.231803 0.179854 0.813691 0.552774 0.386203 0.458409 -0.800442 31.1938 -10.2177 -16.326
protocols.moves.RigidBodyMover: Translate: Jump (after): RT -0.90471 0.357446 -0.231803 0.179854 0.813691 0.552774 0.386203 0.458409 -0.800442 29.6909 -13.3344 -16.5206
protocols.ligand_docking.LigandDockProtocol: Making ligand grid ...
protocols.ligand_docking.LigandDockProtocol: Input score atr = 0 ; rep = 0 ; ha = 66
protocols.docking.DockingInitialPerturbation: Reading options...
protocols.docking.DockingInitialPerturbation: uniform_trans: 5
protocols.moves.RigidBodyMover: Translate: Jump (before): RT -0.90471 0.357446 -0.231803 0.179854 0.813691 0.552774 0.386203 0.458409 -0.800442 29.6909 -13.3344 -16.5206
protocols.moves.RigidBodyMover: Translate: Jump (after): RT -0.90471 0.357446 -0.231803 0.179854 0.813691 0.552774 0.386203 0.458409 -0.800442 30.796 -11.0684 -20.4552
protocols.ligand_docking.LigandDockProtocol: Accepting ligand position with nbr_atom at 1.455619896773046 9.026017917256407 3.536061841619027
protocols.ligand_docking.LigandDockProtocol: Doing 1000 trials of improvement for ligand orientation...
protocols.ligand_docking.LigandDockProtocol: Starting orientation energy atr = -14 ; rep = 1
protocols.geometry.RB_geometry: random_reorientation_matrix phi: 245.23 psi: 4.93238 theta: 99.9664
(skipping about 1000 lines here)
protocols.geometry.RB_geometry: random_reorientation_matrix phi: 142.659 psi: 168.643 theta: 83.9059
protocols.ligand_docking.LigandDockProtocol: Best random orientation energy atr = -31 ; rep = 0
protocols.ligand_docking.LigandDockProtocol: Found 0 'perfect' poses, kept 0 with rms >= 5.28062
protocols.moves.RigidBodyMover: Translate: Jump (before): RT 0.208044 -0.464236 0.860931 0.44876 0.827387 0.337706 -0.869099 0.316094 0.380463 30.796 -11.0684 -20.4552
protocols.moves.RigidBodyMover: Translate: Jump (after): RT 0.208044 -0.464236 0.860931 0.44876 0.827387 0.337706 -0.869099 0.316094 0.380463 29.4975 -9.762 -21.2344
protocols.moves.RigidBodyMover: Translate: Jump (before): RT 0.208044 -0.464236 0.860931 0.44876 0.827387 0.337706 -0.869099 0.316094 0.380463 29.4975 -9.762 -21.2344
protocols.moves.RigidBodyMover: Translate: Jump (after): RT 0.208044 -0.464236 0.860931 0.44876 0.827387 0.337706 -0.869099 0.316094 0.380463 30.796 -11.0684 -20.4552
protocols.moves.RigidBodyMover: Translate: Jump (before): RT 0.208044 -0.464236 0.860931 0.44876 0.827387 0.337706 -0.869099 0.316094 0.380463 30.796 -11.0684 -20.4552
protocols.moves.RigidBodyMover: Translate: Jump (after): RT 0.208044 -0.464236 0.860931 0.44876 0.827387 0.337706 -0.869099 0.316094 0.380463 30.1467 -10.4152 -20.8448
protocols.moves.RigidBodyMover: Translate: Jump (before): RT 0.208044 -0.464236 0.860931 0.44876 0.827387 0.337706 -0.869099 0.316094 0.380463 30.1467 -10.4152 -20.8448
protocols.moves.RigidBodyMover: Translate: Jump (after): RT 0.208044 -0.464236 0.860931 0.44876 0.827387 0.337706 -0.869099 0.316094 0.380463 30.796 -11.0684 -20.4552
protocols.moves.RigidBodyMover: Translate: Jump (before): RT 0.208044 -0.464236 0.860931 0.44876 0.827387 0.337706 -0.869099 0.316094 0.380463 30.796 -11.0684 -20.4552
protocols.moves.RigidBodyMover: Translate: Jump (after): RT 0.208044 -0.464236 0.860931 0.44876 0.827387 0.337706 -0.869099 0.316094 0.380463 30.4713 -10.7418 -20.65
protocols.moves.RigidBodyMover: Translate: Jump (before): RT 0.208044 -0.464236 0.860931 0.44876 0.827387 0.337706 -0.869099 0.316094 0.380463 30.4713 -10.7418 -20.65
protocols.moves.RigidBodyMover: Translate: Jump (after): RT 0.208044 -0.464236 0.860931 0.44876 0.827387 0.337706 -0.869099 0.316094 0.380463 30.796 -11.0684 -20.4552
protocols.moves.RigidBodyMover: Translate: Jump (before): RT 0.208044 -0.464236 0.860931 0.44876 0.827387 0.337706 -0.869099 0.316094 0.380463 30.796 -11.0684 -20.4552
protocols.moves.RigidBodyMover: Translate: Jump (after): RT 0.208044 -0.464236 0.860931 0.44876 0.827387 0.337706 -0.869099 0.316094 0.380463 30.6337 -10.9051 -20.5526
protocols.moves.RigidBodyMover: Translate: Jump (before): RT 0.208044 -0.464236 0.860931 0.44876 0.827387 0.337706 -0.869099 0.316094 0.380463 30.6337 -10.9051 -20.5526
protocols.moves.RigidBodyMover: Translate: Jump (after): RT 0.208044 -0.464236 0.860931 0.44876 0.827387 0.337706 -0.869099 0.316094 0.380463 30.796 -11.0684 -20.4552
protocols.ligand_docking.LigandBaseProtocol: Backbone interface is 14 / 60 residues (23.3333%)
protocols.ligand_docking.LigandBaseProtocol: Moved ligand FOLD_TREE EDGE 1 19 -1 EDGE 19 59 -1 EDGE 19 60 1
protocols.ligand_docking.LigandBaseProtocol:
protocols.ligand_docking.LigandBaseProtocol: Final loops foldtree FOLD_TREE EDGE 1 19 -1 EDGE 19 21 -1 EDGE 21 35 -1 JEDGE 21 48 2 CA CA END EDGE 48 36 -1 EDGE 48 59 -1 EDGE 19 60 1
protocols.ligand_docking.LigandBaseProtocol:
protocols.ligand_docking.LigandBaseProtocol: Interface is 21 / 60 residues (35%)
protocols.ligand_docking.LigandBaseProtocol: Backbone interface is 14 / 60 residues (23.3333%)
protocols.ligand_docking.LigandBaseProtocol: Interface is 21 / 60 residues (35%)
protocols.ligand_docking.LigandBaseProtocol: Interface is 21 / 60 residues (35%)
core.pack.task: Packer task: initialize from command line()
protocols.ligand_docking.LigandBaseProtocol: Interface is 21 / 60 residues (35%)
core.pack.task: Packer task: initialize from command line()
core.pack.rotamer_set.RotamerSet_: [ WARNING ] including current in order to get at least 1 rotamer !!!!!! 60 BCL
core.pack.interaction_graph.interaction_graph_factory: Instantiating DensePDInteractionGraph
core.pack.pack_rotamers: built 1734 rotamers at 21 positions.
core.pack.pack_rotamers: IG: 3168692 bytes
core.scoring.NeighborList: Minimization stats: 23 score/deriv cals, 5 narrow-from-wide updates, 1 full updates.
core.scoring.NeighborList: Minimization stats: 30 score/deriv cals, 5 narrow-from-wide updates, 1 full updates.
core.scoring.NeighborList: Minimization stats: 6 score/deriv cals, 0 narrow-from-wide updates, 0 full updates.
core.scoring.NeighborList: Minimization stats: 7 score/deriv cals, 0 narrow-from-wide updates, 0 full updates.
core.scoring.NeighborList: Minimization stats: 6 score/deriv cals, 0 narrow-from-wide updates, 0 full updates.
core.scoring.NeighborList: Minimization stats: 7 score/deriv cals, 0 narrow-from-wide updates, 0 full updates.
core.pack.interaction_graph.interaction_graph_factory: Instantiating DensePDInteractionGraph
core.pack.pack_rotamers: built 1751 rotamers at 21 positions.
core.pack.pack_rotamers: IG: 3222744 bytes
core.scoring.NeighborList: Minimization stats: 6 score/deriv cals, 0 narrow-from-wide updates, 0 full updates.
core.scoring.NeighborList: Minimization stats: 7 score/deriv cals, 0 narrow-from-wide updates, 0 full updates.
core.scoring.NeighborList: Minimization stats: 6 score/deriv cals, 0 narrow-from-wide updates, 0 full updates.
core.scoring.NeighborList: Minimization stats: 7 score/deriv cals, 0 narrow-from-wide updates, 0 full updates.
core.scoring.NeighborList: Minimization stats: 6 score/deriv cals, 0 narrow-from-wide updates, 0 full updates.
core.scoring.NeighborList: Minimization stats: 7 score/deriv cals, 0 narrow-from-wide updates, 0 full updates.
protocols.moves.MonteCarlo: MonteCarlo:: last_accepted_score,lowest_score: -71.3147 -73.2223
core.scoring.NeighborList: Minimization stats: 156 score/deriv cals, 51 narrow-from-wide updates, 4 full updates.
core.scoring.NeighborList: Minimization stats: 207 score/deriv cals, 51 narrow-from-wide updates, 4 full updates.
protocols.moves.RigidBodyMover: Translate: Jump (before): RT -0.0164393 -0.500663 0.865486 0.402814 0.788934 0.46403 -0.915134 0.356259 0.188705 30.5833 -11.9161 -20.9586
protocols.moves.RigidBodyMover: Translate: Jump (after): RT -0.0164393 -0.500663 0.865486 0.402814 0.788934 0.46403 -0.915134 0.356259 0.188705 30.0471 -11.118 -21.2334
protocols.moves.RigidBodyMover: Translate: Jump (before): RT -0.0164393 -0.500663 0.865486 0.402814 0.788934 0.46403 -0.915134 0.356259 0.188705 30.0471 -11.118 -21.2334
protocols.moves.RigidBodyMover: Translate: Jump (after): RT -0.0164393 -0.500663 0.865486 0.402814 0.788934 0.46403 -0.915134 0.356259 0.188705 298.16 -410.161 116.172
ligand_dock.main: Finished Proc1CsmABCL_LigandDockInput2_0001 in 31 seconds.

It's this line that I'm particularly suspect of:
core.pack.rotamer_set.RotamerSet_: [ WARNING ] including current in order to get at least 1 rotamer !!!!!! 60 BCL

Accordingly, I've attached both my .params files and the first part of my confs.pdb files (as referenced in the PDB_ROTAMERS line).
Recall that the conformers were generated by concatenating the results of running molfile_to_params.py on a .mol2 file generated by OMEGA with the substitution of Mg in place of a dummy atom.

Thanks,
Sándor

Wed, 2012-01-25 16:06
skovacs

I'm guessing it's the format of your PDB_ROTAMERS line. There shouldn't be a colon. It should just be something like

PDB_ROTAMERS BCL.fa.confs.pdb

Thu, 2012-01-26 10:47
rmoretti

Aaaahhhh! Thank you! That was it!

When testing, I now see the following:
ligand_dock.main: Starting Proc1CsmABCL_LigandDockInput2_0001 ...
core.chemical.ResidueTypeSet: Finished initializing fa_standard residue type set. Created 4326 residue types
core.pack.task: Packer task: initialize from command line()
core.pack.dunbrack: Dunbrack library took 0.06 seconds to load from binary
core.pack.dunbrack.SingleLigandRotamerLibrary: Read in 2000 rotamers for BCL!
protocols.moves.RigidBodyMover: Translate: Jump (before): RT -0.90471 0.357446 -0.231803 0.179854 0.813691 0.552774 0.386203 0.458409 -0.800442 31.1938 -10.2177 -16.326
protocols.moves.RigidBodyMover: Translate: Jump (after): RT -0.90471 0.357446 -0.231803 0.179854 0.813691 0.552774 0.386203 0.458409 -0.800442 29.6909 -13.3344 -16.5206
protocols.ligand_docking.LigandDockProtocol: Making ligand grid ...
protocols.ligand_docking.LigandDockProtocol: Input score atr = 0 ; rep = 0 ; ha = 66
protocols.docking.DockingInitialPerturbation: Reading options...
protocols.docking.DockingInitialPerturbation: uniform_trans: 5
protocols.moves.RigidBodyMover: Translate: Jump (before): RT -0.90471 0.357446 -0.231803 0.179854 0.813691 0.552774 0.386203 0.458409 -0.800442 29.6909 -13.3344 -16.5206
protocols.moves.RigidBodyMover: Translate: Jump (after): RT -0.90471 0.357446 -0.231803 0.179854 0.813691 0.552774 0.386203 0.458409 -0.800442 31.0145 -16.2819 -17.2087
protocols.ligand_docking.LigandDockProtocol: Accepting ligand position with nbr_atom at -2.912123871582086 13.09575301988507 2.077092314617596
protocols.ligand_docking.LigandDockProtocol: Doing 1000 trials of improvement for ligand orientation...
protocols.ligand_docking.LigandDockProtocol: Starting orientation energy atr = 0 ; rep = 0
protocols.geometry.RB_geometry: random_reorientation_matrix phi: 40.1049 psi: 257.006 theta: 103.138
core.pack.dunbrack.SingleLigandRotamerLibrary: Added 2000 rotamers for BCL
etc. etc. etc.
ligand_dock.main: Finished Proc1CsmABCL_LigandDockInput2_0001 in 348 seconds.
etc. etc. etc.
ligand_dock.main: Finished Proc1CsmABCL_LigandDockInput2_0020 in 445 seconds.
ligand_dock.main: Finished all 20 structures in 7818 seconds.

A complete re-run is in order (which, based on the above, should take a little over a week) now that the rotamers are recognized, before trying cluster_ligand_poses again.
Although the default options for cluster_ligand_poses worked (i.e. generated output), is there a way to specify either number or size of clusters?
For example, I tried using the -cluster:size and -cluster:sort_groups_by_energy flags, but they didn't seem to have any effect.

Thanks,
Sándor

Thu, 2012-01-26 15:25
skovacs

For the select_best_unique_ligand_poses greedy algorithm, you can control the number of clusters with -docking::ligand::max_poses, and the size (radius) of the clusters with -docking::ligand::min_rms

In contrast to the standard clustering application, the cluster_ligand_poses application uses the affinity propagation technique of Frey and Dueck ("Clustering by Passing Messages Between Data Points", Science 315 (2007) ), which is controlled differently than other clustering algorithms. Changing how the self-similarities are calculated is how the algorithm controls the number/size of the clusters. You can control things somewhat through -docking::ligand::max_poses, but if you want full control, you'll need to alter the code and recompile. If you go that route, I'd recommend reading the Frey & Dueck reference, and taking a look at rosetta_source/src/protocols/cluster/APCluster.hh (& .cc). Unfortunately, I'm not too familiar with the code, so I can't be all that much help.

That said, I'd recommend spending a little bit of time thinking about what you're looking for in clustering results. My guess is that the select_best_unique_ligand_poses approach (select the lowest energy structure that's at least -docking::ligand::min_rms ligand rmsd from any previous output structures) is one that will result in good choices of your cluster representative (specifically, the representative will be the lowest (ligand) energy structure for that cluster). I'd only recommend cluster_ligand_poses if you require the representative structure to be centrally located w/r/t rmsd across all members of the cluster.

Thu, 2012-01-26 17:13
rmoretti

Just a heads-up that this issue has been resolved!
Thanks again Rocco for all your help!

Wed, 2012-03-07 16:50
skovacs