You are here

Ligand Docking with Rosetta Scripts memory problem

6 posts / 0 new
Last post
Ligand Docking with Rosetta Scripts memory problem
#1

Hi,

I am trying to dock a small molecule with holo dimer protein (153 aa, 1 Cu and 1 Zn per chain). I followed the Ligand Docking Tutorial and it all works fine until I try to generate more than ~50 output structures. Using both MPI and default builds, all 8 GB of RAM and 5 GB of Swap are slowly filled up until my linux system kills the program. Below I have listed different commands and how many strucures it generated before swamping memory. The basic options file and xml file is attached is attached.

-linmem_ig 10 produces 41 output structures

-qsar:grid_dir produces 27 output structures

-jd2:delete_old_poses produces 23 output structures

I am running a Linux Mint v 18.2 on an 8 core processor with 8 GB of RAM. I am using the 2017 week 29 release.

Any help is appreciated.

AttachmentSize
options file1009 bytes
xml file1.79 KB
Category: 
Post Situation: 
Thu, 2017-08-17 14:59
Swillard

With recent Rosetta builds, you can add the `-analytic_etable_evaluation true` option to any run which has `-restore_pre_talaris_2013_behavior true` set to greatly reduce memory usage.

Thu, 2017-08-17 15:24
rmoretti

Thanks for the quick reply!

I added that option and it reduced the amount of memory that was initially taken up like you said. Unfortunately, as the run continues, memory and swap is still slowly filled up, completing 63 output structures. Is this normal behavior, or is there something else that I could try?

I probably should add that I am not very experienced with using rosetta, so I apologize if I'm missing something simple.

Sun, 2017-08-20 19:32
Swillard

You can combine all of the flags which you listed in your original post with the `-analytic_etable_evaluation true` option, to see if you can further reduce memory.

Another flag you can try is `-qsar:max_grid_cache_size` -- If you set this to something small (I might even try 0 or 1), it will reduce the amount of space used for the in-memory grid cache, which may help.

But one issue is that you're running up against the "normal" size limit of Rosetta runs. Generally speaking, we recommend having at least 1 GB of memory for each processor running Rosetta. Depending on protocol, this may increase to 1.5 or 2 GB per processor. I might suggest turning down the number of runs you do on that machine. That is, if you only have 8 GB, don't launch 8 Rosetta jobs - try only launching 6 (despite having 8 cores). Hopefully the runs will stablize at some amount of memory that's greater than 1 GB, but less than 8/6 GB.

Mon, 2017-08-21 09:47
rmoretti

The -qsar:max_grid_cache_size 1 command fixed it, Thank you! For my understanding, what does this command actually do and what does the number represent?

Out of curiosity, would this have anything to do with how big my scoring grid is? I set it to 70 since I'm trying to search the entire surface of the protein for docking.

I was only giving open mpi 3 processors. My understanding is that 6+ GB per processor was excessive which is why I didn't just restart the job whenever it ran out of memory. Also, even just 1 processor eventually filled up all of the memory, which I figured was even more excessive.

I really appreciate all the help.

Sat, 2017-09-02 10:25
Swillard

For speed in virtual high throughput screening, Rosetta can cache the pre-calculated interaction grids used by the Transform mover. The -qsar:max_grid_cache_size option controls how large the in-memory cache gets. If you have a very large grid (which 70 definitely is) this will take up a large amount of memory if you have a large number of them cached in memory. By setting it to 1, you'll only cache the last computed grid. This will result in a small slowdown as grids get re-computed, but it will greatly save memory.

By the way, you might not need such a large grid if you're using a recent weekly release with multiple starting positions. There's a bug in earlier releases with multiple starting positions, but in recent weeklies the grids will be re-calculated (recentered) for each different starting point. This way your grid only needs to be big enough to cover the protein for the travel from that particular starting point. (i.e. a bit more than the maximum width of the ligand plus twice the box size)

Mon, 2017-09-04 10:10
rmoretti