You are here

Rosetta CM - Ignore Sporadic Errors

7 posts / 0 new
Last post
Rosetta CM - Ignore Sporadic Errors

Hey all,

Im using Rosetta CM based on this tutorial: I run the jobs in batches of 1000 and most of them don't complete all 1000 because of a "Cannot normalize xyzVector of length() zero" although they did complete anywhere between 60 and 300 before quitting. From some quick searches it seems like this is due to a unlucky random seed pick, so I was wondering if there was any way to tell rosetta to ignore the error and restart that job or skip it.

I tried using "-in:file:skip_failed_simulations true" and "-jd2:failed_job_exception false" by creating a bad input pdb and ran it through score_jd2 to test if those would work but they didn't seem to do the trick. Any suggestions would be greatly appreciated.



Post Situation: 
Wed, 2018-06-13 05:47

Unfortunately, there isn't a way to bypass that error.

This is one of the more infuriating errors in Rosetta, and developers are looking into ways of getting around it, but at the moment you can only just restart the job and hope to make further progress on things.

Wed, 2018-06-13 14:12

Thanks for the response.

In that case, is there an option to pass to the combine silent files executable that would tell it to completely renumber instead of just adding the _1, _2, etc. afterwards?

Thu, 2018-06-14 05:00

You can always give up on the job distributor.  If you write a shell script (look at JD0 in the tools repository adjacent to your main repository) you can just run all your nstruct as separate command lines.  instead of one Rosetta call at nstruct 1000, 1000 Rosetta calls at nstruct 1 each.  be sure to track your random number seed if you do this.

This is not a GOOD solution or a PRETTY solution but it will at least keep one run's death from affecting the others downstream.


Wed, 2018-06-13 14:21

Thanks for the suggestion, it sounds like a fun challenge. I haven't looked extensively at the options for this one yet, but is there a way to tell rosetta to append the output to the end of an already existing silent file instead of overwriting it or writing a whole new file? 

Thu, 2018-06-14 05:05

I am 99% certain that if you give it a silent file path that is already present on disk it will just append.  I'm also pretty sure it's smart enough to check the indices and pick an output name that's not already in use in the file.  It will cause problems if the score fields change, though - it only prints the SCORE line with the score term names at the top; if that changes the scores will become uninterpretable because the labels don't get repeated.  That should be a nonissue here.

Thu, 2018-06-14 10:29

It will definitely append results to a silent file that already exists.

However, to support decent restart behavior (so you don't need to re-run your entire protocol if your cluster dies at 99999 structures out of 100000), it will check the silent file for the desired output prior to starting the job. If that particular output already exists, then it will skip that job.

In your case, this can help. If you write a wrapper script which restarts the job (with exactly the same commandline) if it detects there's insufficient output jobs, Rosetta should pick up right where it left off.

The other way around this is to tell Rosetta to use a unique label for each output structure. You can do this with -out:prefix or -out_suffix. For example, adding something like `-out_suffix _${JOBID}` to the commandline will allow you to launch different jobs, each going to uniquely labeled outputs. (Presuming you're using a loop which sets the JOBID environment variable to unique numbers for each run.) -- These can go into the same silent file, so long as there's only one process writing to that silent file at a time. (Multiple non-MPI processes writing to the same silent file at the same time is a recipie for disaster.) Alternatively, you can have each of them write to a separate silent file, and then just concantenate the files later (with `cat`, or with the combine_silent application).

Fri, 2018-06-15 07:12