You are here

random seed

6 posts / 0 new
Last post
random seed
#1

Hi,

First sorry for repeating this question but I really need to get straight with randomness in rosetta and it would be great to have a good explanation of that somewhere on the web, for example on this forum.

I want to run multiple rosetta instances for the same protein on several computer clusters. From the manual and previous posts I can see two options:
1) -constant_seed -jran
2) -seed_offset

Which one is better? I think the commonly used is option 1). But how this differs from 2) ? Does they do the same?

Is there any constraint on selection of the particular integers from 1 000 000 to 4 000 000 in -jran? If I am to submit 100 jobs on cluster 1 and 100 jobs on cluster 2 I can just simply use 1 000 000 to 1 000 099 on cluster 1 and 1 000 100 to 1 000 199 on cluster 2? Or is it more sophisticated?

Thanks for your help,

Janek Kosinski

Thu, 2008-05-08 03:11
kosa

If you run jobs in a cluster of parallel cpus,sometimes you will get exact same results.This is a general problem for Rosetta.The reason is that some cpus are starting from exactly the same random seed .One solution for this is to add -seed_offset followed by an int number to your command line ,in this way, you can force each cpu to start from a different seed with that offset number.
You can also try to run with -constant_seed -jran in multiple clusters,yes, the selection of the integers for the jran should between 1 million and 4 million.The default value is 1111111.You can certainly choose different values if you want to submit jobs to different clusters.

> Hi,
>
> First sorry for repeating this question but I really need to get straight with randomness in rosetta and it would be great to have a good explanation of that somewhere on the web, for example on this forum.
>
> I want to run multiple rosetta instances for the same protein on several computer clusters. From the manual and previous posts I can see two options:
> 1) -constant_seed -jran
> 2) -seed_offset
>
> Which one is better? I think the commonly used is option 1). But how this differs from 2) ? Does they do the same?
>
> Is there any constraint on selection of the particular integers from 1 000 000 to 4 000 000 in -jran? If I am to submit 100 jobs on cluster 1 and 100 jobs on cluster 2 I can just simply use 1 000 000 to 1 000 099 on cluster 1 and 1 000 100 to 1 000 199 on cluster 2? Or is it more sophisticated?
>
> Thanks for your help,
>
> Janek Kosinski

Thu, 2008-05-08 08:21
huxz

So it sounds that -seed_offset and -constant_seed -jran are just two alternative solutions to the same problem. I used -seed_offset and indeed it apparently worked, but I think the community prefers -constant_seed -jran (however it does not matter as I understand). -constant_seed -jran looks better for the greater control of your seed values...

Thanks for explanations.

> If you run jobs in a cluster of parallel cpus,sometimes you will get exact same results.This is a general problem for Rosetta.The reason is that some cpus are starting from exactly the same random seed .One solution for this is to add -seed_offset followed by an int number to your command line ,in this way, you can force each cpu to start from a different seed with that offset number.
> You can also try to run with -constant_seed -jran in multiple clusters,yes, the selection of the integers for the jran should between 1 million and 4 million.The default value is 1111111.You can certainly choose different values if you want to submit jobs to different clusters.

Fri, 2008-05-09 02:41
kosa

I am only 4 years late, but if you don't use the MPI version and instead, say, submit 100 jobs each producing 500 decoys there should be no need to use constant_seed -jran or am I wrong?
D.

Mon, 2012-11-12 09:39
pardave

I am only 4 years late, too. I run thounds of jobs on cluster without using either method using Rosetta 3.8. Usually hunderds of jobs start simultaneously. But I have never observed any identical decoys, though I do not use either option. Are these options still needed?

Thu, 2017-06-15 06:56
attesor

This thread is in the Rosetta++ (Rosetta2) portion of the forums, so keep in mind that Rosetta++ has different behavior from Rosetta3.

I haven't double-checked the Rosetta++ behavior (I don't have the code handy), but for Rosetta3 if you don't use the -constant_seed option Rosetta will automatically pull from the system random number source to get the seed for the pseudorandom number generator. It should print a message to the core.init tracer listing details. 

If this is working correctly (again, for Rosetta3), this should avoid any issues with restarting jobs and having a large number of jobs start simultaneously. Each time Rosetta restarts it will pull a new seed, so it shouldn't reproduce the same decoys, as it might with a constant seed. It's also pulling the number from the system random number source rather than a clock, so even two jobs starting at the identical instant should get different random number seeds.

Tue, 2017-06-20 08:47
rmoretti