You are here

Does the fold count reset if abinitio is restarted?

11 posts / 0 new
Last post
Does the fold count reset if abinitio is restarted?
#1

I have been running abinitio on a server using mpi to utilize 13 cores. There have been a few times over the last month when we had to restart abinitio. My question is will abinitio pick up the fold count where it left off, or will it start the count from one again?

Abinitio didn't overwrite the silent out files, it just kept writing to the end of the files. But it seems to be taking longer then I expected, maybe I'm just overestimating how long this should take. I also looked at one of the silent out files and after about 6700 folds the number went back down to 14. Does this mean that the folding started over after we had to restart? This is the command I used to run abinitio.

mpirun -np 14 AbinitioRelax.linuxgccrelease -database /opt/rosetta-3.2.1/rosetta_database/ -in:file:fasta t000_.fasta -in:file:frag3 aat000_03_05.200_v1_3 -in:file:frag9 aat000_09_05.200_v1_3 -abinitio:relax -relax:fast -abinitio::increase_cycles 10 -abinitio::rg_reweight 0.5 -abinitio::rsd_wt_helix 0.5 -abinitio::rsd_wt_loop 0.5 -use_filters true -psipred_ss2 t000_.psipred_ss2 -out:file:silent t000_silent.out -nstruct 20000

I just want a total of 20000 folds, is that what this command will give, or will it do 20000 folds 13 times?

Thanks for any help with this problem.

Post Situation: 
Mon, 2011-06-13 11:39
burkheadlab

Abinitio is non-intuitively and non-obviously not compatible with MPI in 3.2.1. I suspect your problem is more related to that. If you look at your result file you will probably find that it does structure 1 13 times, then structure 2 13 times, etc.

It should know where it left off and restart there, but I can't say for sure that it does.

If you'd like the abinitio MPI patch, let me know, and I'll email it to the address you registered your forum account to. You can also take a look at these threads on the issue:
https://www.rosettacommons.org/node/1829
https://www.rosettacommons.org/node/2292

Mon, 2011-06-13 13:31
smlewis

Damn, well that sucks. Thanks for the reply and yes I would like to get that patch from you. And I will look to the other posts that you listed for more info.

Mon, 2011-06-13 16:39
burkheadlab

smlewis, thanks for e-mailing me that patch. I have another question. It seems like there were a lot of duplicate structures in the silent out files. I ran extract_pdbs on the silent out files and it listed "SilentStruct with S_00000715 already exists!" for a bunch of different structures, with a total of 24200 structures in one of the files; there were 13 silent out files from the mpi run, all with about the same filesize and thus probably roughly the same number of structures. My question is, are these structures a unique fold, or are the structures with the same number the same fold (does this question make sense)? I guess what I'm asking is, are all the duplicate structures just going to give the same data/results, or does the monte carlo procedure produce different folds every time it runs? Let me know if you need me to explain what I'm asking any further. Thanks for your help.

Tue, 2011-06-14 16:52
burkheadlab

Whether you have different structures with the same name, or the same structure with the same name, depends on whether the different runs started with different random number seeds. If you didn't manually assign seeds, then what seed it gets is dependent on the system you run it on; I think it's from /dev/urandom on linux systems.

I would try just try extracting the first one from each set. If those are different, then probably they're all different. If they're all the same, you can throw a lot of it out as duplicate. If you get something like 5 copies of A, 3 copies of B, 4 copies of C, and one of D, then you have 4 results, and you can throw out the duplicate trajectory sets.

You shouldn't get behaviors like one output appearing from processor 1 at structure 0001, but then reappearing from processor 10 at structure 0245. If they're going to be the same the numbers should be the same. (This might not hold true if you had multiple job restarts in the same output file, which may cause it to number "first results" as one plus the end of the last one.)

Fri, 2011-06-17 13:02
smlewis

Okay, thanks for the informative reply. It seems like there was some duplication, at least in file name. After I extracted them, I deleted all the structures that were succeeded by a number, indicating a file with the same name had already be created.

Fri, 2011-06-17 18:22
burkheadlab

am also facing the same problem while restarting (run stopped place "S_00002017.pdb"  ,  expected continue "S_00002017.pdb" or "S_00002018.pdb" but "S_00000001.pdb") with the same script it start again from first 

(kindly note:am using single node consist of 8 processor, Rosetta version: "rosetta_bin_linux_2016.17.58663" )

expecting your support and suggestion 

script used to restart 

mpirun -np 8

/PATH/rosetta_bin_linux_2016.17.58663_bundle/main/source/bin/AbinitioRelax.mpi.linuxgccrelease \

-database /PATH/rosetta_bin_linux_2016.17.58663_bundle/main/database/ \

-in:file:fasta 74re.fasta \

-in:file:frag9 aat000_09_05.200_v1_3 \

-in:file:frag3 aat000_03_05.200_v1_3 \

-psipred_ss2 t000_.psipred_ss2 \

-abinitio:relax \

-use_filters true \

-abinitio::increase_cycles 10 \

-abinitio::rg_reweight 0.5 \

-abinitio::rsd_wt_helix 0.5 \

-abinitio::rsd_wt_loop 0.5 \

-nstruct 30000 \

-out:pdb \

-out:file:silent silent.out \

Sun, 2016-06-26 23:49
venkatazb

It might be because you're using both -out:pdb and -out:file:silent.

Nobody has ever fixed this particular version of abrelax to use the better job distributor, and I've forgotten anything I knew about the old one from 8 years ago.

I think the combine_silent tool can just renumber the structures for you (to unique numbers) after the fact.

Mon, 2016-06-27 09:23
smlewis

Using Rosetta 3.6

As the previous user, I am also getting duplicates in the default.out except that I did not use either -out:pdb or -out:file:silent. It looks like I am getting as many duplicates as the -np parameter (ie mpirun -np 10 yields 10 models with the same name in the default.out for each nstruct )

Sending this job to our SLURM:

salloc -N 2 mpirun -np 2 /depot/rosetta/3.6/main/source/bin/AbinitioRelax.default.linuxgccrelease @flags > abinitio.out 2> abinitio.err &

resulted in a default.out where every other model is having the same name as the previous one, although all the models were distinct (different scores), e.g.:

SCORE:     score     fa_atr     fa_rep     fa_sol    fa_intra_rep    fa_elec    pro_close    hbond_sr_bb    hbond_lr_bb    hbond_bb_sc    hbond_sc    dslf_fa13    atom_pair_constraint    coordinate_constraint    angle_constraint    dihedral_constraint       rama      omega     fa_dun    p_aa_pp    yhh_planarity        ref    Filter_Stage2_aBefore    Filter_Stage2_bQuarter    Filter_Stage2_cHalf    Filter_Stage2_dEnd    co    clashes_total    clashes_bb       time description
SCORE:  -386.981  -1063.484    133.980    590.879           2.937    -83.298        1.503        -67.975        -16.145        -14.916     -15.160        0.000                 -85.736                    0.000               0.000                  0.000    -21.011     17.257    280.673    -41.167            0.122     -5.440                    0.000                     0.000                  0.000                 0.000 24.393            0.000         0.000   1192.000  S_00000001
SCORE:  -457.134  -1139.665    151.095    638.124           2.957    -81.250        0.895        -63.663        -24.682        -23.702     -14.814        0.000                -151.746                    0.000               0.000                  0.000    -18.974     26.806    286.585    -39.979            0.319     -5.440                    0.000                     0.000                  0.000                 0.000 31.262            0.000         0.000   2150.000  S_00000001
SCORE:  -431.520  -1114.407    146.608    623.038           2.875    -75.896        1.017        -71.586        -17.001        -19.279     -13.235        0.000                -147.458                    0.000               0.000                  0.000    -15.957     30.079    282.052    -36.953            0.024     -5.440                    0.000                     0.000                  0.000                 0.000 39.731            0.000         0.000   2102.000  S_00000002
SCORE:  -456.403  -1142.623    146.135    646.783           3.101    -92.779        1.241        -62.741        -24.531        -16.490     -18.518        0.000                -151.755                    0.000               0.000                  0.000    -15.470     23.233    305.379    -51.934            0.005     -5.440                    0.000                     0.000                  0.000                 0.000 41.281            0.000         0.000   2112.000  S_00000002
SCORE:  -441.175  -1110.798    136.214    621.876           3.043    -77.155        1.372        -60.945        -16.107        -24.633     -16.463        0.000                -151.997                    0.000               0.000                  0.000    -14.308     24.913    290.717    -41.872            0.409     -5.440                    0.000                     0.000                  0.000                 0.000 29.163            0.000         0.000   2162.000  S_00000003
SCORE:  -441.159  -1157.146    164.662    634.912           3.143    -97.415        1.366        -66.306        -17.415        -17.080     -22.135        0.000                -143.394                    0.000               0.000                  0.000    -11.400     28.260    301.588    -37.477            0.119     -5.440                    0.000                     0.000                  0.000                 0.000 31.182            0.000         0.000   2267.000  S_00000003


etc...

Wed, 2017-01-25 17:32
iphan

You're using the non-MPI compile of AbinitioRelax (the "default" in AbinitioRelax.default.linuxgccrelease). This means that when you launch 10 versions of the program with mpirun -np 10 command, they're all independent and don't talk to each other. This means that they're each going  to make a S_00000001. (But if they get different random seeds, then each of the S_00000001s will be different, with different energies.

If you want to run Rosetta programs with MPI, you should be using the MPI-compiled version of the program (AbinitioRelax.mpi.linuxgccrelease). To get this, you need to compile with "extras=mpi" on the scons commandline, and often have to tweak the compile settings. (This is quite a bit cluster dependent). There are a fair number of threads on this forum about troubleshooting MPI compiles.

Thu, 2017-02-02 08:32
rmoretti

If your silent file is similarly problematic - combine_silent can renumber them.

If you have PDBs instead - well, the file system ate every other output, there's nothing to do about it now but not run it the same way again.

Thu, 2017-01-26 07:17
smlewis