You are here

str::out_of_range with mpiexec and relax

2 posts / 0 new
Last post
str::out_of_range with mpiexec and relax
#1

I'm doing a very simple relax to optimize and score 100.000 pdb files.

Rosetta was compiled with MPI+MYSQL support. 

If I call the relax application on a single PDB file, it works.

Since I have a 64 core machine + 196 GB RAM (with MPI installed), I would like to relax 63 structures in parallel.
This is the command i'm executing with MPI:

mpiexec -np 63 /rosetta2020/main/source/bin/relax.mpimysql.linuxgccrelease @relax.flags.database

After loading the 100k PDBs into the job distributor (this seems to be working) the relax protocol starts, and after a few seconds the processes crashes with this message:

(see flags at the bottom, after the error)

.
.
.
.
protocols.relax: (1) turning off DNA bb and chi move
protocols.relax: (1) turning off DNA bb and chi move
protocols.relax: (1) turning off DNA bb and chi move
protocols.relax: (1) turning off DNA bb and chi move
protocols.relax: (1) turning off DNA bb and chi move
protocols.relax: (1) turning off DNA bb and chi move
core.scoring.hbonds.hbonds_geom: (2) hb_energy_deriv has H(83.7006,28.3743,130.151) D(84.477,29.084,129.338)  distance out of range 1.32937
core.scoring.hbonds.hbonds_geom: (1) hb_energy_deriv has H(83.7006,28.3743,130.151) D(84.477,29.084,129.338)  distance out of range 1.32937
core.scoring.hbonds.hbonds_geom: (2) hb_energy_deriv has H(86.5347,29.8745,60.4846) D(87.159,30.578,61.424)  distance out of range 1.32937
core.scoring.hbonds.hbonds_geom: (1) hb_energy_deriv has H(86.5347,29.8745,60.4846) D(87.159,30.578,61.424)  distance out of range 1.32937
core.pack.dunbrack.RotamerLibrary: (2) shapovalov_lib_fixes_enable option is true.
basic.io.database: (2) Using '/rosetta2020/main/database/rotamer/shapovalov/StpDwn_0-0-0/Dunbrack10.lib.bin' as the cached file.
core.pack.dunbrack.RotamerLibrary: (2) shapovalov_lib::shap_dun10_smooth_level of 1( aka lowest_smooth ) got activated.
core.pack.dunbrack.RotamerLibrary: (2) Binary rotamer library selected: /rosetta2020/main/database/rotamer/shapovalov/StpDwn_0-0-0/Dunbrack10.lib.bin
basic.io.database: (2) Using '/rosetta2020/main/database/rotamer/shapovalov/StpDwn_0-0-0/Dunbrack10.lib.bin' as the cached file.
core.pack.dunbrack.RotamerLibrary: (2) Using Dunbrack library binary file '/rosetta2020/main/database/rotamer/shapovalov/StpDwn_0-0-0/Dunbrack10.lib.bin'.


AN INTERNAL ERROR HAS OCCURED. PLEASE SEE THE CONTENTS OF ROSETTA_CRASH.log FOR DETAILS.


  what():  map::at
[Svr1:224248] *** Process received signal ***
[Svr1:224248] Signal: Aborted (6)
[Svr1:224248] Signal code:  (-6)
[Svr1:224248] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x354b0)[0x7f8b63b714b0]
[Svr1:224248] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x38)[0x7f8b63b71428]
[Svr1:224248] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x16a)[0x7f8b63b7302a]
[Svr1:224248] [ 3] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x16d)[0x7f8b641ab84d]
[Svr1:224248] [ 4] /rosetta2020/main/source/build/src/release/linux/4.4/64/x86/gcc/5.4/mpi-mysql/libutility.so(_ZN7utility17terminate_handlerEv+0xd3)[0x7f8b64d02713]
[Svr1:224248] [ 5] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x8d6b6)[0x7f8b641a96b6]
[Svr1:224248] [ 6] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x8d701)[0x7f8b641a9701]
[Svr1:224248] [ 7] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x8d919)[0x7f8b641a9919]
[Svr1:224248] [ 8] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZSt20__throw_out_of_rangePKc+0x3f)[0x7f8b641d22cf]
[Svr1:224248] [ 9] /rosetta2020/main/source/build/src/release/linux/4.4/64/x86/gcc/5.4/mpi-mysql/libcore.4.so(_ZNK4core7scoring7methods16ProClosureEnergy12measure_chi4ERKNS_12conformation7ResidueES6_+0xa4c)[0x7f8b5fcac12c]
[Svr1:224248] [10] /rosetta2020/main/source/build/src/release/linux/4.4/64/x86/gcc/5.4/mpi-mysql/libcore.4.so(_ZNK4core7scoring7methods16ProClosureEnergy19residue_pair_energyERKNS_12conformation7ResidueES6_RKNS_4pose4PoseERKNS0_13ScoreFunctionERNS0_10EMapVectorE+0x21f)[0x7f8b5fcac6cf]
[Svr1:224248] [11] /rosetta2020/main/source/build/src/release/linux/4.4/64/x86/gcc/5.4/mpi-mysql/libcore.3.so(_ZNK4core7scoring13ScoreFunction10eval_ci_2bERKNS_12conformation7ResidueES5_RKNS_4pose4PoseERNS0_10EMapVectorE+0x68)[0x7f8b5e2a1f78]
[Svr1:224248] [12] /rosetta2020/main/source/build/src/release/linux/4.4/64/x86/gcc/5.4/mpi-mysql/libcore.3.so(_ZNK4core7scoring13ScoreFunction35asym_eval_twobody_neighbor_energiesERNS_4pose4PoseE+0xd3f)[0x7f8b5e2abb7f]
[Svr1:224248] [13] /rosetta2020/main/source/build/src/release/linux/4.4/64/x86/gcc/5.4/mpi-mysql/libcore.3.so(_ZNK4core7scoring13ScoreFunctionclERNS_4pose4PoseE+0xca)[0x7f8b5e2b2c0a]
[Svr1:224248] [14] /rosetta2020/main/source/build/src/release/linux/4.4/64/x86/gcc/5.4/mpi-mysql/libprotocols.4.so(_ZN9protocols5relax9FastRelax5applyERN4core4pose4PoseE+0x3dc)[0x7f8b66ab690c]
[Svr1:224248] [15] /rosetta2020/main/source/build/src/release/linux/4.4/64/x86/gcc/5.4/mpi-mysql/libprotocols.1.so(_ZN9protocols3jd214JobDistributor11run_one_jobERSt10shared_ptrINS_5moves5MoverEElRNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESD_RmSE_b+0xc9c)[0x7f8b660e396c]
[Svr1:224248] [16] /rosetta2020/main/source/build/src/release/linux/4.4/64/x86/gcc/5.4/mpi-mysql/libprotocols.1.so(_ZN9protocols3jd214JobDistributor7go_mainESt10shared_ptrINS_5moves5MoverEE+0x166)[0x7f8b660e5cb6]
[Svr1:224248] [17] /rosetta2020/main/source/build/src/release/linux/4.4/64/x86/gcc/5.4/mpi-mysql/libprotocols.1.so(_ZN9protocols3jd225MPIWorkPoolJobDistributor2goESt10shared_ptrINS_5moves5MoverEE+0x2fb)[0x7f8b6612dadb]
[Svr1:224248] [18] /rosetta2020/main/source/build/src/release/linux/4.4/64/x86/gcc/5.4/mpi-mysql/libprotocols.4.so(_ZN9protocols5relax10Relax_mainEb+0x1d34)[0x7f8b66adc514]
[Svr1:224248] [19] /rosetta2020/main/source/bin/relax.mpimysql.linuxgccrelease[0x40894b]
[Svr1:224248] [20] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f8b63b5c830]
[Svr1:224248] [21] /rosetta2020/main/source/bin/relax.mpimysql.linuxgccrelease[0x4089f9]
[Svr1:224248] *** End of error message ***
--------------------------------------------------------------------------
mpiexec noticed that process rank 1 with PID 224248 on node Srv1 exited on signal 6 (Aborted).
--------------------------------------------------------------------------

 

This is the FLAGS file:

-in:ignore_unrecognized_res
-list pdblist.txt
-relax:script relax.script
-relax:bb_move false

-score:output_residue_energies true
-score:weights dna

-nstruct 1
-overwrite
-adducts dna_major_groove_water

-out:file:output_pose_energies_table
-out:use_database
-out:database_protocol_id 0
-out:level 500
-out:levels all:debug

-inout:dbms:mode mysql
-inout:dbms:database_name test
-inout:dbms:host localhost
-inout:dbms:user xxxxxx
-inout:dbms:password xxxxxx
-inout:dbms:port  3306

-run:random_delay 2
-jd2:delete_old_poses true
-jd2:max_nstruct_in_memory 800

-exclude_patches patches/carbohydrates/2-amination.txt patches/carbohydrates/2-branch.txt patches/carbohydrates/3-amination.txt patches/carbohydrates/3-branch.txt patches/carbohydrates/3-methylation.txt patches/carbohydrates/5-acetylation.txt patches/carbohydrates/6-branch.txt patches/carbohydrates/branch_lower_term.txt patches/carbohydrates/cutpoint_lower.txt patches/carbohydrates/cutpoint_upper.txt patches/carbohydrates/lower_terminus.txt patches/carbohydrates/Me_glycoside.txt patches/carbohydrates/N-acetyl-2-amination.txt patches/carbohydrates/N-linked_glycosylation.txt patches/carbohydrates/upper_terminus.txt patches/branching/aryl-C-conjugated.txt patches/branching/aryl-O-conjugated.txt patches/branching/C-terminal_conjugation.txt patches/branching/N-linked_conjugation.txt patches/branching/O-linked_conjugation.txt patches/branching/phg_cd1_conjugation.txt patches/branching/phg_cd2_conjugation.txt patches/branching/S-linked_conjugation.txt patches/branching/sidechain_carboxyl_conjugation.txt patches/branching/sidechain_electrophile_conjugation.txt patches/NtermConnect.txt patches/CtermConnect.txt patches/oop_post.txt patches/oop_pre.txt patches/hbs_post.txt patches/hbs_pre.txt patches/a3b_hbs_post.txt

 

The CRASH_LOG

[START_CRASH_REPORT]
[ROSETTA_VERSION]: 2020.11+release.ce6f14f
[COMMIT_DATE]: 2020-03-14T19:33:48.401771
[APPLICATION]: /rosetta2020/main/source/bin/relax.mpimysql.linuxgccrelease
[MODE]: Release
[EXTRAS]: mpi (OpenMPI 1.10.2) mysql 
[OS]: GNU/Linux
[COMPILER]: GCC version "5.4.0 20160609"
[STDLIB]: libstdc++ version 20160609
[START_OPTIONS]
 -in:ignore_unrecognized_res -in:file:list=pdblist.txt -inout:dbms:mode=mysql -inout:dbms:database_name=test -inout:dbms:host=localhost -inout:dbms:user=xxxxxxxx -inout:dbms:password=xxxxxxx -inout:dbms:port=3306 -out:overwrite -out:nstruct=1 -out:use_database -out:database_protocol_id=0 -out:level=500 -out:levels=all:debug -out:file:output_pose_energies_table -run:random_delay=2 -jd2:delete_old_poses -jd2:max_nstruct_in_memory=63 -score:weights=dna -score:output_residue_energies -packing:adducts=dna_major_groove_water -chemical:exclude_patches=patches/carbohydrates/2-amination.txt patches/carbohydrates/2-branch.txt patches/carbohydrates/3-amination.txt patches/carbohydrates/3-branch.txt patches/carbohydrates/3-methylation.txt patches/carbohydrates/5-acetylation.txt patches/carbohydrates/6-branch.txt patches/carbohydrates/branch_lower_term.txt patches/carbohydrates/cutpoint_lower.txt patches/carbohydrates/cutpoint_upper.txt patches/carbohydrates/lower_terminus.txt patches/carbohydrates/Me_glycoside.txt patches/carbohydrates/N-acetyl-2-amination.txt patches/carbohydrates/N-linked_glycosylation.txt patches/carbohydrates/upper_terminus.txt patches/branching/aryl-C-conjugated.txt patches/branching/aryl-O-conjugated.txt patches/branching/C-terminal_conjugation.txt patches/branching/N-linked_conjugation.txt patches/branching/O-linked_conjugation.txt patches/branching/phg_cd1_conjugation.txt patches/branching/phg_cd2_conjugation.txt patches/branching/S-linked_conjugation.txt patches/branching/sidechain_carboxyl_conjugation.txt patches/branching/sidechain_electrophile_conjugation.txt patches/NtermConnect.txt patches/CtermConnect.txt patches/oop_post.txt patches/oop_pre.txt patches/hbs_post.txt patches/hbs_pre.txt patches/a3b_hbs_post.txt -relax:script=relax.script -relax:bb_move=false

[END_OPTIONS]

[START_BACKTRACE]: RAW_LIBC
/rosetta2020/main/source/build/src/release/linux/4.4/64/x86/gcc/5.4/mpi-mysql/libutility.so(utility::save_crash_report(char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int)+0x33) [0x7f3b786d6373]
/rosetta2020/main/source/build/src/release/linux/4.4/64/x86/gcc/5.4/mpi-mysql/libutility.so(utility::terminate_handler()+0x184) [0x7f3b786d67c4]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x8d6b6) [0x7f3b77b7d6b6]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x8d701) [0x7f3b77b7d701]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x8d919) [0x7f3b77b7d919]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(std::__throw_out_of_range(char const*)+0x3f) [0x7f3b77ba62cf]
/rosetta2020/main/source/build/src/release/linux/4.4/64/x86/gcc/5.4/mpi-mysql/libcore.4.so(core::scoring::methods::ProClosureEnergy::measure_chi4(core::conformation::Residue const&, core::conformation::Residue const&) const+0xa4c) [0x7f3b7368012c]
/rosetta2020/main/source/build/src/release/linux/4.4/64/x86/gcc/5.4/mpi-mysql/libcore.4.so(core::scoring::methods::ProClosureEnergy::residue_pair_energy(core::conformation::Residue const&, core::conformation::Residue const&, core::pose::Pose const&, core::scoring::ScoreFunction const&, core::scoring::EMapVector&) const+0x21f) [0x7f3b736806cf]
/rosetta2020/main/source/build/src/release/linux/4.4/64/x86/gcc/5.4/mpi-mysql/libcore.3.so(core::scoring::ScoreFunction::eval_ci_2b(core::conformation::Residue const&, core::conformation::Residue const&, core::pose::Pose const&, core::scoring::EMapVector&) const+0x68) [0x7f3b71c75f78]
/rosetta2020/main/source/build/src/release/linux/4.4/64/x86/gcc/5.4/mpi-mysql/libcore.3.so(core::scoring::ScoreFunction::asym_eval_twobody_neighbor_energies(core::pose::Pose&) const+0xd3f) [0x7f3b71c7fb7f]
/rosetta2020/main/source/build/src/release/linux/4.4/64/x86/gcc/5.4/mpi-mysql/libcore.3.so(core::scoring::ScoreFunction::operator()(core::pose::Pose&) const+0xca) [0x7f3b71c86c0a]
/rosetta2020/main/source/build/src/release/linux/4.4/64/x86/gcc/5.4/mpi-mysql/libprotocols.4.so(protocols::relax::FastRelax::apply(core::pose::Pose&)+0x3dc) [0x7f3b7a48a90c]
/rosetta2020/main/source/build/src/release/linux/4.4/64/x86/gcc/5.4/mpi-mysql/libprotocols.1.so(protocols::jd2::JobDistributor::run_one_job(std::shared_ptr<protocols::moves::Mover>&, long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, unsigned long&, unsigned long&, bool)+0xc9c) [0x7f3b79ab796c]
/rosetta2020/main/source/build/src/release/linux/4.4/64/x86/gcc/5.4/mpi-mysql/libprotocols.1.so(protocols::jd2::JobDistributor::go_main(std::shared_ptr<protocols::moves::Mover>)+0x166) [0x7f3b79ab9cb6]
/rosetta2020/main/source/build/src/release/linux/4.4/64/x86/gcc/5.4/mpi-mysql/libprotocols.1.so(protocols::jd2::MPIWorkPoolJobDistributor::go(std::shared_ptr<protocols::moves::Mover>)+0x2fb) [0x7f3b79b01adb]
/rosetta2020/main/source/build/src/release/linux/4.4/64/x86/gcc/5.4/mpi-mysql/libprotocols.4.so(protocols::relax::Relax_main(bool)+0x1d34) [0x7f3b7a4b0514]
/rosetta2020/main/source/bin/relax.mpimysql.linuxgccrelease() [0x40894b]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7f3b77530830]
/rosetta2020/main/source/bin/relax.mpimysql.linuxgccrelease() [0x4089f9]

[END_BACKTRACE]

[FILE]: St12out_of_range
[LINE]: 0
[START_MESSAGE]
map::at

[END_MESSAGE]
[END_CRASH_REPORT]

 

Category: 
Post Situation: 
Fri, 2020-06-19 10:39
pedro.guillem

This was not an MPI error.

It would appear that if the PDB has the DNA atoms first, this problem doesn't occur.
My 100k pdbs had the DNA atoms at the end and somehow rosetta was getting confused. 

copying and pasting the DNA atoms to the beginning of the PDB files seems to have solved the issue.

 

Wed, 2020-06-24 04:49
pedro.guillem