You are here

Possible Shellshock Patch problem results in no output files being written

4 posts / 0 new
Last post
Possible Shellshock Patch problem results in no output files being written
#1

Folks,
In September 2014 (before shellshock patch) a user ran a Loop modeling job on our system (rhel 6.3 IBM Platform HPC 4.1.1.1)
The Rosetta version is rosetta_2014.30.57114_bundle
The user is running minirosetta.mpi.

the command is:
mpiexec -launcher ssh -f /PATH/machines -n 50 -ppn 8 /PATH/rosetta_2014.30.57114_bundle/main/source/bin/minirosetta.mpi.linuxgccrelease @/PATH/DGCR8_NOE_10172014/2LZM_broker_cst.options -database /PATH/rosetta_2014.30.57114_bundle/main/database

the options file looks like this

# Make sure all variable names have been replaced with absolute path and that no line begins with a $ or ~s
-in
-file
-native 2YT4_1_0030.pdb # native PDB file (optional)
-fasta 2YT4.fasta # protein sequence in fasta format
-frag3 aaDGCR803_05.200_v1_3 # protein 3-residue fragments file
-frag9 aaDGCR809_05.200_v1_3 # protein 9-residue fragments file
-abinitio
-increase_cycles 10 # Increase the number of cycles at each stage in AbinitioRelax by this factor
-rg_reweight 0.5 # Reweight contribution of radius of gyration to total score by this scale factor
-rsd_wt_helix 0.5 # Reweight env, pair, and cb scores for helix residues by this factor
-rsd_wt_loop 0.5 # Reweight env, pair, and cb scores for loop residues by this factor
-relax # At the end of de novo protein_folding, do a relax step
-use_filters true
-relax
-fast # Type of relax protocol. This has been shown to be the best deal for speed and robustness.
-psipred_ss2 t000_.psipred_ss2.txt # psipred_ss2 secondary structure definition file (required for -use_filters)
-broker
-setup topology_broker_cst.tpb
-run
-protocol broker
-reinitialize_mover_for_each_job # jd generate fresh copy of its mover before each apply (once per job)
-score
-find_neighbors_3dgrid # Use a 3D lookup table for doing neighbor calculations. For spherical, well-distributed conformations
-evaluation
-rmsd NATIVE _core 2YT4_core.txt # compute CA-RMSD for model comparing to native structure, name of column you want info under, and name of file defining over which residues you want RMSD computed
-out
-output # use this to tell Rosetta you actually want output
-nstruct 100 # how many structures do you want to generate? Usually want to fold at least 1,000.
-file
-silent 2YT4_broker_cst.out # full path to silent file output
-silent_struct_type binary # we want binary silent files
-scorefile 2YT4_broker_cst.fsc
-overwrite # overwrite any existing output with the same name you may have generated
-fold_cst
-force_minimize # minimize the structure after making a move, even if no restraints given
-constraints
-cst_file NOE.centroid.cst # full path to your restraints file
-cst_weight 4 # The factor by which the cst score is multiplied. A weight of 4 is suggested for protein_folding T4-lysozyme.
-cst_fa_file NOE.centroid.cst
-cst_fa_weight 4
-epr_distance

The September job ran to completion, produced the requested number of PDB output files and reported no errors.
and output file ended like this:

: (1) Master Node: Finished sending spin down signals to slaves
protocols.jd2.MPIFileBufJobDistributor: (1) Master Node stats: jobs-send out: 100 returned: 100 bad jobs: 0

the error file was empty.

Today October 22, 2014 ( after shellshock patch) I ran the user's script as before changing only the names of the outfiles.
As before the output file ended like this:

: (1) Master Node: Finished sending spin down signals to slaves
protocols.jd2.MPIFileBufJobDistributor: (1) Master Node stats: jobs-send out: 100 returned: 100 bad jobs: 0

The error file contained these lines
/bin/sh: BASH_FUNC_module(): line 0: syntax error near unexpected token `)'
/bin/sh: BASH_FUNC_module(): line 0: `BASH_FUNC_module() () { eval `/usr/bin/modulecmd bash $*`'
/bin/sh: error importing function definition for `BASH_FUNC_module’

The difference is that the PDB files were not written to the designated output dir

Has any one else reported such an issue?

Thanks

Category: 
Post Situation: 
Wed, 2014-10-22 13:48
hazards

It looks to me that it might be an issue with your MPI system.

The modulecmd program is used with the "module" system for controlling environment variables. It's not, to my knowledge used anywhere in Rosetta. I'm guessing that at some point during the MPI program dispatch process, that BASH_FUNC_module() function is being set up, but because of the patch it's no longer proper. The setup thus fails, and this affects the communication needed for output.

Do you get a similar error message if you try running other, non-Rosetta MPI programs?

Wed, 2014-10-22 14:59
rmoretti

I DO see the message with other non-Rosetta MPI programs but in those cases, the output is produced as expected.

Thu, 2014-10-23 08:13
hazards

Are you seeing any other error messages in the log file? Is the logging indicating that Rosetta is attempting to output structures? (You may just want to attach the log file here, assuming it's not too huge.) It's strange that Rosetta would produce a clean log message, but not output any files. At least one of the processes should be indicating that there's a difficulty writing to disk.

The other thing I might suggest checking on is in which directory the MPI jobs are being launched from on the remote machines. Rosetta tends to put output files in the current directory, so if the "current" directory on the remote machines is not what you think it is for some reason (because settings or startup scripts are working differently now), then the output files may be going into the wrong directory. If write permissions on that directory aren't good, then you might not even be getting any output. I don't know if there's some way to induce your MPI system to print/log which directory it's launching the remote applications from.

Mon, 2014-11-03 16:39
rmoretti