You are here

Compilation and unit test failures

4 posts / 0 new
Last post
Compilation and unit test failures
#1

Hi all,

I installed Rosetta 3.13 in KISTI supercomputor 5 (Nurion) (https://www.ksc.re.kr/eng/index/main).

And, I ran the unit tests. It is showing 85% success rate. 

Below is what I did and I have attached the relevant documents.

$ cd rosetta/main/source

$ module purge

$ module load craype-network-opa python/2.7.15 gcc/8.3.0 mvapich2/2.3.1 craype-mic-knl

$ vi tools/build/site.settings

        "overrides" : {
                 "cc" : "/apps/compiler/gcc/8.3.0/mvapich2/2.3.1/bin/mpicc",
                 "cxx" : "/apps/compiler/gcc/8.3.0/mvapich2/2.3.1/bin/mpicxx"

$ ./scons.py -j64 bin mode=release extras=mpi log=environment 2>&1 |tee make_1.log

$ ./scons.py -j64 mode=debug extras=mpi log=environment 2>&1 |tee make_2.log

$ ./scons.py -j64 mode=debug extras=mpi cat=test log=environment 2>&1 |tee make_3.log

$ cat test.sh
#!/bin/sh
#PBS -V
#PBS -N mpi_test_job
#PBS -q debug
#PBS -A etc
#PBS -l select=1:ncpus=64:mpiprocs=64:ompthreads=1
#PBS -l walltime=04:00:00
#PBS -W sandbox=PRIVATE

cd $PBS_O_WORKDIR

module purge
module load craype-network-opa python/2.7.15 gcc/8.3.0 mvapich2/2.3.1 craype-mic-knl

TOTAL_CPUS=$(wc -l $PBS_NODEFILE | awk '{print $1}')

python /scratch/a1376a01/rosetta/main/source/test/run.py --database=/scratch/a1376a01/rosetta/main/database --mode=debug --extras=mpi --compiler=gcc --jobs=64

$ qsub test.sh

 

How can I fix these problems?

Thank you in advance

AttachmentSize
make_1.txt317.51 KB
make_2.txt275.51 KB
make_3.txt508.64 KB
Category: 
Post Situation: 
Wed, 2021-09-01 22:09
kjs1728

Here is the result of unit tests.

File attachments: 
Wed, 2021-09-01 22:14
kjs1728

From the log, it looks like the majority of the failures are along the lines of

'./core.test RotamerSetsTests --database /scratch/a1376a01/rosetta/main/database -mute all -no_fconfig' exceeded the timeout and will be killed!

Some of the tests can take a fair amount of time to run, so it looks like they're not completing in time on your computer. You can try passing `--timeout 0` to the test/run.py script to turn off the timeout completely. Also, check your cluster job settings, to see if you're bumping up against cluster runtime limits. (You got the final summary, so it doesn't look like you ran into the full job being canceled, but I don't know if your cluster has process-level timeout control.)

Another thing to keep in mind is that even for an MPI compile, the tests here are all run serially (not through MPI). I don't know your cluster, but if there's issues with running 64 separate single CPU non-MPI jobs given your PBS setup, then that might be contributing to things.

 

All that said, the tests are there primarily for developers during development. Running them isn't a part of the typical installation process. You certainly can, but be prepared that there may be a few tests which fail not due to issues with the code, but just because of quirks of your machine/setup.

 

Thu, 2021-09-02 09:04
rmoretti

Dear rmoretti,

Thank you very much for your comment.

I followed your advice and tested again one by one.

Although, some tests still failed due to cluster runtime limits, most of the failed tests were successful.

Also, the rosetta I compiled seems to be working well as I can get the results of calculation with MPI.

Thanks again.

Mon, 2021-09-27 18:59
kjs1728