You are here

kinematic loop modeling/sequence design keeps crashing with segmentation fault

9 posts / 0 new
Last post
kinematic loop modeling/sequence design keeps crashing with segmentation fault
#1

hi
i am running a kinematic loop modeling/design run but it keeps on crashing with segementation fault error. could you tell me how to avaoid this? thanks.

command line:
%loopmodel.gccrelease -database XXX @loopmodelflags

flagfile:
-loops:input_pdb XXX.pdb
-loops:loop_file XXX.loop
-loops:remodel perturb_kic
-loops:refine refine_kic
-loops:relax fastrelax
-loops:extended
-in:file:fullatom
-loops:max_kic_build_attempts 10000
-out:file:fullatom
-out:overwrite
-out:prefix AAA
-out:path ./
-out:file:scorefile score.sc
-ex1
-ex2
-nstruct 10000
-resfile XXX.resfile
-mute core.util.prof ## don't show timing info
-mute core.io.database ## don't show database info

loop file:
LOOP 32 41 36 0 1
LOOP 45 49 47 0 1

output:
~~~~

protocols.jobdist.JobDistributors: Looking for an available job: 17 1 S 17
core.scoring.ScoreFunctionFactory: SCOREFUNCTION: standard
core.scoring.ScoreFunctionFactory: SCOREFUNCTION PATCH: score12
protocols.looprelax: ==== Loop protocol: =================================================
protocols.looprelax: remodel perturb_kic
protocols.looprelax: intermedrelax no
protocols.looprelax: refine refine_kic
protocols.looprelax: relax fastrelax
protocols.looprelax: ====================================================================================
protocols.looprelax: ===
protocols.looprelax: === Remodel
protocols.looprelax: ===
protocol.loops.LoopMover: ALL_LOOPS:LOOP begin end cut skip_rate extended
protocol.loops.LoopMover: LOOP 32 41 36 0 1
protocol.loops.LoopMover: LOOP 45 49 47 0 1
protocol.loops.LoopMover:
protocol.loops.LoopMover: SELECTEDLOOPS:LOOP begin end cut skip_rate extended
protocol.loops.LoopMover: LOOP 32 41 36 0 1
protocol.loops.LoopMover: LOOP 45 49 47 0 1
protocol.loops.LoopMover:
protocols.loops.loops_main: Pose fold tree FOLD_TREE EDGE 1 30 -1 EDGE 30 36 -1 EDGE 30 43 1 EDGE 43 37 -1 EDGE 43 346 -1
protocols.loops.loops_main:
protocol.loops.LoopMover: Setting extended torsions: LOOP 32 41 36 0 1
protocol.loops.LoopMover: Building Loop: LOOP 32 41 36 0 1
protocol.loops.LoopMover: Building Loop attempt: 0
protocol.loops.LoopMover: perturb_one_loop_with_KIC: 32 10
protocol.loops.LoopMover: remodel init temp: 2

protocol.loops.LoopMover: remodel final temp: 1
protocol.loops.LoopMover: kinematic initial perturb with start_res: 32 middle res: 36 end_res: 41
protocol.loops.LoopMover: loop rmsd before initial kinematic perturbation:0
protocol.loops.LoopMover: Attempting loop building: 0 ...
protocol.loops.LoopMover: Attempting loop building: 1 ...
protocol.loops.LoopMover: Attempting loop building: 2 ...
protocol.loops.LoopMover: Attempting loop building: 3 ...
protocol.loops.LoopMover: Attempting loop building: 4 ...
protocol.loops.LoopMover: Attempting loop building: 5 ...
protocol.loops.LoopMover: Attempting loop building: 6 ...
protocol.loops.LoopMover: Attempting loop building: 7 ...
protocol.loops.LoopMover: Attempting loop building: 8 ...
protocol.loops.LoopMover: Attempting loop building: 9 ...
protocol.loops.LoopMover: Attempting loop building: 10 ...
protocol.loops.LoopMover: Attempting loop building: 11 ...
protocol.loops.LoopMover: Attempting loop building: 12 ...
protocol.loops.LoopMover: Attempting loop building: 13 ...
protocol.loops.LoopMover: Attempting loop building: 14 ...
protocol.loops.LoopMover: Attempting loop building: 15 ...
protocol.loops.LoopMover: Attempting loop building: 16 ...
protocol.loops.LoopMover: Attempting loop building: 17 ...
protocol.loops.LoopMover: Attempting loop building: 18 ...
protocol.loops.LoopMover: Attempting loop building: 19 ...
protocol.loops.LoopMover: Attempting loop building: 20 ...
protocol.loops.LoopMover: Attempting loop building: 21 ...
protocol.loops.LoopMover: Attempting loop building: 22 ...
protocol.loops.LoopMover: Attempting loop building: 23 ...
protocol.loops.LoopMover: Attempting loop building: 24 ...
protocol.loops.LoopMover: initial kinematic perturbation complete
protocol.loops.LoopMover: loop rmsd after initial kinematic perturbation:7.65966
protocols.moves.MonteCarlo: MonteCarlo:: last_accepted_score,lowest_score: -3.74435 -3.74435
protocol.loops.LoopMover: new centroid perturb rmsd: 7.66692
protocol.loops.LoopMover: new centroid perturb rmsd: 7.64564
protocol.loops.LoopMover: new centroid perturb rmsd: 7.6479
protocol.loops.LoopMover: new centroid perturb rmsd: 7.60719
protocol.loops.LoopMover: new centroid perturb rmsd: 7.60153
protocol.loops.LoopMover: new centroid perturb rmsd: 7.63606
protocol.loops.LoopMover: new centroid perturb rmsd: 7.63672
protocol.loops.LoopMover: new centroid perturb rmsd: 7.63659
protocol.loops.LoopMover: new centroid perturb rmsd: 7.63877
protocol.loops.LoopMover: new centroid perturb rmsd: 7.63516
protocol.loops.LoopMover: new centroid perturb rmsd: 7.63873
protocol.loops.LoopMover: new centroid perturb rmsd: 7.61386
protocol.loops.LoopMover: new centroid perturb rmsd: 7.59655
protocol.loops.LoopMover: new centroid perturb rmsd: 7.59749
~~~~

protocol.loops.LoopMover: new centroid perturb rmsd: 7.75632
protocol.loops.LoopMover: new centroid perturb rmsd: 7.76036
protocol.loops.LoopMover: new centroid perturb rmsd: 7.74645
protocol.loops.LoopMover: new centroid perturb rmsd: 7.76038
protocol.loops.LoopMover: new centroid perturb rmsd: 7.76153
protocol.loops.LoopMover: new centroid perturb rmsd: 7.76194
protocol.loops.LoopMover: new centroid perturb rmsd: 7.75659
protocol.loops.LoopMover: new centroid perturb rmsd: 7.76038
protocol.loops.LoopMover: new centroid perturb rmsd: 7.77386
protocol.loops.LoopMover: new centroid perturb rmsd: 7.76188
protocol.loops.LoopMover: new centroid perturb rmsd: 7.76014
protocol.loops.LoopMover: new centroid perturb rmsd: 7.75756
protocol.loops.LoopMover: new centroid perturb rmsd: 7.75578
protocol.loops.LoopMover: new centroid perturb rmsd: 7.75525
protocol.loops.LoopMover: new centroid perturb rmsd: 7.74483
protocol.loops.LoopMover: new centroid perturb rmsd: 7.75071
protocol.loops.LoopMover: new centroid perturb rmsd: 7.74632
protocol.loops.LoopMover: new centroid perturb rmsd: 7.82138
protocol.loops.LoopMover: new centroid perturb rmsd: 7.81393
protocol.loops.LoopMover: new centroid perturb rmsd: 7.77752
protocol.loops.LoopMover: new centroid perturb rmsd: 7.81771
protocol.loops.LoopMover: new centroid perturb rmsd: 7.79596
protocol.loops.LoopMover: new centroid perturb rmsd: 7.80156
protocol.loops.LoopMover: new centroid perturb rmsd: 7.80308
protocol.loops.LoopMover: new centroid perturb rmsd: 7.79805
protocol.loops.LoopMover: new centroid perturb rmsd: 7.81848
protocol.loops.LoopMover: new centroid perturb rmsd: 7.76296
protocol.loops.LoopMover: new centroid perturb rm

thanks!

Post Situation: 
Thu, 2013-03-28 09:52
banshee

Seg faults alone are unfortunately useless as diagnostic tools. Can you make the debug build (mode=debug when compiling) and see what it returns? We may have to go to GDB.

Almost all segfaults are ultimately caused by bad indices - looking for atoms or residues that don't exist. Is your loop file in PDB numbering or indexed-from-1 numbering? Does you PDB file contain anything other than protein that we need to be careful of? Do all residues in your PDB have all 4 backbone heavyatoms (N CA C O) present, and with nonzero occupancies (next to last PDB column)? Do you have any weird residues that might not have a proper centroid residue type present (post-translational modifications, noncanonicals, etc)?

Finally, for reproducible crashes, you'll get a better log if you run directly to terminal instead of catching with rosetta > log. The to-log output is buffered in 32 KB blocks (system-dependent), so the last 32 KB of output is lost on crash. Direct to terminal output is not buffered and sometimes carries more information.

Thu, 2013-03-28 10:53
smlewis

hi

my pdb is numbered from 1 ~ . it has two chains and the residues are numbered continuously so that no two locations has the same residue number in the different chains. my pdb only contains the two proteins with all natural AAs, and they all have the heavy atoms with nonzero occupancy.

i tried your suggestion of running the debug mode and it has not crashed yet, could you explain why the normal mode would crash and the debug dont? and should i just run the debug mode even if it takes longer? also what is GDB that you speak of?

thanks again for your help.

Wed, 2013-04-03 08:15
banshee

Going to debug mode is not expected to prevent a crash - it's just that debug mode sometimes has more useful error messages, particularly when debug mode can exit with an assert() statement failure as opposed to a segfault. If the same inputs crash in release mode but not debug mode....that can happen, it's generally due to something nasty happening on the compiler optimization steps (the optimization that makes "release" faster than "debug"). I've seen crashes of this type but were never able to fix them - going into the code to try to identify exactly where the crash occurs alters optimization and makes the bug go away.

GDB is the gnu debugger. It lets you run code inside a "wrapper" that watches what the code is doing, line-by-line. If the code crashes, instead of dumping the memory, the debugger intercepts the crash and keeps the memory state alive to examine what the code was doing - you can know exactly what line it was on and what it did to crash. Usually you find out that some code was looking for residue 101 of a 100 residue protein, or a similar error. At minimum, if the crash occurs in debug mode, and you run in the debugger, you can issue the command "backtrace" to the debugger after the crash to get a list of exactly what line of code the crash occurred on, and what function called that line of code, etc, up through the whole stack.

Wed, 2013-04-03 09:42
smlewis

Keep in mind that the segmentation fault could be contingent on the value of a certain variable, and thus may only manifest occasionally, due to the random number trajectory. You may want to repeat the debug runs multiple times with different random seeds to see if you can get one that triggers the crash.

Unfortunately, due to differences because of optimization, etc., you can just reuse a release-mode seed in debug mode and expect to see the same trajectory. (You really can't even expect to see the same trajectories for the same seed on different machines.

Wed, 2013-04-03 11:36
rmoretti

ok so finally my debub mode run crashed with the following output:

~~~
protocols::checkpoint: Deleting checkpoints of Loopbuild
protocols::loopbuild: loop_cenrms: 0
protocols::loopbuild: loop_rms: 0
protocols::loopbuild: total_energy: -808.297
protocols::loopbuild: chainbreak: 0.0927452
protocols.jobdist.JobDistributors: Looking for an available job: 192 1 S 192
core.scoring.ScoreFunctionFactory: SCOREFUNCTION: standard
core.scoring.ScoreFunctionFactory: SCOREFUNCTION PATCH: score12
protocols.looprelax: ==== Loop protocol: =================================================
protocols.looprelax: remodel perturb_kic
protocols.looprelax: intermedrelax no
protocols.looprelax: refine refine_kic
protocols.looprelax: relax fastrelax
protocols.looprelax: ====================================================================================
protocols.looprelax: ===
protocols.looprelax: === Remodel
protocols.looprelax: ===
protocol.loops.LoopMover: ALL_LOOPS:LOOP begin end cut skip_rate extended
protocol.loops.LoopMover: LOOP 32 41 36 0 1
protocol.loops.LoopMover: LOOP 45 49 47 0 1
protocol.loops.LoopMover:
protocol.loops.LoopMover: SELECTEDLOOPS:LOOP begin end cut skip_rate extended
protocol.loops.LoopMover: LOOP 32 41 36 0 1
protocol.loops.LoopMover: LOOP 45 49 47 0 1
protocol.loops.LoopMover:
protocols.loops.loops_main: Pose fold tree FOLD_TREE EDGE 1 30 -1 EDGE 30 36 -1 EDGE 30 43 1 EDGE 43 37 -1 EDGE 43 346 -1
protocols.loops.loops_main:
protocol.loops.LoopMover: Setting extended torsions: LOOP 32 41 36 0 1
protocol.loops.LoopMover: Building Loop: LOOP 32 41 36 0 1
protocol.loops.LoopMover: Building Loop attempt: 0
protocol.loops.LoopMover: perturb_one_loop_with_KIC: 32 10
protocol.loops.LoopMover: remodel init temp: 2
protocol.loops.LoopMover: remodel final temp: 1
protocol.loops.LoopMover: kinematic initial perturb with start_res: 32 middle res: 36 end_res: 41
protocol.loops.LoopMover: loop rmsd before initial kinematic perturbation:0
protocol.loops.LoopMover: Attempting loop building: 0 ...
protocol.loops.LoopMover: Attempting loop building: 1 ...
Segmentation fault: 11

so the cause seems memory relevant but i have no clue why this is happening or how to fix it. thanks.

Fri, 2013-04-05 07:35
banshee

A) Wow, you got a segfault in debug mode!

B) It looks like it's on the 192 model? Not the first? Can you confirm that?

C) Is the crash reproducible in release mode? Does it always fail in exactly the same place (when tested with -constant_seed)? If the crash is NOT reproducible, then we should consider hardware errors (bad RAM).

Fri, 2013-04-05 09:11
smlewis

A) yes i did
B) yes it crashed at the 192th model, so all my seg faults have been happening after rosetta outputs couple successful models, sometimes in the 10s and some times in the 100s but definitely below 300.
C) i dont know, i guess i need to try couple release runs with a constant seed? i will write the result here after i try it. thanks.

Fri, 2013-04-05 11:02
banshee

C) Yes, or you can use a random seed that failed quickly in an earlier test - look at the top of the log file from a test that failed in the 10s, if you have one. -jran lets you pass in a desired RNG seed.

Fri, 2013-04-05 13:52
smlewis