You are here

NaNs for the p_aa_pp Score Term

6 posts / 0 new
Last post
NaNs for the p_aa_pp Score Term

I've been having a very strange issue with the energies of a PyRosetta script I have been working on. It seems that I am getting NaN returned as the score value as I make mutations to a pose and upon further inspection it seems that the source of the NaNs is the p_aa_pp term. Specifically it seems that the score function subroutine of "eval_ci_1b" is the one producing the NaNs and "eval_ci_2b" doesn't seem to produce NaNs at all. It also seems to be a random occurrence as reinitializing with a new seed sometimes causes p_aa_pp to return a real number although the real number ends up being a different number each time. Is p_aa_pp supposed to be randomized? I wrote a minimal script to identify the problem and p_aa_pp seems to be randomized even though I'm not changing the pose. All the other score terms remain constant. The really weird thing is that this only happens on the manual build I performed for a BlueGene Q cluster. It never happened on the other two clusters that we've been using (which was using the latest downloadable pre-compiled PyRosetta package). Does anyone have any ideas on how to fix this? I don't know if it's a bug or if it's (probably) just something I messed up during compilation (it was a major headache trying to compile on BGQ). I used the instructions in rosetta3.4/rosetta_source/src/python/bindings/building.txt to build it, although I couldn't get all the protocols to build so I only built the ones I needed.

Obviously we'd like to try to get it to work on BGQ for the supercomputing power since we have a lot of parallel code, but if I disable p_aa_pp will that cause problems?

Here are the specifications of the BGQ if it's useful information:
RHEL 6, PPC64, Big-Endian
Python 2.7.3 (also had to be recompiled for BGQ)

Post Situation: 
Mon, 2013-09-09 06:23

As far as I know, the p_aa_pp term should be deterministic. That's not to say that a different seed wouldn't be an issue, as a different seed would result in a different structure, and thus a different p_aa_pp evaluation. Does the NaN come just from scoring, or is it in the context of packing/minimization?

It might help if you could come up with a reproducible example, preferably one that isn't dependent on the RNG seed. Try dumping the structure giving you NaNs as a binary silent file. (PDBs lose precision on coordinates - a binary silent file will keep the full precision.) You can then read in and rescore the structure both under BGQ and the local cluster - that should tell us if it's an issue with the structure itself, or with the p_aa_pp evaluation under BlueGene. I'm guessing it's some bug that only manifests itself under BlueGene. The BlueGene compiler tends to be more finicky than GCC, and we don't do much testing with it.

Another option you could try, if you're using a recent-ish version of PyRosetta, is to set the -corrections:score:use_bicubic_interpolation option to true. This tells Rosetta to use a slightly different evaluation technique for a number of score terms, including p_aa_pp, and so may avoid the NaN issues..

Mon, 2013-09-09 10:39

Sorry for my late response, I got busy with other things yesterday and forgot to respond. I dumped the first pose that generated a NaN in my script (using pose.dump_pdb) and then ran the following simple script:

from rosetta import *

pose = pose_from_pdb("bad_pose.pdb")
scorefxn = create_score_function("standard")
print scorefxn(pose)
Output: nan

Although if I reinitialize and do the same thing, it sometimes gives me a real number. I think you're right, it's most likely something weird with the BG compiler. I might try to recompile in the future but it's a major hassle to compile. I tried using the bicubic interpolation option like you suggested by doing 'init(extra_options="-corrections:score:use_bicubic_interpolation")' and it says that it doesn't know what this option is. But it recognizes the option on other architectures. What if I just turn off p_aa_pp? Would that give unrealistic scores?

Wed, 2013-09-11 05:45

Are you using the same PyRosetta release on all the computers? The use_bicubic_interpolation option dates from about a year ago, so if your BG PyRosetta version is older than that it might not have the option.

If it's more recent than that, I'd seriously question the state of the compile. I'd highly recommend recompiling, as a botched compile has the possibility to give you bad/inaccurate results even in cases where there isn't an obvious error.

Regarding turning off the p_aa_pp term (probability of amino acid given phi and psi), if you're doing fixed sequence, fixed backbone protocols, the p_aa_pp won't change on you, and you can safely turn it off. (As long as you're not comparing to energies/thresholds made with the p_aa_pp score on.) If you're doing fixed backbone design, you may be alright in turning it off, although you may get some small position-specific sequence biases. If you're doing flexible backbone protocols (either fixed sequence or design), I'd hesitate in turning it off, as you might not get the correct backbone conformations. You may want to benchmark your particular case to see if it greatly affects things. You may need to up-weight the rama term (the conceptual inverse of p_aa_pp) to compensate.

Wed, 2013-09-11 11:59

The PyRosetta version on the non-BG computers is the latest version and the version on BG was compiled using Rosetta 3.4, so I think that bicubic interpolation should be available. I think you're right about the compile being bad. I'll try to recompile it next week and let you know what happens. I can't be sure that the rest of the energy calculation isn't messed up in some way as you said in your post.

Fri, 2013-09-13 08:41

Hey, sorry for the really late reply but I got busy with other things for the past month. Anyway, I just wanted to let you know that I tried recompiling PyRosetta on BlueGene Q to get rid of the NaNs that were showing up in the p_aa_pp score term. It took a while to recompile but I ended up getting it to work again. But the NaNs were still there! I got frustrated because I thought this would fix the problem, so I logged out of the BlueGene Q to do something else for a while. Then when I came back and tried running a job again to debug it the NaNs magically disappeared, and this happened for several consecutive runs. I was searching through the Rosetta code for anything related to p_aa_pp and found that it was trying to load data from rosetta_database. I thought one of the files in rosetta_database might have been corrupt so I pointed PyRosetta back to the old rosetta_database and sure enough the NaNs came back. So the problem was with rosetta_database all along! At least it seems highly probable that the database was the problem. When I tried submitting the first fresh job the PYROSETTA_DATABASE environment variable had been sourced from the old PyRosetta build, so the next login updated it to the new database. I don't know what happened exactly because there are the same number of files in old and new rosetta_database. I did learn the hard way that the data partition the sysadmin told me to use (because the normal one is too small) is volatile and files get erased automatically after a certain amount of time passes so maybe that has something to do with corrupting the files. Anyway, I just thought I'd post back here in case this helps someone else. Thanks for your help!

Mon, 2013-10-14 10:32