You are here

ETABLE constraint function type

9 posts / 0 new
Last post
ETABLE constraint function type
#1

I am running rosetta_bin_linux_2016.15.58628_bundle and I would like to run a prediciton with a set of constraints which are defined by a potential using the ETABLE function.

As it was not implemented in the FuncFactory.cc file, I have added: 

#include <core/scoring/func/EtableFunc.hh> in the //Package headers

and 

FuncFactory::add_type( "ETABLE", FuncOP( new EtableFunc( 0, 0, 0) ) );

After that, I have recompiled and it worked to identify the Etable Function. :)

 

Next I have runned a test with Abinitio protocol and a single constraint and the following error took place:

ERROR: dfunc not implemented!

ERROR:: Exit from: src/core/scoring/func/EtableFunc.cc line: 90


I couldn't find how to solve this or even if what I have done to implement the EtableFunction was resonable. I am not a computer expert after all...
 

 

Post Situation: 
Tue, 2017-01-10 08:01
allan.ferrari

Let me clarify:

The "etable" - which is no longer a table - is a grouped scorefunction term in the normal scorefunction, which evaluates the tight physics terms like fa_rep, fa_atr, (collectively van der Waals) and solvation.  (It's called etable because it used to be faster to do table lookups than calculate; now we calculate it instead).

Constraints are user-provided "extra" terms to the scorefunction, which basically say "I want a conformation like THIS", or "here is some experimental data to use to bias scoring".

Funcs - Functions - convert the value measured by a constraint, like an atom-atom distance, into an energy value so that it can be added to the scorefunction.

Given those definitions - what are you trying to do?  You want to write a func for constraints that depends on the etable?  Why?  Should this be a constraint at all, or just a new scorefunction term?  Should you just modify the existing etable instead?

Tue, 2017-01-10 09:10
smlewis

Well, 

I have defined a function that relates a distance between two atoms to an energy number with a function different from those avaiable in Rosetta. As I don't have enough experience programming, I have decided to use the ETABLE Function Type to interpolate values of the function for a given range of distances, as defined here: 

https://www.rosettacommons.org/docs/latest/rosetta_basics/file_types/constraint-file

There, there is the definition of the function type "ETABLE min max [many numbers]" in which min and max refers to the minimum and maximum values the funtion is defined and many numbers are the values of the function from min to max with a step of 0.1)

The first problem was that ETABLE was not explicit defined, as I mentioned, in the FuncFactory.cc. This one I could manage.

The second one was that the derivate of the function is not defined in the EtableFunc.cc and, due to that, AbinitoProtocol stops with the massage: dfunc not implemented (EtableFunc.cc line: 90).

I am not sure if I can set it to zero somehow (and what will be the consequences of this). Or, if it not the case, the implementation is resonable simple based on the interpolation of the values given.

 

Tue, 2017-01-10 14:05
allan.ferrari

(UW had a power outage - a lot of the Rosetta sites were down.  They're back.  https://www.rosettacommons.org/docs/latest/rosetta_basics/file_types/constraint-file)

Great!  Reading your notes and then reading the code, I can come to a few conclusions.  First, you are totally doing this right!

A) Yup, you're totally using EtableFunc, I didn't know that existed.

B) Its absence from FuncFactory indicates that nobody is using it in constraint files...but it has a read_data function defined, so I guess it's a reversion and a bug.  Your addition to FuncFactory.cc is correct!

C) dfunc is "derivative of this function".  Look at HarmonicFunc for the simplest example - the function (function named func) is X^2, so dfunc returns 2x.  (I'm ignoring the sd variable here, but you get the idea).  so: 1) why does it crash on dfunc, and 2) what should you do?

1) It crashes on dfunc because the function can't know its own derivative.  Rosetta can't do numeric derivatives automatically, and it certainly can't do analytical derivatives.  Since your function is user defined, only you know the derivative.  The implication here is that the original author felt that this code should never go through minimization (which is where dfunc is used).  

2) if you set it to zero - it will give incorrect results in minimization, basically.  The minimizer won't see this constraint when minimizing, so it will detect minima in the wrong places.  How bad a problem is this?  I can't know, it depends on your function and your problem.  If the real derivative is zero in the "good" parts of your conformational space, it might be ok.  Setting it nonzero is definitely bad.

Tue, 2017-01-10 13:56
smlewis

Thank you, smlewis! I think we are on the same page!

My function is something like f(x) = A/(1+exp((x-x0)/dx)), where A, x0 and dx are constants. As I don't have enough computer knowledge to write the code from scratch, I have numerically defined the values of the function and tabulated them on the range of interest. For my porpuses, I belive the interpolation is accurate enough.

Regarding the derivative:

A) I could simply derive it from my function, but the Etablefunc is not defined in terms of the variables of my function. Said that, it would not be possible to write it directly in the EtableFunc.cc file and I believe it would be easier to write a new code for my function (or the only option). Hard work for me!!

or B) One way to do so, taking advantage of the EtableFunc.cc code, it would be to define the derivative approximately from the interpolation, by (f(x) - f(x - 0.1) / 0.1).  I just don't know how to do it preciselly on the code. I doesn't appear to be that difficult.. what do you think?

In principle, I don't know if optimize based on the derivative of my function is something that would be critical to my case. So, C) could I just ignore this derivative in my prediction? (As the original author thought it should be with this function type?) At the end of the day, what I want is that in every selection of the model which will survive to the next step of fragment assembly, the contraint function is being taken to account to the final score. (I don't want to minimize it necessarily!)

Thinking in the C) option, as an example (and as a noob), I set the derivative to 0. Realize that my function has values between -2.5 and 0, defined in distances between 3 and 10 (and my file had only one constraint). The result was an atom_pair_constraint raw score of -inf (a huge negative number). I don't have any idea why this is taking place, do you? More precisely (and as far as I can read the output on the screen): Stage 1 and 2 of the AbInitio AtomPair has 0 as Raw Score, in Stage 3 it assumes the value of -inf! Why only in that stage?

As an aditional information 1) all my tabulated data is with double precision values. I was suspicious that this could have anything to do with what I have mentioned... I don't know ... (more tests?) 2) A number of red massages after core.optimization.LineMinimizer shows up with Inaccurate G! step = .... Is it telling me that something is not ok?

 

Tue, 2017-01-10 14:53
allan.ferrari

dfunc: I guess you can do it the way you describe, sure.

-inf: hmm.  I'm going to guess that you're going out of bounds?  There appears to be no bounds checking in func for EtableFunc.  I can't do it in my head, but I'm guessing that if the input value is beyond the legal range, maybe you end up with a divide by zero and thus inf or -inf.  I don't think this has to do with dfunc.  Try increasing your defined range, doing something to measure that value directly to see if it goes out of range, or (god forbid) putting a cout statement in EtableFunc::func to print the incoming x.  (that last one will make the code SUPER SLOW).

The inaccurate G! messages are broadly a statement that the minimzer senses a problem.  They occur for reasons none of us have ever fully tracked down, but they're 100% consistent with you using a derivative of zero - I think they mean the minimizer was upset that the expected value after a minimization step didn't match the real score value (which, since one derivative is wrong, is no surprise).

 

 

Tue, 2017-01-10 15:11
smlewis

Thank you smlewis!

I was able to implement the function and its derivative. From some tests I have made, it is working as expected. 

More doubts will come soon. :p

Fri, 2017-01-13 07:29
allan.ferrari

I was wondering: I have different [many number] for each of my AtomPairs. It causes that the cst_file is quite polluted. Is there a way to call the many numbers moiety of the line from another file. Like this:

AtomPair CA 13 CA 24 min max [many numbers]

to 

AtomPair CA 13 CA 24 min max $file

 

I have tried something here, but it was unsuccessful.

Sat, 2017-01-14 06:26
allan.ferrari

I assume there should be an ETABLE in there to identify the func?

For your question: no, EtableFunc's read_data is not that smart:


void
EtableFunc::read_data( std::istream& in ) {
    in  >> min_ >> max_;
    stepsize_ = 0.1;
    for ( Real r = min_; r <= max_; r += stepsize_ ) {
        core::Real func_temp;
        in >> func_temp;
        func_.push_back( func_temp  );
    }
}
 

You are welcome to rewrite the function to take a different data source!  

Sun, 2017-01-15 18:14
smlewis