Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to calculate gradients in a numerically stable fashion

I would like to compute derivatives of a ratio f = - a / b in a numerically stable fashion using tensorflow but am running into problems when a and b are small (<1e-20 when using 32-bit floating point representation). Of course, the derivative of f is df_db = a / b ** 2 but because of the operator precedence, the square in the denominator is computed first, underflows, and leads to an undefined gradient.

If the derivative was calculated as df_db = (a / b) / b, the underflow would not occur and the gradient would be well-defined as illustrated in the figure below which shows the gradient as a function of a = b. The blue line corresponds to the domain in which tensorflow can calculate the derivative. The orange line corresponds to the domain in which the denominator underflows yielding an infinite gradient. The green line corresponds to the domain in which the denominator overflows yielding a zero gradient. In both problematic domains, the gradient can be calculated using the modified expression above.

enter image description here

I've been able to get a more numerically stable expression by using the ugly hack

g = exp(log(a) - log(b))

which is equivalent to f but yields a different tensorflow graph. But I run into the same problem if I want to calculate a higher-order derivative. The code to reproduce the problem can be found here.

Is there a recommended approach to alleviate such problems? Is it possible to explicitly define a derivative of an expression in tensorflow if one doesn't want to rely on autodifferentiation?

like image 408
Till Hoffmann Avatar asked Nov 02 '25 06:11

Till Hoffmann


1 Answers

Thanks to Yaroslav Bulatov's pointer, I was able to implement a custom function with the desired gradient.

# Define the division function and its gradient
@function.Defun(tf.float32, tf.float32, tf.float32)
def newDivGrad(x, y, grad):
    return tf.reciprocal(y) * grad, - tf.div(tf.div(x, y), y) * grad


@function.Defun(tf.float32, tf.float32, grad_func=newDivGrad)
def newDiv(x, y):
    return tf.div(x, y)

Full notebook is here. PR is here.

enter image description here

like image 111
Till Hoffmann Avatar answered Nov 04 '25 20:11

Till Hoffmann