How to calculate gradients in a numerically stable fashion

Question

I would like to compute derivatives of a ratio f = - a / b in a numerically stable fashion using tensorflow but am running into problems when a and b are small (<1e-20 when using 32-bit floating point representation). Of course, the derivative of f is df_db = a / b ** 2 but because of the operator precedence, the square in the denominator is computed first, underflows, and leads to an undefined gradient.

If the derivative was calculated as df_db = (a / b) / b, the underflow would not occur and the gradient would be well-defined as illustrated in the figure below which shows the gradient as a function of a = b. The blue line corresponds to the domain in which tensorflow can calculate the derivative. The orange line corresponds to the domain in which the denominator underflows yielding an infinite gradient. The green line corresponds to the domain in which the denominator overflows yielding a zero gradient. In both problematic domains, the gradient can be calculated using the modified expression above.

enter image description here

I've been able to get a more numerically stable expression by using the ugly hack

g = exp(log(a) - log(b))

which is equivalent to f but yields a different tensorflow graph. But I run into the same problem if I want to calculate a higher-order derivative. The code to reproduce the problem can be found here.

Is there a recommended approach to alleviate such problems? Is it possible to explicitly define a derivative of an expression in tensorflow if one doesn't want to rely on autodifferentiation?

Till Hoffmann · Accepted Answer

Thanks to Yaroslav Bulatov's pointer, I was able to implement a custom function with the desired gradient.

# Define the division function and its gradient
@function.Defun(tf.float32, tf.float32, tf.float32)
def newDivGrad(x, y, grad):
    return tf.reciprocal(y) * grad, - tf.div(tf.div(x, y), y) * grad


@function.Defun(tf.float32, tf.float32, grad_func=newDivGrad)
def newDiv(x, y):
    return tf.div(x, y)

Full notebook is here. PR is here.

enter image description here

How to calculate gradients in a numerically stable fashion

Tags:

python

tensorflow

Till Hoffmann

1 Answers

Till Hoffmann

Recent Activity

Donate For Us

How to calculate gradients in a numerically stable fashion

Tags:

python

tensorflow

Till Hoffmann

1 Answers

Till Hoffmann

Related questions

Recent Activity

Donate For Us