(Note: this is not a question about back-propagation.) I am trying so solve on a GPU a non-linear PDE using PyTorch tensors in place of Numpy arrays. I want to calculate the partial derivatives of an arbitrary tensor, akin to the action of the center finite-difference numpy.gradient function. I have other ways around this problem, but since I am already using PyTorch, I'm wondering if it is possible use the autograd module (or, in general, any other autodifferentiation module) to perform this action.
I have created a tensor-compatible version of the numpy.gradient function - which runs a lot faster. But perhaps there is a more elegant way of doing this. I can't find any other sources that address this question, either to show that it's possible or impossible; perhaps this reflects my ignorance with the autodifferentiation algorithms.
I've had this same question myself: when numerically solving PDEs, we need access to spatial gradients (which the numpy.gradients function can give us) all the time - could it be possible to use automatic differentiation to compute the gradients, instead of using finite-difference or some flavor of it?
"I'm wondering if it is possible use the autograd module (or, in general, any other autodifferentiation module) to perform this action."
The answer is no: as soon as you discretize your problem in space or time, then time and space become discrete variables with a grid-like structure, and are not explicit variables which you feed into some function to compute the solution to the PDE.
For example, if I wanted to compute the velocity field of some fluid flow u(x,t), I would discretize in space and time, and I would have u[:,:] where the indices represent positions in space and time. 
Automatic differentiation can compute the derivative of a function u(x,t). So why can't it compute the spatial or time derivative here? Because you've discretized your problem. This means you don't have a function for u for arbitrary x, but rather a function of u at some grid points. You can't differentiate automatically with respect to the spacing of the grid points.
As far as I can tell, the tensor-compatible function you've written is probably your best bet. You can see that a similar question has been asked in the PyTorch forums here and here. Or you could do something like
dx = x[:,:,1:]-x[:,:,:-1]
if you're not worried about the endpoints.
You can use PyTorch to compute the gradients of a tensor with respect to another tensor under some constraints. If you're careful to stay within the tensor framework to ensure a computation graph is created, then by repeatedly calling backward on each element of the output tensor and zeroing the grad member of the independent variable you can iteratively query the gradient of each entry. This approach allows you to gradually build the gradient of a vector valued function.
Unfortunately this approach requires calling backward many times which may be slow in practice and may result in very large matrices.
import torch
from copy import deepcopy
def get_gradient(f, x):
    """ computes gradient of tensor f with respect to tensor x """
    assert x.requires_grad
    x_shape = x.shape
    f_shape = f.shape
    f = f.view(-1)
    x_grads = []
    for f_val in f:
        if x.grad is not None:
            x.grad.data.zero_()
        f_val.backward(retain_graph=True)
        if x.grad is not None:
            x_grads.append(deepcopy(x.grad.data))
        else:
            # in case f isn't a function of x
            x_grads.append(torch.zeros(x.shape).to(x))
    output_shape = list(f_shape) + list(x_shape)
    return torch.cat((x_grads)).view(output_shape)
For example, given the following function:
f(x0,x1,x2) = (x0*x1*x2, x1^2, x0+x2)
The Jacobian at x0, x1, x2 = (1, 2, 3) could be computed as follows
x = torch.tensor((1.0, 2.0, 3.0))
x.requires_grad_(True)   # must be set before further computation
f = torch.stack((x[0]*x[1]*x[2], x[1]**2, x[0]+x[2]))
df_dx = get_gradient(f, x)
print(df_dx)
which results in
tensor([[6., 3., 2.],
        [0., 4., 0.],
        [1., 0., 1.]])
For your case if you can define an output tensor with respect to an input tensor you could use such a function to compute the gradient.
A useful feature of PyTorch is the ability to compute the vector-Jacobian product. The previous example required lots of re-applications of the chain rule (a.k.a. back propagation) via the backward method to compute the Jacobian directly. But PyTorch allows you to compute the matrix/vector product of the Jacobian with an arbitrary vector which is much more efficient than actually building the Jacobian. This may be more in line with what you're looking for since you can finagle it to compute multiple gradients at various values of a function, similar to the way I believe numpy.gradient operates.
For example, here we compute f(x) = x^2 + sqrt(x) for x = 1, 1.1, ..., 1.8 and compute the derivative (which is f'(x) = 2x + 0.5/sqrt(x)) at each of these points
dx = 0.1
x = torch.arange(1, 1.8, dx, requires_grad=True)
f = x**2 + torch.sqrt(x)
f.backward(torch.ones(f.shape))
x_grad = x.grad
print(x_grad)
which results in
tensor([2.5000, 2.6767, 2.8564, 3.0385, 3.2226, 3.4082, 3.5953, 3.7835])
Compare this to numpy.gradient
dx = 0.1
x_np = np.arange(1, 1.8, dx)
f_np = x_np**2 + np.sqrt(x_np)
x_grad_np = np.gradient(f_np, dx)
print(x_grad_np)
which results in the following approximation
[2.58808848 2.67722558 2.85683288 3.03885421 3.22284723 3.40847554 3.59547805 3.68929417]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With