Creating a custom interpolation function for pandas

Question

I am currently trying to clean up and fill in some missing time-series data using pandas. The interpolate function works quite well, however it doesn't have a few (less widely used) interpolation functions that I require for my data set. A couple examples would be a simple "last" valid data point which would create something akin to a step function, or something like a logarithmic or geometric interpolation.

Browsing through the docs, it didn't appear there is a way to pass a custom interpolation function. Does such functionality exist directly within pandas? And if not, has anyone done any pandas-fu to efficiently apply custom interpolations through other means?

jdehesa · Accepted Answer

The interpolation methods offered by Pandas are those offered by scipy.interpolate.interp1d - which, unfortunately, do not seem to be extendable in any way. I had to do something like that to apply SLERP quaternion interpolation (using numpy-quaternion), and I managed to do it quite efficiently. I'll copy the code here in the hope that you can adapt it for your purposes:

def interpolate_slerp(data):
    if data.shape[1] != 4:
        raise ValueError('Need exactly 4 values for SLERP')
    vals = data.values.copy()
    # quaternions has size Nx1 (each quaternion is a scalar value)
    quaternions = quaternion.as_quat_array(vals)
    # This is a mask of the elements that are NaN
    empty = np.any(np.isnan(vals), axis=1)
    # These are the positions of the valid values
    valid_loc = np.argwhere(~empty).squeeze(axis=-1)
    # These are the indices (e.g. time) of the valid values
    valid_index = data.index[valid_loc].values
    # These are the valid values
    valid_quaternions = quaternions[valid_loc]
    # Positions of the missing values
    empty_loc = np.argwhere(empty).squeeze(axis=-1)
    # Missing values before first or after last valid are discarded
    empty_loc = empty_loc[(empty_loc > valid_loc.min()) & (empty_loc < valid_loc.max())]
    # Index value for missing values
    empty_index = data.index[empty_loc].values
    # Important bit! This tells you the which valid values must be used as interpolation ends for each missing value
    interp_loc_end = np.searchsorted(valid_loc, empty_loc)
    interp_loc_start = interp_loc_end - 1
    # These are the actual values of the interpolation ends
    interp_q_start = valid_quaternions[interp_loc_start]
    interp_q_end = valid_quaternions[interp_loc_end]
    # And these are the indices (e.g. time) of the interpolation ends
    interp_t_start = valid_index[interp_loc_start]
    interp_t_end = valid_index[interp_loc_end]
    # This performs the actual interpolation
    # For each missing value, you have:
    #   * Initial interpolation value
    #   * Final interpolation value
    #   * Initial interpolation index
    #   * Final interpolation index
    #   * Missing value index
    interpolated = quaternion.slerp(interp_q_start, interp_q_end, interp_t_start, interp_t_end, empty_index)
    # This puts the interpolated values into place
    data = data.copy()
    data.iloc[empty_loc] = quaternion.as_float_array(interpolated)
    return data

The trick is in np.searchsorted, which very quickly finds the right interpolation ends for each value. The limitation of this method is that:

Your interpolation function must work somewhat like quaternion.slerp (which should not be strange since it has regular ufunc broadcasting behaviour).
It only works for interpolation methods that require only one value on each end, so if you want e.g. something like a cubic interpolation (which you don't because that one is already provided) this wouldn't work.

Creating a custom interpolation function for pandas

Tags:

python

pandas

interpolation

MarkD

1 Answers

jdehesa

Recent Activity

Donate For Us

Creating a custom interpolation function for pandas

Tags:

python

pandas

interpolation

MarkD

1 Answers

jdehesa

Related questions

Recent Activity

Donate For Us