I like to least-squares match data (a numpy array of floats) with many known signal shapes. My code works but is too slow for the many runs I plan to do:
import numpy
import time
samples = 50000
width_signal = 100
data = numpy.random.normal(0, 1, samples)
signal = numpy.random.normal(0, 1, width_signal) # Placeholder
t0 = time.clock()
for i in range(samples - width_signal):
data_chunk = data[i:i + width_signal]
residuals = data_chunk - signal
squared_residuals = residuals**2
summed_residuals = numpy.sum(squared_residuals)
t1 = time.clock()
print('Time elapsed (sec)', t1-t0)
EDIT: Corrected a mistake: First square residuals, then sum them.
This takes about 0.2 sec to run on my machine. As I have many datasets and signal shapes, this is too slow. My specific problem does not allow for typical MCMC methods because the signal shapes are too different. It has to be brute force.
Typical volumes are 50,000 floats for the data and 100 for the signal. These can vary by a factor of a few.
My tests show that:
numpy.sum(residuals) eats 90% of the time. I tried Python's sum(residuals) and it is faster for small arrays (~<50 elements) and slower for bigger arrays. Should I insert an if condition?numpy.roll() instead of fetching data directly, and .roll() is slower.Questions:
Based on the various methods proposed in Compute mean squared, absolute deviation and custom similarity measure - Python/NumPy, we look to solve our case here.
Approach #1
We can leverage np.lib.stride_tricks.as_strided based scikit-image's view_as_windows to get sliding windows and thus have our first solution here, like so -
from skimage.util import view_as_windows
d = view_as_windows(data,(width_signal))-signal # diffs
out = np.einsum('ij,ij->i',d,d)
More info on use of as_strided based view_as_windows.
Approach #2
Again based on the matrix-multiplication trick in that answer post, we could better the performance, like so -
def MSD_strided(data, signal):
w = view_as_windows(data,(width_signal))
return (w**2).sum(1) + (signal**2).sum(0) - 2*w.dot(signal)
Approach #3
We will improve on approach#2 by bringing on uniform filtering and convolution -
from scipy.ndimage.filters import uniform_filter
def MSD_uniffilt_conv(data, signal):
hW = width_signal//2
l = len(data)-len(signal)+1
parte1 = uniform_filter(data**2,width_signal)[hW:hW+l]*width_signal
parte3 = np.convolve(data, signal[::-1],'valid')
return parte1 + (signal**2).sum(0) - 2*parte3
Benchmarking
Timings on posted sample -
In [117]: %%timeit
...: for i in range(samples - width_signal + 1):
...: data_chunk = data[i:i + width_signal]
...: residuals = data_chunk - signal
...: squared_residuals = residuals**2
...: summed_residuals = numpy.sum(squared_residuals)
1 loop, best of 3: 239 ms per loop
In [118]: %%timeit
...: d = view_as_windows(data,(width_signal))-signal
...: np.einsum('ij,ij->i',d,d)
100 loops, best of 3: 11.1 ms per loop
In [209]: %timeit MSD_strided(data, signal)
10 loops, best of 3: 18.4 ms per loop
In [210]: %timeit MSD_uniffilt_conv(data, signal)
1000 loops, best of 3: 1.71 ms per loop
~140x speedup there with the third one!
Apart from the versions given by Divakar you could simply use a compiler like Numba or Cython.
Exmaple
import numba as nb
@nb.njit(fastmath=True,parallel=True)
def sq_residuals(data,signal):
summed_residuals=np.empty(data.shape[0]+1-signal.shape[0],dtype=data.dtype)
for i in nb.prange(data.shape[0] - signal.shape[0]+1):
sum=0.
for j in range(signal.shape[0]):
sum+=(data[i+j]-signal[j])**2
summed_residuals[i]=sum
return summed_residuals
Timings
Numba 0.4dev, Python 3.6, Quadcore i7
MSD_uniffilt_conv(Divakar): 2.4ms
after the first call which invokes some compilation overhead:
sq_residuals : 1.7ms
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With