I'm trying to compute average values of shifted diagonals of a square array.
Given input matrix like (in reality much larger than 3x3):
[[a, b, c],
[d, e, f],
[g, h, i]]
correct answer would be
[g, (d+h)/2, (a+e+i)/3, (b+f)/2, c]
A code to compute such average could be:
import numpy as np
def offset_diag_mean(mat):
n = len(mat)
return np.array([np.mean(np.diag(mat,k)) for k in range(-n+1,n)])
I assume one can do without a list comprehension (speed is quite important for me). Any clever solutions I'm missing?
On efficient solution is to accumulate lines of the input 2D array directly in the output array at a specific position and then perform the division.
The idea is to zero-initialize an output
array, then add [a, b, c]
to output[2:5]
, then add [d, e, f]
to output[1:4]
and then add [g, h, i]
to output[0:3]
. Finally, we can divide result
by [1, 2, 3, 2, 1]
.
Here is an implementation:
# Use the decorator @nb.njit here to use Numba
def compute(mat):
n = mat.shape[0]
output = np.zeros(n*2-1, dtype=np.float64)
for i in range(n-1, -1, -1):
output[i:i+n] += mat[n-1-i]
output[0:n] /= np.arange(1, n+1, 1, dtype=np.float64)
output[n:] /= np.arange(n-1, 0, -1, dtype=np.float64)
return output
Here is the resulting performance on my machine on a 1000x1000 array:
Initial code: 15.31 ms
Kevin's code: 3.26 ms ( x4.7)
This code: 1.42 ms (x10.8)
This code + Numba: 0.66 ms (x23.2)
Thus, this implementation is 10.8 times faster than the initial implementation and 2.3 times faster than the fastest alternative version. Note that using Numba results in an even faster code with almost not change to the code: this Numba version is 23.2 times faster than the initial implementation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With