Suppose I have an image and I want to find a subarray with shape 3x3 that contains the maximum sum compared to other subarrays.
How do I do that in python efficiently (run as fast as possible)? If you can provide a sample code that would be great.
My specific problem: I want to extract the location of the center of the blob in this heatmap

I don't want to just get the maximum point because that would cause the coordinate to not be very precise. The true center of the blob could actually be between 2 pixels. Thus, it's better to do weighted average between many points to obtain subpixel precision. For example, if there are 2 points (x1,y1) and (x2,y2) with values 200 and 100. Then the average coordinate will be x=(200*x1+100*x2)/300 y=(200*y1+100*y2)/300
One of my solution is to do a convolution operation. But I think it's not efficient enough because it requires multiplication to the kernel (which contains only ones). I'm looking for a fast implementation so I cannot do looping myself because I'm not sure if it will be fast.
I want to do this algorithm to 50 images every few milliseconds. (Image come in as a batch). Concretely, think of these images as output of a machine learning model that output heatmaps. In order to obtain the coordinate from these heatmaps, I need to do some kind of weighted average between the coordinates with high intensity. My idea is to do a weighted average around 3x3 area on the image. I am also open to other approaches that can be faster or more elegant.
Looking for the "subarray of shape 3x3 with the maximum sum" is the same as looking for the maximum of an image after it has been filtered with an un-normalized 3x3 box filter. So it boils down to finding efficiently the maximum of an image, which you assume is a (perhaps "noisy") discrete sample of an underlying continuous and smooth signal - hence your desire to find a sub-pixel location.
You really need to split the problem in 2 parts:
m=(xm, ym) of the maximum value of the image. This requires no more than a visit of every pixel in the image, and one comparison per pixel, so it's O(N) and hence optimal as long as you are operating at the native image resolution. In OpenCv it is done using
the minMaxLoc function.To clarify point (2): you write
I don't want to just get the maximum point because that would cause the coordinate to not be very precise. The true center of the blob could actually be between 2 pixels
While intuitively plausible, this assertion needs to be made more precise in order to be computable. That is, you need to express mathematically what assumptions you make about the image, that bring you to search for a "true" maximum between pixel-sampled location.
A simple example for such assumptions is quadratic smoothness. In this scenario you assume that, in a small (say, 3x3, of 5x5) neighborhood of the "true" maximum location, the image signal z is well approximated by a quadratic:
z = A00 dx^2 + A01 dx dy + A11 dy^2 + A02 dx + A12 dy + A22
where:
dx = x - xm; dy = y - ym
This assumption makes sense if the underlying signal is expected to be at least 3rd order continuous and differentiable, because of the Taylor series theorem. Geometrically, it means that you assume (hope?) that the signal looks like a quadric (a paraboloid, or an ellipsoid) near its maximum.
You can then evaluate the above equation for each of the pixels in a neighborhood of m, replacing the actual image values for z, and thus obtain a linear system in the unknown Aij, with as many equations as there are neighbor pixels (so even a 3x3 neighborhood will yield an over-constrained system). Solving the system in the least-squares sense gives you the "optimal" coefficients Aij. The theoretical maximum as predicted by this model is where the first partial derivatives vanish:
del z / del dx = 2 A00 dx + A01 dy = 0
del z / del dy = A01 dx + 2 A11 dy = 0
This is a linear system in the two unknown (dx, dy), and solving it yields the estimated location of the maximum and, through the above equation for z, the predicted image value at the maximum.
In terms of computational cost, all such model estimations are extremely fast, compared with traversing an image of even moderate size.
I am sorry I did not exactly understand the meaning of your last paragraph so I have just stopped at a point where I got all the coordinates having the maximum value. I have used cv2.filter2D for convolution on a thresholded image and then using np.amax and np.where have found the coordinates having the maximum value.
import cv2
import numpy as np
from timeit import default_timer as timer
img = cv2.imread('blob.png', 0)
start = timer()
_, thresh = cv2.threshold(img, 240, 1, cv2.THRESH_BINARY)
mask = np.ones((3, 3), np.uint8)
res = cv2.filter2D(thresh, -1, mask)
result = np.where(res == np.amax(res))
end = timer()
print(end - start)
I don't whether it as efficient as you want or not but the output was 0.0013461999999435648 s
P.S. The image you have provided had a white border which I had to crop out for this method.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With