in my project, I try to detect shop signs in my dataset. I'm using Mask-RCNN. The image sizes are 512x512. shop sign images with Mask-RCNN
results = model.detect([image], verbose=1)
r = results[0]
masked = r['masks']
rois = r['rois']
After I run above code, 'rois' gave me the coordinates of bounding boxes of the shop sign (e.g. [40, 52, 79, 249]). r['masks'] gave me the boolean matrix which represents each masks in the image. The pixel value in the mask matrix is 'True' if this pixel is in the mask region. And the pixel value is 'False' if this pixel is out of the mask region. If the model detects 7 shop signs (i.e. 7 masks) in the image, size of the r['masks'] is 512x512x7. Each channel represents different masks.
I have to deal with each mask individually, therefore I separated each channel and let's say get the first one. Then I found the coordinates in the mask array of the 'True' pixels.
array = masked[:,:,0]
true_points = []
for i in range(512):
for j in range(512):
if array[i][j] == True:
true_points.append([j, i])
So, my question is how can I get the coordinates of the corner of the mask(i.e. shop sign) from this boolean matrix? Most of the shop signs are rectengular but they can be rotated. I have coordinates of bounding box, but it is not accurate when shop sign is rotated. I have coordinates of 'True' points. Can you suggest an algorithm to find corner 'True' values?
Perspective transform can be used for this problem:
cv2.getPerspectiveTransform and cv2.warpPerspectiveFor corner points detection, we can use cv2.findContours and cv2.approxPolyDP
cv2.findContours find contours in a binary image.
contours, _ = cv2.findContours(r['masks'][:,:,0].astype(np.uint8), cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
Then approximate the rectangle from contours using cv2.approxPolyDP:
The crucial point in cv2.approxPolyDP is the epsilon (parameter for approximation accuracy). Custom thresholding for rectangle point detection(below)
def Contour2Quadrangle(contour):
def getApprox(contour, alpha):
epsilon = alpha * cv2.arcLength(contour, True)
approx = cv2.approxPolyDP(contour, epsilon, True)
return approx
# find appropriate epsilon
def getQuadrangle(contour):
alpha = 0.1
beta = 2 # larger than 1
approx = getApprox(contour, alpha)
if len(approx) < 4:
while len(approx) < 4:
alpha = alpha / beta
approx = getApprox(contour, alpha)
alpha_lower = alpha
alpha_upper = alpha * beta
elif len(approx) > 4:
while len(approx) > 4:
alpha = alpha * beta
approx = getApprox(contour, alpha)
alpha_lower = alpha / beta
alpha_upper = alpha
if len(approx) == 4:
return approx
alpha_middle = (alpha_lower * alpha_upper ) ** 0.5
approx_middle = getApprox(contour, alpha_middle)
while len(approx_middle) != 4:
if len(approx_middle) < 4:
alpha_upper = alpha_middle
approx_upper = approx_middle
if len(approx_middle) > 4:
alpha_lower = alpha_middle
approx_lower = approx_middle
alpha_middle = ( alpha_lower * alpha_upper ) ** 0.5
approx_middle = getApprox(contour, alpha_middle)
return approx_middle
def getQuadrangleWithRegularOrder(contour):
approx = getQuadrangle(contour)
hashable_approx = [tuple(a[0]) for a in approx]
sorted_by_axis0 = sorted(hashable_approx, key=lambda x: x[0])
sorted_by_axis1 = sorted(hashable_approx, key=lambda x: x[1])
topleft_set = set(sorted_by_axis0[:2]) & set(sorted_by_axis1[:2])
assert len(topleft_set) == 1
topleft = topleft_set.pop()
topleft_idx = hashable_approx.index(topleft)
approx_with_reguler_order = [ approx[(topleft_idx + i) % 4] for i in range(4) ]
return approx_with_reguler_order
return getQuadrangleWithRegularOrder(contour)
Lastly, we generate a new image with our desired destination coordinates.
contour = max(contours, key=cv2.contourArea)
corner_points = Contour2Quadrangle(contour)
src = np.float32(list(map(lambda x: x[0], corner_points)))
dst = np.float32([[0,0],[0, 200],[400, 200],[200, 0]])
M = cv2.getPerspectiveTransform(src, dst)
transformed = cv2.warpPerspective(img, M, (rect_img_w, rect_img_h))
plt.imshow(transformed) # check the results
If you know the rotation angle just rotate the bbox corners e.g. usig cv2.warpAffine on the corner points. If you dont , then you can find the extrema more-or-less easily like this
H,W = array.shape
left_edges = np.where(array.any(axis=1),array.argmax(axis=1),W+1)
flip_lr = cv2.flip(array,1) #1 horz vert 0
right_edges = W-np.where(flip_lr.any(axis=1),flip_lr.argmax(axis=1),W+1)
top_edges = np.where(array.any(axis=0),array.argmax(axis=0),H+1)
flip_ud = cv2.flip(array,0) #1 horz vert 0
bottom_edges = H - np.where(flip_ud.any(axis=0),flip_ud.argmax(axis=0),H+1)
leftmost = left_edges.min()
rightmost = right_edges.max()
topmost = top_edges.min()
bottommost = bottom_edges.max()
Your bbox has corners (leftmost, topmost), (rightmost, bottommost), here's an example where i tried it. BTW if you find yourself looping over pixels, you should know there's almost always a numpy vectorized operation that will do it a lot faster.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With