How is the IoU metric calculated for multiple bounding box predictions in Tensorflow Object Detection API ?
Not sure exactly how TensorFlow does it but here is one way that I recently got it to work since I didn't find a good solution online. I used numpy matrices to get the IoU, & other metrics (TP, FP, TN, FN) for multi-object detection.
Lets say for this example that your image is 6x6.
import cv2
empty_array = np.zeros(36).reshape([6, 6])
array([[0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.]])
And you have the ground truth for 2 objects, one in the bottom left of the image and one smaller one in the top right.
bbox_actual_obj1 = [[0, 3], [2, 5]] # top left coord & bottom right coord
bbox_actual_obj2 = [[4, 0], [5, 1]]
Using OpenCV, you can add these objects to a copy of the empty image array.
actual = empty.copy()
actual = cv2.rectangle(
    actual,
    bbox_actual_obj1[0],
    bbox_actual_obj1[1],
    1,
    -1
)
actual = cv2.rectangle(
    actual,
    bbox_actual_obj2[0],
    bbox_actual_obj2[1],
    1,
    -1
)
array([[0., 0., 0., 0., 1., 1.],
       [0., 0., 0., 0., 1., 1.],
       [0., 0., 0., 0., 0., 0.],
       [1., 1., 1., 0., 0., 0.],
       [1., 1., 1., 0., 0., 0.],
       [1., 1., 1., 0., 0., 0.]])
Now let's say that below are our predicted bounding boxes:
bbox_pred_obj1 = [[1, 3], [3, 5]] # top left coord & bottom right coord
bbox_pred_obj2 = [[3, 0], [5, 2]]
Now we do the same thing as above but change the value we assign within the array.
pred = empty.copy()
pred = cv2.rectangle(
    pred,
    bbox_person2_car1[0],
    bbox_person2_car1[1],
    2,
    -1
)
pred = cv2.rectangle(
    pred,
    bbox_person2_car2[0],
    bbox_person2_car2[1],
    2,
    -1
)
array([[0., 0., 0., 2., 2., 2.],
       [0., 0., 0., 2., 2., 2.],
       [0., 0., 0., 2., 2., 2.],
       [0., 2., 2., 2., 0., 0.],
       [0., 2., 2., 2., 0., 0.],
       [0., 2., 2., 2., 0., 0.]])
If we convert these arrays to matrices and add them, we get the following result
actual_matrix = np.matrix(actual)
pred_matrix = np.matrix(pred)
combined = actual_matrix + pred_matrix
matrix([[0., 0., 0., 2., 3., 3.],
        [0., 0., 0., 2., 3., 3.],
        [0., 0., 0., 2., 2., 2.],
        [1., 3., 3., 2., 0., 0.],
        [1., 3., 3., 2., 0., 0.],
        [1., 3., 3., 2., 0., 0.]])
Now all we need to do is count the amount of each number in the combined matrix to get the TP, FP, TN, FN rates.
combined = np.squeeze(
    np.asarray(
        pred_matrix + actual_matrix
    )
)
unique, counts = np.unique(combined, return_counts=True)
zipped = dict(zip(unique, counts))
{0.0: 15, 1.0: 3, 2.0: 8, 3.0: 10}
Legend:
IoU: 0.48 10/(3 + 8 + 10)
Precision: 0.56 10/(10 + 8)
Recall: 0.77 10/(10 + 3)
F1: 0.65 10/(10 + 0.5 * (3 + 8))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With