To create a class label in CutMix or MixUp type augmentation, we can use beta such as np.random.beta or scipy.stats.beta and do as follows for two labels:
label = label_one*beta + (1-beta)*label_two
But what if we've more than two images? In YoLo4, they've tried an interesting augmentation called Mosaic Augmentation for object detection problems. Unlike CutMix or MixUp, this augmentation creates augmented samples with 4 images. In object detection cases, we can compute the shift of each instance co-ords and thus possible to get the proper ground truth, here. But for only image classification cases, how can we do that?
Here is a starter.
import tensorflow as tf
import matplotlib.pyplot as plt 
import random
(train_images, train_labels), (test_images, test_labels) = \
tf.keras.datasets.cifar10.load_data()
train_images = train_images[:10,:,:]
train_labels = train_labels[:10]
train_images.shape, train_labels.shape
((10, 32, 32, 3), (10, 1))
Here is a function we've written for this augmentation; ( too ugly with an `inner-outer loop! Please suggest if we can do it efficiently.)
def mosaicmix(image, label, DIM, minfrac=0.25, maxfrac=0.75):
    '''image, label: batches of samples 
    '''
    xc, yc  = np.random.randint(DIM * minfrac, DIM * maxfrac, (2,))
    indices = np.random.permutation(int(image.shape[0]))
    mosaic_image = np.zeros((DIM, DIM, 3), dtype=np.float32)
    final_imgs, final_lbs = [], []
    # Iterate over the full indices 
    for j in range(len(indices)): 
        # Take 4 sample for to create a mosaic sample randomly 
        rand4indices = [j] + random.sample(list(indices), 3) 
        
        # Make mosaic with 4 samples 
        for i in range(len(rand4indices)):
            if i == 0:    # top left
                x1a, y1a, x2a, y2a =  0,  0, xc, yc
                x1b, y1b, x2b, y2b = DIM - xc, DIM - yc, DIM, DIM # from bottom right        
            elif i == 1:  # top right
                x1a, y1a, x2a, y2a = xc, 0, DIM , yc
                x1b, y1b, x2b, y2b = 0, DIM - yc, DIM - xc, DIM # from bottom left
            elif i == 2:  # bottom left
                x1a, y1a, x2a, y2a = 0, yc, xc, DIM
                x1b, y1b, x2b, y2b = DIM - xc, 0, DIM, DIM-yc   # from top right
            elif i == 3:  # bottom right
                x1a, y1a, x2a, y2a = xc, yc,  DIM, DIM
                x1b, y1b, x2b, y2b = 0, 0, DIM-xc, DIM-yc    # from top left
                
            # Copy-Paste
            mosaic_image[y1a:y2a, x1a:x2a] = image[i,][y1b:y2b, x1b:x2b]
        # Append the Mosiac samples
        final_imgs.append(mosaic_image)
        
    return final_imgs, label
The augmented samples, currently with the wrong labels.
data, label = mosaicmix(train_images, train_labels, 32)
plt.imshow(data[5]/255)

However, here are some more examples to motivate you. Data is from the Cassava Leaf competition.
.png?generation=1607625768667914&alt=media)
.png?generation=1607625855803714&alt=media)
We already know that, in CutMix, λ is a float number from the beta distribution Beta(α,α). We have seen, when α=1, it performs best. Now, If we grant α==1 always, we can say that λ is sampled from the uniform distribution..
Simply we can say λ is just a floating-point number which value will be 0 to 1.
So, only for 2 images,
if we use λ for the 1st image then we can calculate the remaining unknown portion simply by 1-λ.
But for 3 images, if we use λ for the 1st image, we cannot calculate other 2 unknowns from that single λ.  If we really want to do so, we need 2 random numbers for 3 images. In the same way, we can say that for the n number of images, we need the n-1 number random variable. And in all cases, the summation should be 1. (for example, λ + (1-λ) == 1). If the sum is not 1, the label will be wrong!
For this purpose Dirichlet distribution might be helpful because it helps to generate quantities that sum to 1. A Dirichlet-distributed random variable can be seen as a multivariate generalization of a Beta distribution.
>>> np.random.dirichlet((1, 1), 1)  # for 2 images. Equivalent to λ and (1-λ)
array([[0.92870347, 0.07129653]])  
>>> np.random.dirichlet((1, 1, 1), 1)  # for 3 images.
array([[0.38712673, 0.46132787, 0.1515454 ]])
>>> np.random.dirichlet((1, 1, 1, 1), 1)  # for 4 images.
array([[0.59482542, 0.0185333 , 0.33322484, 0.05341645]])
In CutMix, the size of the cropped part of an image has a relation with λ which weighting the corresponding labels.


So, for multiple λ, you also need to calculate them accordingly.
# let's say for 4 images
# I am not sure the proper way. 
image_list = [4 images]
label_list = [4 label]
new_img = np.zeros((w, h))
beta_list = np.random.dirichlet((1, 1, 1, 1), 1)[0]
for idx, beta in enumerate(beta_list):
    x0, y0, w, h = get_cropping_params(beta, full_img)  # something like this
    new_img[x0, y0, w, h] = image_list[idx][x0, y0, w, h]
    label_list[idx] = label_list[idx] * beta
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With