See EDIT below, the initial post almost has no meaning now but the question still remains.
I developing a neural network to semantically segment imagery. I have worked through various loss functions (categorical cross entropy (CCE), weight CCE, focal loss, tversky loss, jaccard loss, focal tversky loss, etc) which attempt to handle highly skewed class representation, though none are producing the desired effect. My advisor mentioned attempting to create a custom loss function which ignores false negatives for a specific class (but still penalizes false positives).
I have a 6 class problem and my network is setup to work in/with one-hot encoded truth data. As a result my loss function will accept two tensors, y_true, y_pred, of shape (batch, row, col, class) (which is currently (8, 128, 128, 6)). To be able to utilize the losses I have already explored I would like to alter y_pred to set the predicted value for the specific class (the 0th class) to always be correct. That is where y_true == class 0 set y_pred == class 0, otherwise do nothing.
I have spent way too much time attempting to create this loss function as a result of tensorflow tensors being immutable. My first attempt (which I was led to through my experience with numpy)
def weighted_categorical_crossentropy_ignore(weights):
    weights = K.variable(weights)
    def loss(y_true, y_pred):
        y_pred[tf.where(y_true == [1, 0, 0, 0, 0, 0])] = [1, 0, 0, 0, 0, 0]
        # Scale predictions so that the class probs of each sample sum to 1
        y_pred /= K.sum(y_pred, axis=-1, keepdims=True)
        # Clip to prevent NaN's and Inf's
        y_pred = K.clip(y_pred, K.epsilon(), 1 - K.epsilon())
        loss = y_true * K.log(y_pred) * weights
        loss = -K.sum(loss, -1)
        return loss
    return loss
Though obviously I cannot alter y_pred so this attempt failed. I ended up creating a few monstrosities attempting to "build" a tensor by iterating over [batch, row, col] and performing comparisons. While this(ese) attempts did not technically fail, they never actually began training. I assume it was taking on the order of minutes to compute the loss.
After many more failed efforts I started attempting to perform the requisite computation in pure numpy in a SSCCE. But keeping cognizant I was essentially limited to instantiating "simple" tensors (ie ones, zeros) and only performing "simple" operations like element-wise multiply, addition, and reshaping. Thus I arrived at this SSCCE
import numpy as np
from tensorflow.keras.utils import to_categorical
# Generate the "images" at random
true_flat = np.argmax(np.random.rand(1, 2, 2, 4), axis=3).astype('int')
true = to_categorical(true_flat, num_classes=4).astype('int')
pred_flat = np.argmax(np.random.rand(1, 2, 2, 4), axis=3).astype('int')
pred = to_categorical(pred_flat, num_classes=4).astype('int')
print('True:\n', true_flat)
print('Pred:\n', pred_flat)
# Create a mask representing an all "class 0" image
class_zero_label = np.array([1, 0, 0, 0])
czl_all = class_zero_label * np.ones(true.shape).astype('int')
# Mask both the truth and pred to locate class 0 pixels
czl_true_locs = czl_all * true
czl_pred_locs = czl_all * pred
# Subtract to create "addition" matrix
a  = (czl_true_locs - czl_pred_locs) * czl_true_locs
print('a:\n', a)
# Do this
m = ((a + 1) - (a * 2))
print('m - ', m.shape, ':\n', m)
# Pull the front entry from 'm' and "expand" its value
#x = (m[:, :, :, 0].flatten() * np.ones(pred.shape).astype('int')).T.reshape(pred.shape)
m_front = m[:, :, :, 0]
print('m_front - ', m_front.shape, ':\n', m_front)
#m_flat = m_front.flatten()
m_flat = m_front.reshape(m_front.shape[0], m_front.shape[1]*m_front.shape[2])
print('m_flat - ', m_flat.shape, ':\n', m_flat)
m_expand = m_flat * np.ones(pred.shape).astype('int')
print('m_expand - ', m_expand.shape, ':\n', m_expand)
m_trans = m_expand.T
m_fixT = m_trans.reshape(pred.shape)
print('m_fixT - ', m_fixT.shape, ':\n', m_fixT)
m = m_fixT
print('m:\n', m.shape)
# Perform the math as described
pred = (pred * m) + a
print('Pred:\n', np.argmax(pred, axis=3))
This SSCCE, is well, terrible and complex. Essentially my goal here was to create two matrices, the "addition" and "multiplication" matrices. The multiplication matrix is meant to "zero out" every pixel in the predicted values where the truth value was equal to class 0. That is no matter the pixel value (ie a one-hot encoded vector) zero it out to be equal to [0, 0, 0, 0, 0, 0]. The addition matrix is then meant to add the vector [1, 0, 0, 0, 0, 0] to each of the zero'ed out locations. In the end this would achieve the goal of setting the predicted value of every truly class 0 pixel to correct.
The issue is that this SSCCE does not translate fully to tensorflow operations. The first issue is with the generation of the multiplication matrix, it is not defined correctly for when batch_size > 1. I thought no matter, just to see if it work I will break down and tf.unstack the y_true and y_pred tensors and iteration over them. Which has led me to the current instantiation of my loss function
def weighted_categorical_crossentropy_ignore(weights):
    weights = K.variable(weights)
    def loss(y_true, y_pred):
        y_true_un = tf.unstack(y_true)
        y_pred_un = tf.unstack(y_pred)
        y_pred_new = []
        for i in range(0, y_true.shape[0]):
            yt = y_true_un[i]
            yp = y_pred_un[i]
            # Pred:
            # [[[0 3] * [[[1 0] + [[[0 1] = [[[0 0]
            #  [3 1]]]   [[1 1]]]  [[0 0]]]  [[3 1]]]
            # If we multiple pred by a tensor which zeros out only incorrect class 0 labelleling
            # Then add class zero to those zero'd out locations
            # We can negate the effect of mis-classified class 0 pixels but still punish for
            # incorrectly predicted class 0 labels for other classes.
            # Create a mask respresenting an all "class 0" image
            class_zero_label = K.variable([1.0, 0.0, 0.0, 0.0, 0.0, 0.0])
            czl_all = class_zero_label * K.ones(yt.shape)
            # Mask both true and pred to locate class 0 pixels
            czl_true = czl_all * yt
            czl_pred = czl_all * yp
            # Subtract to create "addition matrix"
            a = czl_true - czl_pred
            # Do this.
            m = ((a + 1) - (a * 2.))
            # And this.
            x = K.flatten(m[:, :, 0])
            x = x * K.ones(yp.shape)
            x = K.transpose(x)
            x = K.reshape(x, yp.shape)
            # Voila.
            ypnew = (yp * x) + a
            y_pred_new.append(ypnew)
        y_pred_new = tf.concat(y_pred_new, 0)
        # Continue calculating weighted categorical crossentropy
        # -------------------------------------------------------
        # Scale predictions so that the class probs of each sample sum to 1
        y_pred_new /= K.sum(y_pred_new, axis=-1, keepdims=True)
        # Clip to prevent NaN's and Inf's
        y_pred_new = K.clip(y_pred_new, K.epsilon(), 1 - K.epsilon())
        loss = y_true * K.log(y_pred_new) * weights
        loss = -K.sum(loss, -1)
        return loss
    return loss
The current issue with this loss function lies in the apparent difference in the behavior between numpy and tensorflow when performing the operation
x = K.flatten(m[:, :, 0])
x = x * K.ones(yp.shape)
Which is meant to represent the behavior
m_flat = m_front.flatten()
m_expand = m_flat * np.ones(pred.shape).astype('int')
from the SSCCE.
So at this point I feel like I have delved so far into caveman coding I can't get out of it. I have to image there is some simple way akin to my initial attempt to perform the described behavior.
So, I guess my direct question is How do I implement
y_pred[tf.where(y_true == [1, 0, 0, 0, 0, 0])] = [1, 0, 0, 0, 0, 0]
in a custom tensorflow loss function?
EDIT: After fumbling around quite a bit more I have finally determined how to call .numpy() on the y_true, y_pred tensors to utilize numpy operations (Apparently setting tf.compat.v1.enable_eager_execution at the start of the program "doesn't work". I had to pass run_eagerly=True to Model().compile(...)).
This has allowed me to implement essentially the first attempt outlined
def weighted_categorical_crossentropy_ignore(weights):
    weights = K.variable(weights)
    def loss(y_true, y_pred):
        yp = y_pred.numpy()
        yt = y_true.numpy()
        yp[np.nonzero(np.all(yt == [1, 0, 0, 0, 0, 0], axis=3))] = [1, 0, 0, 0, 0, 0]
 
        # Continue calculating weighted categorical crossentropy
        # -------------------------------------------------------
        # Scale predictions so that the class probs of each sample sum to 1
        yp /= K.sum(yp, axis=-1, keepdims=True)
        # Clip to prevent NaN's and Inf's
        yp = K.clip(yp, K.epsilon(), 1 - K.epsilon())
        loss = y_true * K.log(yp) * weights
        loss = -K.sum(loss, -1)
        return loss
    return loss
Though it seems by calling y_pred.numpy() (or the use of it thereafter) I have apparently "destroyed" the path/flow through the network. Based on the error when attempting to .fit
ValueError: No gradients provided for any variable: ['conv3d/kernel:0', <....>
I assume I somehow need to "remarshall" the tensor back to GPU memory? I have tried
yp = tf.convert_to_tensor(yp)
to no avail; same error. So I guess the same question still lies, but from a different motivation..
EDIT2: Well it seems from this SO Answer that I can't actually use numpy() to marshall the y_true, y_pred to use vanilla numpy operations. This necessarily "destroys" the network path and thus gradients cannot be calculated.
As I result I had realized with run_eagerly=True I can tf.Variable my y_true/y_pred and perform assignment. So in pure tensorflow I attempted to recreate the same code again
def weighted_categorical_crossentropy_ignore(weights):
    weights = K.variable(weights)
    def loss(y_true, y_pred):
        # yp = y_pred.numpy().copy()
        # yt = y_true.numpy().copy()
        # yp[np.nonzero(np.all(yt == [1, 0, 0, 0, 0, 0], axis=3))] = [1, 0, 0, 0, 0, 0]
        yp = K.variable(y_pred)
        yt = K.variable(y_true)
        #np.all
        x = K.all(yt == [1, 0, 0, 0, 0, 0], axis=3)
        #np.nonzero
        ne = tf.not_equal(x, tf.constant(False))
        y = tf.where(ne)
        # Perform the desired operation
        yp[y] = [1, 0, 0, 0, 0, 0]
        # Continue calculating weighted categorical crossentropy
        # -------------------------------------------------------
        # Scale predictions so that the class probs of each sample sum to 1
        #yp /= K.sum(yp, axis=-1, keepdims=True) # Cannot use \= on tf.var, must use var = var /
        yp = yp / K.sum(yp, axis=-1, keepdims=True)
        # Clip to prevent NaN's and Inf's
        yp = K.clip(yp, K.epsilon(), 1 - K.epsilon())
        loss = y_true * K.log(yp) * weights
        loss = -K.sum(loss, -1)
        return loss
    return loss
But alas, this apparently creates the same issue as when calling .numpy(); no gradients can be computed. So I am again seemingly back at square 1.
EDIT3: Using the solution proposed by gobrewers14 in the answer posted below but modified based on my knowledge of the problem I have produced this loss function
def weighted_categorical_crossentropy_ignore(weights):
    weights = K.variable(weights)
    def loss(y_true, y_pred):
        print('y_true.shape: ', y_true.shape)
        print('y_pred.shape: ', y_pred.shape)
        # Generate modified y_pred where all truly class0 pixels are correct
        y_true_class0_indicies = tf.where(tf.math.equal(y_true, [1., 0., 0., 0., 0., 0.]))
        y_pred_updates = tf.repeat([
            [1.0, 0.0, 0.0, 0.0, 0.0, 0.0]],
            repeats=y_true_class0_indicies.shape[0],
            axis=0)
        yp = tf.tensor_scatter_nd_update(y_pred, y_true_class0_indicies, y_pred_updates)
        # Continue calculating weighted categorical crossentropy
        # -------------------------------------------------------
        # Scale predictions so that the class probs of each sample sum to 1
        yp /= K.sum(yp, axis=-1, keepdims=True)
        # Clip to prevent NaN's and Inf's
        yp = K.clip(yp, K.epsilon(), 1 - K.epsilon())
        loss = y_true * K.log(yp) * weights
        loss = -K.sum(loss, -1)
        return loss
    return loss
Provided the original answer assumed y_true to be of shape [8, 128, 128] (ie a "flat" class representation, versus a one-hot encoded representation [8, 128, 128, 6]) I first print the shapes of the y_true and y_pred input tensors for sanity
y_true.shape:  (8, 128, 128, 6)
y_pred.shape:  (8, 128, 128, 6)
For further sanity, the output shape of the network, provided by the tail of model.summary is
conv2d_18 (Conv2D)              (None, 128, 128, 6)  1542        dropout_5[0][0]                  
__________________________________________________________________________________________________
activation_9 (Activation)       (None, 128, 128, 6)  0           conv2d_18[0][0]                  
==================================================================================================
Total params: 535,551,494
Trainable params: 535,529,478
Non-trainable params: 22,016
__________________________________________________________________________________________________
I then follow "the pattern" in the proposed solution and replace the original tf.math.equal(y_true, 0) with tf.math.equal(y_true, [1., 0., 0., 0., 0., 0.]) to handle the one-hot encoded case. From my understanding of the proposed solution currently (after ~10min of inspecting it) I assumed this should work. Though when attempting to train a model the following exception is thrown
InvalidArgumentError: Inner dimensions of output shape must match inner dimensions of updates shape. Output: [8,128,128,6] updates: [684584,6] [Op:TensorScatterUpdate]
Thus it seems as if the production of the (as I have named them) y_pred_updates produces a "collapsed" tensor with "too many" elements. I understand the motivation of the use of tf.repeat but its specific use seems to be incorrect. I assume it should produce a tensor with shape (8, 128, 128, 6) based on what I understand tf.tensor_scatter_nd_update to do. I assume this most likely is just based on the selection of the repeats and axis during the call to tf.repeat.
TL;DR — In this tutorial I cover a simple trick that will allow you to construct custom loss functions in Keras which can receive arguments other than y_true and y_pred. When compiling a model in Keras, we supply the compile function with the desired losses and metrics. For example:
Why Keras loss nan happens Most of the time losses you log will be just some regular values but sometimes you might get nans when working with Keras loss functions. When that happens your model will not update its weights and will stop learning so this situation needs to be avoided.
When compiling a model in Keras, we supply the compile function with the desired losses and metrics. For example: model.compile (loss=’mean_squared_error’, optimizer=’sgd’, metrics=‘acc’) For readability purposes, I will focus on loss functions from now on. However most of what‘s written will apply for metrics as well.
The class handles enable you to pass configuration arguments to the constructor (e.g. loss_fn = CategoricalCrossentropy (from_logits=True) ), and they perform reduction by default when used in a standalone way (see details below). A loss function is one of the two arguments required for compiling a Keras model:
If I understand your question correctly, you are looking for something like this:
import tensorflow as tf
# batch of true labels
y_true = tf.constant([5, 0, 1, 3, 4, 0, 2, 0], dtype=tf.int64)
# batch of class probabilities
y_pred = tf.constant(
  [
    [0.34670502, 0.04551039, 0.14020428, 0.14341979, 0.21430719, 0.10985339],
    [0.25681055, 0.14013883, 0.19890164, 0.11124421, 0.14526634, 0.14763844],
    [0.09199252, 0.21889475, 0.1170236 , 0.1929019 , 0.20311192, 0.17607528],
    [0.3246354 , 0.23257554, 0.15549366, 0.17282239, 0.00000001, 0.11447308],
    [0.16502093, 0.13163856, 0.14371352, 0.19880624, 0.23360236, 0.12721846],
    [0.27362782, 0.21408406, 0.10917682, 0.13135742, 0.10814326, 0.16361059],
    [0.20697299, 0.23721898, 0.06455399, 0.11071447, 0.18990229, 0.19063729],
    [0.10320242, 0.22173141, 0.2547973 , 0.2314068 , 0.07063974, 0.11822232]
  ], dtype=tf.float32)
# find the indices in the batch where the true label is the class 0
indices = tf.where(tf.math.equal(y_true, 0))
# create a tensor with the number of updates you want to replace in `y_pred`
updates = tf.repeat(
    [[1.0, 0.0, 0.0, 0.0, 0.0, 0.0]],
    repeats=indices.shape[0],
    axis=0)
# insert the updates into `y_pred` at the specified indices
modified_y_pred = tf.tensor_scatter_nd_update(y_pred, indices, updates)
print(modified_y_pred)
# tf.Tensor(
#   [[0.34670502, 0.04551039, 0.14020428, 0.14341979, 0.21430719, 0.10985339],
#    [1.00000000, 0.00000000, 0.00000000, 0.00000000, 0.00000000, 0.00000000],
#    [0.09199252, 0.21889475, 0.1170236 , 0.1929019 , 0.20311192, 0.17607528],
#    [0.3246354 , 0.23257554, 0.15549366, 0.17282239, 0.00000001, 0.11447308],
#    [0.16502093, 0.13163856, 0.14371352, 0.19880624, 0.23360236, 0.12721846],
#    [1.00000000, 0.00000000, 0.00000000, 0.00000000, 0.00000000, 0.00000000],
#    [0.20697299, 0.23721898, 0.06455399, 0.11071447, 0.18990229, 0.19063729],
#    [1.00000000, 0.00000000, 0.00000000, 0.00000000, 0.00000000, 0.00000000]], 
#    shape=(8, 6), dtype=tf.float32)
This final tensor, modified_y_pred, can be used in differentiation.
EDIT:
It might be easier to do this with masks.
Example:
# these arent normalized to 1 but you get the point
probs = tf.random.normal([2, 4, 4, 6])
# raw labels per pixel
labels = tf.random.uniform(
    shape=[2, 4, 4],
    minval=0,
    maxval=6,
    dtype=tf.int64)
# your labels are already one-hot encoded
labels = tf.one_hot(labels, 6)
# boolean mask where classes are `0`
# converting back to int labels with argmax for purposes of
# using `tf.math.equal`. Matching on `[1, 0, 0, 0, 0, 0]` is
# potentially buggy; matching on an integer is a lot more
# explicit.
mask = tf.math.equal(tf.math.argmax(labels, -1), 0)[..., None]
# flip the mask to zero out the pixels across channels where
# labels are zero
probs *= tf.cast(tf.math.logical_not(mask), tf.float32)
# multiply the mask by the one-hot labels, and add back
# to the already masked probabilities.
probs += labels * tf.cast(mask, tf.float32)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With