I am trying to implement a simple example of how to apply cross-entropy to what is supposed to be the output of my semantic segmentation CNN.
Using the pytorch format I would have something like this:
out = np.array([[
    [
        [1.,1, 1], 
        [0, 0, 0], 
        [0, 0, 0],
        [0, 0, 0]
    ],
    [
        [0, 0, 0],
        [1, 1, 1],
        [0, 0.,0],
        [0, 0, 0]
    ],
    [
        [0, 0, 0],
        [0, 0, 0],
        [1, 1, 1],
        [0, 0, 0]
    ],
    [
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [1, 1, 1]
    ]
]])
out = torch.tensor(out)
So, my output here has dimensions (1, 4, 4, 3), being 1 element batch, 4 channels representing the 4 possible classes, and 4 by 3 data in each, storing the probability of that cell being from it's class.
Now my target is like this:
target=[
        [0, 0, 0],
        [1, 1, 1],
        [2, 2, 2],
        [3, 3, 3]
    ]
Please notice how in the 'out' tensor each row has a 1.0 probability of being from that class resulting in a perfect match with the target.
For example, the third channel (channel 2) has its whole 3rd row (row 2) with 1.0 probabilities of being from that channel, and zero's in any other place; so it matches the 2's on the target in third row as well.
With this example I expect a minimal loss value between the two tensors.
My question are:
This is what I got so far:
import torch
from torch.nn import CrossEntropyLoss
import numpy as np
out = torch.Tensor(np.array([[
    [
        [1.,1, 1], 
        [0, 0, 0], 
        [0, 0, 0],
        [0, 0, 0]
    ],
    [
        [0, 0, 0],
        [1, 1, 1],
        [0, 0.,0],
        [0, 0, 0]
    ],
    [
        [0, 0, 0],
        [0, 0, 0],
        [1, 1, 1],
        [0, 0, 0]
    ],
    [
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [1, 1, 1]
    ]
]]))
target = torch.Tensor([[
    [0, 0, 0],
    [1, 1, 1],
    [2, 2, 2],
    [3, 3, 3]
]]).type('torch.LongTensor')
criterion = CrossEntropyLoss()
print(criterion(out, target))
And outputs: tensor(0.7437)
Thank you in advance
Look at the description of nn.CrossEntropyLoss function, the prediction out you provide to nn.CrossEntropyLoss are not treated as class probabilities, but rather as logits; The loss function derive the class probabilities from out using soft max therefore nn.CrossEntropyLoss will never output exactly zero loss.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With