The answer to the question in the header is potentially extremely obvious, given it is commonly referred to as "ArcFace Loss".
However, one part is confusing me:
I was reading through the following Keras implementation of Arcface loss:
https://github.com/4uiiurz1/keras-arcface
In it, note that the model.compile line still specifies loss='categorical_crossentropy'
Further, I see a lot of sources referring to Softmax as a loss function, which I had previously understood to instead be the activation function of the output layer for many classification neural networks.
Based on these two points of confusion, my current understanding is that the loss function, i.e. how the network actually calculates the number which represesents "magnitude of wrongness" for a given example is cross entropy regardless. And that ArcFace, like Softmax, is instead the activation function for the output layer.
Would this be correct? If so, why are Arcface and Softmax referred to as loss functions? If not, where might my confusion be coming from?
Based on my understanding. The two things that you are confused about are as follows -
Your assumption that ArcFace is an activation function is incorrect.
ArcFace is indeed a loss function.
If you go through the research paper, the authors have mentioned that they use the traditional softmax function as an activation function for the last layer.
(You can checkout the call function is metrics.py file. The last line is
out = tf.nn.softmax(logits)).
It means that after applying the additive angular margin penalty they have passed the logits to the softmax function only.
It might sound very confusing as ArcFace itself is a loss function,then why is it using softmax? The answer is pretty simple, just to get the probabilities of the classes.
So basically what they have done is that they have applied the additive angular margin penalty, then passed the obtained logits to the softmax to get the class probabilities and applied categorical cross entropy loss on top of that.
To better understand the workflow checkout the below image -
ArcFace
I feel your confusion might be because of the fact that most people consider softmax to be a loss function, although it is not really a loss. I have explained it in detail below.
I feel that you are a bit confused between softmax and categorical crossentropy. I will do my best to explain the differences between the two.
Softmax
Softmax is just a function and not a loss. It squishes the values between 0 and 1. It makes sure that the sum of all these values is equal to 1 i.e. it has a nice probabilistic interpretation.
Softmax Function
Cross Entropy Loss
This is actually a loss function. The general form of Cross Entropy loss is as follows -
Cross Entropy Loss
It has 2 variants -
Binary Cross Entropy Loss
It is used for binary classification tasks.
Binary Cross Entropy Loss
Categorical Cross Entropy Loss / Softmax Loss
CCE loss is actually called the softmax loss. It is used for multi-class classification because of the probabilistic interpretation provided by the softmax function.
Categorical Cross Entropy Loss
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With