Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is ArcFace strictly a loss function or an activation function?

The answer to the question in the header is potentially extremely obvious, given it is commonly referred to as "ArcFace Loss".

However, one part is confusing me:

I was reading through the following Keras implementation of Arcface loss:

https://github.com/4uiiurz1/keras-arcface

In it, note that the model.compile line still specifies loss='categorical_crossentropy'

Further, I see a lot of sources referring to Softmax as a loss function, which I had previously understood to instead be the activation function of the output layer for many classification neural networks.

Based on these two points of confusion, my current understanding is that the loss function, i.e. how the network actually calculates the number which represesents "magnitude of wrongness" for a given example is cross entropy regardless. And that ArcFace, like Softmax, is instead the activation function for the output layer.

Would this be correct? If so, why are Arcface and Softmax referred to as loss functions? If not, where might my confusion be coming from?

like image 942
M.Brodie1221 Avatar asked Oct 22 '25 08:10

M.Brodie1221


1 Answers

Based on my understanding. The two things that you are confused about are as follows -

  1. Is ArcFace is a loss or an activation function ?
  2. Is softmax a loss or an activation function ?

Is ArcFace is a loss or an activation function

Your assumption that ArcFace is an activation function is incorrect. ArcFace is indeed a loss function. If you go through the research paper, the authors have mentioned that they use the traditional softmax function as an activation function for the last layer. (You can checkout the call function is metrics.py file. The last line is out = tf.nn.softmax(logits)). It means that after applying the additive angular margin penalty they have passed the logits to the softmax function only. It might sound very confusing as ArcFace itself is a loss function,then why is it using softmax? The answer is pretty simple, just to get the probabilities of the classes.

So basically what they have done is that they have applied the additive angular margin penalty, then passed the obtained logits to the softmax to get the class probabilities and applied categorical cross entropy loss on top of that.

To better understand the workflow checkout the below image -

ArcFace

I feel your confusion might be because of the fact that most people consider softmax to be a loss function, although it is not really a loss. I have explained it in detail below.

Is Softmax a loss or an activation function

I feel that you are a bit confused between softmax and categorical crossentropy. I will do my best to explain the differences between the two.

Softmax

Softmax is just a function and not a loss. It squishes the values between 0 and 1. It makes sure that the sum of all these values is equal to 1 i.e. it has a nice probabilistic interpretation.

Softmax Function

Cross Entropy Loss

This is actually a loss function. The general form of Cross Entropy loss is as follows -

Cross Entropy Loss

It has 2 variants -

  1. Binary Cross Entropy Loss
  2. Categorical Cross Entropy Loss

Binary Cross Entropy Loss

It is used for binary classification tasks.

Binary Cross Entropy Loss

Categorical Cross Entropy Loss / Softmax Loss

CCE loss is actually called the softmax loss. It is used for multi-class classification because of the probabilistic interpretation provided by the softmax function.

Categorical Cross Entropy Loss

like image 191
Dhairya Kumar Avatar answered Oct 24 '25 11:10

Dhairya Kumar



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!