Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which F1-score is used for the semantic segmentation tasks?

I read some papers about state-of-the-art semantic segmentation models and in all of them, authors use for comparison F1-score metric, but they did not write whether they use the "micro" or "macro" version of it.

Does anyone know which F1-score is used to describe the segmentation results and why it is so obvious that authors do not define it in papers?

Sample papers:

https://arxiv.org/pdf/1709.00201.pdf

https://arxiv.org/pdf/1511.00561.pdf

like image 215
Andropogon Avatar asked Oct 21 '25 20:10

Andropogon


1 Answers

There is just one F-1 score - the harmonic mean of precision and recall.

Macro/Micro/Samples/Weighted/Binary are used in the context of multiclass/multilabel targets. If None, the scores for each class are returned. Otherwise, this determines the type of averaging performed on the data:

binary: Only report results for the class specified by pos_label. This is applicable only if targets (y_{true,pred}) are binary.

micro: Calculate metrics globally by counting the total true positives, false negatives and false positives.

macro: Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.

weighted: Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that is not between precision and recall.

samples: Calculate metrics for each instance, and find their average (only meaningful for multilabel classification where this differs from accuracy_score)

Segnet paper is discussing different classes accuracy separately in Table#5. So I think they have chosen None in this case.

like image 56
Abhi25t Avatar answered Oct 24 '25 16:10

Abhi25t



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!