I read some papers about state-of-the-art semantic segmentation models and in all of them, authors use for comparison F1-score metric, but they did not write whether they use the "micro" or "macro" version of it.
Does anyone know which F1-score is used to describe the segmentation results and why it is so obvious that authors do not define it in papers?
Sample papers:
https://arxiv.org/pdf/1709.00201.pdf
https://arxiv.org/pdf/1511.00561.pdf
There is just one F-1 score - the harmonic mean of precision and recall.
Macro/Micro/Samples/Weighted/Binary are used in the context of multiclass/multilabel targets. If None, the scores for each class are returned. Otherwise, this determines the type of averaging performed on the data:
binary: Only report results for the class specified by pos_label. This is applicable only if targets (y_{true,pred}) are binary.
micro: Calculate metrics globally by counting the total true positives, false negatives and false positives.
macro: Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.
weighted: Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that is not between precision and recall.
samples: Calculate metrics for each instance, and find their average (only meaningful for multilabel classification where this differs from accuracy_score)
Segnet paper is discussing different classes accuracy separately in Table#5. So I think they have chosen None in this case.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With