Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to compute log loss in machine learning

The following code are used to produce the probability output of binary classification with Random Forest.

library(randomForest) 

rf <- randomForest(train, train_label,importance=TRUE,proximity=TRUE)
prediction<-predict(rf, test, type="prob")

Then the result about prediction is as follows:

enter image description here

The true label about test data are known (named test_label). Now I want to compute logarithmic loss for probability output of binary classification. The function about LogLoss is as follows.

LogLoss=function(actual, predicted)
{
  result=-1/length(actual)*(sum((actual*log(predicted)+(1-actual)*log(1-predicted))))
  return(result)
}

How to compute logarithmic loss with probability output of binary classification. Thank you.

like image 267
user2405694 Avatar asked Oct 26 '25 01:10

user2405694


2 Answers

library(randomForest) 

rf <- randomForest(Species~., data = iris, importance=TRUE, proximity=TRUE)
prediction <- predict(rf, iris, type="prob")
#bound the results, otherwise you might get infinity results
prediction <- apply(prediction, c(1,2), function(x) min(max(x, 1E-15), 1-1E-15)) 

#model.matrix generates a true probabilities matrix, where an element is either 1 or 0
#we subtract the prediction, and, if the result is bigger than 0 that's the correct class
logLoss = function(pred, actual){
  -1*mean(log(pred[model.matrix(~ actual + 0) - pred > 0]))
}

logLoss(prediction, iris$Species)
like image 161
catastrophic-failure Avatar answered Oct 27 '25 16:10

catastrophic-failure


I think the logLoss formula is a little bit wrong.

model <- glm(vs ~ mpg, data = mtcars, family = "binomial")

### OP's formula (Wrong)
logLoss1 <- function(pred, actual){
  -1*mean(log(pred[model.matrix(~ actual + 0) - pred > 0]))
}
logLoss1(actual = model$y, pred = model$fitted.values)
# [1] 0.4466049

### Correct formula in native R 
logLoss2 <- function(pred, actual){
  -mean(actual * log(pred) + (1 - actual) * log(1 - pred))
}
logLoss2(actual = model$y, pred = model$fitted.values)
# [1] 0.3989584

## Results from various packages to verify the correct answer

### From ModelMetrics package
ModelMetrics::logLoss(actual = model$y, pred = model$fitted.values)
# [1] 0.3989584

### From MLmetrics package
MLmetrics::LogLoss(y_pred = model$fitted.values, y_true = model$y)
# [1] 0.3989584

### From reticulate package
sklearn.metrics <- import("sklearn.metrics")
sklearn.metrics$log_loss(y_true = model$y, y_pred = model$fitted.values)
# [1] 0.3989584

I used the R version 4.1.0 (2021-05-18).

like image 29
Alex Yahiaoui Martinez Avatar answered Oct 27 '25 15:10

Alex Yahiaoui Martinez