Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

3D plot logistic regression

I have been able to make 4 plots of the predicted proba (binary outcome 0 or 1) in function of one of the 4 variables : Pregnancies, Age, BMI, glucose ; however I don't succeed in applying something similar to obtain a 3D plot (x: one var, y: other var and z: predicted proba for surface and outcome 0 or 1 for actual data) so I wish I could make a surface plot of the model together with dots for the actual coordinates

This has worked, now is it possible to do something analoguous for 3d plots or even better 4d plots (3 variables plotted according to real data value(scatter) and modeled value(surface) and predicted proba as a colour on a gradient : Using this code :

# Assuming 'final_model' is your trained logistic regression model
exclude_vars <- c("BloodPressure", "Insulin", "SkinThickness", "DiabetesPedigreeFunction")
dataset$PP_model <- predict(final_model, newx = as.matrix(dataset[, !(names(dataset) %in% c("Outcome", exclude_vars))]), type = "response")


create_logistic_plot <- function(variable, data) {
  ggplot(data, aes_string(x = variable, y = "Outcome")) +
    geom_point(aes(color = factor(Outcome)), position = position_identity(), size = 2) +
    geom_smooth(aes(y = PP_model), method = "glm", method.args = list(family = "binomial"), se = FALSE) +
    labs(title = paste("Logistic Regression Plot for", variable),
         x = variable,
         y = "Outcome/Predicted Probability",
         color = "Outcome") +
    theme_minimal()
}
# Create logistic regression plots for each variable without jitter
plots <- lapply(c("Pregnancies", "Glucose", "BMI", "Age"), function(var) create_logistic_plot(var, dataset))

# Print the plots
plots 
like image 303
Libellule Avatar asked Dec 28 '25 06:12

Libellule


1 Answers

You can certainly draw a 3D surface plot showing the marginal effect of two predictor variables simultaneously using persp. What this means is that, if you have four predictors in your model, you can see the predicted probabilities as any two of them vary, while the other two are held constant at their mean value.

The following function allows you to draw such a plot:

plot_probs <- function(model, var1, var2, n = 50, ...) {
  nm1 <- deparse(substitute(var1))
  nm2 <- deparse(substitute(var2))
  df <- model.frame(model)
  var1 <- df[[nm1]]
  var2 <- df[[nm2]]
  df <- df[-match(c(nm1, nm2), names(df))]
  preds <- list(x = seq(min(var1), max(var1), length = n),
                y = seq(min(var2), max(var2), length = n))
  predlist <- setNames(c(preds, lapply(df, mean)), c(nm1, nm2, names(df)))
  pred_df <- do.call("expand.grid", predlist)
  Probability <- matrix(predict(model, newdata = pred_df, type = "response"), 
                        nrow = n)
  persp(x = preds$x, y = preds$y, z = Probability, xlab = nm1, ylab = nm2,
        ticktype = "detailed", zlim = c(0, 1), ...)
}

So, for example, if we have the following model:

model <- glm(Outcome ~ Age + Glucose + Pregnancies + BMI, binomial, dataset)

We can call:

plot_probs(model, Glucose, BMI, theta = 45, phi = 30, col = "gold")

enter image description here

And

plot_probs(model, Pregnancies, Age, theta = -45, phi = 30, col = "lightblue")

enter image description here

Note that, in this case at least, a 4D plot would be next to useless. If you had 3 predictors on the x, y and z co-ordinates, then the probabilities would not be a surface, but a volume filling the whole space. Yes, it could be partially transparent and coloured according to probability, but it would be extremely difficult to see what is going on, and would not work at all well on the printed page. It would also need to remain a plot of marginal effects, since one of the predictors would need to be held constant to allow any predictions at all.


Data used

In the absence of a reproducible example, the following data set should approximate the one used by the OP

set.seed(1)

dataset <- data.frame(Pregnancies = rpois(200, 1) + 1,
                      BMI = round(rnorm(200, 25, 2.5), 1),
                      Age = sample(18:45, 200, TRUE),
                      Glucose = round(runif(200, 3.5, 12), 1))

p <- plogis(with(dataset, Age + Glucose + Pregnancies + BMI - 65)/5)

dataset$Outcome <- rbinom(200, 1, p)
like image 121
Allan Cameron Avatar answered Dec 30 '25 19:12

Allan Cameron



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!