3D plot logistic regression

Question

I have been able to make 4 plots of the predicted proba (binary outcome 0 or 1) in function of one of the 4 variables : Pregnancies, Age, BMI, glucose ; however I don't succeed in applying something similar to obtain a 3D plot (x: one var, y: other var and z: predicted proba for surface and outcome 0 or 1 for actual data) so I wish I could make a surface plot of the model together with dots for the actual coordinates

This has worked, now is it possible to do something analoguous for 3d plots or even better 4d plots (3 variables plotted according to real data value(scatter) and modeled value(surface) and predicted proba as a colour on a gradient : Using this code :

# Assuming 'final_model' is your trained logistic regression model
exclude_vars <- c("BloodPressure", "Insulin", "SkinThickness", "DiabetesPedigreeFunction")
dataset$PP_model <- predict(final_model, newx = as.matrix(dataset[, !(names(dataset) %in% c("Outcome", exclude_vars))]), type = "response")


create_logistic_plot <- function(variable, data) {
  ggplot(data, aes_string(x = variable, y = "Outcome")) +
    geom_point(aes(color = factor(Outcome)), position = position_identity(), size = 2) +
    geom_smooth(aes(y = PP_model), method = "glm", method.args = list(family = "binomial"), se = FALSE) +
    labs(title = paste("Logistic Regression Plot for", variable),
         x = variable,
         y = "Outcome/Predicted Probability",
         color = "Outcome") +
    theme_minimal()
}
# Create logistic regression plots for each variable without jitter
plots <- lapply(c("Pregnancies", "Glucose", "BMI", "Age"), function(var) create_logistic_plot(var, dataset))

# Print the plots
plots

Allan Cameron · Accepted Answer

You can certainly draw a 3D surface plot showing the marginal effect of two predictor variables simultaneously using persp. What this means is that, if you have four predictors in your model, you can see the predicted probabilities as any two of them vary, while the other two are held constant at their mean value.

The following function allows you to draw such a plot:

plot_probs <- function(model, var1, var2, n = 50, ...) {
  nm1 <- deparse(substitute(var1))
  nm2 <- deparse(substitute(var2))
  df <- model.frame(model)
  var1 <- df[[nm1]]
  var2 <- df[[nm2]]
  df <- df[-match(c(nm1, nm2), names(df))]
  preds <- list(x = seq(min(var1), max(var1), length = n),
                y = seq(min(var2), max(var2), length = n))
  predlist <- setNames(c(preds, lapply(df, mean)), c(nm1, nm2, names(df)))
  pred_df <- do.call("expand.grid", predlist)
  Probability <- matrix(predict(model, newdata = pred_df, type = "response"), 
                        nrow = n)
  persp(x = preds$x, y = preds$y, z = Probability, xlab = nm1, ylab = nm2,
        ticktype = "detailed", zlim = c(0, 1), ...)
}

So, for example, if we have the following model:

model <- glm(Outcome ~ Age + Glucose + Pregnancies + BMI, binomial, dataset)

We can call:

plot_probs(model, Glucose, BMI, theta = 45, phi = 30, col = "gold")

enter image description here

And

plot_probs(model, Pregnancies, Age, theta = -45, phi = 30, col = "lightblue")

enter image description here

Note that, in this case at least, a 4D plot would be next to useless. If you had 3 predictors on the x, y and z co-ordinates, then the probabilities would not be a surface, but a volume filling the whole space. Yes, it could be partially transparent and coloured according to probability, but it would be extremely difficult to see what is going on, and would not work at all well on the printed page. It would also need to remain a plot of marginal effects, since one of the predictors would need to be held constant to allow any predictions at all.

Data used

In the absence of a reproducible example, the following data set should approximate the one used by the OP

set.seed(1)

dataset <- data.frame(Pregnancies = rpois(200, 1) + 1,
                      BMI = round(rnorm(200, 25, 2.5), 1),
                      Age = sample(18:45, 200, TRUE),
                      Glucose = round(runif(200, 3.5, 12), 1))

p <- plogis(with(dataset, Age + Glucose + Pregnancies + BMI - 65)/5)

dataset$Outcome <- rbinom(200, 1, p)

3D plot logistic regression

Tags:

visualization

r

logistic-regression

Libellule

1 Answers

Allan Cameron

Recent Activity

Donate For Us

3D plot logistic regression

Tags:

visualization

r

logistic-regression

Libellule

1 Answers

Allan Cameron

Related questions

Recent Activity

Donate For Us