I have been able to make 4 plots of the predicted proba (binary outcome 0 or 1) in function of one of the 4 variables : Pregnancies, Age, BMI, glucose ; however I don't succeed in applying something similar to obtain a 3D plot (x: one var, y: other var and z: predicted proba for surface and outcome 0 or 1 for actual data) so I wish I could make a surface plot of the model together with dots for the actual coordinates
This has worked, now is it possible to do something analoguous for 3d plots or even better 4d plots (3 variables plotted according to real data value(scatter) and modeled value(surface) and predicted proba as a colour on a gradient : Using this code :
# Assuming 'final_model' is your trained logistic regression model
exclude_vars <- c("BloodPressure", "Insulin", "SkinThickness", "DiabetesPedigreeFunction")
dataset$PP_model <- predict(final_model, newx = as.matrix(dataset[, !(names(dataset) %in% c("Outcome", exclude_vars))]), type = "response")
create_logistic_plot <- function(variable, data) {
ggplot(data, aes_string(x = variable, y = "Outcome")) +
geom_point(aes(color = factor(Outcome)), position = position_identity(), size = 2) +
geom_smooth(aes(y = PP_model), method = "glm", method.args = list(family = "binomial"), se = FALSE) +
labs(title = paste("Logistic Regression Plot for", variable),
x = variable,
y = "Outcome/Predicted Probability",
color = "Outcome") +
theme_minimal()
}
# Create logistic regression plots for each variable without jitter
plots <- lapply(c("Pregnancies", "Glucose", "BMI", "Age"), function(var) create_logistic_plot(var, dataset))
# Print the plots
plots
You can certainly draw a 3D surface plot showing the marginal effect of two predictor variables simultaneously using persp. What this means is that, if you have four predictors in your model, you can see the predicted probabilities as any two of them vary, while the other two are held constant at their mean value.
The following function allows you to draw such a plot:
plot_probs <- function(model, var1, var2, n = 50, ...) {
nm1 <- deparse(substitute(var1))
nm2 <- deparse(substitute(var2))
df <- model.frame(model)
var1 <- df[[nm1]]
var2 <- df[[nm2]]
df <- df[-match(c(nm1, nm2), names(df))]
preds <- list(x = seq(min(var1), max(var1), length = n),
y = seq(min(var2), max(var2), length = n))
predlist <- setNames(c(preds, lapply(df, mean)), c(nm1, nm2, names(df)))
pred_df <- do.call("expand.grid", predlist)
Probability <- matrix(predict(model, newdata = pred_df, type = "response"),
nrow = n)
persp(x = preds$x, y = preds$y, z = Probability, xlab = nm1, ylab = nm2,
ticktype = "detailed", zlim = c(0, 1), ...)
}
So, for example, if we have the following model:
model <- glm(Outcome ~ Age + Glucose + Pregnancies + BMI, binomial, dataset)
We can call:
plot_probs(model, Glucose, BMI, theta = 45, phi = 30, col = "gold")

And
plot_probs(model, Pregnancies, Age, theta = -45, phi = 30, col = "lightblue")

Note that, in this case at least, a 4D plot would be next to useless. If you had 3 predictors on the x, y and z co-ordinates, then the probabilities would not be a surface, but a volume filling the whole space. Yes, it could be partially transparent and coloured according to probability, but it would be extremely difficult to see what is going on, and would not work at all well on the printed page. It would also need to remain a plot of marginal effects, since one of the predictors would need to be held constant to allow any predictions at all.
Data used
In the absence of a reproducible example, the following data set should approximate the one used by the OP
set.seed(1)
dataset <- data.frame(Pregnancies = rpois(200, 1) + 1,
BMI = round(rnorm(200, 25, 2.5), 1),
Age = sample(18:45, 200, TRUE),
Glucose = round(runif(200, 3.5, 12), 1))
p <- plogis(with(dataset, Age + Glucose + Pregnancies + BMI - 65)/5)
dataset$Outcome <- rbinom(200, 1, p)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With