Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Predicted regression line not same as in geom_smooth(method= "lm", formula= "y ~ x")

I made a scatter plot with ggplot2 for my data with a regression line and I noticed that the regression line is not same as the predicted values from predict(). Why is this the case? See example below:

# Make some data
set.seed(1)
n <- 100
df <- data.frame(x= rnorm(n, 100, 15), group= sample(0:1, n/2, replace= TRUE))
df$y <- df$x*.1 + df$group*2 + rnorm(n)

# Fit data and predict values
mod <- lm(y ~ x + group, df)
df_new <- setNames(data.frame(expand.grid(round(seq(min(df$x), max(df$x), 1), 0),
                                          0:1)),
                   nm= c("x", "group"))
df_pred <- cbind.data.frame(df_new, pred= predict(mod, df_new))

# Plot data with predicted values and geom_smooth()
library(ggplot2)
ggplot(df, aes(x, y, col= factor(group))) +
  geom_point() +
  geom_smooth(method= "lm", formula= "y ~ x") +
  geom_line(data=df_pred, mapping= aes(x, pred, col= factor(group)))

The resulting plot is:

Plot

One can see that the predicted line does not match the one from geom_smooth. Why?

Sidenotes: There is a similiar question but in that case the person simply mixed up variables. As far as I can tell this is not the case in my code. Further, for the generated data both lines lay almost above each other but in my real data (which I can not share) the missmatch is even bigger.

like image 911
Igor stands with Ukraine Avatar asked Dec 09 '25 08:12

Igor stands with Ukraine


1 Answers

As user20650 points already out in the comments, ggplot adds an interaction term, whereas your model mod does only account for group difference based on a additional variable + group. To get the same results either use an interaction term * group or create two different models for each group. See both ways below:

# Make some data
set.seed(1)
n <- 100
df <- data.frame(x= rnorm(n, 100, 15), group= sample(0:1, n/2, replace= TRUE))
df$y <- df$x*.1 + df$group*2 + rnorm(n)

# Fit data and predict values

# With interaction
mod <- lm(y ~ x * group, df) # <- add an interaction term
df_new <- setNames(data.frame(expand.grid(round(seq(min(df$x), max(df$x), 1), 0),
                                          0:1)),
                   nm= c("x", "group"))
df_pred <- cbind.data.frame(df_new, pred = predict(mod, df_new))


# Predict for each group
mod0 <- lm(y ~ x, df[df$group == 0, ]) 

df_new0 <- setNames(data.frame(expand.grid(round(seq(min(df$x), max(df$x), 1), 0),
                                          0)),
                   nm= c("x", "group"))


df_pred0 <- cbind.data.frame(df_new0, pred = predict(mod0, df_new0))

mod1 <- lm(y ~ x, df[df$group == 1, ])

df_new1 <- setNames(data.frame(expand.grid(round(seq(min(df$x), max(df$x), 1), 0),
                                           1)),
                    nm= c("x", "group"))

df_pred1 <- cbind.data.frame(df_new1, pred = predict(mod1, df_new1))

df_pred2 <- rbind(df_pred0, df_pred1)

# Plot data with predicted values and geom_smooth()
library(ggplot2)

ggplot(df, aes(x, y, col = factor(group))) +
  geom_point() +
  geom_smooth(method= "lm", formula= "y ~ x") +
  geom_line(data=df_pred, mapping= aes(x, pred, col = factor(group))) + # with interaction
  geom_line(data=df_pred2, mapping= aes(x, pred, col = factor(group))) # with seperate groups
like image 131
TimTeaFan Avatar answered Dec 11 '25 20:12

TimTeaFan



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!