Displaying geom_smooth() trend line from a specified x value

Tags:

ggplot2

Suppose a dataset containing count data per multiple time periods and per multiple groups in the following format:

set.seed(123)
df <- data.frame(group = as.factor(rep(1:3, each = 50)),
                 week = rep(1:50, 3),
                 rate = c(round(700 - rnorm(50, 100, 10) - 1:50 * 2, 0),
                          round(1000 - rnorm(50, 200, 10) - 1:50 * 2, 0),
                          round(1000 - rnorm(50, 200, 10) - 1:50 * 2, 0)))

    group week rate
1       1    1  604
2       1    2  598
3       1    3  578
4       1    4  591
5       1    5  589
6       1    6  571
7       1    7  581
8       1    8  597
9       1    9  589
10      1   10  584

I'm interested in fitting a model-based trend line per groups, however, I want this trend line to be displayed only from a certain x value. To visualize the trend line using all data points (requires ggplot2):

df %>%
 ggplot(aes(x = week,
            y = rate,
            group = group,
            lty = group)) + 
 geom_line() +
 geom_point() +
 geom_smooth(method = "glm", 
             method.args = list(family = "quasipoisson"),
             se = FALSE)

Plot 1

Or to fit a model based on a specific range of values (requires ggplot2 and dplyr):

df %>%
 group_by(group) %>%
 mutate(rate2 = ifelse(week < 35, NA, rate)) %>%
 ggplot(aes(x = week,
            y = rate,
            group = group,
            lty = group)) + 
 geom_line() +
 geom_point() +
 geom_smooth(aes(y = rate2),
             method = "glm", 
             method.args = list(family = "quasipoisson"),
             se = FALSE)

Plot 2

However, I cannot find a way to fit the models using all data, but display the trend line only from a specific x value (let's say 35+). Thus, I essentially want the trend line as computed for plot one, but displaying it according the second plot, using ggplot2 and ideally only one pipeline.

278

asked Feb 07 '21 18:02

tmfmnk

1 Answers

I went to look at the after_stat function mentioned by @tjebo. See if the following works for you?

df %>%
  ggplot(aes(x = week,
             y = rate,
             lty = group)) + 
  geom_line() +
  geom_point() +
  geom_smooth(method = "glm", 
              aes(group = after_stat(interaction(group, x > 35)),
                  colour = after_scale(alpha(colour, as.numeric(x > 35)))),
              method.args = list(family = "quasipoisson"),
              se = F)

result

This works by splitting the points associated with each line into two groups, those in the x <=35 region and those in the x >35 region, since a line's colour shouldn't vary, and defining a separate colour transparency for each new group. As a result, only the lines in the x > 35 region are visible.

When used, the code triggers a warning that the after_scale modification isn't applied to the legend. I don't think that's a problem though, since we don't need it to appear in the legend anyway.

125

answered Oct 06 '22 08:10

Z.Lin

Related questions
                            
                                Escaping backslash (\) in string or paths in R
                            
                                adding percentile lines to a density plot [duplicate]
                            
                                Use max on each element of a matrix
                            
                                R nls singular gradient
                            
                                R string removes punctuation on split
                            
                                Row product of matrix and column sum of matrix
                            
                                R load script objects to workspace
                            
                                Producing an animated comet plot in R
                            
                                Ordering Permutation in Rcpp i.e. base::order()
                            
                                Print r vector to copy paste into other code. [duplicate]
                            
                                Binning data in R
                            
                                What does mfrow & mfcol stand for in par()?
                            
                                How to create mean and s.d. columns in data.table
                            
                                Create frequency tables for multiple factor columns in R
                            
                                R Installing rCharts on R 3.4.2 x64
                            
                                Check if a string contains at least one numeric character in R [duplicate]
                            
                                R - Create a new variable where each observation depends on another table and other variables in the data frame
                            
                                RMySQL installation generating error in linux [closed]
                            
                                R dplyr:: rename and select using string variable
                            
                                R [ggplot2] How to set ticks size?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With