Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R detect pattern like seasonality

Tags:

r

time-series

I'm looking for a packages to detect pattern for example seasonality. I have a dataframe with two columns: Day(Date) and Visits.

And when I plot the data I see that the visits on the website are in summer month higher than in the other months. And this pattern I can see over 10 years.

The problem is that I want to analyse the seasonality with data from hundreds of websites.

Please provide me with an example to detect this pattern on the timeseries?

like image 623
user860480 Avatar asked Dec 09 '25 11:12

user860480


1 Answers

Facebook released the prophet package to simplify time series analysis. There are tons of other ways to look for seasonality, but I think prophet is the easiest to use without tweaking. I recommend reading Facebook's documentation.

First let's create a sample of your data.

library(tidyverse)
website <-
  tibble(date = seq(as.Date('2015/01/01'), as.Date('2017/01/01'), by = "day"),
         visits = round(rnorm(732, mean = 327, sd = 100)))

Let's increase the website traffic during the summer.

library(lubridate)
website <-
  mutate(website, ifelse(month(date) %in% c(6, 7, 8), visits + 10, visits))

Now for the prophet calculations!

library(prophet)
website <- website %>% 
  rename(ds = date, y = visits)
m <- prophet(website)
future <- make_future_dataframe(m, periods = 365)
forecast <- predict(m, future)

Visualize the results.

plot(m, forecast)

enter image description here

It definitely looks like there's more traffic in the summer but it's hard to be certain. Fortunately, prophet has a function to examine daily and weekly seasonality.

prophet_plot_components(m, forecast)

enter image description here

See that increase in the "yearly" chart? You definitely have more website traffic in the summer than you do in the rest of the year!

Update

In response to comments, here's a quick and easy way to test for any monthly seasonality within each website. It applies an anova test to each group. This example gives website B a seasonal effect, which you can see in the statistic and p.value columns.

First create the demo data...

library(tidyverse)
library(lubridate)
library(purrr)
library(broom)

website <-
  tibble(
    site = c(rep("A", 732), rep("B", 732), rep("C", 732)),
    date = rep(seq(
      as.Date('2015/01/01'), as.Date('2017/01/01'), by = "day"
    ), 3),
    visits = rep(round(rnorm(
      732, mean = 327, sd = 100
    )), 3)
  ) %>% 
  mutate(month = month(date))

website <-
  mutate(website, visits = ifelse(month %in% c(6,7,8) &
                           site == "B", visits + 1000, visits))

Now use the wonders of the tidyverse to run the test across each group...

website %>% 
  split(.$site) %>% 
  map(~ tidy(aov(visits ~ month, data = .)))

#$A
#       term  df       sumsq    meansq statistic   p.value
#1     month   1    3645.896  3645.896 0.3529069 0.5526563
#2 Residuals 730 7541662.108 10331.044        NA        NA

#$B
#       term  df     sumsq    meansq statistic    p.value
#1     month   1   1086355 1086355.5  5.426011 0.02011086
#2 Residuals 730 146155160  200212.5        NA         NA

#$C
#       term  df       sumsq    meansq statistic   p.value
#1     month   1    3645.896  3645.896 0.3529069 0.5526563
#2 Residuals 730 7541662.108 10331.044        NA        NA

Note that this is not the ideal method for performing time series analysis, but it answers the specific question that you're asking.

like image 86
Andrew Brēza Avatar answered Dec 11 '25 00:12

Andrew Brēza



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!