Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Align multiple time series in R

I imported into Excel some time series, I got 20 columns and every second column contain dates. I would like to have only one date column with all data aligned and NA where not available.

2007-06-11 1509 2007-06-11 1138 2008-01-17 990
2007-06-12 1493 2007-06-13 1200 2008-01-18 800
2007-06-13 1324 2007-06-14 1122 2008-01-19 880

should give

2007-06-11 1509 1138 NA
2007-06-12 1493 NA   NA
2007-06-13 1324 1200 NA
...

Thank you!

like image 655
prre72 Avatar asked Sep 14 '25 20:09

prre72


2 Answers

Break data into pieces, then merge with the all.x = TRUE option to join the pieces (which you can do elegantly with Reduce).

dat <- read.table(sep=' ', text=
'2007-06-11 1509 2007-06-11 1138 2008-01-17 990
2007-06-12 1493 2007-06-13 1200 2008-01-18 800
2007-06-13 1324 2007-06-14 1122 2008-01-19 880')

#### break dat into appropriate pieces    
# 1) alternating column indices
col_idx <- seq(1, ncol(dat), by=2)    

# 2) each piece is two columns
dat_pieces <- lapply(col_idx, function(i) {
  dat_piece <- dat[, c(i, i+1)]
  # use common name for date columns
  names(dat_piece)[1] <- 'date'
  return(dat_piece)
})

#### merge all pieces
my_new_thing <- Reduce(
  function(a, b) merge(a, b, by = 'date', all.x =T),
  dat_pieces)

Result:

        date   V2   V4 V6
1 2007-06-11 1509 1138 NA
2 2007-06-12 1493   NA NA
3 2007-06-13 1324 1200 NA
like image 116
arvi1000 Avatar answered Sep 16 '25 10:09

arvi1000


The OP has requested to reshape the date/value pairs of the input data so that the for each date the values are listed.

For this two steps are required:

  1. Reshape multiple value columns simultaneously from wide to long format
  2. Reshape back into wide format but with one date column

The melt() function from the data.table package is able to reshape multiple values columns simultaneously. Columns 1, 3, 5, ... contain date strings while columns 2, 4, 6, ... contain numeric values.

library(data.table)
long <- melt(setDT(input), measure.vars = list(date = seq(1, ncol(raw), by = 2), 
                                               value = seq(2, ncol(raw), by = 2)))
long
   variable       date value
1:        1 2007-06-11  1509
2:        1 2007-06-12  1493
3:        1 2007-06-13  1324
4:        2 2007-06-11  1138
5:        2 2007-06-13  1200
6:        2 2007-06-14  1122
7:        3 2008-01-17   990
8:        3 2008-01-18   800
9:        3 2008-01-19   880

Now, reshape back to wide format. The rowid() function is used to create a compacted view of the values:

dcast(long, date  ~ rowid(date, prefix = "V"))
         date   V1   V2
1: 2007-06-11 1509 1138
2: 2007-06-12 1493   NA
3: 2007-06-13 1324 1200
4: 2007-06-14 1122   NA
5: 2008-01-17  990   NA
6: 2008-01-18  800   NA
7: 2008-01-19  880   NA

Alternatively, the values can be displayed as a "sparse matrix" with the original column group numbers as column headers.

dcast(long, date  ~ variable)
         date    1    2   3
1: 2007-06-11 1509 1138  NA
2: 2007-06-12 1493   NA  NA
3: 2007-06-13 1324 1200  NA
4: 2007-06-14   NA 1122  NA
5: 2008-01-17   NA   NA 990
6: 2008-01-18   NA   NA 800
7: 2008-01-19   NA   NA 880

Data

library(data.table)
input <- fread(
  "2007-06-11 1509 2007-06-11 1138 2008-01-17 990
   2007-06-12 1493 2007-06-13 1200 2008-01-18 800
   2007-06-13 1324 2007-06-14 1122 2008-01-19 880"
)
like image 38
Uwe Avatar answered Sep 16 '25 10:09

Uwe