I imported into Excel some time series, I got 20 columns and every second column contain dates. I would like to have only one date column with all data aligned and NA where not available.
2007-06-11 1509 2007-06-11 1138 2008-01-17 990
2007-06-12 1493 2007-06-13 1200 2008-01-18 800
2007-06-13 1324 2007-06-14 1122 2008-01-19 880
should give
2007-06-11 1509 1138 NA
2007-06-12 1493 NA NA
2007-06-13 1324 1200 NA
...
Thank you!
Break data into pieces, then merge
with the all.x = TRUE
option to join the pieces (which you can do elegantly with Reduce
).
dat <- read.table(sep=' ', text=
'2007-06-11 1509 2007-06-11 1138 2008-01-17 990
2007-06-12 1493 2007-06-13 1200 2008-01-18 800
2007-06-13 1324 2007-06-14 1122 2008-01-19 880')
#### break dat into appropriate pieces
# 1) alternating column indices
col_idx <- seq(1, ncol(dat), by=2)
# 2) each piece is two columns
dat_pieces <- lapply(col_idx, function(i) {
dat_piece <- dat[, c(i, i+1)]
# use common name for date columns
names(dat_piece)[1] <- 'date'
return(dat_piece)
})
#### merge all pieces
my_new_thing <- Reduce(
function(a, b) merge(a, b, by = 'date', all.x =T),
dat_pieces)
Result:
date V2 V4 V6
1 2007-06-11 1509 1138 NA
2 2007-06-12 1493 NA NA
3 2007-06-13 1324 1200 NA
The OP has requested to reshape the date/value pairs of the input data so that the for each date the values are listed.
For this two steps are required:
The melt()
function from the data.table
package is able to reshape multiple values columns simultaneously. Columns 1, 3, 5, ... contain date strings while columns 2, 4, 6, ... contain numeric values.
library(data.table)
long <- melt(setDT(input), measure.vars = list(date = seq(1, ncol(raw), by = 2),
value = seq(2, ncol(raw), by = 2)))
long
variable date value 1: 1 2007-06-11 1509 2: 1 2007-06-12 1493 3: 1 2007-06-13 1324 4: 2 2007-06-11 1138 5: 2 2007-06-13 1200 6: 2 2007-06-14 1122 7: 3 2008-01-17 990 8: 3 2008-01-18 800 9: 3 2008-01-19 880
Now, reshape back to wide format. The rowid()
function is used to create a compacted view of the values:
dcast(long, date ~ rowid(date, prefix = "V"))
date V1 V2 1: 2007-06-11 1509 1138 2: 2007-06-12 1493 NA 3: 2007-06-13 1324 1200 4: 2007-06-14 1122 NA 5: 2008-01-17 990 NA 6: 2008-01-18 800 NA 7: 2008-01-19 880 NA
Alternatively, the values can be displayed as a "sparse matrix" with the original column group numbers as column headers.
dcast(long, date ~ variable)
date 1 2 3 1: 2007-06-11 1509 1138 NA 2: 2007-06-12 1493 NA NA 3: 2007-06-13 1324 1200 NA 4: 2007-06-14 NA 1122 NA 5: 2008-01-17 NA NA 990 6: 2008-01-18 NA NA 800 7: 2008-01-19 NA NA 880
library(data.table)
input <- fread(
"2007-06-11 1509 2007-06-11 1138 2008-01-17 990
2007-06-12 1493 2007-06-13 1200 2008-01-18 800
2007-06-13 1324 2007-06-14 1122 2008-01-19 880"
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With