I have a 3x168 dataframe in R. Each row has three columns - Day, Hour, and value. The day and hour corresponds to day of the week, the hour column corresponds to the hour on that day, and the value corresponds to the value with which I am concerned.
I am hoping to transform this data such that it exists in a 24x7 matrix, with a row (or column) corresponding to a particular day, and a column (or row) corresponding to a particular hour.
What is the most efficient way to do this in R? I've been able to throw together some messy strings of commands to get something close, but I have a feeling there is a very efficient solution.
Example starting data:
> print(data)
weekday hour value
1 M 1 1.11569683
2 M 2 -0.44550495
3 M 3 -0.82566259
4 M 4 -0.81427790
5 M 5 0.08277568
6 M 6 1.36057839
...
156 SU 12 0.12842608
157 SU 13 0.44697186
158 SU 14 0.86549961
159 SU 15 -0.22333317
160 SU 16 1.75955163
161 SU 17 -0.28904472
162 SU 18 -0.78826607
163 SU 19 -0.78520233
164 SU 20 -0.19301032
165 SU 21 0.65281161
166 SU 22 0.37993619
167 SU 23 -1.58806896
168 SU 24 -0.26725907
I'd hope to get something of the type:
M .... SU
1 1.11569683
2 -0.44550495
3 -0.82566259
4 -0.81427790
5
6
.
.
.
19
20
21 0.65281161
22 0.37993619
23 -1.58806896
24 -0.26725907
You can get some actual sample data this way:
weekday <- rep(c("M","T","W","TH","F","SA","SU"),each=24)
hour <- rep(1:24,7)
value <- rnorm(24*7)
data <- data.frame(weekday=weekday, hour=hour, value=value)
Thanks!
This is pretty trivial with the reshape2 package:
# Sample data - please include some with your next question!
x <- data.frame(day = c(rep("Sunday", 24),
rep("Monday", 24),
rep("Tuesday", 24),
rep("Wednesday", 24),
rep("Thursday", 24),
rep("Friday", 24),
rep("Saturday", 24)),
hour = rep(1:24, 7),
value = rnorm(n = 24 * 7)
)
library(reshape2)
# For rows representing hours
acast(x, hour ~ day)
# For rows representing days
acast(x, day ~ hour)
# If you want to preserve the ordering of the days, just make x$day a factor
# unique(x$day) conveniently gives the right order here, but you'd always want
# check that (and make sure the factor reflects the original value - that's why
# I'm making a new variable instead of overwriting the old one)
x$day.f <- factor(x$day, levels = unique(x$day))
acast(x, hour ~ day.f)
acast(x, day.f ~ hour)
The three-column dataset you have is an example of what's called "molten data" - each row represents a single result (x$value) with one or more identifiers (here, x$day and x$hour). The little formula inside of acast lets you express how you'd like your new dataset to be configured - variable names to the left of the tilde are used to define rows, and variable names to the right to define columns. In this case, there's only one column left - x$value - so it's automatically used to fill in the result matrix.
It took me a while to wrap my brain around all of that, but it's an incredibly powerful to think about reshaping data.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With