I have a data frame df that contains 'messages'. Each row is a message. Each message has a timestamp called df$messagedate in POSIXct format %Y-%m-%d %H:%M:%S. Example:
> head(df)
messageid user.id message.date
123 999 2011-07-17 17:54:27
456 888 2011-07-19 16:56:50
(Here is the dput()'ed version of the above):
df <- structure(list(messageid = c(123L, 456L), user.id = c(999L, 888L),
message.date = structure(c(1310950467, 1311119810), class = c("POSIXct",
"POSIXt"), tzone = "")), .Names = c("messageid", "user.id",
"message.date"), row.names = c(NA, -2L), class = "data.frame")
How do I create a data frame with the total the number of messages per day? Example:
day message.count
2011-07-17 1
2011-07-18 0
2011-07-19 1
Rather than not including the dates with no messages, I want to make sure the message.count is set to zero for those days.
What I have done so far: I have extracted the calendar day part of message.date by doing:
df$calendar.day<-as.POSIXct(strptime(substr(df$message.date,1,10),"%Y-%m-%d",tz="CST6CDT"))
> head(df$calendar.day)
[1] "2011-07-17 CDT" "2011-07-18 CDT" "2011-07-19 CDT"
And from there I can generate a list of every single calendar date in the date range: daterange <- seq(min(df$calendar.day), max(df$calendar.day), by="day")
Here's a fairly straightforward solution that uses sapply() to count the number of messages on each date spanned by your log.
countMessages <- function(timeStamps) {
Dates <- as.Date(strftime(df$message.date, "%Y-%m-%d"))
allDates <- seq(from = min(Dates), to = max(Dates), by = "day")
message.count <- sapply(allDates, FUN = function(X) sum(Dates == X))
data.frame(day = allDates, message.count = message.count)
}
countMessages(df$message.date)
# day message.count
# 1 2011-07-17 1
# 2 2011-07-18 0
# 3 2011-07-19 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With