I am working with a large dataset, an example can be shown below. For the majority of individual files I will have to process there should be more than one day's worth of data.
Date <- c("05/12/2012 05:00:00", "05/12/2012 06:00:00", "05/12/2012 07:00:00",
"05/12/2012 08:00:00", "06/12/2012 07:00:00", "06/12/2012 08:00:00",
"07/12/2012 05:00:00", "07/12/2012 06:00:00", "07/12/2012 07:00:00",
"07/12/2012 08:00:00")
Date <- strptime(Date, "%d/%m/%Y %H:%M")
c <- c("0","1","5","4","6","8","0","3","10","6")
c <- as.numeric(c)
df1 <- data.frame(Date,c,stringsAsFactors = FALSE)
I wish to only be left with data on a single day. This day will be chosen by having the most number of data points for that day. If for any reason two days are tied (with the maximum number of data points), I wish to select the day with the highest individual value recorded.
In the example dataframe given above, I would be left with 7th Dec. It has 4 data points (as has the 5th Dec), but it has the highest value recorded out of these two days (i.e. 10).
Here's a solution with tapply.
# count rows per day and find maximum c value
res <- with(df1, tapply(c, as.Date(Date), function(x) c(length(x), max(x))))
# order these two values in decreasing order and find the associated day
# (at top position):
maxDate <- names(res)[order(sapply(res, "[", 1),
sapply(res, "[", 2), decreasing = TRUE)[1]]
# subset data frame:
subset(df1, as.character(as.Date(Date)) %in% maxDate)
Date c
7 2012-12-07 05:00:00 0
8 2012-12-07 06:00:00 3
9 2012-12-07 07:00:00 10
10 2012-12-07 08:00:00 6
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With