Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Subset dataframe by most number of daily records

I am working with a large dataset, an example can be shown below. For the majority of individual files I will have to process there should be more than one day's worth of data.

Date <- c("05/12/2012 05:00:00", "05/12/2012 06:00:00", "05/12/2012 07:00:00",
          "05/12/2012 08:00:00", "06/12/2012 07:00:00", "06/12/2012 08:00:00", 
          "07/12/2012 05:00:00", "07/12/2012 06:00:00", "07/12/2012 07:00:00",
          "07/12/2012 08:00:00")
Date <- strptime(Date, "%d/%m/%Y %H:%M")
c <- c("0","1","5","4","6","8","0","3","10","6")
c <- as.numeric(c)
df1 <- data.frame(Date,c,stringsAsFactors = FALSE)

I wish to only be left with data on a single day. This day will be chosen by having the most number of data points for that day. If for any reason two days are tied (with the maximum number of data points), I wish to select the day with the highest individual value recorded.

In the example dataframe given above, I would be left with 7th Dec. It has 4 data points (as has the 5th Dec), but it has the highest value recorded out of these two days (i.e. 10).

like image 893
KT_1 Avatar asked Dec 06 '25 17:12

KT_1


1 Answers

Here's a solution with tapply.

# count rows per day and find maximum c value
res <- with(df1, tapply(c, as.Date(Date), function(x) c(length(x), max(x))))

# order these two values in decreasing order and find the associated day
# (at top position):
maxDate <- names(res)[order(sapply(res, "[", 1), 
                            sapply(res, "[", 2), decreasing = TRUE)[1]]

# subset data frame:
subset(df1, as.character(as.Date(Date)) %in% maxDate)

                  Date  c
7  2012-12-07 05:00:00  0
8  2012-12-07 06:00:00  3
9  2012-12-07 07:00:00 10
10 2012-12-07 08:00:00  6
like image 120
Sven Hohenstein Avatar answered Dec 08 '25 10:12

Sven Hohenstein



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!