I'm using the airquality data set available in R, and attempting to count the number of rows within the data that do not contain any NAs, while aggregating by Month.
The data looks like this:
head(airquality)
# Ozone Solar.R Wind Temp Month Day
# 1 41 190 7.4 67 5 1
# 2 36 118 8.0 72 5 2
# 3 12 149 12.6 74 5 3
# 4 18 313 11.5 62 5 4
# 5 NA NA 14.3 56 5 5
# 6 28 NA 14.9 66 5 6
As you can see, I have NAs in columns Ozone and Solar.R. I used the function complete.cases as follows:
x <- airquality[,1] # for the Ozone
y <- airquality[,2] # for the Solar.R
ok <- complete.cases(x,y)
And then to check:
nrow(airquality)
# [1] 153
sum(!ok)
# [1] 42
sum(ok)
# [1] 111
which is great.
But now, I'd like to pull that data apart to sort by Month (Column5) and this is where I'm running into problems - in trying to aggregate or sort by the value in column5 (Month).
I was able to get this to run, it won't sort by Month yet (I just wanted to make sure I could get the function to run):
aggregate(x = sum(complete.cases(airquality)), by= list(nrow(airquality)), FUN = sum)
# Group.1 x
# 1 153 111
OK... so to sort it out. I am trying to use the by part of the aggregate function to sort. I tried many variations of the column5 within airquality.
- airquality[,5]
- airquality[,"Month"]
I get these errors:
aggregate(x = sum(complete.cases(airquality)), by= list(airquality[,5]), FUN = sum)
# Error in aggregate.data.frame(as.data.frame(x), ...) :
# arguments must have same length
aggregate(x = sum(complete.cases(airquality)), by=
list(sum(complete.cases(airquality)),airquality[,5]), FUN = sum)
# Error in aggregate.data.frame(as.data.frame(x), ...) :
# arguments must have same length
I tried to search further into the ?aggregate(x, ...) function. Namely on the by part...
by - a list of grouping elements, each as long as the variables in the data frame x. The elements are coerced to factors before use.
I looked up ?factor, but can't seem to see how to apply it (if even necessary in this case). I also tried putting break = into it but didn't work.
None of the "Questions that may already have your answer" seem to apply, many of which give solutions in C# and SQL.
Edit: Expected outcome
Count Month
24 5
9 6
26 7
23 8
29 9
As an addition to the other answers, you could do it with dplyr.
require(dplyr)
airquality %.%
group_by(Month) %.%
summarize(incomplete = sum(!complete.cases(Ozone, Solar.R)),
complete = sum(complete.cases(Ozone, Solar.R)))
# Month incomplete complete
#1 5 7 24
#2 6 21 9
#3 7 5 26
#4 8 8 23
#5 9 1 29
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With