I'm sure there's a simple solution to this problem, but I'm having trouble figuring it out. I have a data frame in the following format:
Number Category Type Count
1 X A 10
2 X B 14
3 Y B 3
4 Z A 14
"Type" is a factor with two levels, {A,B}, and each level gets at least one "Category" entry, (for simplicity, they are denoted XYZ here, but in my actual dataset there are too many to list). I would like the number of rows each Type has to match by Category:
Number Category Type Count
1 X A 10
2 X B 14
3 Y A <NA>
4 Y B 3
5 Z A 14
6 Z B <NA>
For instance, if Type A is listed in four rows of Category A, but Type B has no Category A listings, then four new rows of Category A, Type B should be created (with Count=NA). Similarly, if Type A gets four rows of Category A and Type B has two, then two new rows should be created.
I was able to find numerous answers on how to do this for missing dates in time series data using seq(), expand.grid(), and merge(), but I can't quite see how to do it in this case. I hope this is clear... Grateful for any help!
dat <- read.table(header = TRUE, text =
"Number Category Type Count
1 X A 10
2 X B 14
3 Y B 3
4 Z A 14")
Use expand.grid to make a master list and then merge:
merge(dat, expand.grid(lapply(dat[c("Type","Category")], levels)), all.y=TRUE)
# Category Type Number Count
#1 X A 1 10
#2 X B 2 14
#3 Y A NA NA
#4 Y B 3 3
#5 Z A 4 14
#6 Z B NA NA
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With