I'm working with a prescription drug claims dataset. When there is a canceled claim, the data system does not just delete the observation, but creates a new observation with the same prescription number but with the days supplied shown as a negative number.
E.g.
DaysSupply RxNumber DateSupplied
1 -10 1 2018
2 10 1 2018
I want to delete paired rows of the dataset if they 1) share the same prescription number (RxNumber), 2) if they have the same prescription date (DateSupplied), and 3) if the DaysSupply are corresponding positive and negative values (e.g. +10 and -10). The prescription number is the patient-specific key in this case.
One complication is that multiple drug fills can be redeemed from one prescription number, so I want to deduplicate JUST PAIRS that match the above conditions instead of deduplicating on all rows that share the same prescription number.
I'm not sure what approach I should be taking. I've thought about using a long if statement/dedpulicate command but I'm not sure how to instruct R to deduplicate ONLY pairs that match the above conditions.
v1 <- c(-10,10,10,-8,8,-6,6,5,4)
v2 <- c(1,1,1,2,2,3,4,9,9)
v3 <- c(2018, 2018, 2018, 2018, 2017, 2016, 2016, 2015, 2014)
df <- data.frame("DaysSupply" = v1, "RxNumber" = v2, "DateSupplied" = v3)
DaysSupply RxNumber DateSupplied
1 -10 1 2018
2 10 1 2018
3 10 1 2018
4 -8 2 2018
5 8 2 2017
6 -6 3 2016
7 6 4 2016
8 5 9 2015
9 4 9 2014
What I would like as an output is:
DaysSupply RxNumber DateSupplied
3 10 1 2018
4 -8 2 2018
5 8 2 2017
6 -6 3 2016
7 6 4 2016
8 5 9 2015
9 4 9 2014
Any ideas?
A dplyr solution using your sample data.
I included some lines toward the end to make it look nicer and get the output to look the same as yours. I'm sure someone could cut a line or two out and make the duplicate removal process a little cleaner, but I got it to do what you need.
df %>%
dplyr::mutate(AbsDaysSupply = abs(DaysSupply)) %>%
dplyr::group_by(RxNumber, DateSupplied, AbsDaysSupply) %>%
dplyr::arrange(RxNumber, DateSupplied, AbsDaysSupply, DaysSupply) %>%
dplyr::mutate(sum = cumsum(DaysSupply)) %>%
dplyr::filter(!(sum <= 0 & dplyr::n() > 1)) %>%
dplyr::ungroup() %>%
dplyr::select(-AbsDaysSupply, -sum) %>%
dplyr::arrange(desc(DateSupplied), RxNumber)
# A tibble: 7 x 3
DaysSupply RxNumber DateSupplied
<dbl> <dbl> <dbl>
1 10 1 2018
2 -8 2 2018
3 8 2 2017
4 -6 3 2016
5 6 4 2016
6 5 9 2015
7 4 9 2014
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With