How to remove pairs of rows based on a set of conditions including self-referential match?

Question

I'm working with a prescription drug claims dataset. When there is a canceled claim, the data system does not just delete the observation, but creates a new observation with the same prescription number but with the days supplied shown as a negative number.

E.g.

          DaysSupply RxNumber DateSupplied
        1        -10        1         2018
        2         10        1         2018

I want to delete paired rows of the dataset if they 1) share the same prescription number (RxNumber), 2) if they have the same prescription date (DateSupplied), and 3) if the DaysSupply are corresponding positive and negative values (e.g. +10 and -10). The prescription number is the patient-specific key in this case.

One complication is that multiple drug fills can be redeemed from one prescription number, so I want to deduplicate JUST PAIRS that match the above conditions instead of deduplicating on all rows that share the same prescription number.

I'm not sure what approach I should be taking. I've thought about using a long if statement/dedpulicate command but I'm not sure how to instruct R to deduplicate ONLY pairs that match the above conditions.

v1 <- c(-10,10,10,-8,8,-6,6,5,4)
v2 <- c(1,1,1,2,2,3,4,9,9)
v3 <- c(2018, 2018, 2018, 2018, 2017, 2016, 2016, 2015, 2014)
df <- data.frame("DaysSupply" = v1, "RxNumber" = v2, "DateSupplied" = v3)

  DaysSupply RxNumber DateSupplied
1        -10        1         2018
2         10        1         2018
3         10        1         2018
4         -8        2         2018
5          8        2         2017
6         -6        3         2016
7          6        4         2016
8          5        9         2015
9          4        9         2014

What I would like as an output is:

  DaysSupply RxNumber DateSupplied
3         10        1         2018
4         -8        2         2018
5          8        2         2017
6         -6        3         2016
7          6        4         2016
8          5        9         2015
9          4        9         2014

Any ideas?

hmhensen · Accepted Answer

A dplyr solution using your sample data.

I included some lines toward the end to make it look nicer and get the output to look the same as yours. I'm sure someone could cut a line or two out and make the duplicate removal process a little cleaner, but I got it to do what you need.

df %>% 
  dplyr::mutate(AbsDaysSupply = abs(DaysSupply)) %>% 
  dplyr::group_by(RxNumber, DateSupplied, AbsDaysSupply) %>% 
  dplyr::arrange(RxNumber, DateSupplied, AbsDaysSupply, DaysSupply) %>% 
  dplyr::mutate(sum = cumsum(DaysSupply)) %>% 
  dplyr::filter(!(sum <= 0 & dplyr::n() > 1)) %>% 
  dplyr::ungroup() %>% 
  dplyr::select(-AbsDaysSupply, -sum) %>% 
  dplyr::arrange(desc(DateSupplied), RxNumber)
# A tibble: 7 x 3
  DaysSupply RxNumber DateSupplied
       <dbl>    <dbl>        <dbl>
1         10        1         2018
2         -8        2         2018
3          8        2         2017
4         -6        3         2016
5          6        4         2016
6          5        9         2015
7          4        9         2014

How to remove pairs of rows based on a set of conditions including self-referential match?

Tags:

r

duplicates

rnub

1 Answers

hmhensen

Recent Activity

Donate For Us

How to remove pairs of rows based on a set of conditions including self-referential match?

Tags:

r

duplicates

rnub

1 Answers

hmhensen

Related questions

Recent Activity

Donate For Us