Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R conditional filter

Tags:

dataframe

r

I have two dataframes in R.

Release dataframe

Date     Product
2011-01-13   A  
2011-02-15   A  
2011-01-14   B  
2011-02-15   B

Casedata dataframe

Date       Product Numberofcases 
2011-01-13 A       50
2011-01-12 A       20
2011-01-11 A       100
2011-01-10 A       120
2011-01-09 A       150
2011-01-08 A       180
2011-01-07 A       200
2011-01-06 A       220
2011-01-23 A       500
2011-01-31 A       450
2011-02-08 A       50
2011-02-09 A       1000
2011-02-10 A       1200
2011-02-11 A       1500
2011-02-12 A       1800
2011-02-13 A       2000
2011-02-14 A       2200
2011-02-15 A       5000
2011-01-31 A       4500
:::
:::
2011-01-15 B       1000

My requirement is for every product release date(from release dataframe), I should obtain the corresponding sum(numberofcases) one week prior to the release date(in the casedata dataframe). ie., for product A and release date 2011-01-13, it should be sum of all cases in the previous week (from 2011-01-06 to 2011-01-13) ie., (50+20+100+120+150+180+200+220)

Releasedate Product Numberofcasesoneweekpriorrelease
2011-01-13  A       1040
2011-02-15  A       19250
2011-01-14  B       ...
2011-02-15  B       ...

What I have tried :

beforerelease <- sqldf("select product,release.date_release,sum(numberofcasescreated) as numberofcasesbeforerelease from release left join casedata using (product) where date_case>=weekbeforerelease and date_case<=date_release group by product,date_release") 
finaldf <- merge(beforerelease,afterelease,by=c("monthyear","product"))

I am struck and it is not giving me the expected outcome. Can somebody help me ?

like image 745
Adithya Kumar Avatar asked Mar 10 '26 14:03

Adithya Kumar


1 Answers

Using the recently implemented non-equi joins feature in the current development version of data.table, v1.9.7, this can be done simply as (assuming all Date columns are of class Date):

require(data.table)
setDT(release)[, Date2 := Date-7L]
setDT(casedata)[release, on = .(Product, Date >= Date2, Date <= Date), 
                .(count = sum(Numberofcases)), by = .EACHI]

#    Product       Date       Date count
# 1:       A 2011-01-06 2011-01-13  1040
# 2:       A 2011-02-08 2011-02-15 14750
# 3:       B 2011-01-07 2011-01-14    NA
# 4:       B 2011-02-08 2011-02-15    NA
like image 179
Arun Avatar answered Mar 13 '26 03:03

Arun



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!