Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Subsetting while summarizing data with ddply

Tags:

r

plyr

I have data which look like:

> head(ddd)
        id affiliate_id affiliate_account_id source         Pipeline num  good   bad
1 61046463         1006                   69 29eada Contact Info Bad   1 FALSE  TRUE
2 61046770         1006                   69 344f39    Did not Reach   1  TRUE FALSE
3 61053937         1006                   69 fff384               na   1  TRUE FALSE
4 61053941         1006                   69 22d8b6          App Out   1  TRUE FALSE
5 61060137         1006                   69 29eada         No Offer   1  TRUE FALSE
6 61060221         1006                   69 3fdb4f Contact Info Bad   1 FALSE  TRUE

I am trying to summarise the data using the ddply function so that I can get something like below:

  affiliate_id affiliate_account_id lead_count good_count bad_count good_rate bad_rate 
1         1006                   69        360       300        60     %        %    
2         1006                 5212         64       60         4      %        %
3         1031                 5102         22       3          20     %        %  
4         1035                 5211          5       15         10     %        %  
5         1035                 5216         90       30         60     %        %

where the percentages (%) are the rate of good/bad fro, that affiliate_account_id.

I can't seem to figure out how to get the column counts and rate food good and bad. Can anyone help go from the following to the last four columns in the above table.

ddply(ddd, .(affiliate_id, affiliate_account_id), summarise, lead_count=length(affiliate_id))
like image 526
ATMathew Avatar asked Dec 02 '25 06:12

ATMathew


1 Answers

You can use sum to compute the number of logical,

ddply(dat, .(affiliate_id, affiliate_account_id), summarise, 
      lead_count=length(affiliate_id),
      good_count= sum(good),
      bad_count = sum(bad),
      good_rate = sum(good)/length(affiliate_id),
      bad_rate = sum(bad)/length(affiliate_id))

 affiliate_id affiliate_account_id lead_count good_count bad_count good_rate  bad_rate
1         1006                   69          3          2         1 0.6666667 0.3333333
2         1006                   70          3          2         1 0.6666667 0.3333333

where dat is: (I slightly modify your input to get 2 different groups since you give only one)

       id affiliate_id affiliate_account_id source         Pipeline num  good   bad
1 61046463         1006                   69 29eada Contact Info Bad   1 FALSE  TRUE
2 61046770         1006                   69 344f39    Did not Reach   1  TRUE FALSE
3 61053937         1006                   69 fff384               na   1  TRUE FALSE
4 61053941         1006                   70 22d8b6          App Out   1  TRUE FALSE
5 61060137         1006                   70 29eada         No Offer   1  TRUE FALSE
6 61060221         1006                   70 3fdb4f Contact Info Bad   1 FALSE  TRUE
like image 82
agstudy Avatar answered Dec 03 '25 22:12

agstudy



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!