Recently I'm working with data.table in R and it is quite popular and efficient. Currently I come across a problem which I think could be solved using data.table.
I have a data set like this:
event | group_ind 
  1   | group1
  1   | group1
  1   | group1
  2   | group1
  2   | group1
  1   | group2
  1   | group2
  2   | group2
  2   | group3
  2   | group3
Now I want to know the percentage of event 1 occurs in each group. The result for this data set is obvious: 60% for event 1 in group1, 67% in group2 and 0 in group3. In reality the data set has many more observations with more than 2 event types and rows are not sorted in a certain order. I can get what I want in a very dummy way in R (by counting occurrence in event column divided by total observations in each group) but I think there should be a fancier way of doing this.
So the result I want would be like this:
 event | group_ind | percentage
   1   | group1    | 0.6
   2   | group1    | 0.4
   1   | group2    | 0.67
   2   | group2    | 0.33
   1   | group3    | 0
   2   | group3    | 100
I hope this can be done in data.table. Much appreciate for the help.
A simple solution would be just
setDT(DT)[, .(event = 1:2, percentage = tabulate(event)/.N), by = group_ind]
#    group_ind event percentage
# 1:    group1     1  0.6000000
# 2:    group1     2  0.4000000
# 3:    group2     1  0.6666667
# 4:    group2     2  0.3333333
# 5:    group3     1  0.0000000
# 6:    group3     2  1.0000000
Though a more general solution would be to use unique on event (and also pre-order it - as suggested by @EdM).
setDT(DT)[order(event), .(event = unique(event), percentage = tabulate(event)/.N), by = group_ind]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With