I have a SQLite table of user actions on a website. Each row is the same action on a web site, just different time/date, tagged with a user id. The table has more than 20Million entries. I understand how to get a count by user (i.e. A took the action 3 times, B 4, C 2, D 4, etc.) using the group by function by user id. In other words this works fine:
select count(uid) as event_count
from table
group by uid
What I want is the data for a statistical distribution which is a count of the number of users who only took 1 action, a count of users that took 2 actions, etc. Said another way: The list might look something like:
1 | 339,440
2 | 452,555
3 | 99,239
5 | 20,209
etc. ...
I could use the having event_count = n
clause and just rerun the query for every integer until all were accounted for but that seems silly. There must be a way that I can get a single list with two columns: the group size and the count of the users who all took the exact same number of actions.
As simply as adding another grouping above:
select event_count, count(*) as users_count
from
(select count(uid) as event_count
from table
group by uid) t
group by event_count
order by event_count
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With