Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I calculate retention using SQL?

Tags:

sql

mysql

I am trying to calculate the 1-day, 7-day, and 2-week retention rates but I'm stuck. A table called users has the date of the first login. A table called logins has the dates of all subsequent logins.

My thinking was to find the "max," or most recent login from the logins table and find how many days this was since the user's first login. Eventually, I would want to group by first login date and count the number of users from this "cohort" who logged into the app 1 day later, 7 days later, and 14 days later. I feel like I'm close but it's not quite there.

Below is what I have so far.

DESCRIBE users;
Field             Type         Null  Key  Default              Extra                        
----------------  -----------  ----  ---  -------------------  ---------------------------  
uid               int(11)      YES   MUL  (null)                                            
device_id         varchar(64)  YES        (null)                                            
install_ts        timestamp    NO         CURRENT_TIMESTAMP    on update CURRENT_TIMESTAMP  
firstlogin_ts     timestamp    NO         0000-00-00 00:00:00                               
firstpurchase_ts  timestamp    NO         0000-00-00 00:00:00            

DESCRIBE logins;
Field        Type         Null  Key  Default            Extra                        
-----------  -----------  ----  ---  -----------------  ---------------------------  
uid          int(11)      NO         (null)                                          
device_id    varchar(64)  NO         (null)                                          
login_ts     timestamp    NO         CURRENT_TIMESTAMP  on update CURRENT_TIMESTAMP  
login_count  int(11)      NO         (null)                                          
level        int(11)      NO         (null)                                          


SELECT logins.uid, 
   MAX(CAST(login_ts AS Date)) AS login_dt, 
   CAST(firstlogin_ts AS Date) AS firstlogin_dt,
   MAX(DATEDIFF(CAST(login_ts AS Date),
   CAST(firstlogin_ts AS Date))) AS retentionDays
FROM logins
LEFT JOIN users
ON logins.uid = users.uid
GROUP BY logins.uid
ORDER BY logins.uid, login_dt

uid     login_dt    firstlogin_dt  retentionDays  
------  ----------  -------------  -------------  
121043  2015-01-04  (null)         (null)         
121044  2015-01-04  (null)         (null)         
121045  2015-01-06  2015-01-01     5              
121046  2015-01-05  2015-01-01     4              
121047  2015-01-04  2015-01-01     3              
121049  2015-01-05  2015-01-01     4              
121050  2015-01-02  2015-01-01     1              
121054  2015-01-04  2015-01-01     3              
121055  2015-01-15  2015-01-01     14             
121056  2015-01-07  2015-01-01     6              
121057  2015-01-12  2015-01-01     11             
121058  2015-01-02  2015-01-01     1              
121060  2015-01-11  2015-01-01     10             
121063  2015-01-07  2015-01-01     6              
121065  2015-01-05  2015-01-01     4              
121066  2015-01-07  2015-01-01     6              
121067  2015-01-03  2015-01-01     2              
121069  2015-01-03  2015-01-01     2              
121070  2015-01-06  2015-01-01     5     

Any hints would be appreciated.

like image 338
user2205916 Avatar asked Oct 24 '25 02:10

user2205916


1 Answers

I'm going to assume the query in your question is correct, and you just need help to take it one step further.

It looks like what you want to do is select from the results of your first query, and count the number of users with retentionDays >= 1, the number of users with retentionDays >= 7 and the number of users with retentionDays >= 14.

Translated in SQL, that gives a query like:

SELECT
  SUM(IF(retentionDays >= 1, 1, 0)) as 1day
  SUM(IF(retentionDays >= 7, 1, 0)) as 7days
  SUM(IF(retentionDays >= 14, 1, 0)) as 14days
FROM (
  /* your previous query */
) as computedRetentionDays;

If you want a ratio instead of values, you can divide each sum by COUNT(uid).

That being said, I suggest you add a column lastlogin_ts on the users, that will make your life much easier (and this query much faster).

like image 60
rlanvin Avatar answered Oct 26 '25 16:10

rlanvin