Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Add a column and populate based on other columns values

I have the following table:

CREATE TABLE trajectory(
    user_id int, 
    session_id int, 
    timestamp timestamp with time zone,
    lat double precision, 
    lon double precision
);

INSERT INTO trajectory(user_id, session_id, timestamp, lat, lon) VALUES 
(1, 25304,'2008-10-23 02:53:04+01', 39.984702, 116.318417),
(1, 25304, '2008-10-23 02:53:10+01', 39.984683, 116.31845), 
(1, 25304, '2008-10-23 02:53:15+01', 39.984686, 116.318417),
(1, 25304, '2008-10-23 02:53:20+01', 39.984688, 116.318385), 
(1, 20959,'2008-10-24 02:09:59+01', 40.008304, 116.319876),
(1, 20959,'2008-10-24 02:10:04+01', 40.008413, 116.319962), 
(1, 20959,'2008-10-24 02:10:14+01', 40.007171, 116.319458),
(2, 55305, '2008-10-23 05:53:05+01', 39.984094, 116.319236), 
(2, 55305, '2008-10-23 05:53:11+01', 39.984198, 116.319322), 
(2, 55305, '2008-10-23 05:53:21+01', 39.984224, 116.319402), 
(2, 34104, '2008-10-23 23:41:04+01', 40.013867, 116.306473),
(2, 34104, '2008-10-23 23:41:16+01', 40.013907, 116.306488);

Question:

I want to add a column trip_id to this table, based on the user_id and session_id columns, so that when the user's session id changes, I know the user is on a new trip so I add that id to the new trip column.

Required output:

user_id |session_id  |timestamp              |    lat       |   lon     | trip_id
--------|------------|-----------------------|--------------|-----------|-----------
  1     |     25304  |2008-10-23 02:53:04+01 | 39.984702    |116.318417 |       1
  1     |     25304  |2008-10-23 02:53:10+01 | 39.984683    |116.31845  |       1
  1     |     25304  |2008-10-23 02:53:15+01 | 39.984686    |116.318417 |       1
  1     |     25304  |2008-10-23 02:53:20+01 | 39.984688    |116.318385 |       1
  1     |     20959  |2008-10-24 02:09:59+01 |40.008304     |116.319876 |       2
  1     |     20959  |2008-10-24 02:10:04+01 |40.008413     |116.319962 |       2
  1     |     20959  |2008-10-24 02:10:14+01 |40.007171     |116.319458 |       2
  2     |     55305  |2008-10-23 05:53:05+01 |39.984094     |116.319236 |       1
  2     |     55305  |2008-10-23 05:53:11+01 |39.984198     |116.319322 |       1
  2     |     55305  |2008-10-23 05:53:21+01 |39.984224     |116.319402 |       1
  2     |     34104  |2008-10-23 23:41:04+01 |40.013867     |116.306473 |       2
  2     |     34104  |2008-10-23 23:41:16+01 |40.013907     |116.306488 |       2

How can I do this?

EDIT

Thanks for these great answers, but all the answers received are kind of retrieving table values, they do not modify the table. Plus, I added the timestamp column that I initially omitted thinking that would simplify my question.

like image 719
Amina Umar Avatar asked Nov 30 '25 17:11

Amina Umar


2 Answers

We can approach this with a gaps-and-island technique. Assuming that you have a column to order your dataset, say ordering_id:

select t.*, 
    count(*) 
        filter(where session_id is distinct from lag_session_id) 
        over(partition by user_id order by ordering_id) trip_id
from (
    select t.*, 
        lag(session_id) over(partition by user_id order by ordering_id) lag_session_id
    from trajectory t
) t

The idea is to bring the "preceding" session id of the same user with lag, then to compare it with the value on the current row; we can then count how many times it changed, which defines the trip id.

user_id session_id lat lon ordering_id lag_session_id trip_id
1 25304 39.984702 116.318417 1 null 1
1 25304 39.984683 116.31845 2 25304 1
1 25304 39.984686 116.318417 3 25304 1
1 25304 39.984688 116.318385 4 25304 1
1 20959 40.008304 116.319876 5 25304 2
1 20959 40.008413 116.319962 6 20959 2
1 20959 40.007171 116.319458 7 20959 2
2 55305 39.984094 116.319236 8 null 1
2 55305 39.984198 116.319322 9 55305 1
2 55305 39.984224 116.319402 10 55305 1
2 34104 40.013867 116.306473 11 55305 2
2 34104 40.013907 116.306488 12 34104 2

fiddle

like image 135
GMB Avatar answered Dec 02 '25 05:12

GMB


Yet another approach would involve:

  • selecting the minimum record of your identifier
  • applying a DENSE RANK by ordering on it.

If you don't have any ordering field, you can use "ctid" for that purpose: it's a field that stores the physical location of each row within the table. Otherwise use the timestamp field in place of "ctid".

ALTER TABLE trajectory ADD COLUMN trip_id INT;

WITH cte AS (
    SELECT ctid, 
           user_id, 
           MIN(ctid) OVER(PARTITION BY user_id, session_id) AS parts 
    FROM trajectory
), cte2 AS (
    SELECT ctid,
           DENSE_RANK() OVER(PARTITION BY user_id ORDER BY parts) AS trip_id 
    FROM cte
)
UPDATE trajectory 
SET trip_id = cte2.trip_id
FROM cte2
WHERE trajectory.ctid = cte2.ctid;

Output:

user_id session_id lat lon trip_id
1 25304 39.984702 116.318417 1
1 25304 39.984683 116.31845 1
1 25304 39.984686 116.318417 1
1 25304 39.984688 116.318385 1
1 20959 40.008304 116.319876 2
1 20959 40.008413 116.319962 2
1 20959 40.007171 116.319458 2
2 55305 39.984094 116.319236 1
2 55305 39.984198 116.319322 1
2 55305 39.984224 116.319402 1
2 34104 40.013867 116.306473 2
2 34104 40.013907 116.306488 2

Check the demo here.

like image 40
lemon Avatar answered Dec 02 '25 05:12

lemon



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!