Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to store only most recent entry in Cassandra?

I have a Cassandra table like :-

create table test(imei text,dt_time timestamp, primary key(imei, dt_time)) WITH CLUSTERING ORDER BY (dt_time DESC);

Partition Key is: imei
Clustering Key is: dt_time

Now I want to store only most recent entry in this table(on the time basis) for each partition key. Let's say if I am inserting entry in a table where there will be single entry for each imei

Now let's say for an imei 98838377272 dt_time is 2017-12-23 16.20.12 Now for same imei if dt_time comes like 2017-12-23 15.20.00 Then this entry should not be inserted in that Cassandra table.

But if time comes like 2017-12-23 17.20.00 then it should get insert and previous row should get replaced with this dt_time.

like image 746
Rajendra Jangir Avatar asked Sep 03 '25 06:09

Rajendra Jangir


2 Answers

You can use TIMESTAMP clause in your insert statement to mark data as most recent:

Marks inserted data (write time) with TIMESTAMP. Enter the time since epoch (January 1, 1970) in microseconds. By default, Cassandra uses the actual time of write.

Remove dt_time from primary key to store only one entry for a imei and

  1. Insert data and specify timestamp as 2017-12-23 16.20.12
  2. Insert data and specify timestamp as 2017-12-23 15.20.00

In this case, select by imei will return record with the most recent timestamp (from point 1).

Please note, this approach will work if your dt_time (which will be specified as timestamp) is less than the current time. In other words, select query will return records with most recent timestamp but before the current time. If you insert data with timestamp greater then the current time you will not see this data until this timestamp comes.

like image 121
Mikhail Baksheev Avatar answered Sep 05 '25 00:09

Mikhail Baksheev


First, to store only the last entry in the table, you need to remove dt_time from primary key - otherwise you'll get entries inserted into DB for every timestamp.

Cassandra supports so-called lightweight transactions that allows to check the data before inserting them.

So if you want to update entry only if dt_time is less than new time, then you can use something like:

first insert data:

> insert into test(imei, dt_time) values('98838377272', '2017-12-23 15:20:12');

try to update data with same time, or it could be smaller

> update test SET dt_time = '2017-12-23 15:20:12' WHERE imei = '98838377272' 
   IF dt_time < '2017-12-23 15:20:12';

 [applied] | dt_time
-----------+---------------------------------
     False | 2017-12-23 15:20:12.000000+0000

This will fail as it's seen from applied equal to False. I can update it with greater timestamp, and it will be updated:

> update test SET dt_time = '2017-12-23 15:20:12' WHERE imei = '98838377272' 
   IF dt_time < '2017-12-23 16:21:12';

 [applied]
-----------
      True

There are several problems with this:

  1. It will not work if entry doesn't exist yet - in this case you may try to use INSERT ... IF NOT EXISTS before trying to update, or to pre-populate the database with emei numbers

  2. The lightweight transactions impose overhead on cluster, as the data should be read before writing, and this could be significant load on servers, and decreasing of throughput.

like image 28
Alex Ott Avatar answered Sep 04 '25 23:09

Alex Ott