Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cassandra where IN clause limitation

I've a table like this:

CREATE TABLE peoples(
    user_id int,
    people_id text,
    email text,
    PRIMARY KEY ((user_id), people_id)
);

Is it good practice when I need to import new peoples to check chunks of people instead checking each row separately?

Something like this:

SELECT * FROM peoples WHERE user_id = 1 and people_id IN ('7651-ABCD', '9874-UHAG');

And from the server side I'll check if it's exists or not, instead query each people like this:

SELECT * FROM peoples WHERE user_id = 1 and people_id = '7651-ABCD';

I need to import about 30-50 thousands peoples and have to know if the people was exists for the user or not, I have to do read before write.

Is there any limitation on the IN? How much is good practice for the IN?

I'm using the binary protocol so I prefer to make each time select request using the IN.

Thanks!

like image 498
Rafael Mor Avatar asked Sep 04 '25 16:09

Rafael Mor


1 Answers

To answer your question directly then in general executing many small queries rather than large queries (e.g an IN with a lot of IDs) is preferred as it spreads the load around your cluster more evenly. But depending on your cluster size etc I'd just make it configurable and test it.

However you probably want to denormalize to fit this query. For example you could also have a table keyed by people id that gives you the users they are associated with so for each person you are importing you can directly see which users are affected. Query based modelling is the way to go normally.

like image 83
Christopher Batey Avatar answered Sep 07 '25 19:09

Christopher Batey