Cassandra where IN clause limitation

Question

I've a table like this:

CREATE TABLE peoples(
    user_id int,
    people_id text,
    email text,
    PRIMARY KEY ((user_id), people_id)
);

Is it good practice when I need to import new peoples to check chunks of people instead checking each row separately?

Something like this:

SELECT * FROM peoples WHERE user_id = 1 and people_id IN ('7651-ABCD', '9874-UHAG');

And from the server side I'll check if it's exists or not, instead query each people like this:

SELECT * FROM peoples WHERE user_id = 1 and people_id = '7651-ABCD';

I need to import about 30-50 thousands peoples and have to know if the people was exists for the user or not, I have to do read before write.

Is there any limitation on the IN? How much is good practice for the IN?

I'm using the binary protocol so I prefer to make each time select request using the IN.

Thanks!

Christopher Batey · Accepted Answer

To answer your question directly then in general executing many small queries rather than large queries (e.g an IN with a lot of IDs) is preferred as it spreads the load around your cluster more evenly. But depending on your cluster size etc I'd just make it configurable and test it.

However you probably want to denormalize to fit this query. For example you could also have a table keyed by people id that gives you the users they are associated with so for each person you are importing you can directly see which users are affected. Query based modelling is the way to go normally.

Cassandra where IN clause limitation

Tags:

database

nosql

cassandra

cqlsh

Rafael Mor

1 Answers

Christopher Batey

Recent Activity

Donate For Us

Cassandra where IN clause limitation

Tags:

database

nosql

cassandra

cqlsh

Rafael Mor

1 Answers

Christopher Batey

Related questions

Recent Activity

Donate For Us