Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cassandra schema use set collection or multiple rows

I'm designing a keyspace in Cassandra that will hold information about groups of users. Some info on it:

  • Access to this data will only be made by requesting what users are contained in a certain group and updating the users that are contained in a group.
  • Reads will be much more frequent than writes.
  • Each group could contain up to 20,000 user IDs

I have two designs that I'm considering for this.

  1. Multiple rows per group: The table would have two columns of type TEXT and be keyed on Primary Key (GroupID, UserID) and reading the users in a group would be done by select * from table where GroupID = {GroupID} and would return as many rows as there are users in the group.
  2. One row per group using the Cassandra Set Collection: The table would have two columns, the first (GroupID) of type TEXT and the second (UserIDs) of type SET[TEXT] and be keyed on Pimary Key (GroupID). Reading the users in a gorup would be done by select * from table where GroupID = {GroupID} and would return a single row with the set of user ids contained in its UserIDs column set.

I can't find a lot of documentation surrounding what would be the better design for this scenario. Any thoughts or pros and cons to either scenario?

like image 710
philhan Avatar asked Oct 14 '25 14:10

philhan


1 Answers

For a group of 20k user IDs, I would absolutely avoid using collections at all costs. Collections are a convenience feature, but they're not nearly as performant as using a traditional CQL data model where you have the PRIMARY KEY(GroupID,UserID) where all users are ordered in a single partition. That will be both easy to reason about, easy to query (can SELECT either a single partition and page through all group members, or you can SELECT ... WHERE GroupID=X and UserID=Y to determine if a user is in the group), and very performant.

like image 142
Jeff Jirsa Avatar answered Oct 17 '25 22:10

Jeff Jirsa