Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cassandra DB misunderstanding partition key and primary key

Good Evening,

my problem is, that my recent understanding for partition and primary key is, that the partition key is to distribute the data between the nodes, and the primary ALWAYS contains the partition key. I want to create a partition key to cluster the data with duplicate partition keys and in these clusters I want to have a primary key for unique rows. In my first understanding of Cassandra, it could be possible if can take apart the partition and primary key. Is this possible?

An example to ease my idea:

country state unique_id
USA TEXAS 123
USA TEXAS 114

country and state as the partition key and the unique id as the primary key.
If I create the primary key like this: PRIMARY KEY ((country, state,unique_id)) I can't filter without using the unique_id but I want e.g. a query like SELECT unique_id FROM table WHERE state = 'Texas' and country = 'USA'. If I create the primary key in this way: PRIMARY KEY ((country, state)), it obviously overwrites the data every time one entry gets inserted with the same country and state that's why I need the unique primary key.

like image 725
Nue Avatar asked Sep 11 '25 14:09

Nue


1 Answers

Primary key always includes the partition key, that's always a first item in the primary key. Partition key could consist out of multiple columns, that's why you have brackets around first item in your example. I believe that in your case, primary key should be as following:

PRIMARY KEY ((country, state),unique_id)

In this case, partition key is a combination of country + state, and then inside that partition you will have unique IDs that will be used to select specific items. General syntax for primary key is:

partition key, clustering column1, clustering column2, ...

where partition key could be either:

  • column - single column
  • (column1, column2, ...) - multiple columns
like image 120
Alex Ott Avatar answered Sep 15 '25 06:09

Alex Ott