I am learning Cassandra. I am modeling the cassandra table for a specific use case. The use case described below -
A user can write a post. Other users can reply to the post. Users can also "up vote" or "down vote" a post. User sort the posts by date or up votes or down votes.
This is my table definition -
CREATE TABLE post.comments_by_post (
postid text,
parentpostid text,
createdon bigint,
username text,
userid text,
displayname text,
upvotes int,
downvotes int,
comment text,
PRIMARY KEY ((postid, parentpostid), createdon)
) WITH CLUSTERING ORDER BY (createdon DESC);
To increment "upvote" I have a update query -
UPDATE post.comments_by_post SET upvotes = incrementedValue where postid=1 and parentpostid = 2 ;
incrementedValue is calculated adding 1 in previous value.
incrementedValue = previousValue + 1
My question is, if i have to calculate the increment based on the previous value, which is in the table, it will cause race condition and data corruption.
Do we have better way?
I know that cassandra has counter column definition type, which can be used for such incremental values, but it requires additional table. Counter column cannot be used with normal columns which are not part of primary key.
The following table and secondary index will allow you to implement counting without the Counter table and without any locks:
CREATE TABLE votes_by_comment (
postid text,
parentpostid text,
userid text,
vote text, //can be 'up' or 'down'
PRIMARY KEY (( postid, parentpostid ), userid))
CREATE INDEX ON votes_by_comment (vote);
When a user does 'up votes':
INSERT INTO votes_by_comment (postid, parentpostid, userid, vote) VALUES ('comment1', 'post1', 'user1', 'up');
When a user does 'down votes':
INSERT INTO votes_by_comment (postid, parentpostid, userid, vote) VALUES ('comment1', 'post1', 'user1', 'down');
userid
as clustering column will allow it to avoid race condition and restrict multiple voting by one user.
To count up votes:
SELECT count(*) from votes_by_comment WHERE postid='comment1' AND parentpostid='post1' and vote='up';
The secondary index will allow it to perform select by vote
value, since the select by the secondary index will be performed within a partition key, it will have good performance.
But this approach doesn't allow you to implement ordering by votes on Cassandra side, and it should be implemented on the application side.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With