I have a keyspace populated with data that was expensive to generate. I want two copies of this data within my cluster. I would like to end up with two keyspaces: lets call them mydata and mydatabackup, both of which contain identical data (I don't mind if the Cassandra timestamps are different).
Is there an easy way to do this? Closest thing I can find to an answer is to use sstable2json and json2sstable as suggested in response to a similar question? Is there a better way?
To select a keyspace in Cassandra and perform actions on it, use the keyword USE . The CQL shell switches to the name of the keyspace you specified. To change the current keyspace, use the same command with another name. Note: Whenever you create a table in Cassandra, you start by defining the keyspace.
Changing the Replication Factor for SimpleStrategy: If you want to change the replication factor of a keyspace, you can do it by executing the ALTER KEYSPACE command, which has the following syntax: Syntax: ALTER KEYSPACE "KeySpace Name" WITH replication = {'class': 'Strategy name', 'replication_factor' : 'No.
Creating a database with multiple keyspaces allows you to create different data models for each keyspace or store unique data in unique keyspaces. Multiple keyspaces within a single region allows for an application built on a per-keyspace data model.
" Is there a better way?"
All Cassandra data are stored in the data/ folder (check config value data_file_directories in cassandra.yaml). You may also check the saved_caches_directory and commitlog_directory config.
Inside the data folder, you'll have
One folder per keyspace
One folder for system keyspace
Some folder for authentication etc..
Inside each keyspace folder, you'll have
*-Data.db files which contain your real data
*-Filter.db files
*-Index.db files for index
...
To replicate data, you do a plain copy of those folders.
In our team, the ops use a crontab to schedule regular backup of Cassandra data this way.
Note: sometimes, you may miss live data which are still in memory or in memtable and not flushed yet to disk. You can trigger a full compaction before backuping data files. But full compaction may hurt you perf so be careful
Better answer: use the provided tool to take a snapshot of you DB:
http://www.datastax.com/docs/1.0/operations/backup_restore
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With