I have one production cassandra node and would like create same cassandra in my local machine. As per my understanding, I can opt following option
 1. Taking snapshot of each keyspace from production and use it in local machine.(But It would take more time as I have many keyspace).
 2. Exporting production cassandra data to as CSV and importing in to local cassandra(I have COUNTER TABLE. Hence,It also creating some headache - Correct me if it is not).
My Question is "What will happen if I move entire data_directory,Commit_log folder from production to local and start local cassandra?". Is it possible anyway?
When I did the above solution,cassandra throws many errors.
Adding a new node to an existing cluster, in Apache Cassandra version 3 and higher is fairly easy. When a new node is added to the cluster, Cassandra will automatically adjust the token ranges each node is responsible for resulting in each node in the cluster storing a smaller subset of the data.
In the cassandra. yaml file for each node, remove the IP address of the dead node from the - seeds list in the seed-provider property. If the cluster needs a new seed node to replace the dead node, add the new node's IP address to the - seeds list of the other nodes.
Open the node's cassandra. yaml file and add the node's address to the seed_provider list. Make this change on all other nodes in the cluster. Start Cassandra as a service or a stand-alone process.
In Cassandra all nodes communicating with each other via a gossip protocol. Gossip is the message system that Cassandra node use to make their data consistent with each other.
If all you're looking to do is recreate your production node on a local machine, then all you really need to do is copy everything (assuming hardware is similar).
From Production:
To your local machine (assuming fresh install)
<data_dir>/<keyspace>/<columnfamily>/
Note: These checklists are not completely thorough
Running nodetool repair isn't a bad idea in this case. Assuming you just want to recreate the production node on a local machine (as stated in the question), then it might be moot as the snapshot would have the current data. Also running nodetool cleanup wouldn't hurt either, if repair was deemed essential.
Answering your question:
If you just copy the data directory and commitlogs from production onto your local machine won't really work as you need to recreate the keyspaces and column families to put the data in. If however you did that, then something else is at work. To get the data from one cassandra environment to the next, the config files, data directory (commitlogs, data, saved_cache, etc), and the schema scripts are the most important. From there you probably can debug issues. A fresh install (or remapping the current data/commitlog/etc directories to new directories, ie. new_data, new_commitlog, new_saved_cache) might be the easiest way to accomplish the task.
If you have one node, you can copy the /data, /saved_caches, and /commitlog folders to your local machine.You need the same version of cassandra. But first, you need to export your schema(s) from production and import into your local machine. Then stop local cassandra, delete any contents of local /commitlog folder, copy data from prod into local - the foldernames in /data will probably be different because newer versions of c* append a UUID to the tablename folders. but it will work. you may have to run nodetool repair afterward.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With