I am new to NoSQL and Cassandra. I am experimenting with settings to acheive an in memory cache only solution. I am processing by reading line by line from a 100000 lines file and using Hector to insert to Cassandra. I am noticing a very low throughput of around 6000 inserts per second. The whole write operation about 20.5 seconds which is unacceptable to our application. We need something like 100000 inserts per second. I am testing on a Windows 7 computer with 4GB RAM.
I am doing an insert only test.
Kindly let me know where I am going wrong. Kindly suggest on how I can improve the inserts per second.
Keyspace: Keyspace1
        Read Count: 0
        Read Latency: NaN ms.
        Write Count: 177042
        Write Latency: 0.003106884242157228 ms.
        Pending Tasks: 0
                Column Family: user
                SSTable count: 3
                Space used (live): 17691
                Space used (total): 17691
                Number of Keys (estimate): 384
                Memtable Columns Count: 100000
                Memtable Data Size: 96082090
                Memtable Switch Count: 1
                Read Count: 0
                Read Latency: NaN ms.
                Write Count: 177042
                Write Latency: NaN ms.
                Pending Tasks: 0
                Key cache capacity: 150000
                Key cache size: 0
                Key cache hit rate: NaN
                Row cache capacity: 150000
                Row cache size: 0
                Row cache hit rate: NaN
                Compacted row minimum size: 73
                Compacted row maximum size: 924
                Compacted row mean size: 784
I have tried couple of methods for setting row cache and key cache:
Through Cassandra CLI
Through NodeCmd: java org.apache.cassandra.tools.NodeCmd -p 7199 setcachecapacity Keyspace1 user 150000 150000
I wouldn't describe 6000 writes per second as "slow" - but Cassandra can do much better. But note that Cassandra is designed for durable writes, so may give lower performance than memory-only caching solutions.
As sbridges says, you cannot get full performance out of Cassandra using a single client. Try using multiple client threads, or processes, or machines.
I don't think you will get 100,000 writes per second on a single node. I have only obtained around 20,000-25,000 writes per second on modest hardware (although Cassandra has got significantly faster since I did that benchmarking). 6000 per second seems about right for a single client against a single commodity node.
With a cluster of nodes, you can definitely get 100,000 per second (See http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html for a recent benchmark of 1,000,000 writes per second!)
Row cache and key cache are to help read performance, not write performance.
Also, make sure you are batching the writes (if appropriate) - this will reduce the network overhead.
How many threads/processes are you using to perform inserts? Hector calls are synchronous, so if you are only using 1 thread on the client side, that may be your bottleneck.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With