I'm running a DSE 4.6.5 Cluster (Cassandra 2.0.14.352) with OpsCenter 5.1.1
Once or twice a day, one of the nodes (sometimes more) stops reporting metrics until I manually restart the datastax-agent.
Before I restart the agent, it's alive. Here's the agent log :
WARN [Thread-13] 2015-04-14 23:20:23,277 Cassandra operation queue is full, discarding cassandra operation
WARN [Thread-13] 2015-04-14 23:20:23,277 131176 operations dropped so far.
WARN [Thread-13] 2015-04-14 23:20:23,277 Cassandra operation queue is full, discarding cassandra operation
WARN [Thread-13] 2015-04-14 23:20:23,277 131177 operations dropped so far.
WARN [Thread-13] 2015-04-14 23:20:23,278 Cassandra operation queue is full, discarding cassandra operation
WARN [Thread-13] 2015-04-14 23:20:23,278 131178 operations dropped so far.
WARN [Thread-13] 2015-04-14 23:20:23,278 Cassandra operation queue is full, discarding cassandra operation
WARN [Thread-13] 2015-04-14 23:20:23,278 131179 operations dropped so far.
WARN [Thread-13] 2015-04-14 23:20:23,278 Cassandra operation queue is full, discarding cassandra operation
WARN [Thread-13] 2015-04-14 23:20:23,278 131180 operations dropped so far.
ERROR [cassandra-processor-1] 2015-04-14 23:20:24,387 Error when proccessing cassandra callcom.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (no host was tried)
Please note that :
To sum up, on one of the machine (in a round robin fashion), agent stops reporting data while on the other it works fine. Restarting the agent service corrects the issue but shouldn't it restart itself ? Is this a bug ? How can I get around this ?
Please tell me if you need more information. Thanks.
I've seen this same thing. Two things you can try.
1) Exclude or limit the keyspaces/CF's you collect metrics from. http://docs.datastax.com/en/opscenter/5.1/opsc/configure/opscControllingDataCollection_c.html?scroll=concept_ds_jlq_xk4_gk
2) Run Opscenter on a separate cluster (like a one or two node small cluster separate from your main cluster). http://www.datastax.com/dev/blog/storing-opscenter-data-in-a-separate-cluster
Option 2 is the smarter move honestly, you don't need large nodes, and if you collect metrics on your main cluster and that cluster crashes, you're running blind.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With