Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MUTATION_REQ/RSP message keep being dropped by Cassandra Cluster

Tags:

cassandra

I have a Cassandra cluster on my development environment. Recently I did some write request testing, and found many "MUTATION_REQ/RSP was dropped" message from the log as follows.

${source_ip}:7000->/${dest_ip}:7000-SMALL_MESSAGES-d21c2c3e dropping message of type MUTATION_RSP whose timeout expired before reaching the network

MUTATION_REQ messages were dropped in last 5000 ms: 0 internal and 110 cross node. Mean internal dropped latency: 0 ms and Mean cross-node dropped latency: 3971 ms

I also found there were more dropped MUTATION_REQ than MUTATION_RSP: (I found that with "nodetool tpstats")

Latencies waiting in queue (micros) per dropped message types
Message type Dropped 50% 95% 99% Max

HINT_RSP 63 1131.752 2816.159 3379.391 3379.391
GOSSIP_DIGEST_SYN 0 1629.722 2816.159 4055.2690000000002 4055.2690000000002
HINT_REQ 4903 1955.666 1955.666 1955.666 1955.666
GOSSIP_DIGEST_ACK2 0 1131.752 2816.159 4055.2690000000002 4055.2690000000002
MUTATION_RSP **6146** 1358.102 2816.159 3379.391 89970.66
MUTATION_REQ **450775** 1358.102 3379.391 4866.323 4139110.981

My questions are:

  1. Is it usual for a health cluster to have so many dropped MUTATION_REQ/RSP?
  2. I supposed MUTATION_RSP were dropped on replica node, and MUTATION_REQ on coordinator node. am I correct?

Thanks

like image 281
柯鴻儀 Avatar asked Oct 27 '25 13:10

柯鴻儀


1 Answers

I had same issues and asked same question on Cassandra mailing list, here answer:

First thing to check, do you have NTP client running on all Cassandra servers? Are their clock in sync? If you answer "yes" to both, check the server load, does any server have high CPU usage or disk utilization? Any swapping activity? If not, check the GC logs, and looking for long GC pauses.

Mine issue was wrong clock synchronization.

So, first things first, check clock on each node with timedatectl utility, in output you should have two entries NTP enabled and NTP synchronized with yes. In case clock out of sync, force synchronization on all nodes and then make full repair on each affected node (I made nodetool -full -pr). After that you should radically less messages with MUTATION_* drops - from hundreds on heavy load to one-two per day.

like image 153
Azamat Hackimov Avatar answered Oct 29 '25 08:10

Azamat Hackimov



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!