Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Amazon Keyspaces "DefaultTokenFactoryRegistry and DefaultTopologyMonitor" causes High CPU and memory usage

We have partially moved some of our tables from AWS RDS to AWS Keyspaces to see if we could get better performance on KeySpaces. We have put a lot of work to migrate from MySQL to Keyspaces and also we have been monitoring the system to avoid exploding inconsistency. Through our monitoring period, we have observed the following warnings that result in High CPU and memory usage.

- DefaultTokenFactoryRegistry - [s0] Unsupported partitioner 'com.amazonaws.cassandra.DefaultPartitioner, token map will be empty.

-DefaultTopologyMonitor - [s0] Control node IPx/IPy:9142 has an entry for itself in system.peers: this entry will be ignored. This is likely due to a misconfiguration; please verify your rpc_address configuration in cassandra.yaml on all nodes in your cluster(IPx and IPy are cassandra node IPs) - Control node cassandra.{REGION}.amazonaws.com/{IP_1}:9142 has an entry for itself in system.peers: this entry will be ignored. This is likely due to a misconfiguration; please verify your rpc_address configuration in cassandra.yaml on all nodes in your cluster.

Even though these warnings does not appear immediately after we deployed our code and the following hours, it somehow appears after 24-72 hours after the deployment.

What we have done so far?

  • We have tried all connections methods existing in AWS Keyspaces Developer Guide: https://docs.aws.amazon.com/keyspaces/latest/devguide/using_java_driver.html

  • We have found there is an already open discussion in AWS forums: https://forums.aws.amazon.com/thread.jspa?messageID=945795

    • We configured our client as it's stated by an amazonian: https://forums.aws.amazon.com/profile.jspa?userID=512911
  • We have also created an issue on the GitHub of aws-sigv4-auth-cassandra-java-driver-plugin. You can see the details by following the link https://github.com/aws/aws-sigv4-auth-cassandra-java-driver-plugin/issues/24

  • We have walked through the DataStax java driver code to see what's wrong. When we check DefaultTopologyMonitor class, we have seen that there's a rule that checks if our access point to AWS Keyspaces -{IP_2}- which resolves from contact-point [cassandra.{REGION}.amazonaws.com:9142] is control node or not. As this ip address [{IP_2}] exists in system.peers, the control connections is triggered always and iterations and asssignments consumes high cpu and creates garbage. As we understood, the contact point should not be listed in system.peers. We do not have any decision making point to adjust system.peers table, or setting the control node. These are all managed by AWS keyspaces.

Even though it's possible to suppress warnings by setting the log level to error, The Driver says there's a misconfiguration in cassandra.yml which we do not have permission to edit or view. Is there a way to avoid this warning or any solution suggested to solve this issue?

datastax-java-driver {
        basic {
            contact-points = ["cassandra.eu-west-1.amazonaws.com:9142"]
            load-balancing-policy {
                class = DefaultLoadBalancingPolicy
                local-datacenter = eu-west-1
            }
            request {
                timeout = 10 seconds
                default-idempotence = true
            }
        }

        advanced {
            auth-provider = {
                class = software.aws.mcs.auth.SigV4AuthProvider
                aws-region = eu-west-1
            }

            ssl-engine-factory {
                class = DefaultSslEngineFactory
                truststore-path = "./cassandra_truststore.jks"
                truststore-password = "XXX"
                keystore-path = "./cassandra_truststore.jks"
                keystore-password = "XXX"
            }

            retry-policy {
                class =  com.ABC.DEF.config.cassandra.AmazonKeyspacesRetryPolicy
                max-attempts = 5
            }

            connection {
                pool {
                    local {
                        size = 9
                    }
                    remote {
                        size = 1
                    }
                }

                init-query-timeout = 5 seconds

                max-requests-per-connection = 1024
            }

            reconnect-on-init = true

            heartbeat {
                timeout = 1 seconds
            }

            metadata {
                schema {
                    enabled = false
                }
                token-map {
                    enabled = false
                }
            }

            control-connection {
                timeout = 1 seconds
            }

        }
    }


----------


like image 334
xywz Avatar asked Dec 04 '25 06:12

xywz


1 Answers

This is indeed a non-standard, unsupported partitioner: com.amazonaws.cassandra.DefaultPartitioner. Token-aware routing won't work with AWS Keyspaces unless you write your own TopologyMonitor and TokenFactory.

I suggest that you disable token-aware routing completely, see here for instructions.

like image 162
adutra Avatar answered Dec 06 '25 23:12

adutra