Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Postgres replication slot from kafka-connect is filling up

The replication slot created by a kafka-connector connector is filling up.

I have a postgres RDS database on AWS. I put the following parameter group option on it (only showing the diff from default)

rds.logical_replication: 1

I have kafka connect running with a debezium postgres connector. This is the config (with certain values redacted, of course)

"database.dbname"        = "mydb"
"database.hostname"      = "myhostname"
"database.password"      = "mypass"
"database.port"          = "myport"
"database.server.name"   = "postgres"
"database.user"          = "myuser"
"database.whitelist"     = "my_database"
"include.schema.changes" = "false"
"plugin.name"            = "wal2json_streaming"
"slot.name"              = "my_slotname"
"snapshot.mode"          = "never"
"table.whitelist"        = "public.mytable"
"tombstones.on.delete"   = "false"
"transforms"             = "key"
"transforms.key.field"   = "id"
"transforms.key.type"    = "org.apache.kafka.connect.transforms.ExtractField$Key"

If I get the status of this connector, it appears to be fine.

curl -s http://my.kafkaconnect.url:kc_port/connectors/my-connector/status | jq

{
  "name": "my-connector",
  "connector": {
    "state": "RUNNING",
    "worker_id": "some_ip"
  },
  "tasks": [
    {
      "id": 0,
      "state": "RUNNING",
      "worker_id": "some_ip"
    }
  ],
  "type": "source"
}

However, the replication slot in postgres keeps getting larger and larger:

SELECT slot_name,
  pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) as replicationSlotLag,
  pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), confirmed_flush_lsn)) as confirmedLag,
  active
FROM pg_replication_slots;
           slot_name           | replicationslotlag | confirmedlag | active
-------------------------------+--------------------+--------------+--------
 my_slotname                   | 20 GB              | 20 GB        | t

Why does replication keep growing? As I understand, the kafka connect connector task that is running should be reading from this replication slot, publishing it to the topic postgres. public.mytable, and then the replication slot should decrease in size. Am I missing something in this chain of actions?

like image 713
swagrov Avatar asked Oct 28 '25 10:10

swagrov


1 Answers

Please take a look at WAL Diskspace Consumption.

The most common reason why the PostgreSQL WAL backlogs is because the connector is monitoring a database or a subset of tables from your database change much more infrequent compared to the other tables or databases in your environment and therefore the connector isn't acknowledging the LSNs frequent enough to avoid the WAL backlog.

For Debezium 1.0.x and before, enable heartbeat.interval.ms.
For Debezium 1.1.0 and after, also consider enabling heartbeat.action.query as well.

like image 89
Naros Avatar answered Oct 31 '25 08:10

Naros



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!