What is best and save solution to remove expired archive files after taking snapshots and also remove invalid snapshot files in frequency?
You can use class RecordingLog to inspect the various entries (logs, snapshots) belonging to the Consensus Module (CM) and clustered service(s).
Once you identified which snapshots are safe to delete (according to your business requirements), you can delete the corresponding recordings from the archive and invalidate the entries in the recording log.
The next thing you have to do is purge the CM log to the position of the oldest CM snapshot you kept. There is a snippet in Aeron project that you can take inspiration from: io.aeron.test.cluster.TestCluster#purgeLogToLastSnapshot().
There are 2 system tests within the aeron repo that are demonstrating different ways on how to purge Aeron archive data and reclaim more disk space.
In the first test the connection to Aeron cluster is made to purge Log to the latest snapshot: test name is shouldRecoverWhenFollowerWithInitialSnapshotAndArchivePurgeThenIsMultipleTermsBehind within ClusterTest class
The other way is described in test class StartFromTruncatedRecordingLogTest where the RecordingLog is inspected and unnecessary files are removed. My understanding is that for that case the cluster should be shutdown as the recording log is adjusted and then replaced.
However, once the data in archive is purged, it's not clear to me how the fresh new cluster node without any data could join the cluster. When I'm trying to do that I'm getting the following error:
io.aeron.archive.client.ArchiveException: ERROR - response for correlationId=270, error: requested replay start position=0 is less than recording start position=134217728 for recording 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With