I want to process recent updates on a DynamoDB table and save them in another one. Let's say I get updates from an IoT device irregularly put in Table1, and I need to use the N last updates to compute an update in Table2 for the same device in sync with the original updates (kind of a sliding window).
DynamoDB Triggers (Streams + Lambda) seem quite appropriate for my needs, but I did not find a clear definition of TRIM_HORIZON. In some docs I understand that it is the oldest data in Table1 (can get huge), but in other docs it would seems that it is 24h. Or maybe the oldest in the stream, which is 24h?
So anyone knows the truth about TRIM_HORIZON? Would it even be possible to configure it? 
The alternative I see is not to use TRIM_HORIZON, but rather tu use LATEST and perform a query on Table1. But it sort of defeats the purpose of streams.
Here are the relevant aspects for you, from DynamoDB's documentation (1 and 2):
All data in DynamoDB Streams is subject to a 24 hour lifetime. You can retrieve and analyze the last 24 hours of activity for any given table
TRIM_HORIZON - Start reading at the last (untrimmed) stream record, which is the oldest record in the shard. In DynamoDB Streams, there is a 24 hour limit on data retention. Stream records whose age exceeds this limit are subject to removal (trimming) from the stream.
So, if you have a Lambda that is continuously processing stream updates, I'd suggest going with LATEST.
Also, since you "need to use the N last updates to compute an update in Table2", you will have to query Table1 for every update, so that you can 'merge' the current update with the previous ones for that device. I don't think you can't get around that using TRIM_HORIZON too.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With