Due to PostgreSQL docs https://www.postgresql.org/docs/current/ddl-system-columns.html
xmin The identity (transaction ID) of the inserting transaction for this row version (A row version is an individual state of a row; each update of a row creates a new row version for the same logical row).
We are using it (don't ask why, just happens) for synchronisation data and extracting (E in ETL) changes from PostgreSQL source DB, we made it with intervals scanning, in particular xmin intervals, as example we have synced xmin interval from 0 to 10002, and when we would made next sync we will search for xmin starting from 10003 in this case. If each transaction committed and visible is numbered sequentially, there is no problem, all data changes will be numerated sequentially, but if transactions are numbered at the moment they initialized, may happens next case:
and if we have made sync at 15:02, and got max xmin in target DB: 10002 in this case in next sync starting from xmin 10003 we will skip xmin 10001 and will lost changes.
So Does PostgreSQL transaction id (xmin) appears in committed version sequentially ?
There is also xmax in same doc:
xmax The identity (transaction ID) of the deleting transaction, or zero for an undeleted row version. It is possible for this column to be nonzero in a visible row version. That usually indicates that the deleting transaction hasn't committed yet, or that an attempted deletion was rolled back.
So we can see transaction which planning to delete row (if it will be committed), so maybe xmin also shows transaction which will change row? but this is cant be true due to xmin description:
...for this row version. (A row version is an individual state of a row; each update of a row creates a new row version for the same logical row.)
because, as written, it must match the row version we read, what can perhaps only with dirty read (when we see uncommitted data), but this can't happen in PostgreSQL https://www.postgresql.org/docs/current/transaction-iso.html
Dirty Read: Allowed, but not in PG
Your idea is nice (except that you should take txid_snapshot_xmin as a cut-off point), but it suffers from two shortcomings:
transaction IDs are generated from a 4-byte unsigned integer counter, which will wrap around at some point. Then comparing xmin will no longer work.
To avoid that problem, PostgreSQL will at some point set the “frozen” flag on old rows, which is not visible from SQL. For frozen rows, xmin and xmax must be ignored.
So I think that cute technique is doomed.
You should look into logical decoding, particularly the wal2json plugin. That will allow you to reliably capture all changes to the data.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With