From reading some stuff about TOAST I've learned Postgres uses an LZ-family compression algorithm, which it calls PGLZ. And it kicks in automatically for values larger than 2KB.
How does PGLZ compare to GZIP in terms of speed and compression ratio?
I'm curious to know if PGLZ and GZIP have similar speeds and compression rates such that doing an extra GZIP step before inserting large JSON strings as data into Postgres would be unnecessary or harmful.
It's significantly faster, but has a lower compression ratio than gzip. It's optimised for lower CPU costs.
There's definitely a place for gzip'ing large data before storing it in a bytea field, assuming you don't need to manipulate it directly in the DB, or don't mind having to use a function to un-gzip it first. You can do it with things like plpython or plperl if you must do it in the DB, but it's usually more convenient to just do it in the app.
If you're going to go to the effort of doing extra compression though, consider using a stronger compression method like LZMA.
There have been efforts to add support for gzip and/or LZMA compression to TOAST in PostgreSQL. The main problem with doing so has been that we need to maintain compatibility with on-disk format for older versions, make sure it stays compatible into the future, etc. So far nobody's come up with an implementation that's satisfied relevant core team members. See e.g. pluggable compression support. It tends to get stuck in a catch-22 where pluggable support gets rejected (see that thread for why) but nobody can agree on a suitable, software-patent-safe algorithm we should adopt as a new default method, agree on how to change the format to handle multiple compression methods, etc.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With