I have a table targeting that has a column marital_status of type text[] and another column data of type jsonb. The content of these two columns is the same, just in a different format (it's just for demonstration purposes). Example data:
id | marital_status | data
----+--------------------------+---------------------------------------------------
1 | null | {}
2 | {widowed} | {"marital_status": ["widowed"]}
3 | {never_married,divorced} | {"marital_status": ["never_married", "divorced"]}
...
There are more than 690K records in the table in random combination.
EXPLAIN ANALYZE SELECT marital_status
FROM targeting
WHERE marital_status @> '{widowed}'::text[]
Usually takes < 900ms without creating any indices:
Seq Scan on targeting (cost=0.00..172981.38 rows=159061 width=28) (actual time=0.017..840.084 rows=158877 loops=1)
Filter: (marital_status @> '{widowed}'::text[])
Rows Removed by Filter: 452033
Planning time: 0.150 ms
Execution time: 845.731 ms
With index it usually takes < 200ms (75% improvement):
CREATE INDEX targeting_marital_status_idx ON targeting ("marital_status");
Result:
Index Only Scan using targeting_marital_status_idx on targeting (cost=0.42..23931.35 rows=159061 width=28) (actual time=3.528..143.848 rows=158877 loops=1)"
Filter: (marital_status @> '{widowed}'::text[])
Rows Removed by Filter: 452033
Heap Fetches: 0
Planning time: 0.217 ms
Execution time: 148.506 ms
EXPLAIN ANALYZE SELECT data
FROM targeting
WHERE (data -> 'marital_status') @> '["widowed"]'::jsonb
Usually takes < 5,700ms without creating any indices (more than 6 times slower!):
Seq Scan on targeting (cost=0.00..174508.65 rows=611 width=403) (actual time=0.095..5399.112 rows=158877 loops=1)
Filter: ((data -> 'marital_status'::text) @> '["widowed"]'::jsonb)
Rows Removed by Filter: 452033
Planning time: 0.172 ms
Execution time: 5408.326 ms
With index it usually takers < 3,700ms (35% improvement):
CREATE INDEX targeting_data_marital_status_idx ON targeting USING GIN ((data->'marital_status'));
Result:
Bitmap Heap Scan on targeting (cost=144.73..2482.75 rows=611 width=403) (actual time=85.966..3694.834 rows=158877 loops=1)
Recheck Cond: ((data -> 'marital_status'::text) @> '["widowed"]'::jsonb)
Rows Removed by Index Recheck: 201080
Heap Blocks: exact=33723 lossy=53028
-> Bitmap Index Scan on targeting_data_marital_status_idx (cost=0.00..144.58 rows=611 width=0) (actual time=78.851..78.851 rows=158877 loops=1)"
Index Cond: ((data -> 'marital_status'::text) @> '["widowed"]'::jsonb)
Planning time: 0.257 ms
Execution time: 3703.492 ms
text[] column so much more performant, even when not using indices?jsonb column only increase the performance by 35%?jsonb column?Seems like an easy question. Essentially you're asking how come,
CREATE TABLE foo ( id int, key1 text );
Is faster than
CREATE TABLE bar ( id int, jsonb foo );
@Craig answers it here in the comment
GIN indexing is generally less efficient than a b-tree, so that much is expected.
Also the null value in that schema should read
SELECT jsonb_build_object('marital_status',ARRAY[null]);
jsonb_build_object
----------------------------
{"marital_status": [null]}
(1 row)
And not {}. PostgreSQL takes numerous shortcuts to make updates to the jsonb objects quick, and to make indexing space-efficient.
If none of this make sense look at this pseudo-table.
CREATE TABLE foo ( id int, x text, y text, z text )
CREATE INDEX ON foo(x);
CREATE INDEX ON foo(y);
CREATE INDEX ON foo(z);
Here we have three btree indexes on the table. Let's look at a similar table..
CREATE TABLE bar ( id int, junk jsonb );
CREATE INDEX ON bar USING gin (junk);
INSERT INTO bar (id,junk) VALUES (1,$${"x": 10, "y": 42}$$);
For bar to be perform like foo, we'd need two btrees both of which would be individually larger than the single GIN index we have. And if you did
INSERT INTO bar (id,junk) VALUES (1,$${"x": 10, "y": 42, "z":3}$$);
We'd have to have another btree index on z, which again would be massive. You can see where I am going with this. jsonb is great, but the complexity of indexing and schema modeling doesn't parallel the database. You can't just reduce the database to a jsonb column, issue CREATE INDEX and expect the same performance.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With