I am doing an experiment on GIN Index for Jsonb type data. till I am having 500K rows in a table, GIN Index is working, however when I increase the data to 50Million rows in a table. The GIN Index gets created however explain analysis doesn't seem to consider the Indexes working.
postgres=# \d users
Table "public.users"
Column | Type | Collation | Nullable | Default
-----------+---------+-----------+----------+------------------------------------
_id | integer | | not null | nextval('users__id_seq'::regclass)
id | integer | | not null |
attribute | integer | | not null |
value | jsonb | | not null |
Indexes:
"users_pkey" PRIMARY KEY, btree (_id)
"user_value_gin" gin (value)
Foreign-key constraints:
"users_attribute_fkey" FOREIGN KEY (attribute) REFERENCES attribute(id)
=======================================
postgres=# explain analyze select * from users where value @> '{"v": "Rahul"}'::jsonb;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..1017776.57 rows=897059 width=70) (actual time=880.003..23596.085 rows=913446 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on users (cost=0.00..927070.67 rows=373775 width=70) (actual time=451.196..22468.458 rows=304482 loops=3)
Filter: (value @> '{"v": "Rahul"}'::jsonb)
Rows Removed by Filter: 16362185
Planning Time: 105.326 ms
JIT:
Functions: 6
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 502.947 ms, Inlining 676.053 ms, Optimization 560.960 ms, Emission 94.573 ms, Total 1834.533 ms
Execution Time: 24099.549 ms
(12 rows)
My question is that is there any limitation Postgres has on GIN Index?
if yes what is the possible resolution/design change is appropriate to overcome this problem if my requirement is to have approximately 50Billion rows in a table.
Regards Praveen
You are retrieving nearly 2% of the table. It is not obvious that using an index is the fastest way to do that.
If you want to force it to use the index, you can set enable_seqscan=off and it will probably use the index then
Or, maybe you can pick a more selective thing to search on if all you want to do is see the index work 'on is own'.
Your example query would perform better with jsonb_path_ops operator class:
CREATE INDEX user_value_gin ON users USING GIN (value jsonb_path_ops);
The default jsonb_ops operator class indexes each key and value separately. The jsonb_path_ops operator class indexes each key value pair. That performs a lot better when, for example, many rows have the same keys. The resulting index is also usually smaller.
The downside of the jsonb_path_ops operator class is that it cannot be used with operators ?, ?|, ?& that would require each key to be indexed separately.
My advice is to always go with jsonb_path_ops if possible.
You should read jsonb Indexing on the official documentation for the most up to date information.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With