Issue with GIN Index large data set

Question

I am doing an experiment on GIN Index for Jsonb type data. till I am having 500K rows in a table, GIN Index is working, however when I increase the data to 50Million rows in a table. The GIN Index gets created however explain analysis doesn't seem to consider the Indexes working.

postgres=# \d users
                              Table "public.users"
  Column   |  Type   | Collation | Nullable |              Default
-----------+---------+-----------+----------+------------------------------------
 _id       | integer |           | not null | nextval('users__id_seq'::regclass)
 id        | integer |           | not null |
 attribute | integer |           | not null |
 value     | jsonb   |           | not null |
Indexes:
    "users_pkey" PRIMARY KEY, btree (_id)
    "user_value_gin" gin (value)
Foreign-key constraints:
    "users_attribute_fkey" FOREIGN KEY (attribute) REFERENCES attribute(id)

=======================================

postgres=# explain analyze select * from users where value @> '{"v": "Rahul"}'::jsonb;
                                                             QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..1017776.57 rows=897059 width=70) (actual time=880.003..23596.085 rows=913446 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on users  (cost=0.00..927070.67 rows=373775 width=70) (actual time=451.196..22468.458 rows=304482 loops=3)
         Filter: (value @> '{"v": "Rahul"}'::jsonb)
         Rows Removed by Filter: 16362185
 Planning Time: 105.326 ms
 JIT:
   Functions: 6
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 502.947 ms, Inlining 676.053 ms, Optimization 560.960 ms, Emission 94.573 ms, Total 1834.533 ms
 Execution Time: 24099.549 ms
(12 rows)

My question is that is there any limitation Postgres has on GIN Index?

if yes what is the possible resolution/design change is appropriate to overcome this problem if my requirement is to have approximately 50Billion rows in a table.

Regards Praveen

jjanes · Accepted Answer

You are retrieving nearly 2% of the table. It is not obvious that using an index is the fastest way to do that.

If you want to force it to use the index, you can set enable_seqscan=off and it will probably use the index then

Or, maybe you can pick a more selective thing to search on if all you want to do is see the index work 'on is own'.

user3270865 · Answer

Your example query would perform better with jsonb_path_ops operator class:

CREATE INDEX user_value_gin ON users USING GIN (value jsonb_path_ops);

The default jsonb_ops operator class indexes each key and value separately. The jsonb_path_ops operator class indexes each key value pair. That performs a lot better when, for example, many rows have the same keys. The resulting index is also usually smaller.

The downside of the jsonb_path_ops operator class is that it cannot be used with operators ?, ?|, ?& that would require each key to be indexed separately.

My advice is to always go with jsonb_path_ops if possible.

You should read jsonb Indexing on the official documentation for the most up to date information.

Issue with GIN Index large data set

Tags:

postgresql

Praveen Pandit

2 Answers

jjanes

user3270865

Recent Activity

Donate For Us

Issue with GIN Index large data set

Tags:

postgresql

Praveen Pandit

2 Answers

jjanes

user3270865

Related questions

Recent Activity

Donate For Us