Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimize SELECT query with ORDER BY, OFFSET and LIMIT of postgresql

Tags:

postgresql

This is my table schema

Column       |          Type          |                      Modifiers                      
-------------+------------------------+------------------------------------------------------
id           | integer                | not null default nextval('message_id_seq'::regclass)
date_created | bigint                 |
content      | text                   |
user_name    | character varying(128) |
user_id      | character varying(128) |
user_type    | character varying(8)   |
user_ip      | character varying(128) |
user_avatar  | character varying(128) |
chatbox_id   | integer                | not null
Indexes:
    "message_pkey" PRIMARY KEY, btree (id)
    "idx_message_chatbox_id" btree (chatbox_id)
    "indx_date_created" btree (date_created)
Foreign-key constraints:
    "message_chatbox_id_fkey" FOREIGN KEY (chatbox_id) REFERENCES chatboxes(id) ON UPDATE CASCADE ON DELETE CASCADE

This is the query

SELECT * 
FROM message 
WHERE chatbox_id=$1 
ORDER BY date_created 
OFFSET 0 
LIMIT 20;

($1 will be replaced by the actual ID)

It runs pretty well, but when it reaches 3.7 millions records, all SELECT queries start consuming a lot of CPU and RAM and then the whole system goes down. I have to temporarily backup all the current messages and truncate that table. I am not sure what is going on because everything is ok when I have about 2 millions records

I am using Postresql Server 9.1.5 with default options.


Update the output of EXPLAIN ANALYZE

Limit  (cost=0.00..6.50 rows=20 width=99) (actual time=0.107..0.295 rows=20 loops=1)
->  Index Scan Backward using indx_date_created on message  (cost=0.00..3458.77 rows=10646 width=99) (actual time=0.105..0.287 rows=20 loops=1)
Filter: (chatbox_id = 25065)
Total runtime: 0.376 ms
(4 rows)

Update server specification

Intel Xeon 5620 8x2.40GHz+HT
12GB DDR3 1333 ECC
SSD Intel X25-E Extreme 64GB

Final solution

Finally I can go above 3 million messages, I have to optimize the postgresql configuration as wildplasser suggested and also make a new index as A.H. suggested

like image 880
Tan Nguyen Avatar asked Oct 17 '25 11:10

Tan Nguyen


2 Answers

You could try to give PostgreSQL a better index for that query. I propose something like this:

create index invent_suitable_name on message(chatbox_id, date_created);

or

 create index invent_suitable_name on message(chatbox_id, date_created desc);
like image 156
A.H. Avatar answered Oct 19 '25 02:10

A.H.


Try adding an index for chatbox_id, date_created. For this particular query it will give you maximum performance.

For the case, when postgres "start consuming a lot of CPU and RAM" try to get more details. It could be a bug (with default configuration postgres normally doesn't consume much RAM).

UPD My guess for the reason of bad performance:

At some point in time the table becomes to big for full scan to collect accurate statistics. After another ANALYZE Postgresql got bad statistics for the table. As a result - got bad plan that consisted of:

  1. Index scan on chatbox_id;
  2. Ordering of returned records to get top 20.

Because of default configs and lots of records, returned on step 1, postgres was forced to do sorting in files on disk. As a result - bad performance.

UPD2 EXPALIN ANALYZE shows 0.376 ms time and a good plan. Can you give details about a case with bad performance?

like image 34
Ihor Romanchenko Avatar answered Oct 19 '25 02:10

Ihor Romanchenko