I am doing a PoC to check if Postgres is the right candidate for our use cases.
I have the following workload:
Data query: Presentation layer will retrieve data every 15 mins for last 2 weeks
Data load: Every 15 mins, 5 Million rows of data is loaded into a table and I have observed that it is consuming 375MB for that load. Per day, it would be 480 Million rows with table size as 36GB.
After I loaded data for couple of days (approx. 1Billion rows in a table), I ran few queries and I observe that select queries are not responding for hours. e.g. select count(*) .. and select * .. simple but heavy queries. no joins.
My requirement is to load the data every 15min and store it for couple of months but I have not yet reached that far. Even with couple of days of data for above work load, I observe that select queries are not responding.
I wonder if postgres has any limitation with this kind of work load or if I have not tuned it right! Did I miss to configure any key parameter?
I have gone through postgres official documentation (https://www.postgresql.org/about/) on the limits and my requirement has not really reached the theoretical limits specified in postgres.
Postgres configuration: Below are the postgres parameters that I have configured.
checkpoint_completion_target | 0.9
default_statistics_target | 500
effective_cache_size | 135GB
maintenance_work_mem | 2GB
max_connections | 50
max_stack_depth | 2MB
max_wal_size | 8GB
min_wal_size | 4GB
shared_buffers | 45GB
wal_buffers | 16MB
work_mem | 471859kB
Server configuration:
Virtualized Hardware!
vCPUs: 32
RAM: 200GB
I wonder if postgres needs a physical dedicated hardware. Perhaps it can't handle this load on Virtualized Hardware!
Appreciate if you have comments or suggestions on this. BR/Nag
Problem won't be PostgreSQL but hardware and how you will tune database. In fact Yahoo, Reddit, Yandex and others use it. And from 9.6 there are parallel queries so you can utilize your CPU more effectively.
There are some configuration steps you can look into to get better and faster response with this much amount of data. You can use multi tenant approach, indexing the DB, use Linux base systems instead of windows.
This links will help you out sorting these aspects along with some other too.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With