I'm running postgresql 9.3 on a machine with 32GB ram, with 0 swap. There are up to 200 clients connected. There's 1 other 4GB process running on the box. How do I interpret this error log message? How can I prevent the out of memory error? Allow swapping? Add more memory to the machine? Allow fewer client connections? Adjust a setting?
example pg_top:
last pid:  6607;  load avg:  3.59,  2.32,  2.61;       up 16+09:17:29                                                                                                                                                              20:49:51
113 processes: 1 running, 111 sleeping, 1 uninterruptable
CPU states: 22.5% user,  0.0% nice,  4.9% system, 63.2% idle,  9.4% iowait
Memory: 29G used, 186M free, 7648K buffers, 23G cached
DB activity: 2479 tps,  1 rollbs/s, 217 buffer r/s, 99 hit%,  11994 row r/s, 3820 row w/s  
DB I/O:     0 reads/s,     0 KB/s,     0 writes/s,     0 KB/s  
DB disk: 149.8 GB total, 46.7 GB free (68% used)
Swap: 
example top showing the only other significant 4GB process on the box:
top - 21:05:09 up 16 days,  9:32,  2 users,  load average: 2.73, 2.91, 2.88
Tasks: 247 total,   3 running, 244 sleeping,   0 stopped,   0 zombie
%Cpu(s): 22.1 us,  4.1 sy,  0.0 ni, 62.9 id,  9.8 wa,  0.0 hi,  0.7 si,  0.3 st
KiB Mem:  30827220 total, 30642584 used,   184636 free,     7292 buffers
KiB Swap:        0 total,        0 used,        0 free. 23449636 cached Mem
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                                                                
 7407 postgres  20   0 7604928  10172   7932 S  29.6  0.0   2:51.27 postgres                                                                                                                                                               
10469 postgres  20   0 7617716 176032 160328 R  11.6  0.6   0:01.48 postgres                                                                                                                                                               
10211 postgres  20   0 7630352 237736 208704 S  10.6  0.8   0:03.64 postgres                                                                                                                                                               
18202 elastic+  20   0 8726984 4.223g   4248 S   9.6 14.4 883:06.79 java                                                                                                                                                                   
 9711 postgres  20   0 7619500 354188 335856 S   7.0  1.1   0:08.03 postgres                                                                                                                                                               
 3638 postgres  20   0 7634552 1.162g 1.127g S   6.6  4.0   0:50.42 postgres
postgresql.conf:
max_connections = 1000                  # (change requires restart)
shared_buffers = 7GB                    # min 128kB
work_mem = 40MB                         # min 64kB
maintenance_work_mem = 1GB              # min 1MB
effective_cache_size = 20GB
....
log:
ERROR:  out of memory
DETAIL:  Failed on request of size 67108864.
STATEMENT:  SELECT  "package_texts".* FROM "package_texts"  WHERE "package_texts"."id" = $1 LIMIT 1
TopMemoryContext: 798624 total in 83 blocks; 11944 free (21 chunks); 786680 used
  TopTransactionContext: 8192 total in 1 blocks; 7328 free (0 chunks); 864 used
  Prepared Queries: 253952 total in 5 blocks; 136272 free (18 chunks); 117680 used
  Type information cache: 24240 total in 2 blocks; 3744 free (0 chunks); 20496 used
  Operator lookup cache: 24576 total in 2 blocks; 11888 free (5 chunks); 12688 used
  TableSpace cache: 8192 total in 1 blocks; 3216 free (0 chunks); 4976 used
  MessageContext: 8192 total in 1 blocks; 6976 free (0 chunks); 1216 used
  Operator class cache: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
  smgr relation table: 24576 total in 2 blocks; 5696 free (4 chunks); 18880 used
  TransactionAbortContext: 32768 total in 1 blocks; 32736 free (0 chunks); 32 used
  Portal hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
  PortalMemory: 8192 total in 1 blocks; 7888 free (0 chunks); 304 used
    PortalHeapMemory: 1024 total in 1 blocks; 568 free (0 chunks); 456 used
      ExecutorState: 32928 total in 3 blocks; 15616 free (5 chunks); 17312 used
        printtup: 34002024 total in 2 blocks; 7056 free (7 chunks); 33994968 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
  Relcache by OID: 24576 total in 2 blocks; 12832 free (3 chunks); 11744 used
  CacheMemoryContext: 1372624 total in 24 blocks; 38832 free (0 chunks); 1333792 used
    CachedPlanSource: 7168 total in 3 blocks; 3080 free (1 chunks); 4088 used
      CachedPlanQuery: 7168 total in 3 blocks; 2992 free (1 chunks); 4176 used
    CachedPlanSource: 15360 total in 4 blocks; 7128 free (5 chunks); 8232 used
      CachedPlanQuery: 15360 total in 4 blocks; 3320 free (1 chunks); 12040 used
    CachedPlanSource: 3072 total in 2 blocks; 552 free (0 chunks); 2520 used
      CachedPlanQuery: 7168 total in 3 blocks; 1592 free (1 chunks); 5576 used
    CachedPlanSource: 3072 total in 2 blocks; 536 free (0 chunks); 2536 used
... 2 Thousand snipped lines of CachedPlans ...
    CachedPlanSource: 15360 total in 4 blocks; 7128 free (5 chunks); 8232 used
      CachedPlanQuery: 15360 total in 4 blocks; 3320 free (1 chunks); 12040 used
    CachedPlanSource: 7168 total in 3 blocks; 3880 free (3 chunks); 3288 used
      CachedPlanQuery: 7168 total in 3 blocks; 4032 free (1 chunks); 3136 used
    CachedPlanSource: 7168 total in 3 blocks; 3936 free (3 chunks); 3232 used
      CachedPlanQuery: 7168 total in 3 blocks; 4032 free (1 chunks); 3136 used
    CachedPlanSource: 7168 total in 3 blocks; 3080 free (1 chunks); 4088 used
      CachedPlanQuery: 7168 total in 3 blocks; 2992 free (1 chunks); 4176 used
    CachedPlanSource: 7168 total in 3 blocks; 3872 free (2 chunks); 3296 used
      CachedPlanQuery: 7168 total in 3 blocks; 4032 free (1 chunks); 3136 used
    pg_toast_17305_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
    index_package_raises_on_natural_key: 3072 total in 2 blocks; 1648 free (1 chunks); 1424 used
    index_package_extensions_on_natural_key: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
    index_package_mixins_on_natural_key: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
    index_package_mixins_on_includes_id: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    package_texts_pkey: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    index_package_file_objects_on_natural_key: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
    index_package_symbols_on_natural_key: 3072 total in 2 blocks; 1136 free (1 chunks); 1936 used
    index_package_symbols_on_full_name: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
    index_package_symbols_on_alias_for_id: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    package_symbols_pkey: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_toast_17313_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
    index_packages_on_natural_key: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
    packages_pkey: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    index_package_files_on_natural_key: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
    package_files_pkey: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_toast_2619_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
    index_projects_on_user_id: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    index_projects_on_type: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    index_projects_on_name_and_type: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
    index_projects_on_claim_ticket: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    ruby_gem_metadata_pkey: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_constraint_contypid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_constraint_conrelid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_constraint_conname_nsp_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
    pg_attrdef_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_attrdef_adrelid_adnum_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
    pg_index_indrelid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_db_role_setting_databaseid_rol_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
    pg_opclass_am_name_nsp_index: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
    pg_foreign_data_wrapper_name_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_enum_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_class_relname_nsp_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
    pg_foreign_server_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_statistic_relid_att_inh_index: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
    pg_cast_source_target_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
    pg_language_name_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_collation_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_amop_fam_strat_index: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
    pg_index_indexrelid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_ts_template_tmplname_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
    pg_ts_config_map_index: 3072 total in 2 blocks; 1784 free (2 chunks); 1288 used
    pg_opclass_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_foreign_data_wrapper_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_event_trigger_evtname_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_ts_dict_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_event_trigger_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_conversion_default_index: 3072 total in 2 blocks; 1784 free (2 chunks); 1288 used
    pg_operator_oprname_l_r_n_index: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
    pg_trigger_tgrelid_tgname_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
    pg_enum_typid_label_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
    pg_ts_config_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_user_mapping_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_opfamily_am_name_nsp_index: 3072 total in 2 blocks; 1784 free (2 chunks); 1288 used
    pg_foreign_table_relid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_type_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_aggregate_fnoid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_constraint_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_rewrite_rel_rulename_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
    pg_ts_parser_prsname_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
    pg_ts_config_cfgname_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
    pg_ts_parser_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_operator_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_namespace_nspname_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_ts_template_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_amop_opr_fam_index: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
    pg_default_acl_role_nsp_obj_index: 3072 total in 2 blocks; 1784 free (2 chunks); 1288 used
    pg_collation_name_enc_nsp_index: 3072 total in 2 blocks; 1784 free (2 chunks); 1288 used
    pg_range_rngtypid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_ts_dict_dictname_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
    pg_type_typname_nsp_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
    pg_opfamily_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_class_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_proc_proname_args_nsp_index: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
    pg_attribute_relid_attnum_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
    pg_proc_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_language_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_namespace_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_amproc_fam_proc_index: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
    pg_foreign_server_name_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_attribute_relid_attnam_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
    pg_conversion_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_user_mapping_user_server_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
    pg_conversion_name_nsp_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
    pg_authid_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_auth_members_member_role_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
    pg_tablespace_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_database_datname_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_auth_members_role_member_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
    pg_database_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_authid_rolname_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
  MdSmgr: 24576 total in 2 blocks; 13984 free (0 chunks); 10592 used
  ident parser context: 0 total in 0 blocks; 0 free (0 chunks); 0 used
  hba parser context: 7168 total in 3 blocks; 304 free (1 chunks); 6864 used
  LOCALLOCK hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
  Timezones: 83472 total in 2 blocks; 3744 free (0 chunks); 79728 used
  ErrorContext: 8192 total in 1 blocks; 8160 free (6 chunks); 32 used
If I'm reading the output of your top correctly, it's not taken at a point when you're out of memory.
The actual error seems fine - it's not requesting a huge amount of memory so presumably the machine was out of memory at that point.
Let's take a quick look at your settings:
max_connections = 1000                  # (change requires restart)
work_mem = 40MB                         # min 64kB
So - you are of the opinion that you can support 1000 concurrent queries each using say 10 + 40MB (some might use multiples of 40MB but let's be reasonable). So - this is suggesting to me that your machine has > 500 cores and say 100GB of RAM. That's not the case.
So - take your number of cores and double it - that's a reasonable value for the max number of connections. That will allow you one query on each core while another is waiting for I/O. Then, place a connection pooler in front of the DB if you need to (pgbouncer / Java's connection pooling).
Then, you might even consider increasing work_mem if you need to.
Oh - perfectly reasonable to run without swap enabled. Once you start swapping you are in a world of pain anyway as regards database usage.
Edit: expand on work_mem vs shared
If in doubt, always refer to the documentation.
The shared_buffers value is, as the name suggests shared between backends. The work_mem is not only per backend, it's actually per sort. So - one query might use three or four times that amount if it is doing sorts on three subqueries.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With