One of our production server experienced an oom-killer problem in 2020-12-26 20:05:30, but I can not figure out the cause of this event. The zabbix page showed that available memory was about 9GB when oom-killer killed my java process. Please help me figure out why it happened, thank you.

Here is the log in /var/log/message(there were a lot "kernel: TCP: too many orphaned sockets" before this)
Dec 26 20:05:30 myhost kernel: systemd invoked oom-killer: gfp_mask=0x3000d0, order=2, oom_score_adj=0
Dec 26 20:05:30 myhost kernel: systemd cpuset=/ mems_allowed=0
Dec 26 20:05:30 myhost kernel: CPU: 4 PID: 1 Comm: systemd Not tainted 3.10.0-1062.9.1.el7.x86_64 #1
Dec 26 20:05:30 myhost kernel: Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 8c24b4c 04/01/2014
Dec 26 20:05:30 myhost kernel: Call Trace:
Dec 26 20:05:30 myhost kernel: [<ffffffffae37ac23>] dump_stack+0x19/0x1b
Dec 26 20:05:30 myhost kernel: [<ffffffffae375ce9>] dump_header+0x90/0x229
Dec 26 20:05:30 myhost kernel: [<ffffffffadd06142>] ? ktime_get_ts64+0x52/0xf0
Dec 26 20:05:30 myhost kernel: [<ffffffffaddc1714>] oom_kill_process+0x254/0x3e0
Dec 26 20:05:30 myhost kernel: [<ffffffffadd32e71>] ? cpuset_mems_allowed_intersects+0x21/0x30
Dec 26 20:05:30 myhost kernel: [<ffffffffaddc11bd>] ? oom_unkillable_task+0xcd/0x120
Dec 26 20:05:30 myhost kernel: [<ffffffffaddc1266>] ? find_lock_task_mm+0x56/0xc0
Dec 26 20:05:30 myhost kernel: [<ffffffffaddc1f66>] out_of_memory+0x4b6/0x4f0
Dec 26 20:05:30 myhost kernel: [<ffffffffaddc8a6f>] __alloc_pages_nodemask+0xacf/0xbe0
Dec 26 20:05:30 myhost kernel: [<ffffffffadc98f6d>] copy_process+0x1dd/0x1a50
Dec 26 20:05:30 myhost kernel: [<ffffffffadc9a991>] do_fork+0x91/0x330
Dec 26 20:05:30 myhost kernel: [<ffffffffae388a26>] ? trace_do_page_fault+0x56/0x150
Dec 26 20:05:30 myhost kernel: [<ffffffffadc9acb6>] SyS_clone+0x16/0x20
Dec 26 20:05:30 myhost kernel: [<ffffffffae38e2b4>] stub_clone+0x44/0x70
Dec 26 20:05:30 myhost kernel: [<ffffffffae38dede>] ? system_call_fastpath+0x25/0x2a
Dec 26 20:05:30 myhost kernel: Mem-Info:
Dec 26 20:05:30 myhost kernel: active_anon:13196525 inactive_anon:589 isolated_anon:35#012 active_file:950 inactive_file:1072 isolated_file:93#012 unevictable:0 dirty:0 writeback:30 unstable:0#012 slab_reclaimable:209382 slab_unreclaimable:237991#012 mapped:612 shmem:724 pagetables:36600 bounce:0#012 free:1721704 free_pcp:226 free_cma:0
Dec 26 20:05:30 myhost kernel: Node 0 DMA free:15904kB min:16kB low:20kB high:24kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Dec 26 20:05:30 myhost kernel: lowmem_reserve[]: 0 2810 63223 63223
Dec 26 20:05:30 myhost kernel: Node 0 DMA32 free:242744kB min:3000kB low:3748kB high:4500kB active_anon:1444208kB inactive_anon:24kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3111608kB managed:2878104kB mlocked:0kB dirty:0kB writeback:0kB mapped:20kB shmem:40kB slab_reclaimable:571840kB slab_unreclaimable:419836kB kernel_stack:65344kB pagetables:2092kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Dec 26 20:05:30 myhost kernel: lowmem_reserve[]: 0 0 60412 60412
Dec 26 20:05:30 myhost kernel: Node 0 Normal free:6628168kB min:64560kB low:80700kB high:96840kB active_anon:51341892kB inactive_anon:2332kB active_file:3800kB inactive_file:4288kB unevictable:0kB isolated(anon):140kB isolated(file):372kB present:62914560kB managed:61862736kB mlocked:0kB dirty:0kB writeback:120kB mapped:2428kB shmem:2856kB slab_reclaimable:265688kB slab_unreclaimable:532128kB kernel_stack:167552kB pagetables:144308kB unstable:0kB bounce:0kB free_pcp:904kB local_pcp:104kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Dec 26 20:05:30 myhost kernel: lowmem_reserve[]: 0 0 0 0
Dec 26 20:05:30 myhost kernel: Node 0 DMA: 0*4kB 0*8kB 0*16kB 1*32kB (U) 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15904kB
Dec 26 20:05:30 myhost kernel: Node 0 DMA32: 8570*4kB (UEM) 6220*8kB (UEM) 1155*16kB (UEM) 528*32kB (UEM) 647*64kB (UEM) 268*128kB (UEM) 76*256kB (UEM) 31*512kB (EM) 12*1024kB (UEM) 0*2048kB 0*4096kB = 242744kB
Dec 26 20:05:30 myhost kernel: Node 0 Normal: 1280224*4kB (EM) 188481*8kB (EM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 6628744kB
Dec 26 20:05:30 myhost kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Dec 26 20:05:30 myhost kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Dec 26 20:05:30 myhost kernel: 3295 total pagecache pages
Dec 26 20:05:30 myhost kernel: 0 pages in swap cache
Dec 26 20:05:30 myhost kernel: Swap cache stats: add 0, delete 0, find 0/0
Dec 26 20:05:30 myhost kernel: Free swap = 0kB
Dec 26 20:05:30 myhost kernel: Total swap = 0kB
Dec 26 20:05:30 myhost kernel: 16510540 pages RAM
Dec 26 20:05:30 myhost kernel: 0 pages HighMem/MovableOnly
Dec 26 20:05:30 myhost kernel: 321353 pages reserved
Dec 26 20:05:30 myhost kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
Dec 26 20:05:30 myhost kernel: [ 532] 0 532 24185 100 53 0 0 systemd-journal
Dec 26 20:05:30 myhost kernel: [ 555] 0 555 11277 235 22 0 -1000 systemd-udevd
Dec 26 20:05:30 myhost kernel: [ 788] 0 788 13882 114 27 0 -1000 auditd
Dec 26 20:05:30 myhost kernel: [ 820] 0 820 58999 1724 26 0 0 CmsGoAgent.linu
Dec 26 20:05:30 myhost kernel: [ 836] 999 836 153093 1667 59 0 0 polkitd
Dec 26 20:05:30 myhost kernel: [ 837] 81 837 14597 203 32 0 -900 dbus-daemon
Dec 26 20:05:30 myhost kernel: [ 885] 0 885 6703 199 18 0 0 systemd-logind
Dec 26 20:05:30 myhost kernel: [ 892] 0 892 6477 53 18 0 0 atd
Dec 26 20:05:30 myhost kernel: [ 894] 0 894 31593 202 20 0 0 crond
Dec 26 20:05:30 myhost kernel: [ 896] 0 896 27527 34 10 0 0 agetty
Dec 26 20:05:30 myhost kernel: [ 898] 0 898 27527 34 11 0 0 agetty
Dec 26 20:05:30 myhost kernel: [ 948] 0 948 832809 11158 156 0 0 exe
Dec 26 20:05:30 myhost kernel: [ 1512] 0 1512 25724 515 48 0 0 dhclient
Dec 26 20:05:30 myhost kernel: [ 1587] 0 1587 73789 4313 97 0 0 salt-minion
Dec 26 20:05:30 myhost kernel: [ 1588] 0 1588 143550 3328 95 0 0 tuned
Dec 26 20:05:30 myhost kernel: [ 1592] 0 1592 29699 3019 20 0 0 node_exporter
Dec 26 20:05:30 myhost kernel: [ 1595] 0 1595 259453 450 342 0 0 rsyslogd
Dec 26 20:05:30 myhost kernel: [ 1756] 0 1756 28230 261 56 0 -1000 sshd
Dec 26 20:05:30 myhost kernel: [ 1825] 0 1825 285329 12268 162 0 0 salt-minion
Dec 26 20:05:30 myhost kernel: [ 1827] 0 1827 95073 5869 100 0 0 salt-minion
Dec 26 20:05:30 myhost kernel: [ 2149] 1001 2149 1572340 335768 806 0 0 java
Dec 26 20:05:30 myhost kernel: [11492] 997 11492 20486 213 40 0 0 zabbix_agentd
Dec 26 20:05:30 myhost kernel: [11493] 997 11493 20512 534 40 0 0 zabbix_agentd
Dec 26 20:05:30 myhost kernel: [11494] 997 11494 20542 261 41 0 0 zabbix_agentd
Dec 26 20:05:30 myhost kernel: [11495] 997 11495 20542 266 41 0 0 zabbix_agentd
Dec 26 20:05:30 myhost kernel: [11496] 997 11496 20542 261 41 0 0 zabbix_agentd
Dec 26 20:05:30 myhost kernel: [11497] 997 11497 20542 261 41 0 0 zabbix_agentd
Dec 26 20:05:30 myhost kernel: [11498] 997 11498 20542 279 41 0 0 zabbix_agentd
Dec 26 20:05:30 myhost kernel: [11499] 997 11499 20542 261 41 0 0 zabbix_agentd
Dec 26 20:05:30 myhost kernel: [11500] 997 11500 20542 261 41 0 0 zabbix_agentd
Dec 26 20:05:30 myhost kernel: [11501] 997 11501 20542 261 41 0 0 zabbix_agentd
Dec 26 20:05:30 myhost kernel: [11502] 997 11502 20542 262 41 0 0 zabbix_agentd
Dec 26 20:05:30 myhost kernel: [11503] 997 11503 20542 261 41 0 0 zabbix_agentd
Dec 26 20:05:30 myhost kernel: [11504] 997 11504 20520 241 41 0 0 zabbix_agentd
Dec 26 20:05:30 myhost kernel: [ 8595] 0 8595 8132 372 19 0 0 AliYunDunUpdate
Dec 26 20:05:30 myhost kernel: [ 9080] 0 9080 44295 11441 88 0 0 AliYunDun
Dec 26 20:05:30 myhost kernel: [58297] 38 58297 7491 158 19 0 0 ntpd
Dec 26 20:05:30 myhost kernel: [21537] 1001 21537 28726 472 12 0 0 app_watchdog.sh
Dec 26 20:05:30 myhost kernel: [21600] 1001 21600 3817967 615094 1819 0 0 java
Dec 26 20:05:30 myhost kernel: [29546] 1001 29546 28726 471 12 0 0 app_watchdog.sh
Dec 26 20:05:30 myhost kernel: [32310] 1001 32310 28726 471 12 0 0 app_watchdog.sh
Dec 26 20:05:30 myhost kernel: [32360] 1001 32360 4532386 1934665 5006 0 0 java
Dec 26 20:05:30 myhost kernel: [37926] 1001 37926 28726 471 13 0 0 app_watchdog.sh
Dec 26 20:05:30 myhost kernel: [37965] 1001 37965 3686958 387894 1320 0 0 java
Dec 26 20:05:30 myhost kernel: [40686] 1001 40686 28726 471 11 0 0 app_watchdog.sh
Dec 26 20:05:30 myhost kernel: [40735] 1001 40735 4597914 1960388 5065 0 0 java
Dec 26 20:05:30 myhost kernel: [41223] 1001 41223 28726 471 13 0 0 app_watchdog.sh
Dec 26 20:05:30 myhost kernel: [41468] 1001 41468 4596501 1862812 4877 0 0 java
Dec 26 20:05:30 myhost kernel: [43679] 1001 43679 28726 471 13 0 0 app_watchdog.sh
Dec 26 20:05:30 myhost kernel: [43836] 1001 43836 4596108 1924911 4999 0 0 java
Dec 26 20:05:30 myhost kernel: [45896] 1001 45896 28726 471 13 0 0 app_watchdog.sh
Dec 26 20:05:30 myhost kernel: [45988] 1001 45988 4595346 1896739 4951 0 0 java
Dec 26 20:05:30 myhost kernel: [56357] 1001 56357 4535046 1940786 5046 0 0 java
Dec 26 20:05:30 myhost kernel: [ 3720] 0 3720 201068 1471 13 0 0 aliyun-service
Dec 26 20:05:30 myhost kernel: [ 3822] 0 3822 4472 122 13 0 0 assist_daemon
Dec 26 20:05:30 myhost kernel: [28130] 0 28130 45614 263 46 0 0 crond
Dec 26 20:05:30 myhost kernel: [37497] 1001 37497 28325 91 11 0 0 app_log_bak.sh
Dec 26 20:05:30 myhost kernel: [37502] 1001 37502 28325 93 11 0 0 app_log_bak.sh
Dec 26 20:05:30 myhost kernel: [37506] 1001 37506 43958 188 41 0 0 scp
Dec 26 20:05:30 myhost kernel: [37507] 1001 37507 44682 252 43 0 0 ssh
Dec 26 20:05:30 myhost kernel: [54640] 1001 54640 28325 95 11 0 0 app_log_bak.sh
Dec 26 20:05:30 myhost kernel: [54645] 1001 54645 28325 97 11 0 0 app_log_bak.sh
Dec 26 20:05:30 myhost kernel: [54649] 1001 54649 43958 188 40 0 0 scp
Dec 26 20:05:30 myhost kernel: [54650] 1001 54650 44682 262 41 0 0 ssh
Dec 26 20:05:30 myhost kernel: [ 6703] 1001 6703 28325 99 11 0 0 app_log_bak.sh
Dec 26 20:05:30 myhost kernel: [ 6708] 1001 6708 28325 101 11 0 0 app_log_bak.sh
Dec 26 20:05:30 myhost kernel: [ 6711] 1001 6711 43958 200 42 0 0 scp
Dec 26 20:05:30 myhost kernel: [ 6713] 1001 6713 44682 253 43 0 0 ssh
Dec 26 20:05:30 myhost kernel: [11353] 1001 11353 28325 101 11 0 0 app_log_bak.sh
Dec 26 20:05:30 myhost kernel: [11358] 1001 11358 28325 103 11 0 0 app_log_bak.sh
Dec 26 20:05:30 myhost kernel: [11361] 1001 11361 43958 188 41 0 0 scp
Dec 26 20:05:30 myhost kernel: [11362] 1001 11362 44682 248 44 0 0 ssh
Dec 26 20:05:30 myhost kernel: [23855] 0 23855 45614 263 46 0 0 crond
Dec 26 20:05:30 myhost kernel: [23858] 1001 23858 28325 88 12 0 0 app_log_bak.sh
Dec 26 20:05:30 myhost kernel: Out of memory: Kill process 40735 (java) score 121 or sacrifice child
Dec 26 20:05:30 myhost kernel: Killed process 40735 (java), UID 1001, total-vm:18391656kB, anon-rss:7841552kB, file-rss:0kB, shmem-rss:0kB
host info: Linux myhost 3.10.0-1062.9.1.el7.x86_64 #1
free -lm gives:
total used free shared buff/cache available
Mem: 63239 41399 354 2 21484 21317
Low: 63239 62884 354
High: 0 0 0
Swap: 0 0 0
Our host disabled swap intentionally.
It seems that your system was suffering from heavy memory fragmentation, probably because you're running an old kernel without any swap.
The important lines are as follows:
systemd invoked oom-killer: gfp_mask=0x3000d0, order=2, oom_score_adj=0
...
Node 0 Normal: 1280224*4kB (EM) 188481*8kB (EM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 6628744kB
The last digit of gfp_mask is zero so this allocation needs to be done from the Normal memory area. The order=2 tells that the amount that needs to be allocated is pagesize * 2 ^ order or 4096 bytes * 2 * 2 = 16 kB.
The "buddyinfo" line (you can check the current status with cat /proc/buddyinfo) that tells how much free memory you have with different block sizes. It turns out that you had 1280224 blocks of 4 kB each, 188481 blocks of 8 kB each and zero bigger blocks. The request was for 16 kB block and you had zero available so the system entered OOM mode. (Your system is probably running some software that causes heavy fragmentation because this is not the usual state with so much total free memory.)
In addition, the total free RAM in the Normal area was 6628744 kB, not 9 GB as you were to believe. The difference is probably caused by your memory statistics logger that cannot properly deal with spikes in memory usage.
If you had used at least a small amount of swap, I think the kernel could have used it to swap around pages to merge some of those small page fragments into bigger pieces but you had removed that possibility by having no swap at all. I'd recommend using a small zram swap area so that your performance is not limited by SSD or HDD in case of swapping.
As an alternative, you could have try running a more modern kernel that has support for memory compaction (CONFIG_COMPACTION=y) and you could configure your system as follows:
/proc/sys/vm/extfrag_threshold: at least 500, preferably more /proc/sys/vm/watermark_scale_factor: at least 100, preferably more
/proc/sys/vm/zone_reclaim_mode: at least 1 and I'd recommend 3 or 7 (note that if you're running hard realtime you cannot change this but you shouldn't be running out of RAM either)
You could also consider setting /proc/sys/vm/watermark_boost_factor to a value greater than zero but read the documentation first.
The virtual memory manager configuration options are documented at https://github.com/torvalds/linux/blob/master/Documentation/admin-guide/sysctl/vm.rst (ignore the header that claims that this documents version 2.6.29 only, it actually describes the latest Linux kernel).
Or if this only happens when the system has been running for a long time, the fragmentation could be avoided by running a cron job that periodically runs as root echo 1 > /proc/sys/vm/compact_memory.
If the software you're running supports huge pages you probably want to set /proc/sys/vm/nr_overcommit_hugepages to about 90% of your RAM. In practice, on Linux you can simply run
echo "$(getconf _PHYS_PAGES) * 0.9 / 512" | bc > /proc/sys/vm/nr_overcommit_hugepages
because huge pages are 2 MB each which is 512 times bigger than the default 4 KB page and you want to set the limit to 90% of the physical pages available. Use of huge pages for processes that require lots of shared RAM (e.g. PostgreSQL) reduces memory fragmentation and reduces RAM wasted to page tables. Note that this is a runtime change and you have to do this after each reboot before starting services that support using hugepages.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With