I was performing load testing against a tomcat server. The server has 10G physical memory and 2G swap space. The heap size (xms and xmx) was set to 3G before, and the server just worked fine. Since I still saw a lot free memory left and the performance was not good, I increased heap size to 7G and ran the load testing again. This time I observed physical memory was eaten up very quickly, and the system started consuming swap space. Later, tomcat crashed after running out of swap space. I included -XX:+HeapDumpOnOutOfMemoryError when starting tomcat, but I didn't get any heap dump. When I checked /var/log/messages, I saw kernel: Out of memory: Kill process 2259 (java) score 634 or sacrifice child.
To provide more info, here's what I saw from Linux top command when heap size set to 3G and 7G
xms&xmx = 3G (which worked fine):
Before starting tomcat:
Mem:  10129972k total,  1135388k used,  8994584k free,    19832k buffers
Swap:  2097144k total,        0k used,  2097144k free,    56008k cached
After starting tomcat:
Mem:  10129972k total,  3468208k used,  6661764k free,    21528k buffers
Swap:  2097144k total,        0k used,  2097144k free,   143428k cached
PID  USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
2257 tomcat    20   0 5991m 1.9g  19m S 352.9 19.2   3:09.64 java
After starting load for 10 min:
Mem:  10129972k total,  6354756k used,  3775216k free,    21960k buffers
Swap:  2097144k total,        0k used,  2097144k free,   144016k cached
PID  USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
2257 tomcat    20   0 6549m 3.3g  10m S 332.1 34.6  16:46.87 java
xms&xmx = 7G (which caused tomcat crash):
Before starting tomcat:
Mem:  10129972k total,  1270348k used,  8859624k free,    98504k buffers
Swap:  2097144k total,        0k used,  2097144k free,    74656k cached
After starting tomcat:
Mem:  10129972k total,  6415932k used,  3714040k free,    98816k buffers
Swap:  2097144k total,        0k used,  2097144k free,   144008k cached
PID  USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
2310 tomcat    20   0  9.9g 3.5g  10m S  0.3 36.1   3:01.66 java
After starting load for 10 min (right before tomcat was killed):
Mem:  10129972k total,  9960256k used,   169716k free,      164k buffers
Swap:  2097144k total,  2095056k used,     2088k free,     3284k cached
PID  USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
2310 tomcat    20   0 10.4g 5.3g  776 S  9.8 54.6  14:42.56 java
Java and JVM Version:
Java(TM) SE Runtime Environment (build 1.7.0_21-b11)
Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode)
Tomcat Version:
6.0.36
Linux Server:
Red Hat Enterprise Linux Server release 6.4 (Santiago)
So my questions are:
top RES shows that java is using 5.3G memory, there's much more memory consumed?I have been investigating and searching for a while, still cannot find the root cause for this issue. Thanks a lot!
Why would this issue happen? When JVM runs out of memory why is there no OutOfMemoryException thrown?
It is not the JVM that has run out of memory. It is the Host Operating System that has run out of memory-related resources, and is taking drastic action. The OS has no way of knowing that the process (in this case the JVM) is capable of shutting down in an orderly fashion when told "No" in response to a request for more memory. It HAS to hard-kill something or else there is a serious risk of the entire OS hanging.
Anyway, the reason you are not seeing OOMEs is that this is not an OOME situation. In reality, the JVM has already been given too much memory by the OS, and there is no way to take it back. That's the problem the OS has to deal with by hard-killing processes.
And why does it go straight to using swap?
It uses swap because the total virtual memory demand of the entire system won't fit in physical memory. This is NORMAL behaviour for a UNIX / Linux operating system.
Why top RES shows that java is using 5.3G memory, there's much more memory consumed
The RES numbers can be a little misleading. What they refer to is the amount of physical memory that the process is currently using ... excluding stuff that is shared or shareable with other processes. The VIRT number is more relevant to your problem. It says your JVM is using 10.4g of virtual ... which is more than the available physical memory on your system.
As the other answer says, it is concerning that it concerns you that you don't get an OOME.  Even if you did get one, it would be unwise to do anything with it.  An OOME is liable to do collateral damage to your application / container that is hard to detect and harder to recover from.  That's why OOME is an Error not an Exception.
Recommendations:
Don't try to use significantly more virtual memory than you have physical memory, especially with Java. When a JVM is running a full garbage collection, it will touch most of its VM pages, multiple times in random order. If you have over-allocated your memory significantly this is liable to cause thrashing which kills performance for the entire system.
Do increase your system's swap space. (But that might not help ...)
Don't try to recover from OOMEs.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With