Our application requires very huge memory since it deals with very large data. Hence we increased our max heap size to 12GB (-Xmx).
Following are the environment details
OS - Linux 2.6.18-164.11.1.el5    
JBoss - 5.0.0.GA
VM Version - 16.0-b13 Sun JVM
JDK - 1.6.0_18
We have above env & configuration in our QA & prod. In QA we have max PS Old Gen (Heap memory) allocated as 8.67GB whereas in Prod it is just 8GB.
In Prod for a particular job Old Gen Heap reaches 8GB, hangs there and the web URL become inaccessible. Server is getting down. But in QA also it reaches 8.67GB but full GC is performed and its coming back to 6.5GB or something. Here its not getting hanged.
We couldn't figure out a solution for this because both the environment and configuration on both the boxes are same.
I have 3 questions here,
2/3rd of max heap will be allocated to old/tenured gen. If that is the case why it is 8GB in one place and 8.67GB in another place?
How to provide a valid ratio for New and Tenure in this case(12GB)?
Why it is full GCed in one place and not in the other?
Any help would be really appreciable. Thanks.
Pls let me know if you need further details on env or conf.
The Tenured generation is used for the longer lived objects. Another GC process (CMS) runs when it becomes full to remove any unused objects.
When the eden space becomes full, minor gc takes place. During a minor GC event, objects surviving the eden space are moved to the survivor space.
This is because the JVM steadily increases heap usage percentage until the garbage collection process frees up memory again. High heap usage occurs when the garbage collection process cannot keep up. An indicator of high heap usage is when the garbage collection is incapable of reducing the heap usage to around 30%.
Eden Space: The pool from which memory is initially allocated for most objects. Survivor Space: The pool containing objects that have survived the garbage collection of the Eden space.
For your specific questions:
-XX:NewRatio=3.It sounds like you need more memory for prod. If on QA the request finishes then perhaps that extra 0.67GB is all that it needs. That doesn't seem to leave you much headroom though. Are you running the same test on QA as will happen on prod?
Since you're using 12GB you must be using 64-bit. You can save the memory overhead of 64-bit addressing by using the -XX:+UseCompressedOops option. It typically saves 40% memory, so your 12GB will go a lot further.
Depending on what you're doing the concurrent collector might be better as well, particularly to reduce long GC pause times. I'd recommend trying these options as I've found them to work well:
-Xmx12g -XX:NewRatio=4 -XX:SurvivorRatio=8 -XX:+UseCompressedOops
-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+DisableExplicitGC
-XX:+UseCMSInitiatingOccupancyOnly -XX:+CMSClassUnloadingEnabled
-XX:+CMSScavengeBeforeRemark -XX:CMSInitiatingOccupancyFraction=68
you need to get some more data in order to know what is going on, only then will you know what needs to be fixed. To my mind that means
get detailed information about what the garbage collector is doing, these params are a good start (substitute some preferred path and file in place of gc.log)
-XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -Xloggc:gc.log -verbose:gc
repeat the run, scan through the gc log for the period when it is hanging & post back with that output
consider watching the output using visualgc (requires jstatd running on the server, one random link that explains how to do this setup is this one) which is part of jvmstat, this is a v easy way to see how the various generations in the heap are sized (though perhaps not for 6hrs!)
I also strongly recommend you do some reading too so you know what all these switches are referring to otherwise you'll be blindly trying stuff with no real understanding of why 1 thing helps and another doesn't. I'd start with the oracle java 6 gc tuning page which you can find here
I'd only suggest changing options once you have baselined performance. Having said that CompressedOops is v likely to be an easy win, you may want to note it has been defaulted to on since 6u23.
Finally you should consider upgrading the jvm, 6u18 is getting on a bit and performance keeps improving.
each job will take 3 hours to complete and almost 6 jobs running one after another. Last job when running reaches 8GB max and getting hang in prod
are these jobs related at all? this really sounds like a gradual memory leak if they're not working on the same dataset. If heap usage keeps going up and up and eventually blows then you have a memory leak. You should consider using -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/path/to/some/dir to catch a heap dump (though note with a 13G heap it will be a big file so make sure you have the disk space) if/when it blows. You can then use jhat to look at what was on the heap at the time.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With