Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

My JBoss server hits 100% SYS CPU on Linux; what can cause this?

We've been debugging this JBoss server problem for quite a while. After about 10 hours of work, the server goes into 100% CPU panic attacks and just stalls. During this time you cannot run any new programs, so you can't even kill -quit to get a stack trace. These high 100% SYS CPU loads last 10-20 seconds and repeat every few minutes.

We have been working on for quite a while. We suspect it has something to do with the GC, but cannot confirm it with a smaller program. We are running on i386 32bit, RHEL5 and Java 1.5.0_10 using -client and ParNew GC.

Here's what we have tried so far:

  1. We limited the CPU affinity so we can actually use the server when the high load hits. With strace we see an endless loop of SIGSEGV and then the sig return.

  2. We tried to reproduce this with a Java program. It's true that SYS CPU% climbs high with WeakHashMap or when accessing null pointers. Problem was that fillStackTrace took a lot of user CPU% and that's why we never reached 100% SYS CPU.

  3. We know that after 10 hours of stress, GC goes crazy and full GC sometimes takes 5 seconds. So we assume it has something to do with memory.

  4. jstack during that period showed all threads as blocked. pstack during that time, showed MarkSweep stack trace occasionally, so we can't be sure about this as well. Sending SIGQUIT yielded nothing: Java dumped the stack trace AFTER the SYS% load period was over.

We're now trying to reproduce this problem with a small fragment of code so we can ask Sun.

If you know what's causing it, please let us know. We're open to ideas and we are clueless, any idea is welcome :)

Thanks for your time.

like image 200
gilm Avatar asked Dec 05 '25 17:12

gilm


2 Answers

Thanks to everybody for helping out.

Eventually we upgraded (only half of the java servers,) to JDK 1.6 and the problem disappeared. Just don't use 1.5.0.10 :)

We managed to reproduce these problems by just accessing null pointers (boosts SYS instead of US, and kills the entire linux.)

Again, thanks to everyone.

like image 167
gilm Avatar answered Dec 08 '25 06:12

gilm


If you're certain that GC is the problem (and it does sound like it based on your description), then adding the -XX:+HeapDumpOnOutOfMemoryError flag to your JBoss settings might help (in JBOSS_HOME/bin/run.conf).

You can read more about this flag here. It was originally added in Java 6, but was later back-ported to Java 1.5.0_07.

Basically, you will get a "dump file" if an OutOfMemoryError occurs, which you can then open in various profiling tools. We've had good luck with the Eclipse Memory Analyzer.

This won't give you any "free" answers, but if you truly have a memory leak, then this will help you find it.

like image 43
Matt Solnit Avatar answered Dec 08 '25 07:12

Matt Solnit