I would like to stop Cassandra from dumping hprof files as I do not require the use of them.
I also have very limited disk space (50GB out of 100 GB is used for data), and these files swallow up all the disk space before I can say "stop".
How should I go about it?
Is there a shell script that I could use to erase these files from time to time?
It happens because Cassandra starts with -XX:+HeapDumpOnOutOfMemoryError Java option. Which is good stuff if you want to analyze. Also, if you are getting lots of heap-dump that indicate that you should probably tune the memory available to Cassandra.
I haven't tried it. But to block this option, comment the following line in $CASSANDRA_HOME/conf/cassandra-env.sh
JVM_OPTS="$JVM_OPTS -XX:+HeapDumpOnOutOfMemoryError"
Optionally, you may comment this block as well, but not really required, I think. This block is available in 1.0+ version I guess. I can't find this in 0.7.3.
# set jvm HeapDumpPath with CASSANDRA_HEAPDUMP_DIR
if [ "x$CASSANDRA_HEAPDUMP_DIR" != "x" ]; then
    JVM_OPTS="$JVM_OPTS -XX:HeapDumpPath=$CASSANDRA_HEAPDUMP_DIR/cassandra-`date +%s`-pid$$.hprof"
fi
Let me know if this worked.
Update
...I guess it is JVM throwing it out when Cassandra crashes / shuts down. Any way to prevent that one from happening?
If you want to disable JVM heap-dump altogether, see here how to disable creating java heap dump after VM crashes?
I'll admit i haven't used Cassandra, but from what i can tell, it shouldn't be dumping any hprof files unless you enable it at compile time, or the program experiences an OutofMemoryException. So try looking there.
in terms of a shell script, if the files are being dumped to a specific location you can use this command to delete all *.hprof files.
find /my/location/ -name *.hprof -delete
this is using the -delete directive from find that deletes all files that match the search. Look at the man page for find for more search options if you need to narrow it down more.
You can use cron to run a script at a given time, which would satisfy your "time to time" requirement, most linux distros have a cron installed, and work off of a crontab file. You can find out more about the crontab by using man crontab
Even if you update cassandra-env.sh to point to the heapdump path it will still not work. The reason was that from the upstart script /etc/init.d/cassandra there is this line which creates the default HeapDump path
start-stop-daemon -S -c cassandra -a /usr/sbin/cassandra -b -p "$PIDFILE" -- \
    -p "$PIDFILE" -H "$heap_dump_f" -E "$error_log_f" >/dev/null || return 2
I'm not an upstart expert but what I did was just removed the param which creates the duplicate. Another weird observation also when checking cassandra process via ps aux you'll notice that you'll see some parameters being written twice. If you source cassandra-env.sh and print $JVM_OPTS you'll notice those variables okay.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With