Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Memory profiling on Google Cloud Dataflow

What would be the best way to debug memory issues of a dataflow job?

My job was failing with a GC OOM error, but when I profile it locally I cannot reproduce the exact scenarios and data volumes.

I'm running it now on 'n1-highmem-4' machines, and I don't see the error anymore, but the job is very slow, so obviously using machine with more RAM is not the solution :)

Thanks for any advice, G

like image 660
G B Avatar asked Jan 18 '26 20:01

G B


1 Answers

Please use the option --dumpHeapOnOOM and --saveHeapDumpsToGcsPath (see docs).

This will only help if one of your workers actually OOMs. Additionally you can try running jmap -dump PID on the harness process on the worker to obtain a heap dump at runtime if it's not OOMing but if you observe high memory usage nevertheless.

like image 194
jkff Avatar answered Jan 21 '26 07:01

jkff



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!