Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Clojure error - GC overhead limit exceeded

I'm trying to randomly sample a large FASTQ file and write it to standard out. I keep getting 'GC overhead limit exceeded' errors and I'm not sure what I'm doing wrong. I've tried increasing Xmx in leiningen but that didn't help. Here is my code:

(ns fastq-sample.core
  (:gen-class)
  (:use clojure.java.io))

(def n-read-pair-lines 8)

(defn sample? [sample-rate]
  (> sample-rate (rand)))

;
; Agent for writing the reads asynchronously
;

(def wtr (agent (writer *out*)))

(defn write-out [r]
  (letfn [(write [out msg] (.write out msg) out)]
    (send wtr write r)))

(defn write-close []
  (send wtr #(.close %))
  (await wtr))

;
; Main
;

(defn reads [file]
  (->>
    (input-stream file)
    (java.util.zip.GZIPInputStream.)
    (reader)
    (line-seq)))

(defn -main [fastq-file sample-rate-str]
  (let [sample-rate (Float. sample-rate-str)
        in-reads    (partition n-read-pair-lines (reads fastq-file))]
    (doseq [x (filter (fn [_] (sample? sample-rate)) in-reads)]
      (write-out (clojure.string/join "\n" x)))
    (write-close)
    (shutdown-agents)))
like image 704
Michael Barton Avatar asked Jan 20 '26 21:01

Michael Barton


1 Answers

This is the same symptom I often get when I try to merge an infinite sequence into a simgle data structure like a map or vector. It very often means that memory was tight and the garbage collector could not keep up with demand for new objects. Most likely the wtr agent is too large for memory. Perhaps you may want to not store the printed results in the atom by changing

(write [out msg] (.write out msg) out)

to

(write [out msg] (.write out msg))
like image 150
Arthur Ulfeldt Avatar answered Jan 22 '26 10:01

Arthur Ulfeldt



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!