I have a Kemal-based RESTful web service that returns "very big" (from 10 to 17M in size) chunks of JSON data, which is produced by to_json method from the "big" Hash structure.
According to GC warning messages my code "may lead to memory leaks" and my own measurements show that memory is "leaking" during application runtime.
So, i think, it will be good to free memory allocated for the Hash and it's JSON string representation by hand, but i dont know how to do this: my experiments with bad-documented GC.free method was not succesful and i don't know in what direction to continue my investigations...
Please, tell me what can i do to avoid memory leaks?
You can look at the not-very-fresh-but-generally-actual version of my very simple application (actually it was developed inside the closed corporate network segment) here https://github.com/DRVTiny/Druid/blob/master/src/druid_mp.cr
Code that leads to memory leaks:
get "/service/:serviceid" do |env|
        if (svcid = env.params.url["serviceid"]) && svcid.is_a?(String) && svcid =~ /^s?\d+$/
          druid.svc_branch_get((svcid[0] == 's' ? svcid[1..-1] : svcid).to_i).to_json
        else
          halt env, status_code: 404, response: %q({"error": "Wrong service identificator"})
        end
  rescue ex
        halt env, status_code: 503, response: {"error": "Unhandled exception #{ex.message}"}.to_json
  end
P.S. I've inserted after_all hook executing GC.collect after each user request. Dont know, maybe this can solve my problem (but i think this is not the right way at all).
UPD: After i've added GC.collect to after_all Kemal hook - memory leaks dissapears. But global GC.collect probably too slow and, as i know, it blocks all fibers and socket.accept(). Please, let me know if i've mistaken.
Yes, you shouldn't call GC.collect after each request.
Apart from improvements to the GC (which will come eventually), the easiest shot is to avoid the useless string allocation. Judging by your example code, you don't need the result from tha to_json call in memory as a string. You can just serialise it directly to the IO stream like to_json(env.response). This is faster over all and doesn't allocate additional memory, completely avoiding the issue with releasing memory.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With