I have an application that stores uploaded CSV files using the Paperclip gem.
Once uploaded, I would like to be able to stream the data from the uploaded file into code that reads it line-by-line and loads it into a data-staging table in Postgres.
I've gotten this far in my efforts, where data_file.upload is a Paperclip CSV Attachment
io = StringIO.new(Paperclip.io_adapters.for(data_file.upload).read, 'r')
Even though ^^ works, the problem is that - as you can see - it loads the entire file into memory as a honkin' Ruby String, and Ruby String garbage is notoriously bad for app performance.
Instead, I want a Ruby IO object that supports use of e.g., io.gets so that the IO object handles buffering and cleanup, and the whole file doesn't sit as one huge string in memory.
Thanks in advance for any suggestions!
With some help (from StackOverflow, of course), I was able to suss this myself.
In my PaperClip AR model object, I now have the following:
# Done this way so we get auto-closing of the File object
def yielding_upload_as_readable_file
# It's quite annoying that there's not 1 method that works for both filesystem and S3 storage
open(filesystem_storage? ? upload.path : upload.url) { |file| yield file }
end
def filesystem_storage?
Paperclip::Attachment.default_options[:storage] == :filesystem
end
... and, I consume it in another model like so:
data_file.yielding_upload_as_readable_file do |file|
while line = file.gets
next if line.strip.size == 0
... process line ...
end
end
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With