Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I get a Ruby IO stream for a Paperclip Attachment?

I have an application that stores uploaded CSV files using the Paperclip gem.

Once uploaded, I would like to be able to stream the data from the uploaded file into code that reads it line-by-line and loads it into a data-staging table in Postgres.

I've gotten this far in my efforts, where data_file.upload is a Paperclip CSV Attachment

io = StringIO.new(Paperclip.io_adapters.for(data_file.upload).read, 'r')

Even though ^^ works, the problem is that - as you can see - it loads the entire file into memory as a honkin' Ruby String, and Ruby String garbage is notoriously bad for app performance.

Instead, I want a Ruby IO object that supports use of e.g., io.gets so that the IO object handles buffering and cleanup, and the whole file doesn't sit as one huge string in memory.

Thanks in advance for any suggestions!

like image 226
aec Avatar asked Nov 01 '25 00:11

aec


1 Answers

With some help (from StackOverflow, of course), I was able to suss this myself.

In my PaperClip AR model object, I now have the following:

# Done this way so we get auto-closing of the File object
def yielding_upload_as_readable_file
  # It's quite annoying that there's not 1 method that works for both filesystem and S3 storage
  open(filesystem_storage? ? upload.path : upload.url) { |file| yield file }
end

def filesystem_storage?
  Paperclip::Attachment.default_options[:storage] == :filesystem
end

... and, I consume it in another model like so:

data_file.yielding_upload_as_readable_file do |file|
  while line = file.gets
    next if line.strip.size == 0
    ... process line ...
  end
end
like image 78
aec Avatar answered Nov 04 '25 12:11

aec



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!