How do I get a Ruby IO stream for a Paperclip Attachment?

Question

I have an application that stores uploaded CSV files using the Paperclip gem.

Once uploaded, I would like to be able to stream the data from the uploaded file into code that reads it line-by-line and loads it into a data-staging table in Postgres.

I've gotten this far in my efforts, where data_file.upload is a Paperclip CSV Attachment

io = StringIO.new(Paperclip.io_adapters.for(data_file.upload).read, 'r')

Even though ^^ works, the problem is that - as you can see - it loads the entire file into memory as a honkin' Ruby String, and Ruby String garbage is notoriously bad for app performance.

Instead, I want a Ruby IO object that supports use of e.g., io.gets so that the IO object handles buffering and cleanup, and the whole file doesn't sit as one huge string in memory.

Thanks in advance for any suggestions!

aec · Accepted Answer

With some help (from StackOverflow, of course), I was able to suss this myself.

In my PaperClip AR model object, I now have the following:

# Done this way so we get auto-closing of the File object
def yielding_upload_as_readable_file
  # It's quite annoying that there's not 1 method that works for both filesystem and S3 storage
  open(filesystem_storage? ? upload.path : upload.url) { |file| yield file }
end

def filesystem_storage?
  Paperclip::Attachment.default_options[:storage] == :filesystem
end

... and, I consume it in another model like so:

data_file.yielding_upload_as_readable_file do |file|
  while line = file.gets
    next if line.strip.size == 0
    ... process line ...
  end
end

How do I get a Ruby IO stream for a Paperclip Attachment?

Tags:

io

ruby-on-rails-4

paperclip

aec

1 Answers

aec

Recent Activity

Donate For Us

How do I get a Ruby IO stream for a Paperclip Attachment?

Tags:

io

ruby-on-rails-4

paperclip

aec

1 Answers

aec

Related questions

Recent Activity

Donate For Us