I have a CSV file that I'm uploading which runs into an issue when importing rows into the database:
Encoding::UndefinedConversionError ("\xCC" from ASCII-8BIT to UTF-8)
What would be the most efficient way to ensure each column is properly encoded for being placed in the database or ignored?
The most basic approach is to go through each row and each field and force encoding on the string but that seems incredibly inefficient. What would be a better way to handle this?
Currently it's just uploaded as a parameter (:csv_file). I then access it as follows:
CSV.parse(csv_file.download) within the model.
I'm assuming there's a way to force the encoding when CSV.parse is called on the activestorage file but not sure how. Any ideas?
Thanks!
The latest version ActiveStorage (6.0.0.rc1) adds an API to be able to download the file to a temp file, which you can then read from. I'm assuming that Ruby will read from the file using the correct encoding.
https://edgeguides.rubyonrails.org/active_storage_overview.html#downloading-files
If you don't want to upgrade to the RC of Rails 6 (like I don't) you can use this method to convert the string to UTF-8 while getting rid of the byte order mark that may be present in your file:
wrongly_encoded_string = active_record_model.attachment.download
correctly_encoded_string = wrongly_encoded_string.bytes.pack("c*").force_encoding("UTF-8")
Coming back in 2023 to commend @eric-parshall's answer. My application stores full email bodies (raw html, usually) as ActiveStorage file attachments and had issues for quite a while when pulling those bodies to render them using .download. The way ruby handles the encoding always meant that the body came in with an encoding of ASCII-8BIT and forcing the encoding back still came with occasional errors.
While Eric mentioned it years ago prior to Rails 6 being live, this solution (and the link, somehow 🤣) still works great. Here's a versioned link for the Rails 7.1 docs in case Eric's link ever dies:
https://guides.rubyonrails.org/v7.1/active_storage_overview.html#downloading-files
And the idea Eric posits is totally correct. If you use the .open API to save the attachment to a temp-file then use File.read, your encoding woes will likely fade. Looks like this:
eml = Email.last
eml.update!(
  full_body: {
    io: StringIO.new("This is a test!"),
    filename: "full_body_test.html",
    content_type: "text/plain"
  }
) 
# Before, using `#download`
content = eml.full_body.download
content.encoding #=> #<Encoding:ASCII-8BIT>
# After, using `#open`
content = nil
Email.last.full_body.open do |file|
  content = File.read file
end
content.encoding #=> #<Encoding:UTF-8>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With