In Rails, we are some text files as ISO-8859-1. Sometimes the files come in as UTF-8 with BOM. I am trying to determine if its UTF-8 with BMO then re-read the file as bom|UTF-8.
I trying the following but it doesn't seem to compare correctly:
# file is saved as UTF-8 with BOM using Sublime Text 2
> string = File.read(file, encoding: 'ISO-8859-1')
# this doesn't work, while it supposed to work
> string.start_with?("\xef\xbb\xbf".force_encoding("UTF-8"))
> false
# it works if I try this
> string.start_with?('')
> true
The purpose is to read the file as UTF-8 with BOM if file has the Byte Order Marker at the start and I want to avoid string.start_with?('') method.
string.start_with?("\u00ef\u00bb\u00bf")
From Ruby official documentation:
\xnnhexadecimal bit pattern, where nn is 1-2 hexadecimal digits ([0-9a-fA-F])
\unnnnUnicode character, where nnnn is exactly 4 hexadecimal digits ([0-9a-fA-F])
That said, to interpolate a unicode character, one should use \uXXXX notation. It is safe and we can reliable use this version.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With