Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ruby: Check for Byte Order Marker

In Rails, we are some text files as ISO-8859-1. Sometimes the files come in as UTF-8 with BOM. I am trying to determine if its UTF-8 with BMO then re-read the file as bom|UTF-8.

I trying the following but it doesn't seem to compare correctly:

# file is saved as UTF-8 with BOM using Sublime Text 2

> string = File.read(file, encoding: 'ISO-8859-1')

# this doesn't work, while it supposed to work
> string.start_with?("\xef\xbb\xbf".force_encoding("UTF-8"))
> false

# it works if I try this
> string.start_with?('')
> true

The purpose is to read the file as UTF-8 with BOM if file has the Byte Order Marker at the start and I want to avoid string.start_with?('') method.

like image 924
Saim Avatar asked Oct 21 '25 04:10

Saim


1 Answers

string.start_with?("\u00ef\u00bb\u00bf")

From Ruby official documentation:

\xnn      hexadecimal bit pattern, where nn is 1-2 hexadecimal digits ([0-9a-fA-F])

\unnnn  Unicode character, where nnnn is exactly 4 hexadecimal digits ([0-9a-fA-F])

That said, to interpolate a unicode character, one should use \uXXXX notation. It is safe and we can reliable use this version.

like image 99
Aleksei Matiushkin Avatar answered Oct 23 '25 19:10

Aleksei Matiushkin



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!