Ruby: Check for Byte Order Marker

Question

In Rails, we are some text files as ISO-8859-1. Sometimes the files come in as UTF-8 with BOM. I am trying to determine if its UTF-8 with BMO then re-read the file as bom|UTF-8.

I trying the following but it doesn't seem to compare correctly:

# file is saved as UTF-8 with BOM using Sublime Text 2

> string = File.read(file, encoding: 'ISO-8859-1')

# this doesn't work, while it supposed to work
> string.start_with?("\xef\xbb\xbf".force_encoding("UTF-8"))
> false

# it works if I try this
> string.start_with?('ï»¿')
> true

The purpose is to read the file as UTF-8 with BOM if file has the Byte Order Marker at the start and I want to avoid string.start_with?('ï»¿') method.

Aleksei Matiushkin · Accepted Answer

string.start_with?("\u00ef\u00bb\u00bf")

From Ruby official documentation:

\xnn hexadecimal bit pattern, where nn is 1-2 hexadecimal digits ([0-9a-fA-F])

\unnnn Unicode character, where nnnn is exactly 4 hexadecimal digits ([0-9a-fA-F])

That said, to interpolate a unicode character, one should use \uXXXX notation. It is safe and we can reliable use this version.

Ruby: Check for Byte Order Marker

Tags:

ruby

ruby-on-rails

encoding

byte-order-mark

Saim

1 Answers

Aleksei Matiushkin

Recent Activity

Donate For Us

Ruby: Check for Byte Order Marker

Tags:

ruby

ruby-on-rails

encoding

byte-order-mark

Saim

1 Answers

Aleksei Matiushkin

Related questions

Recent Activity

Donate For Us