Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I take Nokogiri-scraped HTML and output it as UTF-8 to the terminal?

I'm very new to programming, and am writing a small practice program in Ruby 1.9.3 that uses Nokogiri to query the Canadian parliamentary website with a postal code, and then prints the name of the corresponding Member of Parliament and their riding to the terminal.

My code fetches the page and isolates the MP's name/riding just fine, but displays UTF-8 characters as plain ASCII in the shell. I want the UTF-8 characters to be displayed instead.

I know the shell can handle UTF-8 because:

irb> riding = "St-Jérôme"
=> "St-Jérôme"
irb> puts riding
St-Jérôme
=> nil

The code I'm using to fetch the page:

page = Nokogiri::HTML(open("http://parl.gc.ca/ParlInfo/Compilations/HouseOfCommons/MemberByPostalCode.aspx?PostalCode=#{postalcode}"))

This is a sample of what this code returns when I type puts page:

<span id="ctl00_cphContent_repMP_ctl00_grdConstituencyAddress_ctl02_Label12">St-J&Atilde;&copy;r&Atilde;&acute;me</span>

So "St-Jérôme" becomes "St-J&Atilde;&copy;r&Atilde;&acute;me" in the page output, or just "St-J&Atilde;&copy;r&Atilde;&acute;me" in the terminal.

Maybe there's a method to convert it while it's stored as a string variable? Or maybe there's an option I can set in Nokogiri which will pull it down as UTF-8 instead of ASCII?

I searched for a long time to find an answer on Google and Stack Overflow, and haven't found anything either relevant or that I understand; Again, I'm very new at this. If this is a duplicate, please point me in the right direction.

Many thanks.

like image 890
Nicholas Avatar asked Feb 02 '26 04:02

Nicholas


1 Answers

Try

page = Nokogiri::HTML(open("http://parl.gc.ca/ParlInfo/Compilations/HouseOfCommons/MemberByPostalCode.aspx?PostalCode=#{postalcode}"), nil, "UTF-8")

instead. This should parse the page as UTF-8 and resolve the issue.

like image 115
BadgerPriest Avatar answered Feb 03 '26 22:02

BadgerPriest



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!