I'm very new to programming, and am writing a small practice program in Ruby 1.9.3 that uses Nokogiri to query the Canadian parliamentary website with a postal code, and then prints the name of the corresponding Member of Parliament and their riding to the terminal.
My code fetches the page and isolates the MP's name/riding just fine, but displays UTF-8 characters as plain ASCII in the shell. I want the UTF-8 characters to be displayed instead.
I know the shell can handle UTF-8 because:
irb> riding = "St-Jérôme"
=> "St-Jérôme"
irb> puts riding
St-Jérôme
=> nil
The code I'm using to fetch the page:
page = Nokogiri::HTML(open("http://parl.gc.ca/ParlInfo/Compilations/HouseOfCommons/MemberByPostalCode.aspx?PostalCode=#{postalcode}"))
This is a sample of what this code returns when I type puts page:
<span id="ctl00_cphContent_repMP_ctl00_grdConstituencyAddress_ctl02_Label12">St-Jérôme</span>
So "St-Jérôme" becomes "St-Jérôme" in the page output, or just "St-Jérôme" in the terminal.
Maybe there's a method to convert it while it's stored as a string variable? Or maybe there's an option I can set in Nokogiri which will pull it down as UTF-8 instead of ASCII?
I searched for a long time to find an answer on Google and Stack Overflow, and haven't found anything either relevant or that I understand; Again, I'm very new at this. If this is a duplicate, please point me in the right direction.
Many thanks.
Try
page = Nokogiri::HTML(open("http://parl.gc.ca/ParlInfo/Compilations/HouseOfCommons/MemberByPostalCode.aspx?PostalCode=#{postalcode}"), nil, "UTF-8")
instead. This should parse the page as UTF-8 and resolve the issue.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With