I can't remove whitespaces from a string.
My HTML is:
<p class='your-price'>
Cena pro Vás: <strong>139 <small>Kč</small></strong>
</p>
My code is:
#encoding: utf-8
require 'rubygems'
require 'mechanize'
agent = Mechanize.new
site  = agent.get("http://www.astratex.cz/podlozky-pod-raminka/doplnky")
price = site.search("//p[@class='your-price']/strong/text()")
val = price.first.text  => "139 "
val.strip               => "139 "
val.gsub(" ", "")       => "139 "
gsub, strip, etc. don't work. Why, and how do I fix this?
val.class      => String
val.dump       => "\"139\\u{a0}\""      !
val.encoding   => #<Encoding:UTF-8>
__ENCODING__               => #<Encoding:UTF-8>
Encoding.default_external  => #<Encoding:UTF-8>
I'm using Ruby 1.9.3 so Unicode shouldn't be problem.
The trim() function removes whitespace and other predefined characters from both sides of a string. Related functions: ltrim() - Removes whitespace or other predefined characters from the left side of a string.
trim() The trim() method removes whitespace from both ends of a string and returns a new string, without modifying the original string. Whitespace in this context is all the whitespace characters (space, tab, no-break space, etc.)
Nokogiri is an open source software library to parse HTML and XML in Ruby. It depends on libxml2 and libxslt to provide its functionality.
strip only removes ASCII whitespace and the character you've got here is a Unicode non-breaking space. 
Removing the character is easy. You can use gsub by providing a regex with the character code:
gsub(/\u00a0/, '')
You could also call
gsub(/[[:space:]]/, '')
to remove all Unicode whitespace. For details, check the Regexp documentation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With