Hpricot's inner_text doesn't handle HTML entities correctly
Hpricot's inner_text method is fubar and doesn't handle HTML entities correctly, instead you'll see questionmarks in the output. To fix this replace calls to Hpricot's inner_text with a call to the following method (or Monkey patch Hpricot):
require 'rubygems'
require 'htmlentities'
def inner_text(node)
text = node.innerHTML.gsub(%r{<.*?>}, "").strip
HTMLEntities.new.decode(text)
end
Remember to install the htmlentities gem:
sudo gem install htmlentities