Hpricot's inner_text doesn't handle HTML entities correctly

Tagged hpricot, inner_text, problem, bug  Languages ruby

Hpricot's inner_text method is fubar and doesn't handle HTML entities correctly, instead you'll see questionmarks in the output. To fix this replace calls to Hpricot's inner_text with a call to the following method (or Monkey patch Hpricot):

require 'rubygems'
require 'htmlentities'

  def inner_text(node)
     text = node.innerHTML.gsub(%r{<.*?>}, "").strip

Remember to install the htmlentities gem:

sudo gem install htmlentities