Register now and start sharing your code snippets.
-->

Hpricot's inner_text doesn't handle HTML entities correctly

Ruby posted 7 months ago by christian

Hpricot’s inner_text method is fubar and doesn’t handle HTML entities correctly, instead you’ll see questionmarks in the output. To fix this replace calls to Hpricot’s inner_text with a call to the following method (or Monkey patch Hpricot):

   1  require 'rubygems'
   2  require 'htmlentities'
   3  
   4    def inner_text(node)
   5       text = node.innerHTML.gsub(%r{<.*?>}, "").strip
   6       HTMLEntities.new.decode(text)
   7    end

Remember to install the htmlentities gem:

   1  sudo gem install htmlentities

Tagged hpricot, inner_text, problem, bug