Detecting file/data encoding with Ruby and the chardet RubyGem

Ruby posted over 5 years ago by christian

You can use the chardet gem to detect the charset of an arbitrary string.

Install the chardet gem by issuing the following command:

   1  $ sudo gem install chardet

Then in irb:

   1  require 'rubygems'
   2  require 'UniversalDetector'
   3  p UniversalDetector::chardet('Ascii text')
   4  p UniversalDetector::chardet('åäö')

The output from this example is:

   1  {"encoding"=>"ascii", "confidence"=>1.0}
   2  {"encoding"=>"utf-8", "confidence"=>0.87625}

For Python users there exists an identical library…

Tagged detect, charset, encoding, ruby, chardet