Detecting file/data encoding with Ruby and the chardet RubyGem

Tagged detect, charset, encoding, ruby, chardet  Languages ruby

You can use the chardet gem to detect the charset of an arbitrary string.

Install the chardet gem by issuing the following command:

$ sudo gem install chardet

Then in irb:

require 'rubygems'
require 'UniversalDetector'
p UniversalDetector::chardet('Ascii text')
p UniversalDetector::chardet('åäö')

The output from this example is:

{"encoding"=>"ascii", "confidence"=>1.0}
{"encoding"=>"utf-8", "confidence"=>0.87625}

For Python users there exists an identical library...