Detecting file/data encoding with Ruby and the chardet RubyGem

You can use the chardet gem to detect the charset of an arbitrary string.

Install the chardet gem by issuing the following command:

$ sudo gem install chardet

Then in irb:

require 'rubygems'
require 'UniversalDetector'
p UniversalDetector::chardet('Ascii text')
p UniversalDetector::chardet('åäö')

The output from this example is:

{"encoding"=>"ascii", "confidence"=>1.0}
{"encoding"=>"utf-8", "confidence"=>0.87625}

For Python users there exists an identical library…

Updated 2813 days ago