detect snippets

Detecting file/data encoding with Ruby and the chardet RubyGem

Tagged detect, charset, encoding, ruby, chardet  Languages ruby

You can use the chardet gem to detect the charset of an arbitrary string.

Install the chardet gem by issuing the following command:

$ sudo gem install chardet

Then in irb:

require 'rubygems'
require 'UniversalDetector'
p UniversalDetector::chardet('Ascii text')
p UniversalDetector::chardet('åäö')

The output from this example is:

{"encoding"=>"ascii", "confidence"=>1.0}
{"encoding"=>"utf-8", "confidence"=>0.87625}

For Python users there exists an identical library...

How to detect traffic from the most common search spiders with Ruby

Tagged spider, web crawler, bot, search, user agent, detect  Languages ruby

This snippet detects traffic from the following bots, which is enough for me:

The code (via):

user_agent = request.user_agent.downcase
@bot = [ 'msnbot', 'yahoo! slurp','googlebot' ].detect { |bot| user_agent.include? bot }

When the Google bot visists your site the @bot string will contain 'googlebot'.

If you need to detect more bots than these, then the user-agents.org site contains a list of various user agents for both bots and browsers.