Register now and start sharing your code snippets.
How to detect traffic from the most common search spiders with Ruby
Ruby posted about 1 month ago by christian
This snippet detects traffic from the following bots, which is enough for me:
- Google – Googlebot/2.1 ( http://www.googlebot.com/bot.html)
- Google Image – Googlebot-Image/1.0 ( http://www.googlebot.com/bot.html)
- MSN Live – msnbot-Products/1.0 (+http://search.msn.com/msnbot.htm)
- Yahoo – Mozilla/5.0 (compatible; Yahoo! Slurp;)
The code (via):
1 user_agent = request.user_agent.downcase 2 @bot = [ 'msnbot', 'yahoo! slurp','googlebot' ].detect { |bot| user_agent.include? bot }
When the Google bot visists your site the @bot string will contain ‘googlebot’.
If you need to detect more bots than these, then the user-agents.org site contains a list of various user agents for both bots and browsers.
Detecting file/data encoding with Ruby and the chardet RubyGem
Ruby posted 5 months ago by christian
You can use the chardet gem to detect the charset of an arbitrary string.
Install the chardet gem by issuing the following command:
1 $ sudo gem install chardet
Then in irb:
1 require 'rubygems' 2 require 'UniversalDetector' 3 p UniversalDetector::chardet('Ascii text') 4 p UniversalDetector::chardet('åäö')
The output from this example is:
1 {"encoding"=>"ascii", "confidence"=>1.0} 2 {"encoding"=>"utf-8", "confidence"=>0.87625}
For Python users there exists an identical library…