How to use Ruby and SimpleRSS to parse RSS and Atom feeds
This script is an example of how to use the SimpleRSS gem to parse an RSS feed.
The script can easily be modified to support conditional gets. It also detects the feed’s character encoding and converts the feed to UTF -8.
1 require 'iconv' 2 require 'net/http' 3 require 'net/https' 4 require 'rubygems' 5 require 'simple-rss' 6 7 url = URI.parse('http://hbl.fi/rss.xml') 8 9 http = Net::HTTP.new(url.host, url.port) 10 11 http.open_timeout = http.read_timeout = 10 # Set open and read timeout to 10 seconds 12 http.use_ssl = (url.scheme == "https") 13 14 headers = { 15 'User-Agent' => 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1.12) Gecko/20080201 Firefox/2.0.0.12', 16 'If-Modified-Since' => 'store in a database and set on each request', 17 'If-None-Match' => 'store in a database and set on each request' 18 } 19 20 response, body = http.get(url.path, headers) 21 22 encoding = body.scan( 23 /^<\?xml [^>]*encoding="([^\"]*)"[^>]*\?>/ 24 ).flatten.first 25 26 if encoding.empty? 27 if response["Content-Type"] =~ /charset=([\w\d-]+)/ 28 puts "Feed #{url} is #{encoding} according to Content-Type header" 29 encoding = $1.downcase 30 else 31 puts "Unable to detect content encoding for #{href}, using default." 32 encoding = "ISO-8859-1" 33 end 34 else 35 puts "Feed #{url} is #{encoding} according to XML" 36 end 37 38 # Use 'UTF-8//IGNORE', if this throws an exception 39 ic = Iconv.new('UTF-8', encoding) 40 body = ic.iconv(body) 41 42 feed = SimpleRSS.parse(body) 43 44 for item in feed.items 45 puts item.title 46 end
Example of how to fetch a URL with Net:HTTP and Ruby
1 require 'net/http' 2 require 'net/https' 3 4 url = URI.parse('http://www.google.com/yo?query=yahoo') 5 6 http = Net::HTTP.new(url.host, url.port) 7 8 http.open_timeout = http.read_timeout = 10 # Set open and read timeout to 10 seconds 9 http.use_ssl = (url.scheme == "https") 10 11 headers = { 12 'User-Agent' => 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1.12) Gecko/20080201 Firefox/2.0.0.12', 13 'If-Modified-Since' => '', 14 'If-None-Match' => '' 15 } 16 17 # Note to self, use request_uri not path: http://www.ruby-doc.org/core/classes/URI/HTTP.html#M004934 18 response, body = http.get(url.request_uri, headers) 19 20 puts response.code 21 puts response.message 22 23 response.each {|key, val| puts key + ' = ' + val}
Recursively add files to ClearCase
This script adds all files in the current directory to ClearCase.
Save the following script as add_recursively.rb in the directory you want to add to ClearCase:
1 %x{cleartool ls -view_only -r -s . > view_private_files.txt} 2 3 lines = File.open('view_private_files.txt').readlines.collect{|line| %Q{"#{line.chomp}"} } 4 5 # Work around command line length limit in Windows 6 while lines.size > 0 7 %x{cleardlg /addtosrc #{lines.slice!(0..100).join(' ')}} 8 end
Next open a command line window and execute the script:
1 cd clearcase_vob 2 ruby add_recursively.rb
ClearCase sucks, use Mercurial or git instead…
Using backgroundrb to execute tasks asynchronously in Rails
Draft…
Planning on using BackgroundDRB? Take a long look at the alternatives first
Ask yourself, do you really need a complex solution like BackgroundDRB? Most likely you don’t, so use a simple daemonized process instead, see this snippet about the daemons gem for more information.
Heck, even a simple Ruby script run by cron every 5 minutes will be more stable than BackgroundDRB and require less work.
Even if you really need to process a lot of data asynchronously in the background, I wouldn’t recommend BackgroundDRB, it’s riddled with bugs and unstable in production, so use the BJ plugin instead.
Anyway, continue reading if you want to use BackgroundDRB…
Installing the prerequisites:
1 $ sudo gem install chronic packet
Installing backgroundrb
1 $ cd rails_project 2 $ git clone git://gitorious.org/backgroundrb/mainline.git vendor/plugins/backgroundrb
You can also get the latest stable version from the Subversion repository:
1 svn co http://svn.devjavu.com/backgroundrb/trunk vendor/plugins/backgroundrb
Setup backgroundrb
1 rake backgroundrb:setup
Create a worker
1 ./script/generate worker feeds_worker
1 class FeedsWorker < BackgrounDRb::MetaWorker 2 set_worker_name :feeds_worker 3 4 def create(args = nil) 5 # this method is called, when worker is loaded for the first time 6 logger.info "Created feeds worker" 7 end 8 9 def update(data) 10 logger.info "Updating #{Feed.count} feeds." 11 12 seconds = Benchmark.realtime do 13 thread_pool.defer do 14 Feed.update_all() 15 end 16 end 17 18 logger.info "Update took #{'%.5f' % seconds}." 19 end 20 end
Starting backgroundrb
First configure backgroundrb by opening config/backgroundrb.yml in your editor:
1 :backgroundrb: 2 :ip: 0.0.0.0 3 4 :development: 5 :backgroundrb: 6 :port: 11111 # use port 11111 7 :log: foreground # foreground mode,print log messages on console 8 9 :production: 10 :backgroundrb: 11 :port: 22222 # use port 22222
Next, start backgroundrb in development mode:
1 ./script/backgroundrb -e development &
Call your worker
From the command line:
1 $ script/console 2 Loading development environment (Rails 2.0.2) 3 >> MiddleMan.worker(:feeds_worker).update()
When things go wrong
Asynchronous programming is complex, so expect bugs…
Rule #1 know who you’re calling.
If you give your MiddleMan the wrong name of your worker, he’ll just spit this crap at you:
1 You have a nil object when you didn't expect it! 2 The error occurred while evaluating nil.send_request 3 /usr/local/lib/ruby/gems/1.8/gems/packet-0.1.5/lib/packet/packet_master.rb:44:in `ask_worker' 4 /Users/christian/Documents/Projects/xxx/vendor/plugins/backgroundrb/server/lib/master_worker.rb:104:in `process_work' 5 /Users/christian/Documents/Projects/xxx/vendor/plugins/backgroundrb/server/lib/master_worker.rb:35:in `receive_data' 6 /usr/local/lib/ruby/gems/1.8/gems/packet-0.1.5/lib/packet/packet_parser.rb:29:in `call' 7 /usr/local/lib/ruby/gems/1.8/gems/packet-0.1.5/lib/packet/packet_parser.rb:29:in `extract' 8 /Users/christian/Documents/Projects/xxx/vendor/plugins/backgroundrb/server/lib/master_worker.rb:31:in `receive_data'
So for example this command would generate the above mentioned error:
1 MiddleMan.worker(:illegal_worker).update()
It’s always nice to see a cryptic error messages such as this, it really deserves an award.
Check for bugs and bug fixes
Going to production
Starting the daemon:
1 ./script/backgroundrb -e production start
Configuring your task to run periodically
The following example makes backgroundrb call the FeedsWorker’s update method once every 15 minutes:
1 :production: 2 :backgroundrb: 3 :port: 22222 # use port 22222 4 :lazy_load: true # do not load models eagerly 5 :debug_log: false # disable log workers and other logging 6 # Cron based scheduling 7 :schedules: 8 :feeds_worker: 9 :update: 10 :trigger_args: * */15 * * * * 11 :data: "Hello world"
At the time of writing, the cron scheduler seems to be broken, so I prefer hard-coding the interval in the worker’s create method:
1 def create 2 add_periodic_timer(15.minutes) { update } 3 end
If using Vlad or Capistrano, it’s also a good idea to fix script/backgroundrb by changing these lines:
1 pid_file = "#{RAILS_HOME}/../../shared/pids/backgroundrb_#{CONFIG_FILE[:backgroundrb][:port]}.pid" 2 SERVER_LOGGER = "#{RAILS_HOME}/../../shared/log/backgroundrb_server_#{CONFIG_FILE[:backgroundrb][:port]}.log"
Resources
Detecting file/data encoding with Ruby and the chardet RubyGem
You can use the chardet gem to detect the charset of an arbitrary string.
Install the chardet gem by issuing the following command:
1 $ sudo gem install chardet
Then in irb:
1 require 'rubygems' 2 require 'UniversalDetector' 3 p UniversalDetector::chardet('Ascii text') 4 p UniversalDetector::chardet('åäö')
The output from this example is:
1 {"encoding"=>"ascii", "confidence"=>1.0} 2 {"encoding"=>"utf-8", "confidence"=>0.87625}
For Python users there exists an identical library…