How to use Ruby and SimpleRSS to parse RSS and Atom feeds
This script is an example of how to use the SimpleRSS gem to parse an RSS feed.
The script can easily be modified to support conditional gets. It also detects the feed’s character encoding and converts the feed to UTF -8.
1 require 'iconv' 2 require 'net/http' 3 require 'net/https' 4 require 'rubygems' 5 require 'simple-rss' 6 7 url = URI.parse('http://hbl.fi/rss.xml') 8 9 http = Net::HTTP.new(url.host, url.port) 10 11 http.open_timeout = http.read_timeout = 10 # Set open and read timeout to 10 seconds 12 http.use_ssl = (url.scheme == "https") 13 14 headers = { 15 'User-Agent' => 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1.12) Gecko/20080201 Firefox/2.0.0.12', 16 'If-Modified-Since' => 'store in a database and set on each request', 17 'If-None-Match' => 'store in a database and set on each request' 18 } 19 20 response, body = http.get(url.path, headers) 21 22 encoding = body.scan( 23 /^<\?xml [^>]*encoding="([^\"]*)"[^>]*\?>/ 24 ).flatten.first 25 26 if encoding.empty? 27 if response["Content-Type"] =~ /charset=([\w\d-]+)/ 28 puts "Feed #{url} is #{encoding} according to Content-Type header" 29 encoding = $1.downcase 30 else 31 puts "Unable to detect content encoding for #{href}, using default." 32 encoding = "ISO-8859-1" 33 end 34 else 35 puts "Feed #{url} is #{encoding} according to XML" 36 end 37 38 # Use 'UTF-8//IGNORE', if this throws an exception 39 ic = Iconv.new('UTF-8', encoding) 40 body = ic.iconv(body) 41 42 feed = SimpleRSS.parse(body) 43 44 for item in feed.items 45 puts item.title 46 end
Example of how to fetch a URL with Net:HTTP and Ruby
1 require 'net/http' 2 require 'net/https' 3 4 url = URI.parse('http://www.google.com/yo?query=yahoo') 5 6 http = Net::HTTP.new(url.host, url.port) 7 8 http.open_timeout = http.read_timeout = 10 # Set open and read timeout to 10 seconds 9 http.use_ssl = (url.scheme == "https") 10 11 headers = { 12 'User-Agent' => 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1.12) Gecko/20080201 Firefox/2.0.0.12', 13 'If-Modified-Since' => '', 14 'If-None-Match' => '' 15 } 16 17 # Note to self, use request_uri not path: http://www.ruby-doc.org/core/classes/URI/HTTP.html#M004934 18 response, body = http.get(url.request_uri, headers) 19 20 puts response.code 21 puts response.message 22 23 response.each {|key, val| puts key + ' = ' + val}
Recursively add files to ClearCase
This script adds all files in the current directory to ClearCase.
Save the following script as add_recursively.rb in the directory you want to add to ClearCase:
1 %x{cleartool ls -view_only -r -s . > view_private_files.txt} 2 3 lines = File.open('view_private_files.txt').readlines.collect{|line| %Q{"#{line.chomp}"} } 4 5 # Work around command line length limit in Windows 6 while lines.size > 0 7 %x{cleardlg /addtosrc #{lines.slice!(0..100).join(' ')}} 8 end
Next open a command line window and execute the script:
1 cd clearcase_vob 2 ruby add_recursively.rb
ClearCase sucks, use Mercurial or git instead…
Using backgroundrb to execute tasks asynchronously in Rails
Draft…
Planning on using BackgroundDRB? Take a long look at the alternatives first
Ask yourself, do you really need a complex solution like BackgroundDRB? Most likely you don’t, so use a simple daemonized process instead, see this snippet about the daemons gem for more information.
Heck, even a simple Ruby script run by cron every 5 minutes will be more stable than BackgroundDRB and require less work.
Even if you really need to process a lot of data asynchronously in the background, I wouldn’t recommend BackgroundDRB, it’s riddled with bugs and unstable in production, so use the BJ plugin instead.
Anyway, continue reading if you want to use BackgroundDRB…
Installing the prerequisites:
1 $ sudo gem install chronic packet
Installing backgroundrb
1 $ cd rails_project 2 $ git clone git://gitorious.org/backgroundrb/mainline.git vendor/plugins/backgroundrb
You can also get the latest stable version from the Subversion repository:
1 svn co http://svn.devjavu.com/backgroundrb/trunk vendor/plugins/backgroundrb
Setup backgroundrb
1 rake backgroundrb:setup
Create a worker
1 ./script/generate worker feeds_worker
1 class FeedsWorker < BackgrounDRb::MetaWorker 2 set_worker_name :feeds_worker 3 4 def create(args = nil) 5 # this method is called, when worker is loaded for the first time 6 logger.info "Created feeds worker" 7 end 8 9 def update(data) 10 logger.info "Updating #{Feed.count} feeds." 11 12 seconds = Benchmark.realtime do 13 thread_pool.defer do 14 Feed.update_all() 15 end 16 end 17 18 logger.info "Update took #{'%.5f' % seconds}." 19 end 20 end
Starting backgroundrb
First configure backgroundrb by opening config/backgroundrb.yml in your editor:
1 :backgroundrb: 2 :ip: 0.0.0.0 3 4 :development: 5 :backgroundrb: 6 :port: 11111 # use port 11111 7 :log: foreground # foreground mode,print log messages on console 8 9 :production: 10 :backgroundrb: 11 :port: 22222 # use port 22222
Next, start backgroundrb in development mode:
1 ./script/backgroundrb -e development &
Call your worker
From the command line:
1 $ script/console 2 Loading development environment (Rails 2.0.2) 3 >> MiddleMan.worker(:feeds_worker).update()
When things go wrong
Asynchronous programming is complex, so expect bugs…
Rule #1 know who you’re calling.
If you give your MiddleMan the wrong name of your worker, he’ll just spit this crap at you:
1 You have a nil object when you didn't expect it! 2 The error occurred while evaluating nil.send_request 3 /usr/local/lib/ruby/gems/1.8/gems/packet-0.1.5/lib/packet/packet_master.rb:44:in `ask_worker' 4 /Users/christian/Documents/Projects/xxx/vendor/plugins/backgroundrb/server/lib/master_worker.rb:104:in `process_work' 5 /Users/christian/Documents/Projects/xxx/vendor/plugins/backgroundrb/server/lib/master_worker.rb:35:in `receive_data' 6 /usr/local/lib/ruby/gems/1.8/gems/packet-0.1.5/lib/packet/packet_parser.rb:29:in `call' 7 /usr/local/lib/ruby/gems/1.8/gems/packet-0.1.5/lib/packet/packet_parser.rb:29:in `extract' 8 /Users/christian/Documents/Projects/xxx/vendor/plugins/backgroundrb/server/lib/master_worker.rb:31:in `receive_data'
So for example this command would generate the above mentioned error:
1 MiddleMan.worker(:illegal_worker).update()
It’s always nice to see a cryptic error messages such as this, it really deserves an award.
Check for bugs and bug fixes
Going to production
Starting the daemon:
1 ./script/backgroundrb -e production start
Configuring your task to run periodically
The following example makes backgroundrb call the FeedsWorker’s update method once every 15 minutes:
1 :production: 2 :backgroundrb: 3 :port: 22222 # use port 22222 4 :lazy_load: true # do not load models eagerly 5 :debug_log: false # disable log workers and other logging 6 # Cron based scheduling 7 :schedules: 8 :feeds_worker: 9 :update: 10 :trigger_args: * */15 * * * * 11 :data: "Hello world"
At the time of writing, the cron scheduler seems to be broken, so I prefer hard-coding the interval in the worker’s create method:
1 def create 2 add_periodic_timer(15.minutes) { update } 3 end
If using Vlad or Capistrano, it’s also a good idea to fix script/backgroundrb by changing these lines:
1 pid_file = "#{RAILS_HOME}/../../shared/pids/backgroundrb_#{CONFIG_FILE[:backgroundrb][:port]}.pid" 2 SERVER_LOGGER = "#{RAILS_HOME}/../../shared/log/backgroundrb_server_#{CONFIG_FILE[:backgroundrb][:port]}.log"
Resources
How to use Vlad the Deployer with git, nginx, mongrel, mongrel_cluster and Rails
This is a draft…
Installing Vlad the Deployer
1 gem install vlad
Configuring Vlad the Deployer
Add this to the end of RakeFile:
1 begin 2 require 'rubygems' 3 require 'vlad' 4 Vlad.load :scm => :git 5 rescue LoadError => e 6 puts "Unable to load Vlad #{e}." 7 end
Note that we’re telling Vlad to use git. This snippet- gives you a quick introduction on how to use git with Rails.
Creating the deployment recipe
If you’re uncertain what these variables mean, have a look at the docs. This folder is also worth a look, and don’t forget to take a peek at the vlad source code.
1 # 2 # General configuration 3 # 4 set :ssh_flags, '-p 666' 5 set :application, 'xxx.com' 6 set :domain, '127.0.01' 7 set :deploy_to, '/var/www/xxx.com' 8 set :repository, '/var/lib/git/repositories/xxx.com/.git/' 9 10 11 # 12 # Mongrel configuration 13 # 14 set :mongrel_clean, true 15 set :mongrel_command, 'sudo mongrel_rails' 16 set :mongrel_group, 'www-data' 17 set :mongrel_port, 9000 18 set :mongrel_servers, 3 19 20 #set :mongrel_address, '127.0.0.1' 21 #set(:mongrel_conf) { '#{shared_path}/mongrel_cluster.conf' } 22 #set :mongrel_config_script, nil 23 #set :mongrel_environment, 'production' 24 #set :mongrel_log_file, nil 25 #set :mongrel_pid_file, nil 26 #set :mongrel_prefix, nil 27 #set :mongrel_user, 'mongrel' 28 29 # 30 # Customize Vlad to our needs 31 # 32 namespace :vlad do 33 # 34 # Add an after_update hook 35 # 36 remote_task :update do 37 Rake::Task['vlad:after_update'].invoke 38 end 39 40 # 41 # The after_update hook, which is run after vlad:update 42 # 43 remote_task :after_update do 44 # Link to shared resources, if you have them in .gitignore 45 # run "ln -s #{deploy_to}/shared/system/database.yml #{deploy_to}/current/config/database.yml" 46 end 47 48 # 49 # Deploys a new version of your application 50 # 51 remote_task :deploy => [:update, :migrate, :start_app] 52 end
Setup the server
1 $ rake vlad:setup
This will create the necessary folders and mongrel_cluster configuration file.
Deploy the application
Now deploy the application with vlad:deploy, which is a custom rake task that we added to the deployment recipe:
1 $ rake vlad:deploy
Copying your SSH public key to the remote server
Vlad uses ssh for executing commands on the remotely, and rsync for copying the build to your server, which means you’ll quickly grow tired of typing your password each time a command is run.
This problem is solved by copying your public SSH keys to the remote server, this snippet- explains how to do exactly that.