Register now and start sharing your code snippets.
-->

How to use Ruby and SimpleRSS to parse RSS and Atom feeds

Ruby posted 8 months ago by christian

This script is an example of how to use the SimpleRSS gem to parse an RSS feed.

The script can easily be modified to support conditional gets. It also detects the feed’s character encoding and converts the feed to UTF -8.

   1  require 'iconv'
   2  require 'net/http'
   3  require 'net/https'
   4  require 'rubygems'
   5  require 'simple-rss'
   6  
   7  url = URI.parse('http://hbl.fi/rss.xml')
   8  
   9  http = Net::HTTP.new(url.host, url.port)
  10  
  11  http.open_timeout = http.read_timeout = 10  # Set open and read timeout to 10 seconds
  12  http.use_ssl = (url.scheme == "https")
  13  
  14  headers = {
  15    'User-Agent'          => 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1.12) Gecko/20080201 Firefox/2.0.0.12',
  16    'If-Modified-Since'   => 'store in a database and set on each request',
  17    'If-None-Match'       => 'store in a database and set on each request'
  18  }
  19  
  20  response, body = http.get(url.path, headers)
  21  
  22  encoding = body.scan(
  23  /^<\?xml [^>]*encoding="([^\"]*)"[^>]*\?>/
  24  ).flatten.first
  25  
  26  if encoding.empty?
  27  	if response["Content-Type"] =~ /charset=([\w\d-]+)/
  28  		puts "Feed #{url} is #{encoding} according to Content-Type header"
  29  		encoding = $1.downcase
  30  	else
  31  		puts "Unable to detect content encoding for #{href}, using default."
  32  		encoding = "ISO-8859-1"
  33  	end
  34  else
  35  	puts "Feed #{url} is #{encoding} according to XML"
  36  end
  37  
  38  # Use 'UTF-8//IGNORE', if this throws an exception
  39  ic = Iconv.new('UTF-8', encoding)
  40  body = ic.iconv(body)
  41  
  42  feed = SimpleRSS.parse(body)
  43  
  44  for item in feed.items
  45    puts item.title
  46  end

Tagged rss, atom, parse, ruby, simplerss, encoding, utf-8

Example of how to fetch a URL with Net:HTTP and Ruby

Ruby posted 8 months ago by christian

   1  require 'net/http'
   2  require 'net/https'
   3  
   4  url = URI.parse('http://www.google.com/yo?query=yahoo')
   5  
   6  http = Net::HTTP.new(url.host, url.port)
   7  
   8  http.open_timeout = http.read_timeout = 10  # Set open and read timeout to 10 seconds
   9  http.use_ssl = (url.scheme == "https")
  10         
  11  headers = {
  12    'User-Agent'          => 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1.12) Gecko/20080201 Firefox/2.0.0.12',
  13    'If-Modified-Since'   => '',
  14    'If-None-Match'       => ''
  15  }
  16  
  17  # Note to self, use request_uri not path: http://www.ruby-doc.org/core/classes/URI/HTTP.html#M004934
  18  response, body = http.get(url.request_uri, headers)
  19  
  20  puts response.code
  21  puts response.message
  22  
  23  response.each {|key, val| puts key + ' = ' + val}

Tagged net, http, ruby, example, headers

Recursively add files to ClearCase

Ruby posted 8 months ago by christian

This script adds all files in the current directory to ClearCase.

Save the following script as add_recursively.rb in the directory you want to add to ClearCase:

   1  %x{cleartool ls -view_only -r -s . > view_private_files.txt}
   2  
   3  lines = File.open('view_private_files.txt').readlines.collect{|line| %Q{"#{line.chomp}"} }
   4  
   5  # Work around command line length limit in Windows
   6  while lines.size > 0
   7    %x{cleardlg /addtosrc #{lines.slice!(0..100).join(' ')}}
   8  end

Next open a command line window and execute the script:

   1  cd clearcase_vob
   2  ruby add_recursively.rb

ClearCase sucks, use Mercurial or git instead…

Tagged add, recursive, clearcase, ruby, script

Using backgroundrb to execute tasks asynchronously in Rails

Ruby posted 8 months ago by christian

Draft…

Planning on using BackgroundDRB? Take a long look at the alternatives first

Ask yourself, do you really need a complex solution like BackgroundDRB? Most likely you don’t, so use a simple daemonized process instead, see this snippet about the daemons gem for more information.

Heck, even a simple Ruby script run by cron every 5 minutes will be more stable than BackgroundDRB and require less work.

Even if you really need to process a lot of data asynchronously in the background, I wouldn’t recommend BackgroundDRB, it’s riddled with bugs and unstable in production, so use the BJ plugin instead.

Anyway, continue reading if you want to use BackgroundDRB…

Installing the prerequisites:

   1  $ sudo gem install chronic packet 

Installing backgroundrb

   1  $ cd rails_project
   2  $ git clone git://gitorious.org/backgroundrb/mainline.git vendor/plugins/backgroundrb

You can also get the latest stable version from the Subversion repository:

   1  svn co http://svn.devjavu.com/backgroundrb/trunk  vendor/plugins/backgroundrb

Setup backgroundrb

   1  rake backgroundrb:setup

Create a worker

   1  ./script/generate worker feeds_worker

   1  class FeedsWorker < BackgrounDRb::MetaWorker
   2    set_worker_name :feeds_worker
   3    
   4    def create(args = nil)
   5      # this method is called, when worker is loaded for the first time
   6      logger.info "Created feeds worker"
   7    end
   8    
   9    def update(data)
  10      logger.info "Updating #{Feed.count} feeds."
  11      
  12      seconds = Benchmark.realtime do
  13        thread_pool.defer do
  14          Feed.update_all()
  15        end
  16      end
  17  
  18      logger.info "Update took #{'%.5f' % seconds}."
  19    end
  20  end

Starting backgroundrb

First configure backgroundrb by opening config/backgroundrb.yml in your editor:

   1  :backgroundrb:
   2    :ip: 0.0.0.0
   3  
   4  :development:
   5    :backgroundrb:
   6      :port: 11111     # use port 11111
   7      :log: foreground # foreground mode,print log messages on console
   8  
   9  :production:
  10    :backgroundrb:
  11      :port: 22222      # use port 22222

Next, start backgroundrb in development mode:

   1  ./script/backgroundrb -e development &

Call your worker

From the command line:

   1  $ script/console
   2  Loading development environment (Rails 2.0.2)
   3  >> MiddleMan.worker(:feeds_worker).update() 

When things go wrong

Asynchronous programming is complex, so expect bugs…

Rule #1 know who you’re calling.

If you give your MiddleMan the wrong name of your worker, he’ll just spit this crap at you:

   1  You have a nil object when you didn't expect it!
   2  The error occurred while evaluating nil.send_request
   3  /usr/local/lib/ruby/gems/1.8/gems/packet-0.1.5/lib/packet/packet_master.rb:44:in `ask_worker'
   4  /Users/christian/Documents/Projects/xxx/vendor/plugins/backgroundrb/server/lib/master_worker.rb:104:in `process_work'
   5  /Users/christian/Documents/Projects/xxx/vendor/plugins/backgroundrb/server/lib/master_worker.rb:35:in `receive_data'
   6  /usr/local/lib/ruby/gems/1.8/gems/packet-0.1.5/lib/packet/packet_parser.rb:29:in `call'
   7  /usr/local/lib/ruby/gems/1.8/gems/packet-0.1.5/lib/packet/packet_parser.rb:29:in `extract'
   8  /Users/christian/Documents/Projects/xxx/vendor/plugins/backgroundrb/server/lib/master_worker.rb:31:in `receive_data'

So for example this command would generate the above mentioned error:

   1  MiddleMan.worker(:illegal_worker).update() 

It’s always nice to see a cryptic error messages such as this, it really deserves an award.

Check for bugs and bug fixes

git mainline commits

Going to production

Starting the daemon:

   1  ./script/backgroundrb -e production start

Configuring your task to run periodically

The following example makes backgroundrb call the FeedsWorker’s update method once every 15 minutes:

   1  :production:
   2    :backgroundrb:
   3      :port: 22222      # use port 22222
   4      :lazy_load: true  # do not load models eagerly
   5      :debug_log: false # disable log workers and other logging
   6  # Cron based scheduling
   7  :schedules:
   8    :feeds_worker:
   9      :update:
  10        :trigger_args: * */15 * * * *
  11        :data: "Hello world"

At the time of writing, the cron scheduler seems to be broken, so I prefer hard-coding the interval in the worker’s create method:

   1  def create
   2             add_periodic_timer(15.minutes) { update }
   3           end

If using Vlad or Capistrano, it’s also a good idea to fix script/backgroundrb by changing these lines:

   1  pid_file = "#{RAILS_HOME}/../../shared/pids/backgroundrb_#{CONFIG_FILE[:backgroundrb][:port]}.pid"
   2  SERVER_LOGGER = "#{RAILS_HOME}/../../shared/log/backgroundrb_server_#{CONFIG_FILE[:backgroundrb][:port]}.log"

Resources

Backgroundrb homepage

Backgroundrb best practices

Backgroundrb scheduling

Debugging backgroundrb

Backroundrb’s README

topfunky’s messaging article

Tagged backgroundrb, rails, ruby, distributed, messaging

How to use Vlad the Deployer with git, nginx, mongrel, mongrel_cluster and Rails

Ruby posted 8 months ago by christian

This is a draft…

Installing Vlad the Deployer

   1  gem install vlad

Configuring Vlad the Deployer

Add this to the end of RakeFile:

   1  begin
   2    require 'rubygems'
   3    require 'vlad'
   4    Vlad.load :scm => :git
   5  rescue LoadError => e
   6    puts "Unable to load Vlad #{e}."
   7  end

Note that we’re telling Vlad to use git. This snippet- gives you a quick introduction on how to use git with Rails.

Creating the deployment recipe

If you’re uncertain what these variables mean, have a look at the docs. This folder is also worth a look, and don’t forget to take a peek at the vlad source code.

   1  #
   2  # General configuration
   3  #
   4  set :ssh_flags,             '-p 666'
   5  set :application,           'xxx.com'
   6  set :domain,                '127.0.01'
   7  set :deploy_to,             '/var/www/xxx.com'
   8  set :repository,            '/var/lib/git/repositories/xxx.com/.git/'
   9  
  10  
  11  #
  12  # Mongrel configuration
  13  #
  14  set :mongrel_clean,         true
  15  set :mongrel_command,       'sudo mongrel_rails'
  16  set :mongrel_group,         'www-data'
  17  set :mongrel_port,          9000
  18  set :mongrel_servers,       3
  19  
  20  #set :mongrel_address,       '127.0.0.1'
  21  #set(:mongrel_conf)          { '#{shared_path}/mongrel_cluster.conf' }
  22  #set :mongrel_config_script, nil
  23  #set :mongrel_environment,   'production'
  24  #set :mongrel_log_file,      nil
  25  #set :mongrel_pid_file,      nil
  26  #set :mongrel_prefix,        nil
  27  #set :mongrel_user,          'mongrel'
  28  
  29  #
  30  # Customize Vlad to our needs
  31  #
  32  namespace :vlad do
  33    #
  34    # Add an after_update hook
  35    #
  36    remote_task :update do
  37      Rake::Task['vlad:after_update'].invoke
  38    end
  39  
  40    #
  41    # The after_update hook, which is run after vlad:update
  42    #
  43    remote_task :after_update do
  44    # Link to shared resources, if you have them in .gitignore
  45    #  run "ln -s #{deploy_to}/shared/system/database.yml #{deploy_to}/current/config/database.yml"
  46    end
  47  
  48    #
  49    # Deploys a new version of your application
  50    #
  51    remote_task :deploy => [:update, :migrate, :start_app]
  52  end

Setup the server

   1  $ rake vlad:setup

This will create the necessary folders and mongrel_cluster configuration file.

Deploy the application

Now deploy the application with vlad:deploy, which is a custom rake task that we added to the deployment recipe:

   1  $ rake vlad:deploy

Copying your SSH public key to the remote server

Vlad uses ssh for executing commands on the remotely, and rsync for copying the build to your server, which means you’ll quickly grow tired of typing your password each time a command is run.

This problem is solved by copying your public SSH keys to the remote server, this snippet- explains how to do exactly that.

Tagged vlad, deployer, deploy, capistrano, nginx, mongrel, mongrel_cluster