Register now and start sharing your code snippets.
-->

Parsing feeds with Ruby and the FeedTools gem

Ruby posted 8 months ago by christian

This is an example of how to use the FeedTools gem to parse a feed. FeedTools supports atom, rss, and so on…

The only negative thing about FeedTools is that the project is abandoned, the author said this in a comment from March 2008: “I’ve effectively abandoned it, so I’m really not going to go taking on huge code reorganization efforts.”

Installing

   1  $ sudo gem install feedtools

Fetching and parsing a feed

Easy…

   1  require 'rubygems'
   2  require 'feed_tools'
   3  feed = FeedTools::Feed.open('http://www.slashdot.org/index.rss')
   4  
   5  puts feed.title
   6  puts feed.link
   7  puts feed.description
   8  
   9  for item in feed.items
  10    puts item.title
  11    puts item.link
  12    puts item.content
  13  end

Feed autodiscovery

FeedTools finds the Slashdot feed for you.

   1  puts FeedTools::Feed.open('http://www.slashdot.org').href

Helpers

FeedTools can also cleanup your dirty XML /HTML:

   1  require 'feed_tools'
   2  require 'feed_tools/helpers/feed_tools_helper'
   3  
   4  FeedTools::HtmlHelper.tidy_html(html)

Database cache

FeedTools can also store the fetched feeds for you:

   1  FeedTools.configurations[:tidy_enabled] = false
   2  FeedTools.configurations[:feed_cache] = "FeedTools::DatabaseFeedCache"

The schema contains all you need:

   1  -- Example MySQL schema
   2    CREATE TABLE `cached_feeds` (
   3      `id`              int(10) unsigned NOT NULL auto_increment,
   4      `href`            varchar(255) default NULL,
   5      `title`           varchar(255) default NULL,
   6      `link`            varchar(255) default NULL,
   7      `feed_data`       longtext default NULL,
   8      `feed_data_type`  varchar(20) default NULL,
   9      `http_headers`    text default NULL,
  10      `last_retrieved`  datetime default NULL,
  11      `time_to_live`    int(10) unsigned NULL,
  12      `serialized`       longtext default NULL,
  13      PRIMARY KEY  (`id`)
  14    )

There’s even a Rails migration file included.

Feed updater

There’s also a feed updater tool that can fetch feeds in the background, but I haven’t had time to look at it yet.

   1  sudo gem install feedupdater

Character set/encoding bug

As always, there are bugs that you need to be aware of, Feedtools is no different. There’s an encoding bug, FeedTools encodes everything to ISO -8859-1, instead UTF -8 which should be the default encoding.

To fix it use the following code:

   1  ic = Iconv.new('ISO-8859-1', 'UTF-8')
   2  feed.description = ic.iconv(feed.description)

You can also try this patch.

   1  cd /usr/local/lib/ruby/gems/1.8/gems/
   2  wget http://n0life.org/~julbouln/feedtools_encoding.patch
   3  patch -p1 feedtools_encoding.patch

The character encoding bug is discussed on this page: http://sporkmonger.com/2005/08/11/tutorial

Time estimation

By default FeedTools will try to estimate when a feed item was published, if it’s not available from the feed. This annoys me and will create weird publish dates, so usually it’s a good idea to disable it with the timestamp_estimation_enabled option:

   1  FeedTools.reset_configurations
   2  FeedTools.configurations[:tidy_enabled] = false
   3  FeedTools.configurations[:feed_cache] = nil
   4  FeedTools.configurations[:default_ttl]   = 15.minutes
   5  FeedTools.configurations[:timestamp_estimation_enabled] = false

Configuration options

To see a list of available configuration options run the following code:

   1  pp FeedTools.configurations

Tagged feedtools, rss, atom, parser, ruby, content encoding, utf-8, iso-8859-1

Installing/compiling and using git with Ruby on Rails (on Mac OS X Leopard and Debian Linux)

Shell Script (Bash) posted 8 months ago by christian

Git is a good alternative to Mercurial, and of course SVN or CVS if you’re still using stone age tools, so in this post I’ll show you how to compile, install and use git with Rails.

Installing git on Mac OS X

First compile and install git:

   1  cd /usr/local/src
   2  wget http://kernel.org/pub/software/scm/git/git-1.5.4.4.tar.bz2
   3  tar jxvf git-1.5.4.4.tar.bz2
   4  cd git-1.5.4.4
   5  make prefix=/usr/local all
   6  make prefix=/usr/local test && echo $?
   7  sudo make prefix=/usr/local install

Installing git on Debian

On a Debian installation install git by first executing the following commands:

$ sudo apt-get install git-core

Note that the package name is git-core not git.

If you want the latest and greatest version, you first need to install the dependencies (note that you can leave out tk and expat):

   1  sudo apt-get install curl
   2  sudo apt-get install libcurl3
   3  sudo apt-get install libcurl3-dev
   4  sudo apt-get install tk8.4
   5  sudo apt-get install cpio expat
   6  sudo apt-get install zlib
   7  sudo apt-get install build-essential
   8  sudo apt-get install zlib1g-dev 
   9  sudo apt-get install asciidoc
  10  sudo apt-get install xmlto

Then compile and install:

   1  NO_EXPAT=yes NO_SVN_TESTS=yes NO_IPV6=yes NO_TCLTK=yes make -j2 prefix=/usr all
   2  NO_EXPAT=yes NO_SVN_TESTS=yes NO_IPV6=yes NO_TCLTK=yes make -j2 prefix=/usr install

Configuring git

Run these commands to tell git your name and email:

   1  git config --global user.name "u name"
   2  git config --global user.email x@x.com

Otherwise, you might get this error:

   1  *** Environment problem:
   2  *** Your name cannot be determined from your system services (gecos).
   3  *** You would need to set GIT_AUTHOR_NAME and GIT_COMMITTER_NAME
   4  *** environment variables; otherwise you won't be able to perform
   5  *** certain operations because of "empty ident" errors.
   6  *** Alternatively, you can use user.name configuration variable.
   7  
   8  fatal: empty ident  <........@........com> not allowed
   9  fatal: The remote end hung up unexpectedly

If you like colorized command output execute these commands:

   1  git config --global color.diff auto
   2  git config --global color.status auto
   3  git config --global color.branch auto

Using git

If all goes well, change to your project directory and run the following commands:

   1  git init

This creates the git repository, so we’re now ready to start adding files to it, but first we need to create the git ignore file, which tells git to ignore certain files completely:

   1  cat <<EOF<<EOF > .gitignore 
   2  config/database.yml
   3  db/*.sqlite3
   4  log/*.log
   5  tmp/**/*
   6  .DS_Store
   7  doc/api
   8  doc/app
   9  EOFEOF

By default git doesn’t add empty directories—sucks if you ask me—so we’ll create a dummy file in all empty directories with the find and touch commands:

   1  find . \( -type d -empty \) -and \( -not -regex ./\.git.* \) -exec touch {}/.gitignore \; 

Importing files

We’re now ready to start adding and commiting files, so without thinking execute:

   1  git add .
   2  git commit -m 'initial import'

This creates the git repository, adds and commits all files that are in the current folder.

Using remote repositories

If you’re like me you’ll want to use a remote repository, so let’s continue the exercise by creating the repository folder on the remote server (Note that commands are executed on the remote server from now on):

   1  mkdir /var/lib/git/repositories/project_name

We want the folder to be accessible by users belonging to the git group only:

   1  addgroup git
   2  chown root.git /var/lib/git/repositories/project_name
   3  chmod 770 /var/lib/git/repositories/project_name

Now add yourself—or the user you’ll be using to connect to the remote server—to the git group:

   1  usermod -a -G git your_username

Alternatively create a new user:

   1  useradd -g git your_username

Now we’re finally ready to copy the local repository to the remote server, which is done with the scp command (Note that commands are executed locally again from now on):

   1  scp -rp .git user@server://var/lib/git/repositories/project_name

To let git know that this repository exists we’ll use the git remote command:

   1  git remote add project_name ssh://server/var/lib/git/repositories/project_name

This adds the information to .git/config, which might be good to have a quick look at.

Note that if you’re using a non-standard SSH port you need to add the following to your ~/.ssh/config file:

   1  Host server
   2    Port 1234

Commit files and push them to the remote server

Now change a file and commit and push the changes to the remote server:

   1  git commit -m "Me be sleepy"
   2  git push project_name

If you get an error such as this it means you need to install git:

   1  $ git push project_name
   2  username@server's password: 
   3  sh: git-receive-pack: command not found
   4  fatal: The remote end hung up unexpectedly

That’s all…

Miscellaneous problems

error: unable to create temporary sha1 filename ./objects/obj_FUu2jb: Permission denied

Resources

http://jointheconversation.org/railsgit

http://devblog.michaelgalero.com/2007/12/17/my-git-notes-for-rails/

http://railscasts.com/episodes/96

http://groups.google.com/group/rails-oceania/browse_thread/thread/2c8611dc93917952/e175f72310823547

http://www.kernel.org/pub/software/scm/git/docs/tutorial.html

http://scie.nti.st/2007/11/14/hosting-git-repositories-the-easy-and-secure-way

Tagged git, osx, mac, compile, ruby, rails, remote, linux

Installing Rails, mongrel and mongrel_cluster on Debian

Shell Script (Bash) posted 8 months ago by christian

DRAFT …

Install RubyGems

   1  http://rubyforge.org/frs/download.php/29548/rubygems-1.0.1.tgz
   2  
   3  tar zxvf rubygems-1.0.1.tgz
   4  
   5  cd rubygems-1.0.1
   6  
   7  ruby setup.rb

Install Rails

   1  gem install rails

Install sqlite3 (optional)

   1  apt-get install sqlite3 libsqlite3-dev
   2  gem install sqlite3-ruby

Install mongrel and mongrel_cluster

   1  $ gem install mongrel mongrel_cluster
   2  
   3  $ mongrel_rails cluster::configure -e production \
   4    -p 8000 \
   5    -a 127.0.0.1 \
   6    -N 3 \
   7    -c /var/www/xyz/current
   8  
   9  
  10  $ mongrel_rails cluster::start
  11  
  12  
  13  $ useradd -g www-data -d /var/www mongrel

Surviving reboots

   1  sudo mkdir /etc/mongrel_cluster
   2  
   3  sudo ln -s /var/www/xyz/config/mongrel_cluster.yml /etc/mongrel_cluster/xyz.yml
   4  
   5  sudo cp /usr/local/lib/ruby/gems/1.8/gems/mongrel_cluster-1.0.5/resources/mongrel_cluster /etc/init.d/
   6  
   7  sudo chmod +x /etc/init.d/mongrel_cluster
   8  
   9  sudo /usr/sbin/update-rc.d -f mongrel_cluster defaults
  10  
  11  mongrel_cluster_ctl status

Stale pids

If your mongrels crash or if you kill them, mongrel_cluster won’t start your mongrels because mongrel_cluster believes the processes are still running, instead mongrel_cluster complains and does nothing:

   1  ** !!! PID file tmp/pids/mongrel.8000.pid already exists.  Mongrel could be running already.  Check your log/mongrel.8000.log for errors.
   2  ** !!! Exiting with error.  You must stop mongrel and clear the .pid before I'll attempt a start.

To fix this simply add the —clean switch to the /usr/local/lib/ruby/gems/1.8/gems/mongrel_cluster-1.0.5/resources/mongrel_cluster startup script:

   1  mongrel_cluster_ctl start -c $CONF_DIR --clean

Tagged rails, ruby, debian, install, sqlite3, mongrel, mongrel_cluster

Compiling Ruby with OpenSSL, Zlib and Readline support on Debian

Ruby posted 8 months ago by christian

DRAFT … From http://blog.fiveruns.com/2008/3/3/compiling-ruby-rubygems-and-rails-on-ubuntu

Install pre-requisites

   1  apt-get -y install build-essential libssl-dev libreadline5-dev zlib1g-dev

Download and install

   1  cd /usr/local/src
   2  
   3  wget http://ftp.ruby-lang.org/pub/ruby/1.8/ruby-1.8.6.tar.gz
   4  
   5  tar zxvf ruby-1.8.6.tar.gz
   6  
   7  cd ruby-1.8.6.tar.gz
   8  
   9  ./configure --prefix=/usr/local --with-openssl-dir=/usr --with-readline-dir=/usr --with-zlib-dir=/usr
  10  
  11  make
  12  make install
  13  
  14  ruby -ropenssl -rzlib -rreadline -e "puts :success" 
  15  

Tagged ruby, readline, ssl, zlib, debian

Scraping Yahoo! Finance with Ruby and Hpricot

CSS posted 9 months ago by christian

This code extracts the numbers from the Fund operations table on the BLV fund’s Profile page at Yahoo! Finance.

   1  require 'rubygems'
   2  require 'hpricot'
   3  require 'open-uri'
   4  
   5  page = Hpricot(open('http://finance.yahoo.com/q/pr?s=BLV'))
   6  
   7  fund_operations = []
   8  page.search( "//table[@class='yfnc_datamodoutline1']" ).each do |row|
   9    row.search( "//td[@class='yfnc_datamoddata1']").each do |data|
  10      fund_operations << data.inner_html
  11    end
  12  end
  13  
  14  pp fund_operations

The output from this script is:

   1  ["N/A", "N/A", "55%", "72", "85.05M", "1.71B"]

Note that you could also use Scrubyt for this. Here’s a snippet that explains how to use Scrubyt to scrape web pages: Scraping Google search results with Scrubyt and Ruby

Tagged yahoo, finance, ruby, hpricot