Parsing feeds with Ruby and the FeedTools gem
This is an example of how to use the FeedTools gem to parse a feed. FeedTools supports atom, rss, and so on…
The only negative thing about FeedTools is that the project is abandoned, the author said this in a comment from March 2008: “I’ve effectively abandoned it, so I’m really not going to go taking on huge code reorganization efforts.”
Installing
1 $ sudo gem install feedtools
Fetching and parsing a feed
Easy…
1 require 'rubygems' 2 require 'feed_tools' 3 feed = FeedTools::Feed.open('http://www.slashdot.org/index.rss') 4 5 puts feed.title 6 puts feed.link 7 puts feed.description 8 9 for item in feed.items 10 puts item.title 11 puts item.link 12 puts item.content 13 end
Feed autodiscovery
FeedTools finds the Slashdot feed for you.
1 puts FeedTools::Feed.open('http://www.slashdot.org').href
Helpers
FeedTools can also cleanup your dirty XML /HTML:
1 require 'feed_tools' 2 require 'feed_tools/helpers/feed_tools_helper' 3 4 FeedTools::HtmlHelper.tidy_html(html)
Database cache
FeedTools can also store the fetched feeds for you:
1 FeedTools.configurations[:tidy_enabled] = false 2 FeedTools.configurations[:feed_cache] = "FeedTools::DatabaseFeedCache"
The schema contains all you need:
1 -- Example MySQL schema 2 CREATE TABLE `cached_feeds` ( 3 `id` int(10) unsigned NOT NULL auto_increment, 4 `href` varchar(255) default NULL, 5 `title` varchar(255) default NULL, 6 `link` varchar(255) default NULL, 7 `feed_data` longtext default NULL, 8 `feed_data_type` varchar(20) default NULL, 9 `http_headers` text default NULL, 10 `last_retrieved` datetime default NULL, 11 `time_to_live` int(10) unsigned NULL, 12 `serialized` longtext default NULL, 13 PRIMARY KEY (`id`) 14 )
There’s even a Rails migration file included.
Feed updater
There’s also a feed updater tool that can fetch feeds in the background, but I haven’t had time to look at it yet.
1 sudo gem install feedupdater
Character set/encoding bug
As always, there are bugs that you need to be aware of, Feedtools is no different. There’s an encoding bug, FeedTools encodes everything to ISO -8859-1, instead UTF -8 which should be the default encoding.
To fix it use the following code:
1 ic = Iconv.new('ISO-8859-1', 'UTF-8') 2 feed.description = ic.iconv(feed.description)
You can also try this patch.
1 cd /usr/local/lib/ruby/gems/1.8/gems/ 2 wget http://n0life.org/~julbouln/feedtools_encoding.patch 3 patch -p1 feedtools_encoding.patch
The character encoding bug is discussed on this page: http://sporkmonger.com/2005/08/11/tutorial
Time estimation
By default FeedTools will try to estimate when a feed item was published, if it’s not available from the feed. This annoys me and will create weird publish dates, so usually it’s a good idea to disable it with the timestamp_estimation_enabled option:
1 FeedTools.reset_configurations 2 FeedTools.configurations[:tidy_enabled] = false 3 FeedTools.configurations[:feed_cache] = nil 4 FeedTools.configurations[:default_ttl] = 15.minutes 5 FeedTools.configurations[:timestamp_estimation_enabled] = false
Configuration options
To see a list of available configuration options run the following code:
1 pp FeedTools.configurations
Installing/compiling and using git with Ruby on Rails (on Mac OS X Leopard and Debian Linux)
Git is a good alternative to Mercurial, and of course SVN or CVS if you’re still using stone age tools, so in this post I’ll show you how to compile, install and use git with Rails.
Installing git on Mac OS X
First compile and install git:
1 cd /usr/local/src 2 wget http://kernel.org/pub/software/scm/git/git-1.5.4.4.tar.bz2 3 tar jxvf git-1.5.4.4.tar.bz2 4 cd git-1.5.4.4 5 make prefix=/usr/local all 6 make prefix=/usr/local test && echo $? 7 sudo make prefix=/usr/local install
Installing git on Debian
On a Debian installation install git by first executing the following commands:
$ sudo apt-get install git-coreNote that the package name is git-core not git.
If you want the latest and greatest version, you first need to install the dependencies (note that you can leave out tk and expat):
1 sudo apt-get install curl 2 sudo apt-get install libcurl3 3 sudo apt-get install libcurl3-dev 4 sudo apt-get install tk8.4 5 sudo apt-get install cpio expat 6 sudo apt-get install zlib 7 sudo apt-get install build-essential 8 sudo apt-get install zlib1g-dev 9 sudo apt-get install asciidoc 10 sudo apt-get install xmlto
Then compile and install:
1 NO_EXPAT=yes NO_SVN_TESTS=yes NO_IPV6=yes NO_TCLTK=yes make -j2 prefix=/usr all 2 NO_EXPAT=yes NO_SVN_TESTS=yes NO_IPV6=yes NO_TCLTK=yes make -j2 prefix=/usr install
Configuring git
Run these commands to tell git your name and email:
1 git config --global user.name "u name" 2 git config --global user.email x@x.com
Otherwise, you might get this error:
1 *** Environment problem: 2 *** Your name cannot be determined from your system services (gecos). 3 *** You would need to set GIT_AUTHOR_NAME and GIT_COMMITTER_NAME 4 *** environment variables; otherwise you won't be able to perform 5 *** certain operations because of "empty ident" errors. 6 *** Alternatively, you can use user.name configuration variable. 7 8 fatal: empty ident <........@........com> not allowed 9 fatal: The remote end hung up unexpectedly
If you like colorized command output execute these commands:
1 git config --global color.diff auto 2 git config --global color.status auto 3 git config --global color.branch auto
Using git
If all goes well, change to your project directory and run the following commands:
1 git init
This creates the git repository, so we’re now ready to start adding files to it, but first we need to create the git ignore file, which tells git to ignore certain files completely:
1 cat <<EOF<<EOF > .gitignore 2 config/database.yml 3 db/*.sqlite3 4 log/*.log 5 tmp/**/* 6 .DS_Store 7 doc/api 8 doc/app 9 EOFEOF
By default git doesn’t add empty directories—sucks if you ask me—so we’ll create a dummy file in all empty directories with the find and touch commands:
1 find . \( -type d -empty \) -and \( -not -regex ./\.git.* \) -exec touch {}/.gitignore \;
Importing files
We’re now ready to start adding and commiting files, so without thinking execute:
1 git add . 2 git commit -m 'initial import'
This creates the git repository, adds and commits all files that are in the current folder.
Using remote repositories
If you’re like me you’ll want to use a remote repository, so let’s continue the exercise by creating the repository folder on the remote server (Note that commands are executed on the remote server from now on):
1 mkdir /var/lib/git/repositories/project_name
We want the folder to be accessible by users belonging to the git group only:
1 addgroup git 2 chown root.git /var/lib/git/repositories/project_name 3 chmod 770 /var/lib/git/repositories/project_name
Now add yourself—or the user you’ll be using to connect to the remote server—to the git group:
1 usermod -a -G git your_username
Alternatively create a new user:
1 useradd -g git your_username
Now we’re finally ready to copy the local repository to the remote server, which is done with the scp command (Note that commands are executed locally again from now on):
1 scp -rp .git user@server://var/lib/git/repositories/project_name
To let git know that this repository exists we’ll use the git remote command:
1 git remote add project_name ssh://server/var/lib/git/repositories/project_name
This adds the information to .git/config, which might be good to have a quick look at.
Note that if you’re using a non-standard SSH port you need to add the following to your ~/.ssh/config file:
1 Host server 2 Port 1234
Commit files and push them to the remote server
Now change a file and commit and push the changes to the remote server:
1 git commit -m "Me be sleepy" 2 git push project_name
If you get an error such as this it means you need to install git:
1 $ git push project_name 2 username@server's password: 3 sh: git-receive-pack: command not found 4 fatal: The remote end hung up unexpectedly
That’s all…
Miscellaneous problems
error: unable to create temporary sha1 filename ./objects/obj_FUu2jb: Permission denied
Resources
http://jointheconversation.org/railsgit
http://devblog.michaelgalero.com/2007/12/17/my-git-notes-for-rails/
http://railscasts.com/episodes/96
http://groups.google.com/group/rails-oceania/browse_thread/thread/2c8611dc93917952/e175f72310823547
http://www.kernel.org/pub/software/scm/git/docs/tutorial.html
http://scie.nti.st/2007/11/14/hosting-git-repositories-the-easy-and-secure-way
Installing Rails, mongrel and mongrel_cluster on Debian
DRAFT …
Install RubyGems
1 http://rubyforge.org/frs/download.php/29548/rubygems-1.0.1.tgz 2 3 tar zxvf rubygems-1.0.1.tgz 4 5 cd rubygems-1.0.1 6 7 ruby setup.rb
Install Rails
1 gem install rails
Install sqlite3 (optional)
1 apt-get install sqlite3 libsqlite3-dev 2 gem install sqlite3-ruby
Install mongrel and mongrel_cluster
1 $ gem install mongrel mongrel_cluster 2 3 $ mongrel_rails cluster::configure -e production \ 4 -p 8000 \ 5 -a 127.0.0.1 \ 6 -N 3 \ 7 -c /var/www/xyz/current 8 9 10 $ mongrel_rails cluster::start 11 12 13 $ useradd -g www-data -d /var/www mongrel
Surviving reboots
1 sudo mkdir /etc/mongrel_cluster 2 3 sudo ln -s /var/www/xyz/config/mongrel_cluster.yml /etc/mongrel_cluster/xyz.yml 4 5 sudo cp /usr/local/lib/ruby/gems/1.8/gems/mongrel_cluster-1.0.5/resources/mongrel_cluster /etc/init.d/ 6 7 sudo chmod +x /etc/init.d/mongrel_cluster 8 9 sudo /usr/sbin/update-rc.d -f mongrel_cluster defaults 10 11 mongrel_cluster_ctl status
Stale pids
If your mongrels crash or if you kill them, mongrel_cluster won’t start your mongrels because mongrel_cluster believes the processes are still running, instead mongrel_cluster complains and does nothing:
1 ** !!! PID file tmp/pids/mongrel.8000.pid already exists. Mongrel could be running already. Check your log/mongrel.8000.log for errors. 2 ** !!! Exiting with error. You must stop mongrel and clear the .pid before I'll attempt a start.
To fix this simply add the —clean switch to the /usr/local/lib/ruby/gems/1.8/gems/mongrel_cluster-1.0.5/resources/mongrel_cluster startup script:
1 mongrel_cluster_ctl start -c $CONF_DIR --clean
Compiling Ruby with OpenSSL, Zlib and Readline support on Debian
DRAFT … From http://blog.fiveruns.com/2008/3/3/compiling-ruby-rubygems-and-rails-on-ubuntu
Install pre-requisites
1 apt-get -y install build-essential libssl-dev libreadline5-dev zlib1g-dev
Download and install
1 cd /usr/local/src 2 3 wget http://ftp.ruby-lang.org/pub/ruby/1.8/ruby-1.8.6.tar.gz 4 5 tar zxvf ruby-1.8.6.tar.gz 6 7 cd ruby-1.8.6.tar.gz 8 9 ./configure --prefix=/usr/local --with-openssl-dir=/usr --with-readline-dir=/usr --with-zlib-dir=/usr 10 11 make 12 make install 13 14 ruby -ropenssl -rzlib -rreadline -e "puts :success" 15
Scraping Yahoo! Finance with Ruby and Hpricot
This code extracts the numbers from the Fund operations table on the BLV fund’s Profile page at Yahoo! Finance.
1 require 'rubygems' 2 require 'hpricot' 3 require 'open-uri' 4 5 page = Hpricot(open('http://finance.yahoo.com/q/pr?s=BLV')) 6 7 fund_operations = [] 8 page.search( "//table[@class='yfnc_datamodoutline1']" ).each do |row| 9 row.search( "//td[@class='yfnc_datamoddata1']").each do |data| 10 fund_operations << data.inner_html 11 end 12 end 13 14 pp fund_operations
The output from this script is:
1 ["N/A", "N/A", "55%", "72", "85.05M", "1.71B"]
Note that you could also use Scrubyt for this. Here’s a snippet that explains how to use Scrubyt to scrape web pages: Scraping Google search results with Scrubyt and Ruby