rss snippets

Valid RSS 2.0 Feed Template for Rails

Tagged ruby, rails, rss2.0, feed, rss, template, example, atom  Languages ruby

If you like Atom more than RSS use the atom_feed_helper.

Here's the template, modify it to fit your needs. I know there are plugins and other ways of doing this, but I hate code that gets too abstract:

<?xml version="1.0"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <atom:link href="http://xxxxxxx" rel="self" type="application/rss+xml" />
    <title>Code Snippets - Aktagon</title>
    <link>http://snippets.aktagon.com/</link>
    <description>Share your code with the world. Allow others to review and comment.</description>
    <language>en-us</language>
    <pubDate><%= @snippets[0].created_at.rfc822 %></pubDate>
    <lastBuildDate><%= @snippets[0].created_at.rfc822 %></lastBuildDate>
    <docs>http://blogs.law.harvard.edu/tech/rss</docs>
    <generator>Aktagon Snippets</generator>
 <% for snippet in @snippets %>
    <item>
      <title><![CDATA[<%= snippet.title %>]]></title>
      <link><%= snippet_url(snippet) %></link>
      <description><![CDATA[<%= snippet.rendered_body %>]]></description>
      <pubDate><%= @snippets[0].created_at.rfc822 %></pubDate>
      <guid><%= snippet_url(snippet) %></guid>
      <% for tag in snippet.tags%>
        <category domain="http://snippets.aktagon.com/snippets"><![CDATA[<%= tag.name %>]]></category>
      <% end%>
    </item>
<% end %>
  </channel>
</rss>

Remember to serve the feed with the correct HTTP headers.

It also helps to have an auto-discovery tag inside the head tag:

<link rel="alternate" type="application/rss+xml" title="RSS feed" href="http://<%= request.host %>/rss/" />

Parsing feeds with Ruby and the FeedTools gem

Tagged feedtools, rss, atom, parser, ruby, content encoding, utf-8, iso-8859-1  Languages ruby

This is an example of how to use the FeedTools gem to parse a feed. FeedTools supports atom, rss, and so on...

The only negative thing about FeedTools is that the project is abandoned, the author said this in a comment from March 2008: “I’ve effectively abandoned it, so I’m really not going to go taking on huge code reorganization efforts.”

Installing

$ sudo gem install feedtools

Fetching and parsing a feed

Easy...

require 'rubygems'
require 'feed_tools'
feed = FeedTools::Feed.open('http://www.slashdot.org/index.rss')

puts feed.title
puts feed.link
puts feed.description

for item in feed.items
  puts item.title
  puts item.link
  puts item.content
end

Feed autodiscovery

FeedTools finds the Slashdot feed for you.

puts FeedTools::Feed.open('http://www.slashdot.org').href

Helpers

FeedTools can also cleanup your dirty XML/HTML:

require 'feed_tools'
require 'feed_tools/helpers/feed_tools_helper'

FeedTools::HtmlHelper.tidy_html(html)

Database cache

FeedTools can also store the fetched feeds for you:

FeedTools.configurations[:tidy_enabled] = false
FeedTools.configurations[:feed_cache] = "FeedTools::DatabaseFeedCache"

The schema contains all you need:

-- Example MySQL schema
  CREATE TABLE cached_feeds (
    id              int(10) unsigned NOT NULL auto_increment,
    href            varchar(255) default NULL,
    title           varchar(255) default NULL,
    link            varchar(255) default NULL,
    feed_data       longtext default NULL,
    feed_data_type  varchar(20) default NULL,
    http_headers    text default NULL,
    last_retrieved  datetime default NULL,
    time_to_live    int(10) unsigned NULL,
    serialized       longtext default NULL,
    PRIMARY KEY  (id)
  )

There's even a Rails migration file included.

Feed updater

There's also a feed updater tool that can fetch feeds in the background, but I haven't had time to look at it yet.

sudo gem install feedupdater

Character set/encoding bug

As always, there are bugs that you need to be aware of, Feedtools is no different. There's an encoding bug, FeedTools encodes everything to ISO-8859-1, instead UTF-8 which should be the default encoding.

To fix it use the following code:

ic = Iconv.new('ISO-8859-1', 'UTF-8')
feed.description = ic.iconv(feed.description)

You can also try this patch.

cd /usr/local/lib/ruby/gems/1.8/gems/
wget http://n0life.org/~julbouln/feedtools_encoding.patch
patch -p1 feedtools_encoding.patch

The character encoding bug is discussed on this page: http://sporkmonger.com/2005/08/11/tutorial

Time estimation

By default FeedTools will try to estimate when a feed item was published, if it's not available from the feed. This annoys me and will create weird publish dates, so usually it's a good idea to disable it with the timestamp_estimation_enabled option:

FeedTools.reset_configurations
FeedTools.configurations[:tidy_enabled] = false
FeedTools.configurations[:feed_cache] = nil
FeedTools.configurations[:default_ttl]   = 15.minutes
FeedTools.configurations[:timestamp_estimation_enabled] = false

Configuration options

To see a list of available configuration options run the following code:

pp FeedTools.configurations

How to use Ruby and SimpleRSS to parse RSS and Atom feeds

Tagged rss, atom, parse, ruby, simplerss, encoding, utf-8  Languages ruby

This script is an example of how to use the SimpleRSS gem to parse an RSS feed.

The script can easily be modified to support conditional gets. It also detects the feed's character encoding and converts the feed to UTF-8.

require 'iconv'
require 'net/http'
require 'net/https'
require 'rubygems'
require 'simple-rss'

url = URI.parse('http://hbl.fi/rss.xml')

http = Net::HTTP.new(url.host, url.port)

http.open_timeout = http.read_timeout = 10  # Set open and read timeout to 10 seconds
http.use_ssl = (url.scheme == "https")

headers = {
  'User-Agent'          => 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1.12) Gecko/20080201 Firefox/2.0.0.12',
  'If-Modified-Since'   => 'store in a database and set on each request',
  'If-None-Match'       => 'store in a database and set on each request'
}

response, body = http.get(url.path, headers)

encoding = body.scan(
/^<\?xml [^>]*encoding="([^\"]*)"[^>]*\?>/
).flatten.first

if encoding.empty?
    if response["Content-Type"] =~ /charset=([\w\d-]+)/
        puts "Feed #{url} is #{encoding} according to Content-Type header"
        encoding = $1.downcase
    else
        puts "Unable to detect content encoding for #{href}, using default."
        encoding = "ISO-8859-1"
    end
else
    puts "Feed #{url} is #{encoding} according to XML"
end

# Use 'UTF-8//IGNORE', if this throws an exception
ic = Iconv.new('UTF-8', encoding)
body = ic.iconv(body)

feed = SimpleRSS.parse(body)

for item in feed.items
  puts item.title
end

How to parse RSS/Atom feeds with the ROME Java library

Tagged rome, java, atom, rss, feed, parse  Languages java

This is a simple example of how to use the ROME library to parse feeds:

import com.sun.syndication.io.*;
import com.sun.syndication.feed.synd.*;
import java.net.URL;
import java.util.*;

public class RomeParserTest {

    public static void main(String args[]) {
        try {
            SyndFeedInput sfi = new SyndFeedInput();

            String urls[] = {
                "...", 
                "..." 
            };
            
            for(String url:urls) {
                SyndFeed feed = sfi.build(new XmlReader(new URL(url)));

                List entries = feed.getEntries();

                System.out.println(feed.getTitle());            
                System.out.println(entries.size());
            }
        } catch (Exception ex) {
            throw new RuntimeException(ex);
        }
    }
}

How to parse an RSS or Atom feed with Python and the Universal Feed Parser library

Tagged universal, feed, parser, atom, rss, python  Languages python

This example uses the Universal Feed Parser, one of the best and fastest parsers for Python.

Feed Parser is a lot faster than feed_tools for Ruby and it's about as fast as the ROME Java library according to my simple benchmark.

Feed Parser uses less memory and about as much of the CPU as ROME, but this wasn't tested with a long running process, so don't take my word for it.

import time
import feedparser

start = time.time()

feeds = [
    'http://..', 
    'http://'
]

for url in feeds:
  options = {
    'agent'   : '..',
    'etag'    : '..',
    'modified': feedparser._parse_date('Sat, 29 Oct 1994 19:43:31 GMT'),
    'referrer' : '..'
  }

  feed = feedparser.parse(url, **options)

  print len(feed.entries)
  print feed.feed.title.encode('utf-8')

end = time.time()

print 'fetch took %0.3f s' % (end-start)

Parsing feeds with Ruby and rFeedParser

Tagged rfeedparser, ruby, rss, parse, feed  Languages ruby

rFeedParser is a Ruby version of the feedparser Python library, which is probably the best (not fastest) feed parser.

To install it follow the instruction on the project's GitHub page.

require 'rubygems'
require 'rfeedparser'
require 'benchmark'


seconds = Benchmark.realtime do

    body = File.read('example-feed.xml')
    
    for num in (1..500)
        feed = FeedParser.parse(body) # Can be URL, string, data.
    end
    
end

puts "#{seconds.round} elapsed."

rFeedParser has one problem. In my simple test it was ~3-4 times slower than feed-normalizer and feedparser.org.

How to do an HTTP conditional GET with Feedzirra (update an existing feed)

Tagged feedzirra, conditional-get, rss, atom, feed  Languages ruby

This snippet explains how to do conditional gets with Feedzirra 0.0.17:

# First create a dummy parser, any type of parser will do
f = Feedzirra::Parser::RSS.new

# Set the required Feedzirra values with data from your database
f.feed_url = feed_from_db.url
f.etag = feed_from_db.etag
f.last_modified = feed_from_db.last_modified_at

# Set the last entry. This step is important. 
# This allows Feedzirra to detect if a feed that doesn't support last modified and etag has been updated.
last_entry = Feedzirra::Parser::RSSEntry.new

# Do we have a last entry in the database? If so let Feedzirra know
if feed_from_db.items.last
  last_entry.url = feed_from_db.items.last.link
end

# Without this Feedzirra will return an empty array or some other surprise
f.entries << last_entry

# Update the feed
Feedzirra::Feed.update f

How to parse RSS/Atom feeds with Scala and the Rome library

Tagged scala, feed, atom, rss, parse  Languages java

This snippet shows how to parse feeds with Scala and the Rome library:

import com.sun.syndication.io._
import com.sun.syndication.feed.synd._
import java.net.URL

object FeedParser {
  def main(args: Array[String]): Unit = {
    try {
      val sfi = new SyndFeedInput()

      val urls = List("http://hbl.fi/rss.xml")
      
      urls.foreach(url => {
        val feed = sfi.build(new XmlReader(new URL(url)))

        val entries = feed.getEntries()

        println(feed.getTitle())
        println(entries.size())
      })
    } catch {
      case e => throw new RuntimeException(e)
    }
    
  }
}

Also see: https://gist.github.com/585235/bf328d90d094305121cec0ba2a646ce0093fa654

How to parse XML feeds with jQuery

Tagged atom, rss, feed, parse, jquery, internet explorer  Languages javascript
$.ajax({
    type: 'GET',
    url: '/some/good/stuff.xml',
    dataType: 'xml',
    error: function(xhr) {
        alert('Failed to parse feed');
    },
    success: function(xml) {
        var channel = $('channel', xml).eq(0);
        var items = [];
        $('item', xml).each( function() {
            var item = {};
            item.title = $(this).find('title').eq(0).text();
            item.link = $(this).find('link').eq(0).text();
            item.description = $(this).find('description').eq(0).text();
            item.updated = $(this).find('pubDate').eq(0).text();
            item.id = $(this).find('guid').eq(0).text();
            items.push(item);
        });
        console.dir(items);
    }
});

Your friend Internet Explorer

For IE 6 and better (worse?) the feed must return the right content type, so make sure the response contains this header:

Content-type: text/xml

If this header is not set the jQuery Ajax error handler is called and the feed is not parsed.

How to fetch delicious data with Hpricot and OpenURI

Tagged delicious, hpricot, ruby, rss, feed  Languages ruby

The code:

class Delicious
  class << self
    def tag(username, name, count = 15)
      links = []
      url = "http://feeds.delicious.com/v2/rss/#{username}/#{name}?count=#{count}"
      feed = Hpricot(open(url))

      feed.search("item").each do |i|
        item = OpenStruct.new
        item.link = i.at('link').next.to_s
        item.title = i.at('title').innerHTML
        item.description  = i.at('description').innerHTML rescue nil

        links << item
      end

      links
    end
  end
end

Usage:

# Return last 15 items tagged with business and news from jebus's account:
Delicious.tag 'jebus', 'business+news', 15

Returns an array of items.