simplerss snippets

How to use Ruby and SimpleRSS to parse RSS and Atom feeds

Tagged ruby, utf-8, rss, atom, encoding, parse, simplerss  Languages ruby

This script is an example of how to use the SimpleRSS gem to parse an RSS feed.

The script can easily be modified to support conditional gets. It also detects the feed's character encoding and converts the feed to UTF-8.

require 'iconv'
require 'net/http'
require 'net/https'
require 'rubygems'
require 'simple-rss'

url = URI.parse('')

http =, url.port)

http.open_timeout = http.read_timeout = 10  # Set open and read timeout to 10 seconds
http.use_ssl = (url.scheme == "https")

headers = {
  'User-Agent'          => 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv: Gecko/20080201 Firefox/',
  'If-Modified-Since'   => 'store in a database and set on each request',
  'If-None-Match'       => 'store in a database and set on each request'

response, body = http.get(url.path, headers)

encoding = body.scan(
/^<\?xml [^>]*encoding="([^\"]*)"[^>]*\?>/

if encoding.empty?
    if response["Content-Type"] =~ /charset=([\w\d-]+)/
        puts "Feed #{url} is #{encoding} according to Content-Type header"
        encoding = $1.downcase
        puts "Unable to detect content encoding for #{href}, using default."
        encoding = "ISO-8859-1"
    puts "Feed #{url} is #{encoding} according to XML"

# Use 'UTF-8//IGNORE', if this throws an exception
ic ='UTF-8', encoding)
body = ic.iconv(body)

feed = SimpleRSS.parse(body)

for item in feed.items
  puts item.title