Register now and start sharing your code snippets.
-->
How to use Ruby and SimpleRSS to parse RSS and Atom feeds
Ruby posted 7 months ago by christian
This script is an example of how to use the SimpleRSS gem to parse an RSS feed.
The script can easily be modified to support conditional gets. It also detects the feed’s character encoding and converts the feed to UTF -8.
1 require 'iconv' 2 require 'net/http' 3 require 'net/https' 4 require 'rubygems' 5 require 'simple-rss' 6 7 url = URI.parse('http://hbl.fi/rss.xml') 8 9 http = Net::HTTP.new(url.host, url.port) 10 11 http.open_timeout = http.read_timeout = 10 # Set open and read timeout to 10 seconds 12 http.use_ssl = (url.scheme == "https") 13 14 headers = { 15 'User-Agent' => 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1.12) Gecko/20080201 Firefox/2.0.0.12', 16 'If-Modified-Since' => 'store in a database and set on each request', 17 'If-None-Match' => 'store in a database and set on each request' 18 } 19 20 response, body = http.get(url.path, headers) 21 22 encoding = body.scan( 23 /^<\?xml [^>]*encoding="([^\"]*)"[^>]*\?>/ 24 ).flatten.first 25 26 if encoding.empty? 27 if response["Content-Type"] =~ /charset=([\w\d-]+)/ 28 puts "Feed #{url} is #{encoding} according to Content-Type header" 29 encoding = $1.downcase 30 else 31 puts "Unable to detect content encoding for #{href}, using default." 32 encoding = "ISO-8859-1" 33 end 34 else 35 puts "Feed #{url} is #{encoding} according to XML" 36 end 37 38 # Use 'UTF-8//IGNORE', if this throws an exception 39 ic = Iconv.new('UTF-8', encoding) 40 body = ic.iconv(body) 41 42 feed = SimpleRSS.parse(body) 43 44 for item in feed.items 45 puts item.title 46 end