How to do an HTTP conditional GET with Feedzirra (update an existing feed)

Ruby posted 5 months ago by christian

This snippet explains how to do conditional gets with Feedzirra 0.0.17:

   1  # First create a dummy parser, any type of parser will do
   2  f = Feedzirra::Parser::RSS.new
   3  
   4  # Set the required Feedzirra values with data from your database
   5  f.feed_url = feed_from_db.url
   6  f.etag = feed_from_db.etag
   7  f.last_modified = feed_from_db.last_modified_at
   8  
   9  # Set the last entry. This step is important. 
  10  # This allows Feedzirra to detect if a feed that doesn't support last modified and etag has been updated.
  11  last_entry = Feedzirra::Parser::RSSEntry.new
  12  
  13  # Do we have a last entry in the database? If so let Feedzirra know
  14  if feed_from_db.items.last
  15    last_entry.url = feed_from_db.items.last.link
  16  end
  17  
  18  # Without this Feedzirra will return an empty array or some other surprise
  19  f.entries << last_entry
  20  
  21  # Update the feed
  22  Feedzirra::Feed.update f

Tagged feedzirra, conditional-get, rss, atom, feed

Atom template Rails builder template

Ruby posted 10 months ago by christian

   1  atom_feed(:url => formatted_posts_url(:atom)) do |feed|
   2    feed.title(@category.name)
   3    feed.updated(@posts.first ? @posts.first.created_at : Time.now.utc)
   4  
   5    for post in @posts
   6      feed.entry(post) do |entry|
   7        entry.title(post.title)
   8        entry.content(post.body_html, :type => 'html')
   9        entry.updated post.updated_at
  10  
  11        for tag in post.tags
  12          entry.category :term => url_for(tag), :label => tag.name
  13        end
  14      end
  15    end
  16  end

Tagged atom, builder, category, tags

How to parse an RSS or Atom feed with Python and the Universal Feed Parser library

Python posted about 1 year ago by christian

This example uses the Universal Feed Parser, one of the best and fastest parsers for Python.

Feed Parser is a lot faster than feed_tools for Ruby and it’s about as fast as the ROME Java library according to my simple benchmark.

Feed Parser uses less memory and about as much of the CPU as ROME , but this wasn’t tested with a long running process, so don’t take my word for it.

   1  import time
   2  import feedparser
   3  
   4  start = time.time()
   5  
   6  feeds = [
   7  	'http://..', 
   8  	'http://'
   9  ]
  10  
  11  for url in feeds:
  12    options = {
  13      'agent'   : '..',
  14      'etag'    : '..',
  15      'modified': feedparser._parse_date('Sat, 29 Oct 1994 19:43:31 GMT'),
  16      'referrer' : '..'
  17    }
  18  
  19    feed = feedparser.parse(url, **options)
  20  
  21    print len(feed.entries)
  22    print feed.feed.title.encode('utf-8')
  23  
  24  end = time.time()
  25  
  26  print 'fetch took %0.3f s' % (end-start)

Tagged universal, feed, parser, atom, rss, python

How to parse an RSS or Atom feed with the ROME Java library

Java posted about 1 year ago by christian

This is a simple example of how to use the ROME library to parse feeds:

   1  import com.sun.syndication.io.*;
   2  import com.sun.syndication.feed.synd.*;
   3  import java.net.URL;
   4  import java.util.*;
   5  
   6  public class RomeParserTest {
   7  
   8  	public static void main(String args[]) {
   9  		try {
  10  			SyndFeedInput sfi = new SyndFeedInput();
  11  
  12  			String urls[] = {
  13  				"...", 
  14  				"..." 
  15  			};
  16  			
  17  			for(String url:urls) {
  18  				SyndFeed feed = sfi.build(new XmlReader(new URL(url)));
  19  
  20  				List entries = feed.getEntries();
  21  
  22  				System.out.println(feed.getTitle());			
  23  				System.out.println(entries.size());
  24  			}
  25  		} catch (Exception ex) {
  26  			throw new RuntimeException(ex);
  27  		}
  28  	}
  29  }

Tagged rome, java, atom, rss, feed, parse

How to use Ruby and SimpleRSS to parse RSS and Atom feeds

Ruby posted about 1 year ago by christian

This script is an example of how to use the SimpleRSS gem to parse an RSS feed.

The script can easily be modified to support conditional gets. It also detects the feed’s character encoding and converts the feed to UTF -8.

   1  require 'iconv'
   2  require 'net/http'
   3  require 'net/https'
   4  require 'rubygems'
   5  require 'simple-rss'
   6  
   7  url = URI.parse('http://hbl.fi/rss.xml')
   8  
   9  http = Net::HTTP.new(url.host, url.port)
  10  
  11  http.open_timeout = http.read_timeout = 10  # Set open and read timeout to 10 seconds
  12  http.use_ssl = (url.scheme == "https")
  13  
  14  headers = {
  15    'User-Agent'          => 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1.12) Gecko/20080201 Firefox/2.0.0.12',
  16    'If-Modified-Since'   => 'store in a database and set on each request',
  17    'If-None-Match'       => 'store in a database and set on each request'
  18  }
  19  
  20  response, body = http.get(url.path, headers)
  21  
  22  encoding = body.scan(
  23  /^<\?xml [^>]*encoding="([^\"]*)"[^>]*\?>/
  24  ).flatten.first
  25  
  26  if encoding.empty?
  27  	if response["Content-Type"] =~ /charset=([\w\d-]+)/
  28  		puts "Feed #{url} is #{encoding} according to Content-Type header"
  29  		encoding = $1.downcase
  30  	else
  31  		puts "Unable to detect content encoding for #{href}, using default."
  32  		encoding = "ISO-8859-1"
  33  	end
  34  else
  35  	puts "Feed #{url} is #{encoding} according to XML"
  36  end
  37  
  38  # Use 'UTF-8//IGNORE', if this throws an exception
  39  ic = Iconv.new('UTF-8', encoding)
  40  body = ic.iconv(body)
  41  
  42  feed = SimpleRSS.parse(body)
  43  
  44  for item in feed.items
  45    puts item.title
  46  end

Tagged rss, atom, parse, ruby, simplerss, encoding, utf-8