Register now and start sharing your code snippets.

How to parse an RSS or Atom feed with Python and the Universal Feed Parser library

Python posted 2 months ago by christian

This example uses the Universal Feed Parser, one of the best and fastest parsers for Python.

Feed Parser is a lot faster than feed_tools for Ruby and it’s about as fast as the ROME Java library according to my simple benchmark.

Feed Parser uses less memory and about as much of the CPU as ROME , but this wasn’t tested with a long running process, so don’t take my word for it.

   1  import time
   2  import feedparser
   3  
   4  start = time.time()
   5  
   6  feeds = [
   7  	'http://..', 
   8  	'http://'
   9  ]
  10  
  11  for url in feeds:
  12    options = {
  13      'agent'   : '..',
  14      'etag'    : '..',
  15      'modified': feedparser._parse_date('Sat, 29 Oct 1994 19:43:31 GMT'),
  16      'referrer' : '..'
  17    }
  18  
  19    feed = feedparser.parse(url, **options)
  20  
  21    print len(feed.entries)
  22    print feed.feed.title.encode('utf-8')
  23  
  24  end = time.time()
  25  
  26  print 'fetch took %0.3f s' % (end-start)

Tagged universal, feed, parser, atom, rss, python

How to parse an RSS or Atom feed with the ROME Java library

Java posted 2 months ago by christian

This is a simple example of how to use the ROME library to parse feeds:

   1  import com.sun.syndication.io.*;
   2  import com.sun.syndication.feed.synd.*;
   3  import java.net.URL;
   4  import java.util.*;
   5  
   6  public class RomeParserTest {
   7  
   8  	public static void main(String args[]) {
   9  		try {
  10  			SyndFeedInput sfi = new SyndFeedInput();
  11  
  12  			String urls[] = {
  13  				"...", 
  14  				"..." 
  15  			};
  16  			
  17  			for(String url:urls) {
  18  				SyndFeed feed = sfi.build(new XmlReader(new URL(url)));
  19  
  20  				List entries = feed.getEntries();
  21  
  22  				System.out.println(feed.getTitle());			
  23  				System.out.println(entries.size());
  24  			}
  25  		} catch (Exception ex) {
  26  			throw new RuntimeException(ex);
  27  		}
  28  	}
  29  }

Tagged rome, java, atom, rss, feed, parse

Valid RSS 2.0 Feed Template for Rails

HTML (Rails) posted 10 months ago by christian

Note that because of a bug the pubDate and lastBuildDate tags are displayed in lowercase on this site…

   1  <?xml version="1.0"?>
   2  <rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
   3    <channel>
   4      <atom:link href="http://xxxxxxx" rel="self" type="application/rss+xml" />
   5      <title>Code Snippets - Aktagon</title>
   6      <link>http://snippets.aktagon.com/</link>
   7      <description>Share your code with the world. Allow others to review and comment.</description>
   8      <language>en-us</language>
   9      <pubDate><%= @snippets[0].created_at.rfc822 %></pubDate>
  10      <lastBuildDate><%= @snippets[0].created_at.rfc822 %></lastBuildDate>
  11      <docs>http://blogs.law.harvard.edu/tech/rss</docs>
  12      <generator>Aktagon Snippets</generator>
  13   <% for snippet in @snippets %>
  14      <item>
  15        <title><![CDATA[<%= snippet.title %>]]></title>
  16        <link><%= snippet_url(snippet) %></link>
  17        <description><![CDATA[<%= snippet.rendered_body %>]]></description>
  18        <pubDate><%= @snippets[0].created_at.rfc822 %></pubDate>
  19        <guid><%= snippet_url(snippet) %></guid>
  20  	  <% for tag in snippet.tags%>
  21  		<category domain="http://snippets.aktagon.com/snippets"><![CDATA[<%= tag.name %>]]></category>
  22  	  <% end%>
  23      </item>
  24  <% end %>
  25    </channel>
  26  </rss>
  27  

We’ll it’s supposed to be valid, but the syntax highlighting seems to process link tag, so it’s not…

Tagged ruby, rails, rss2.0, feed, rss, template, example