How to parse RSS/Atom feeds with Scala and the Rome library
This snippet shows how to parse feeds with Scala and the Rome library:
1 import com.sun.syndication.io._ 2 import com.sun.syndication.feed.synd._ 3 import java.net.URL 4 5 object FeedParser { 6 def main(args: Array[String]): Unit = { 7 try { 8 val sfi = new SyndFeedInput() 9 10 val urls = List("http://hbl.fi/rss.xml") 11 12 urls.foreach(url => { 13 val feed = sfi.build(new XmlReader(new URL(url))) 14 15 val entries = feed.getEntries() 16 17 System.out.println(feed.getTitle()) 18 System.out.println(entries.size()) 19 }) 20 } catch { 21 case e => throw new RuntimeException(e) 22 } 23 24 } 25 }
How to do an HTTP conditional GET with Feedzirra (update an existing feed)
This snippet explains how to do conditional gets with Feedzirra 0.0.17:
1 # First create a dummy parser, any type of parser will do 2 f = Feedzirra::Parser::RSS.new 3 4 # Set the required Feedzirra values with data from your database 5 f.feed_url = feed_from_db.url 6 f.etag = feed_from_db.etag 7 f.last_modified = feed_from_db.last_modified_at 8 9 # Set the last entry. This step is important. 10 # This allows Feedzirra to detect if a feed that doesn't support last modified and etag has been updated. 11 last_entry = Feedzirra::Parser::RSSEntry.new 12 13 # Do we have a last entry in the database? If so let Feedzirra know 14 if feed_from_db.items.last 15 last_entry.url = feed_from_db.items.last.link 16 end 17 18 # Without this Feedzirra will return an empty array or some other surprise 19 f.entries << last_entry 20 21 # Update the feed 22 Feedzirra::Feed.update f
Atom template Rails builder template
1 atom_feed(:url => formatted_posts_url(:atom)) do |feed| 2 feed.title(@category.name) 3 feed.updated(@posts.first ? @posts.first.created_at : Time.now.utc) 4 5 for post in @posts 6 feed.entry(post) do |entry| 7 entry.title(post.title) 8 entry.content(post.body_html, :type => 'html') 9 entry.updated post.updated_at 10 11 for tag in post.tags 12 entry.category :term => url_for(tag), :label => tag.name 13 end 14 end 15 end 16 end
How to parse an RSS or Atom feed with Python and the Universal Feed Parser library
This example uses the Universal Feed Parser, one of the best and fastest parsers for Python.
Feed Parser is a lot faster than feed_tools for Ruby and it’s about as fast as the ROME Java library according to my simple benchmark.
Feed Parser uses less memory and about as much of the CPU as ROME , but this wasn’t tested with a long running process, so don’t take my word for it.
1 import time 2 import feedparser 3 4 start = time.time() 5 6 feeds = [ 7 'http://..', 8 'http://' 9 ] 10 11 for url in feeds: 12 options = { 13 'agent' : '..', 14 'etag' : '..', 15 'modified': feedparser._parse_date('Sat, 29 Oct 1994 19:43:31 GMT'), 16 'referrer' : '..' 17 } 18 19 feed = feedparser.parse(url, **options) 20 21 print len(feed.entries) 22 print feed.feed.title.encode('utf-8') 23 24 end = time.time() 25 26 print 'fetch took %0.3f s' % (end-start)
How to parse an RSS or Atom feed with the ROME Java library
This is a simple example of how to use the ROME library to parse feeds:
1 import com.sun.syndication.io.*; 2 import com.sun.syndication.feed.synd.*; 3 import java.net.URL; 4 import java.util.*; 5 6 public class RomeParserTest { 7 8 public static void main(String args[]) { 9 try { 10 SyndFeedInput sfi = new SyndFeedInput(); 11 12 String urls[] = { 13 "...", 14 "..." 15 }; 16 17 for(String url:urls) { 18 SyndFeed feed = sfi.build(new XmlReader(new URL(url))); 19 20 List entries = feed.getEntries(); 21 22 System.out.println(feed.getTitle()); 23 System.out.println(entries.size()); 24 } 25 } catch (Exception ex) { 26 throw new RuntimeException(ex); 27 } 28 } 29 }