How to parse XML with Python's built-in ElementTree parser

Python posted 7 months ago by christian

   1  from xml.etree.ElementTree import fromstring, tostring
   2  
   3  namespace = 'https://xxx.com/xxx'
   4  element = fromstring(xml)
   5  
   6  device = element.find('.//{%s}Device' % namespace)
   7  detail = device.find('.//{%s}Details' % namespace)
   8  series = device.findall('.//{%s}Series' % namespace)

Watch out for namespaces…

Tagged elementtree, python, xml, parse

How to parse OPML with Ruby

Ruby posted about 1 year ago by christian

This example demonstrates how to parse OPML with Ruby.

First install the gem.

   1  gem install opml

Then run this code:

   1  require 'pp'
   2  require 'rubygems'
   3  require 'opml'
   4  
   5  opml = Opml.new(File.read('opml.xml'))
   6  pp opml
   7  
   8  opml.outlines[0].attributes['xml_url']
   9  opml.outlines[0].attributes['html_url']
  10  opml.outlines[0].attributes['title']

Tagged opml, xml, parse, ruby

A simple and easy to use PHP XML parser

PHP posted over 2 years ago by christian

The PHP XML parser:

   1  class XML
   2  {
   3  	static function parse($data, $handler, $encoding = "UTF-8")
   4  	{
   5  		$parser = xml_parser_create($encoding);
   6  
   7  		xml_set_object($parser, $handler);
   8  		
   9  		xml_set_element_handler($parser,
  10  			array(&$handler, 'start'),
  11  			array(&$handler, 'end')
  12  		);
  13  			
  14  		xml_set_character_data_handler(
  15  			$parser,
  16  			array($handler, 'content')
  17  		);
  18  			
  19  		$result = xml_parse($parser, $data);
  20  
  21  		if(!$result)
  22  		{
  23  			$error_string = xml_error_string(xml_get_error_code($parser));
  24  			$error_line	  = xml_get_current_line_number($parser);
  25  			$error_column = xml_get_current_column_number($parser);
  26  			
  27  			$message = sprintf("XML error '%s' at line %d column %d", $error_string, $error_line, $error_column);
  28  			
  29  			throw new Exception($message);
  30  		}
  31  
  32  		xml_parser_free($parser);
  33  	}
  34  }

A result handler:

   1  class ResultHandler
   2  {
   3  	var $tag;
   4  
   5  	function start ($parser, $tagName, $attributes = null)
   6  	{
   7  		echo "start";
   8  		$this->tag .= $tagName; # Use .= to work around bug...
   9  	}
  10  
  11  	function end ($parser, $tagName)
  12  	{
  13  		echo "end";
  14  		$this->tag = null;
  15  
  16  	}
  17  
  18  	function content ($parser, $content)
  19  	{
  20  		echo "$this->tag: $content" ;
  21  	}
  22  }

Then in your code:

   1  $xml = "<a>bah</a>";
   2  XML::parse($xml, new ResultHandler());

Note that HTML /XML entities are considered to be tags by PHP ’s XML parser, so your start tag handler will be called three times for this tag, once for “really”, once for “&” and once for ” bad parser”:

   1  <data>really &amp;  bad parser</data>

I guess this is a bug… You can

Tagged php, xml, parser, simple