Register now and start sharing your code snippets.
-->

Detecting file/data encoding with Ruby and the chardet RubyGem

Ruby posted 8 months ago by christian

You can use the chardet gem to detect the charset of an arbitrary string.

Install the chardet gem by issuing the following command:

   1  $ sudo gem install chardet

Then in irb:

   1  require 'rubygems'
   2  require 'UniversalDetector'
   3  p UniversalDetector::chardet('Ascii text')
   4  p UniversalDetector::chardet('åäö')

The output from this example is:

   1  {"encoding"=>"ascii", "confidence"=>1.0}
   2  {"encoding"=>"utf-8", "confidence"=>0.87625}

For Python users there exists an identical library…

Tagged detect, charset, encoding, ruby, chardet

Parsing feeds with Ruby and the FeedTools gem

Ruby posted 8 months ago by christian

This is an example of how to use the FeedTools gem to parse a feed. FeedTools supports atom, rss, and so on…

The only negative thing about FeedTools is that the project is abandoned, the author said this in a comment from March 2008: “I’ve effectively abandoned it, so I’m really not going to go taking on huge code reorganization efforts.”

Installing

   1  $ sudo gem install feedtools

Fetching and parsing a feed

Easy…

   1  require 'rubygems'
   2  require 'feed_tools'
   3  feed = FeedTools::Feed.open('http://www.slashdot.org/index.rss')
   4  
   5  puts feed.title
   6  puts feed.link
   7  puts feed.description
   8  
   9  for item in feed.items
  10    puts item.title
  11    puts item.link
  12    puts item.content
  13  end

Feed autodiscovery

FeedTools finds the Slashdot feed for you.

   1  puts FeedTools::Feed.open('http://www.slashdot.org').href

Helpers

FeedTools can also cleanup your dirty XML /HTML:

   1  require 'feed_tools'
   2  require 'feed_tools/helpers/feed_tools_helper'
   3  
   4  FeedTools::HtmlHelper.tidy_html(html)

Database cache

FeedTools can also store the fetched feeds for you:

   1  FeedTools.configurations[:tidy_enabled] = false
   2  FeedTools.configurations[:feed_cache] = "FeedTools::DatabaseFeedCache"

The schema contains all you need:

   1  -- Example MySQL schema
   2    CREATE TABLE `cached_feeds` (
   3      `id`              int(10) unsigned NOT NULL auto_increment,
   4      `href`            varchar(255) default NULL,
   5      `title`           varchar(255) default NULL,
   6      `link`            varchar(255) default NULL,
   7      `feed_data`       longtext default NULL,
   8      `feed_data_type`  varchar(20) default NULL,
   9      `http_headers`    text default NULL,
  10      `last_retrieved`  datetime default NULL,
  11      `time_to_live`    int(10) unsigned NULL,
  12      `serialized`       longtext default NULL,
  13      PRIMARY KEY  (`id`)
  14    )

There’s even a Rails migration file included.

Feed updater

There’s also a feed updater tool that can fetch feeds in the background, but I haven’t had time to look at it yet.

   1  sudo gem install feedupdater

Character set/encoding bug

As always, there are bugs that you need to be aware of, Feedtools is no different. There’s an encoding bug, FeedTools encodes everything to ISO -8859-1, instead UTF -8 which should be the default encoding.

To fix it use the following code:

   1  ic = Iconv.new('ISO-8859-1', 'UTF-8')
   2  feed.description = ic.iconv(feed.description)

You can also try this patch.

   1  cd /usr/local/lib/ruby/gems/1.8/gems/
   2  wget http://n0life.org/~julbouln/feedtools_encoding.patch
   3  patch -p1 feedtools_encoding.patch

The character encoding bug is discussed on this page: http://sporkmonger.com/2005/08/11/tutorial

Time estimation

By default FeedTools will try to estimate when a feed item was published, if it’s not available from the feed. This annoys me and will create weird publish dates, so usually it’s a good idea to disable it with the timestamp_estimation_enabled option:

   1  FeedTools.reset_configurations
   2  FeedTools.configurations[:tidy_enabled] = false
   3  FeedTools.configurations[:feed_cache] = nil
   4  FeedTools.configurations[:default_ttl]   = 15.minutes
   5  FeedTools.configurations[:timestamp_estimation_enabled] = false

Configuration options

To see a list of available configuration options run the following code:

   1  pp FeedTools.configurations

Tagged feedtools, rss, atom, parser, ruby, content encoding, utf-8, iso-8859-1

Compiling Ruby with OpenSSL, Zlib and Readline support on Debian

Ruby posted 8 months ago by christian

DRAFT … From http://blog.fiveruns.com/2008/3/3/compiling-ruby-rubygems-and-rails-on-ubuntu

Install pre-requisites

   1  apt-get -y install build-essential libssl-dev libreadline5-dev zlib1g-dev

Download and install

   1  cd /usr/local/src
   2  
   3  wget http://ftp.ruby-lang.org/pub/ruby/1.8/ruby-1.8.6.tar.gz
   4  
   5  tar zxvf ruby-1.8.6.tar.gz
   6  
   7  cd ruby-1.8.6.tar.gz
   8  
   9  ./configure --prefix=/usr/local --with-openssl-dir=/usr --with-readline-dir=/usr --with-zlib-dir=/usr
  10  
  11  make
  12  make install
  13  
  14  ruby -ropenssl -rzlib -rreadline -e "puts :success" 
  15  

Tagged ruby, readline, ssl, zlib, debian

Populating a table with n checkbox fields

Ruby posted 9 months ago by christian

Let’s say you have 10 checkboxes and you want to display 4 per line as shown here:

   1  x  x  x  x
   2  x  x  x  x
   3  x  x  

This can be achieved with the following code:

   1  <table>
   2  <% 
   3  	from = 0
   4  	to   = checkboxes.size
   5  	cols = 4
   6  
   7  	from.step(to, cols) do |i| 
   8  %>
   9  	<tr>
  10  	<% for checkbox in checkboxes.slice(i..i + (cols -1)) %>
  11     		<td><input type="checkbox" id="<%= checkbox.name %>" name="column" value="<%= checkbox.value %>"/> <label for="<%= column.name %>"><%= column.name%></label></td>
  12  	<% end %>
  13  	</tr>
  14  <% end %>
  15  </table>

If you’re using Rails you can also use the built in method described here.

   1  %w(1 2 3 4 5 6 7).in_groups_of(3) {|g| p g}
   2  ["1", "2", "3"]
   3  ["4", "5", "6"]
   4  ["7", nil, nil]

Tagged table, checkbox, populating

How to perform a file upload (multipart post) with Ruby

Ruby posted 9 months ago by christian

You have at least three options:

  1. The curb gem
  2. The multipart-post Net:HTTP extension
  3. Calling curl from Ruby with, for example, Open3.
       1  Open3.popen3('curl  <and your parameters>') do |input, output, error|
       2      # do something
       3      end
    
Tagged post, multipart, curl, ruby