How to fetch URLs in parallell with EventMachine and Ruby

Ruby posted 7 days ago by christian

Save time by doing things in parallell:

   1  require 'rubygems'
   2  require 'eventmachine'
   3  require 'open-uri'
   4  require 'pp'
   5  require 'thread'
   6  require 'benchmark'
   7  
   8  class Worker
   9    include EM::Deferrable
  10  
  11    def run
  12      get_google
  13      set_deferred_status :succeeded
  14    end
  15  end
  16  
  17  def get_google
  18    # sorry for spamming you
  19    open('http://www.google.com/') do |f|
  20      #pp f.meta
  21    end
  22  end
  23  
  24  def asynchronous(i)
  25    worker = Worker.new
  26    # on success
  27    worker.callback do
  28      p "#{Thread.current} done #{i}!"
  29    end 
  30    worker.errback do 
  31      p "Unexpected error"    
  32      EM.stop  
  33    end
  34    #
  35    Thread.new do
  36      worker.run
  37      EM.stop
  38    end 
  39    #puts "scheduling done!"
  40  end
  41  
  42  def synchronous(i)
  43    get_google
  44  end
  45  
  46  # on error
  47  EM.error_handler do |e|  
  48    p "Unexpected error: #{e}" 
  49  end
  50  
  51  EM.run do
  52    seconds = Benchmark.realtime do
  53      50.times do |i|
  54        asynchronous i
  55      end
  56    end
  57    p "With EventMachine: #{seconds} elapsed..."
  58  
  59    seconds = Benchmark.realtime do
  60      50.times do |i|
  61        synchronous i
  62      end
  63    end
  64    p "Without EventMachine: #{seconds} elapsed..."
  65  end

Output:

   1  With EventMachine: 9.05974316596985 elapsed...
   2  Without EventMachine: 19.1381118297577 elapsed...

Conclusion

  • Speeds up blocking operations.
  • EventMachine is currently limited to one CPU core (native thread) per process.

References

Tagged eventmachine, ruby, asynchronous, job

How to use dual-purpose accessors in Ruby to create a DSL

Ruby posted about 1 month ago by christian

Instead of this:

   1  Sitemap('public/sitemap.xml') do
   2    self.stylesheet = 'public/sitemap.xls'  
   3    self.ping = ['http://www.google.com', 'http://www.google.com']  
   4  end

You could write this:

   1  Sitemap('public/sitemap.xml') do
   2    stylesheet 'public/sitemap.xls'  
   3    ping ['http://www.google.com', 'http://www.google.com']  
   4  end

Using dual-purpose accessors:

   1  class Sitemap
   2    def stylesheet(path = nil) 
   3      return @path unless path
   4     @path = path
   5    end
   6    alias_method :stylesheet=, :stylesheet 
   7    ...
   8  end

Tagged ruby, dsl

How to get ActiveRecord and Rails to print SQL to the production log

Ruby posted about 1 month ago by christian

Add to the end of config/environment.rb:

   1  ActiveRecord::Base.logger.level = Logger::DEBUG

Config/environments/production.rb might also work.

Tagged sql, production, activerecord, rails, logging

How to customize Hirb output

Ruby posted 3 months ago by christian

Only print id, created_at and title for FeedItem class:

   1  Hirb.disable
   2    Hirb.enable :output => {
   3      "FeedItem"=>{
   4        :options=>{
   5          :fields=>%w{id created_at title}
   6        }
   7      }
   8    }

Tagged hirb, irb, console, rails

How to scrape a Amazon Listmania list with Hpricot and Ruby

Ruby posted 4 months ago by christian

   1  html =  open('http://www.amazon.com/Nick-Hornby-and-Company/lm/1X1GGDBXARHZ6/ref=cm_lm_toplist_fullview_1')
   2  
   3  page = Hpricot(html)
   4  
   5  xpath = "td[@class='listItem']//input[@name='asin.1']"
   6  
   7  page.search(xpath).each do |book|
   8    puts book['value']
   9  end

Tagged amazon, hpricot, scrape