Register now and start sharing your code snippets.
-->

Using the WWW::Mechanize RubyGem to scrape login protected pages

Ruby posted 8 months ago by christian

This is an example of how to access a login protected site with WWW ::Mechanize. In this example, the login form has two fields named user and password. In other words, the HTML contains the following code:

   1  <input name="user" .../>
   2  <input name="password" .../>

Note that this example also shows how to enable WWW ::Mechanize logging and how to capture the HTML response:

   1  require 'rubygems'
   2  require 'logger'
   3  require 'mechanize'
   4  
   5  agent = WWW::Mechanize.new{|a| a.log = Logger.new(STDERR) }
   6  #agent.set_proxy('a-proxy', '8080')
   7  page = agent.get 'http://bobthebuilder.com'
   8  
   9  form = page.forms.first
  10  form.user = 'bob'
  11  form.password = 'password'
  12  
  13  page = agent.submit form
  14  
  15  output = File.open("output.html", "w") { |file|  file << page.body }

Use the search method to scrape the page content. In this example I extract all text contained by span elements, which in turn are contained by a table element having a class attribute equal to ‘list-of-links’:

   1  puts page.search("//table[@class='list-of-links']//span/text()") # do |row|

The HTML looks like this (td, tr elements omitted for clarity):

   1  ...
   2  <table class="list-of-links">
   3  ...
   4  <span>The content</span>
   5  ...
   6  </table>
   7  ...

Tagged www, mechanize, scraping, scrape, login, ruby

How to add OpenID support to your Rails application with the open_id_authentication plugin

Ruby posted 8 months ago by christian

These instructions have been tested with Rails 2.0.2 and ruby-openid 2.0.4. The snippet is an adaptation of the instructions in Ryan Bates’ screencast on how to integrate OpenID with Rails.

Installing and configuring the restful_authentication plugin

Follow these instructions: How to install and use the restful_authentication Rails plugin.

Installing the ruby-openid gem

   1  gem install ruby-openid

Installing the open_id_authentication Rails plugin

   1  script/plugin source http://svn.techno-weenie.net/projects/plugins/
   2  script/plugin install open_id_authentication

Create the migration files

   1  rake open_id_authentication:db:create

Add the following to the self.up method in 002_add_open_id_authentication_tables.rb:

   1  add_column :users, :identity_url, :string

Configuring the routes

   1  map.open_id_complete 'session', :controller => "sessions", :action => "create", :requirements => { :method => :get }

Protect the identity_url field

Next protect the identity_url field, by adding the following to user.rb, account.rb or your custom user model:

   1  attr_accessible :login, :email, :password, :password_confirmation, :identity_url

Add the following to the self.down method in 002_add_open_id_authentication_tables.rb:

   1  remove_column :users, :identity_url

Integrating Open-id with the login page

Add the following to sessions/new.html.erb:

   1  <label for="openid_url">OpenID URL</label><br />
   2  <%= text_field_tag "openid_url" %>

Make sure you’re showing flash messages, otherwise you won’t see the error messages:

   1  <html>
   2    <head></head>
   3    <body>
   4      <%= [:notice, :error].collect {|type| content_tag('div', flash[type], :id => type) if flash[type] } %>
   5  
   6      <%= yield %>
   7    </body>
   8  </html>

Modifying the sessions controller

Copy & paste the following code in app/controllers/sessions_controller.rb:

   1  class SessionsController < ApplicationController
   2    # Hack to fix: No action responded to show
   3    def show
   4      create
   5    end
   6  
   7    def create
   8      if using_open_id?
   9        open_id_authentication(params[:openid_url])
  10      else
  11        password_authentication(params[:login], params[:password])
  12      end
  13    end
  14  
  15    def destroy
  16      self.current_user.forget_me if logged_in?
  17      cookies.delete :auth_token
  18      reset_session
  19      flash[:notice] = "You have been logged out."
  20      redirect_back_or_default('/')
  21    end
  22  
  23    protected
  24  
  25    def open_id_authentication(openid_url)
  26      authenticate_with_open_id(openid_url, :required => [:nickname, :email]) do |result, identity_url, registration|
  27        if result.successful?
  28          @user = User.find_or_initialize_by_identity_url(identity_url)
  29          if @user.new_record?
  30            @user.login = registration['nickname']
  31            @user.email = registration['email']
  32            @user.save(false)
  33          end
  34          self.current_user = @user
  35          successful_login
  36        else
  37          failed_login result.message
  38        end
  39      end
  40    end
  41  
  42    def password_authentication(login, password)
  43      self.current_user = User.authenticate(login, password)
  44      if logged_in?
  45        successful_login
  46      else
  47        failed_login
  48      end
  49    end
  50  
  51    def failed_login(message = "Authentication failed.")
  52      flash.now[:error] = message
  53      render :action => 'new'
  54    end
  55  
  56    def successful_login
  57      if params[:remember_me] == "1"
  58        self.current_user.remember_me
  59        cookies[:auth_token] = { :value => self.current_user.remember_token , :expires => self.current_user.remember_token_expires_at }
  60      end
  61      redirect_back_or_default('/')
  62      flash[:notice] = "Logged in successfully"
  63    end
  64  end

OpenID authentication from behind a proxy

First, set the HTTP _PROXY environment variable to the proxy URL :

   1  export HTTP_PROXY=http://proxy.aktagon.com:8080/

Then add the following to environment.rb:

   1  OpenID::fetcher_use_env_http_proxy

Tagged openid, authentication, rails, ruby, plugin, restful_authentication

How to install and use the restful_authentication Rails plugin

Ruby posted 8 months ago by christian

This is an adaptation of the restful_authentication screencast by Ryan Bates, which has an issue with Rails 2.0.3 that throws the following error:

   1  NameError (uninitialized constant SessionsController):
   2      /usr/local/lib/ruby/gems/1.8/gems/activesupport-2.0.2/lib/active_support/dependencies.rb:266:in `load_missing_constant'
   3      /usr/local/lib/ruby/gems/1.8/gems/activesupport-2.0.2/lib/active_support/dependencies.rb:453:in `const_missing'
   4      /usr/local/lib/ruby/gems/1.8/gems/activesupport-2.0.2/lib/active_support/dependencies.rb:465:in `const_missing'
   5      /usr/local/lib/ruby/gems/1.8/gems/activesupport-2.0.2/lib/active_support/inflector.rb:257:in `constantize'

Installing the restful_authentication plugin

   1  script/plugin source http://svn.techno-weenie.net/projects/plugins/
   2  script/plugin install restful_authentication

Generating the model and controller

   1  script/generate authenticated user sessions

Now run the migration:

   1  rake db:migrate

Configure routing

Open config/routes.rb and add the following routes:

   1  map.resources :users
   2  map.resource  :session
   3  
   4  map.signup '/signup', :controller => 'users', :action => 'new'
   5  map.login  '/login', :controller => 'sessions', :action => 'new'
   6  map.logout '/logout', :controller => 'sessions', :action => 'destroy'

Include restful_authentication in ApplicationController

First remove these lines from the users and sessions controllers:

   1  # Be sure to include AuthenticationSystem in Application Controller instead
   2    include AuthenticatedSystem

Now include restful_authentication in the application controller:

   1  class ApplicationController < ActionController::Base
   2    include AuthenticatedSystem

Integrate restful_authentication with your views

First let’s create a controller and view by executing the generate script:

   1  script/generate controller home index

Modify index.html.erb as follows:

   1  <h1>Welcome</h1>
   2  
   3  <% if logged_in? %>
   4    <p><strong>You are logged in as <%=h current_user.login %></strong></p>
   5    <p><%= link_to 'Logout', logout_path %></p>
   6  <% else %>
   7    <p><strong>You are currently not logged in.</strong></p>
   8    <p>
   9      <%= link_to 'Login', login_path %> or
  10      <%= link_to 'Sign Up', signup_path %>
  11    </p>
  12  <% end %>

Start Rails and access your application. If needed, add the following to config/routes.rb to make the home controller the default:

   1  map.root :controller => "home"

Login, sign up and logout should work.

Tagged rails, ruby, authentication, restful_authentication

How to use god to monitor a pack of mongrels

Ruby posted 8 months ago by christian

God is a monitoring framework written in Ruby that can be used for monitoring, for example, mongrel processes.

Installing god

Install god with the following command:

   1  sudo gem install god

Configuring god

To configure god, first create a master configuration script by saving the following in /etc/god/god.rb:

   1  # load in all god configs
   2  God.load "/etc/god/conf/*.rb"

Now, save this configuration in /etc/god/conf/site.com.rb:

   1  #
   2  # Test this configuration file by executing:  
   3  #   god -c /path_to_this_file -D
   4  # 
   5  require 'yaml'
   6  
   7  
   8  #
   9  # Change these to match your project setup
  10  #
  11  APPLICATION  = "xxx.com"
  12  ROOT         = "/var/www/#{APPLICATION}" # deployment directory
  13  RAILS_ROOT   = ROOT + '/current'         # current release directory
  14  MONGREL_CONF = ROOT + '/shared/mongrel_cluster.conf' # mongrel_cluster.conf file
  15  
  16  # Read in mongrel_conf
  17  OPTIONS      = YAML.load_file(MONGREL_CONF)   # Read mongrel configuration
  18  
  19  #
  20  # TODO This can be simplified
  21  #
  22  def ports(port, servers)
  23    ports = []
  24    
  25    start_port = port
  26    end_port   = start_port + servers - 1
  27    
  28    for port in start_port..end_port do
  29      ports << port
  30    end
  31    
  32    ports
  33  end
  34  
  35  PORTS        = ports(OPTIONS['port'].to_i, OPTIONS['servers'].to_i)
  36  
  37  #
  38  # Returns path of mongrel pid or log file:
  39  #
  40  #   mongrel_path "/tmp/mongrel.pid", 9000 => "/tmp/mongrel.9000.pid"
  41  #
  42  def mongrel_path(file_path, port)
  43      file_ext = File.extname(file_path)
  44      file_base = File.basename(file_path, file_ext)
  45      file_dir = File.dirname(file_path)
  46      file = [file_base, port].join(".") +  file_ext
  47      
  48      File.join(file_dir, file)
  49  end
  50  
  51  #
  52  # Returns the mongrel_rails start, stop or restart command depending on command parameter
  53  #
  54  def mongrel_rails(command, port)
  55    raise "Unsupported command '#{command}'" if !['start', 'stop', 'restart'].include?(command)
  56  
  57    argv = [ "mongrel_rails" ]
  58    argv << command
  59    argv << "-d" if command != 'stop'
  60    argv << "-e #{OPTIONS['environment']}" if OPTIONS['environment'] && command != 'stop'
  61    argv << "-a #{OPTIONS['address']}"  if OPTIONS['address'] && command != 'stop'
  62    argv << "-c #{OPTIONS['cwd']}" if OPTIONS['cwd']
  63    argv << "-f #{OPTIONS['force']}" if OPTIONS['force'] && command == 'stop'
  64    argv << "-o #{OPTIONS['timeout']}" if OPTIONS['timeout'] && command != 'stop'
  65    argv << "-t #{OPTIONS['throttle']}" if OPTIONS['throttle'] && command != 'stop'
  66    argv << "-m #{OPTIONS['mime_map']}" if OPTIONS['mime_map'] && command != 'stop'
  67    argv << "-r #{OPTIONS['docroot']}" if OPTIONS['docroot'] && command != 'stop'
  68    argv << "-n #{OPTIONS['num_procs']}" if OPTIONS['num_procs'] && command != 'stop'
  69    argv << "-B" if OPTIONS['debug'] && command != 'stop'
  70    argv << "-S #{OPTIONS['config_script']}" if OPTIONS['config_script'] && command != 'stop'
  71    argv << "--user #{OPTIONS['user']}" if OPTIONS['user'] && command != 'stop'
  72    argv << "--group #{OPTIONS['group']}" if OPTIONS['group'] && command != 'stop'
  73    argv << "--prefix #{OPTIONS['prefix']}" if OPTIONS['prefix'] && command != 'stop'
  74    argv << "-p #{port}" if command != 'stop'
  75    argv << '-P ' + mongrel_path(OPTIONS['pid_file'], port)
  76    argv << '-l ' + mongrel_path(OPTIONS['log_file'], port) if command != 'stop'
  77  
  78    cmd = argv.join " "
  79  
  80    return cmd
  81  end
  82  
  83  PORTS.each do |port|
  84    God.watch do |w|
  85      w.name          = "#{APPLICATION}-#{port}"
  86      w.group         = "mongrels"
  87      w.interval      = 30.seconds
  88      w.start         = mongrel_rails('start', port)
  89      w.stop          = mongrel_rails('stop', port)
  90      w.restart       = mongrel_rails('restart', port)
  91      w.start_grace   = 10.seconds
  92      w.restart_grace = 10.seconds
  93      w.pid_file      = File.join(RAILS_ROOT, "/tmp/pids/mongrel.#{port}.pid")
  94          
  95      w.behavior(:clean_pid_file)
  96  
  97      w.start_if do |start|
  98        start.condition(:process_running) do |c|
  99          c.interval = 5.seconds
 100          c.running  = false
 101        end
 102      end
 103      
 104      w.restart_if do |restart|
 105        restart.condition(:memory_usage) do |c|
 106          c.above = 150.megabytes
 107          c.times = [3, 5] # 3 out of 5 intervals
 108        end
 109      
 110        restart.condition(:cpu_usage) do |c|
 111          c.above = 50.percent
 112          c.times = 5
 113        end
 114      end
 115      
 116      # lifecycle
 117      w.lifecycle do |on|
 118        on.condition(:flapping) do |c|
 119          c.to_state     = [:start, :restart]
 120          c.times        = 5
 121          c.within       = 5.minute
 122          c.transition   = :unmonitored
 123          c.retry_in     = 10.minutes
 124          c.retry_times  = 5
 125          c.retry_within = 2.hours
 126        end
 127      end
 128    end
 129  end

Add a script for each site you want to monitor.

Starting god

To start god execute:

   1  god -c /etc/god/god.rb

For a list of available commands run god with the help switch:

   1  $ god --help
   2    Usage:
   3      Starting:
   4        god [-c <config file>] [-p <port> | -b] [-P <file>] [-l <file>] [-D]
   5        
   6      Querying:
   7        god <command> <argument> [-p <port>]
   8        god <command> [-p <port>]
   9        god -v
  10        god -V (must be run as root to be accurate on Linux)
  11        
  12      Commands:
  13        start <task or group name>         start task or group
  14        restart <task or group name>       restart task or group
  15        stop <task or group name>          stop task or group
  16        monitor <task or group name>       monitor task or group
  17        unmonitor <task or group name>     unmonitor task or group
  18        remove <task or group name>        remove task or group from god
  19        load <file>                        load a config into a running god
  20        log <task name>                    show realtime log for given task
  21        status                             show status of each task
  22        quit                               stop god
  23        terminate                          stop god and all tasks
  24        check                              run self diagnostic
  25        
  26      Options:
  27      -c, --config-file CONFIG         Configuration file
  28      -p, --port PORT                  Communications port (default 17165)
  29      -b, --auto-bind                  Auto-bind to an unused port number
  30      -P, --pid FILE                   Where to write the PID file
  31      -l, --log FILE                   Where to write the log file
  32      -D, --no-daemonize               Don't daemonize
  33      -v, --version                    Print the version number and exit
  34      -V                               Print extended version and build information
  35          --log-level LEVEL            Log level [debug|info|warn|error|fatal]
  36          --no-syslog                  Disable output to syslog
  37          --attach PID                 Quit god when the attached process dies
  38          --no-events                  Disable the event system
  39          --bleakhouse                 Enable bleakhouse profiling

Surviving reboots

Save the following in /etc/init.d/god:

   1  #!/bin/bash
   2  #
   3  # God
   4  #
   5  
   6  RETVAL=0
   7  
   8  case "$1" in
   9      start)
  10        god -c /etc/god/god.rb -P /var/run/god.pid -l /var/log/god.log
  11        RETVAL=$?
  12        echo "God started"
  13    ;;
  14      stop)
  15        kill `cat /var/run/god.pid`
  16        RETVAL=$?
  17        echo "God stopped"
  18    ;;
  19      restart)
  20        kill `cat /var/run/god.pid`
  21        god -c /etc/god/god.rb -P /var/run/god.pid -l /var/log/god.log
  22        RETVAL=$?
  23        echo "God restarted"
  24    ;;
  25      status)
  26        RETVAL=$?
  27    ;;
  28      *)
  29        echo "Usage: god {start|stop|restart|status}"
  30        exit 1
  31    ;;
  32  esac
  33  
  34  exit $RETVAL

Make the file executable with chmod:

   1  chmod +x /etc/init.d/god

Tell Debian to run the script at startup:

   1  sudo /usr/sbin/update-rc.d -f god defaults

Tagged god, mongrel, monit, monitor, recipe, monitoring

How to install the stemmer4r gem on Mac OS X and Linux

Ruby posted 8 months ago by christian

The stemmer4r gem is fubar. Warning draft snippet…

   1  # gem install stemmer4r
   2  Bulk updating Gem source index for: http://gems.rubyforge.org
   3  Building native extensions.  This could take a while...
   4  ERROR:  While executing gem ... (Gem::Installer::ExtensionBuildError)
   5      ERROR: Failed to build gem native extension.
   6  
   7  ruby extconf.rb install stemmer4r
   8  
   9  Gem files will remain installed in /usr/lib/ruby/gems/1.8/gems/stemmer4r-0.6 for inspection.
  10  Results logged to /usr/lib/ruby/gems/1.8/gems/stemmer4r-0.6/ext/stemmer4r/gem_make.out
  11  
  12  
  13  1. Change path of Ruby executable
  14  
  15  cd /usr/lib/ruby/gems/1.8/gems/stemmer4r-0.6/ext/stemmer4r/
  16  vim extconf.rb
  17  
  18  #!/usr/bin/ruby -w
  19  
  20  to
  21  
  22  #ruby -w
  23  
  24  2. Compile libstemmer_c
  25  
  26  cd /usr/lib/ruby/gems/1.8/gems/stemmer4r-0.6/ext/stemmer4r/libstemmer/
  27  make
  28  
  29  3. Compile stemmer4r
  30  
  31  cd /usr/lib/ruby/gems/1.8/gems/stemmer4r-0.6/ext/stemmer4r/
  32  
  33  Change path:
  34  /usr/local/ruby/lib/ruby/1.8/i686-linux/
  35  To:
  36  /usr/lib/ruby/1.8/x86_64-linux/
  37  
  38  Or wherever you have it installed
  39  
  40  ruby extconf.rb
  41  
  42  
  43  4. Build stemmer4r gem
  44  
  45  
  46  gem build stemmer4r.gemspec
  47  
  48  gem install stemmer4r-0.6.gem
  49  
  50  
  51  Problems
  52  
  53  gcc -shared -rdynamic -Wl,-export-dynamic   -L"/usr/lib" -o stemmer4r.so stemmer4r.o libstemmer_c/libstemmer.o  -lruby1.8  -lpthread -ldl -lcrypt -lm   -lc
  54  /usr/bin/ld: libstemmer_c/libstemmer.o(libstemmer.o): relocation R_X86_64_32 against `a local symbol' can not be used when making a shared object; recompile with -fPIC
  55  libstemmer_c/libstemmer.o: could not read symbols: Bad value
  56  collect2: ld returned 1 exit status
  57  make: *** [stemmer4r.so] Error 1
  58  
  59  
  60  Add CFLAGS:
  61  
  62  root@aktagon:/usr/lib/ruby/gems/1.8/gems/stemmer4r-0.6/ext/stemmer4r/libstemmer_c# make
  63  include mkinc.mak
  64  CFLAGS   =  -fPIC
  65  libstemmer.o: $(snowball_sources:.c=.o)
  66          $(AR) -cru $@ $^
  67  

Tagged stemming, stemmer4r, install, osx, linux, gem