Scraping Yahoo! Finance with Ruby and Hpricot
This code extracts the numbers from the Fund operations table on the BLV fund’s Profile page at Yahoo! Finance.
1 require 'rubygems' 2 require 'hpricot' 3 require 'open-uri' 4 5 page = Hpricot(open('http://finance.yahoo.com/q/pr?s=BLV')) 6 7 fund_operations = [] 8 page.search( "//table[@class='yfnc_datamodoutline1']" ).each do |row| 9 row.search( "//td[@class='yfnc_datamoddata1']").each do |data| 10 fund_operations << data.inner_html 11 end 12 end 13 14 pp fund_operations
The output from this script is:
1 ["N/A", "N/A", "55%", "72", "85.05M", "1.71B"]
Note that you could also use Scrubyt for this. Here’s a snippet that explains how to use Scrubyt to scrape web pages: Scraping Google search results with Scrubyt and Ruby
Generate a 56-bit DES encrypted (htpasswd) password with Ruby
Run the following in an irb console to generate a 56-bit DES encrypted password:
1 "password".crypt("salt")
The password can be used in an Apache or Nginx htpasswd file to enable basic authentication.
The generated password can also be used in other Unix password files.
A simple image replacement technique for increased usability and SEO ranking
This is currently my favorite image replacement technique. I don’t remember where I found it… Using it can improve both your site’s usability and your search engine ranking, by allowing both screen readers and search engines to find your h1 headlines. First create the h1 and the description of your page/site, for example:
1 <h1 id="logo">Viagra, Botox, you name it</h1>
Then create the CSS rule for the page title:
1 h1#logo { 2 text-indent: -9000px; 3 background: url(logo.gif); 4 width: 200px; /* Width of image */ 5 height: 50px; /* Height of image */ 6 }
People using a modern browser that support CSS will see your logo (the image), and search engines and people using less modern browsers will see the content of the h1 header tag.
Note that if you replace the text of a link then use the outline CSS property to remove the dotted border:
1 .text-replacement { 2 text-indent: -9000px; 3 } 4 5 .text-replacement a { 6 outline: none; 7 }
Implementing hanging bullets with CSS
According to Mark Boulton’s article Five simple steps to better typography – part 2, the text in bulleted lists should be left-aligned with the surrounding text; this is rarely the case on the web, but is easily achievable by using the following CSS style:
1 ul { 2 list-style-position: outside; 3 margin-left: 0px; 4 }
Reset CSS rules to render HTML identically in all browsers
These CSS rules remove most, if not all, browser specific styles from common HTML elements. Your page will look almost identical in all browser when using these CSS rules. Note that this is a combination of Tantek Celik’s undohtml.css and YUI ’s reset.css.
1 /** START BLATANT RIP FROM Tantek Celik's undohtml.css */ 2 3 /* link underlines tend to make hypertext less readable, 4 because underlines obscure the shapes of the lower halves of words */ 5 :link,:visited { text-decoration:none } 6 7 /** END BLATANT RIP FROM Tantek Celik's undohtml.css */ 8 9 /** START BLATANT RIP FROM YUI's reset.css */ 10 11 body,div,dl,dt,dd,ul,ol,li,h1,h2,h3,h4,h5,h6,pre,form,fieldset,input,textarea,p,blockquote,th,td { 12 margin:0; 13 padding:0; 14 } 15 table { 16 border-collapse:collapse; 17 border-spacing:0; 18 } 19 fieldset,img { 20 border:0; 21 } 22 address,caption,cite,code,dfn,em,strong,th,var { 23 font-style:normal; 24 font-weight:normal; 25 } 26 ol,ul { 27 list-style:none; 28 } 29 caption,th { 30 text-align:left; 31 } 32 h1,h2,h3,h4,h5,h6 { 33 font-size: 1em; 34 font-weight:normal; 35 } 36 q:before,q:after { 37 content:''; 38 } 39 abbr,acronym { 40 border:0; 41 } 42 43 /** START BLATANT RIP FROM YUI's reset.css */