search snippets

Find a text pattern in jar files

Tagged jar, find, search, recursive, linux, unzip  Languages bash

Helpful when you need to find a class or package in some jar file recursively below the current directory. Still needs a test to see if the file found was a file or directory. Works case insensitively. Uses the unzip command because of it's performance superiority in comparison to jar.

#!/bin/sh
for f in find . -type f -name '*\.jar'
do
        unzip -l $f | grep -i $1 && echo "was found in $f"
done

How to submit your sitemap to multiple search engines

Tagged seo, sitemap, google, search  Languages 

To submit your sitemap to search engines—at least Google, MSN and Yahoo support this feature—add this line to your robots.txt file:

Sitemap: http://aktagon.com/sitemap.xml

This allows the search engine to find your sitemap when it visits your site, which means you don't have to manually register it with each search engine.

How to install Hyper Estraier and the Ruby bindings on Mac OS X, including a mini example on how to use the P2P capabilities

Tagged hyper estraier, search, ruby, install  Languages bash

This is a slightly modified version of some Japanese fellow's documention on how to install Hyper Estraier on Mac OS X

First we need libiconv:

$ cd /usr/local/src
$ wget http://ftp.gnu.org/pub/gnu/libiconv/libiconv-1.11.tar.gz
$ tar zxvf libiconv-1.11.tar.gz
$ cd libiconv-1.11
$ ./configure
$ make
$ sudo make install

zlib:

$ cd /usr/local/src
$ wget http://www.zlib.net/zlib-1.2.3.tar.gz
$ tar zxvf zlib-1.2.3.tar.gz
$ ./configure
$ make
$ sudo make install

QDBM:

$ cd /usr/local/src
$ wget http://qdbm.sourceforge.net/qdbm-1.8.74.tar.gz
$ tar zxvf qdbm-1.8.74.tar.gz
$ cd qdbm-1.8.74
$ ./configure --enable-zlib
$ make mac
$ make check-mac
$ sudo make install-mac

Hyper Estraier

$ cd /usr/local/src
$ wget http://hyperestraier.sourceforge.net/hyperestraier-1.4.9.tar.gz
$ tar zxvf hyperestraier-1.4.9.tar.gz
$ cd hyperestraier-1.4.9
$ ./configure
$ make mac
$ make check-mac
$ sudo make install-mac

Finally we'll install the pure ruby bindings:

$ cd rubypure
$ ./configure
$ make
$ sudo make install

To verify that Hyper Estraier is installed and working, try one of the examples in the examples folder, or follow these instructions:

First create and start a P2P node:

estmaster init casket
estmaster start casket

Open http://localhost:1978/master\_ui in your browser and create a node called dictionary.

Then run this code which adds a record to the index:

require "estraierpure"
include EstraierPure

node = Node::new
node.set_url("http://localhost:1978/node/dictionary")
node.set_auth("admin", "admin")

doc = Document::new
# @uri : the location of a document which any document should have.
doc.add_attr("@uri", "This is the URL, required?")
# @title : the title used as a headline in the search result.
doc.add_attr("@title", "This is the title, required?")
doc.add_text("Text goes here")

result = node.put_doc(doc)
unless result
  printf("error: %s\n", node.status)
end

Next we'll perform a query which returns the object we just added:

require "estraierpure"
include EstraierPure

# create and configure the node connecton object
node = Node::new
node.set_url("http://localhost:1978/node/dictionary")

# create a search condition object
cond = Condition::new

# set the search phrase to the search condition object
cond.set_phrase("Text goes here")

# get the result of search
nres = node.search(cond, 0);
if nres
  # for each document in the result
  for i in 0...nres.doc_num
    # get a result document object
    rdoc = nres.get_doc(i)
    # display attributes
    value = rdoc.attr("@uri")
    printf("URI: %s\n", value) if value
    value = rdoc.attr("@title")
    printf("Title: %s\n", value) if value
    # display the snippet text */
    printf("%s", rdoc.snippet)
  end
else
  STDERR.printf("error: %d\n", node.status)
end

The query language is documented here.

If you're indexing ActiveRecord objects use acts_as_searchable:

gem install acts_as_searchable

How to install and use the Sphinx search engine and acts_as_sphinx plugin on Debian Etch

Tagged sphinx, search, acts_as_sphinx, debian, etch, rails, install, libstemmer  Languages bash

Inspiration for this snippet was taken from this post on the Sphinx forum, plus this blog post.

Compiling Sphinx

First install the prerequisites:

sudo aptitude install libmysql++-dev libmysqlclient15-dev checkinstall

Next download sphinx, libstemmer and install everything and the fish:

cd /usr/local/src

wget http://sphinxsearch.com/downloads/sphinx-0.9.9.tar.gz
tar zxvf sphinx-0.9.9.tar.gz 

cd sphinx-0.9.9/

# Add stemming support for Swedish, Finnish and other fun languages.
wget http://snowball.tartarus.org/dist/libstemmer_c.tgz
tar zxvf libstemmer_c.tgz

./configure --with-libstemmer
make

make install

Configure Sphinx

Create a sphinx.conf file in your Rails config directory, as described here, or use this template.

Install acts_as_sphinx plugin

./script/plugin install http://svn.datanoise.com/acts_as_sphinx

Add acts_as_sphinx to your model:

class Documents
   acts_as_sphinx
end

Indexing content

rake sphinx:index

(in /var/www/xxx.com/releases/20080429144230)
Sphinx 0.9.8-rc2 (r1234)
Copyright (c) 2001-2008, Andrew Aksyonoff

using config file './sphinx.conf'...
indexing index 'xxx.com'...
collected 5077 docs, 0.6 MB
sorted 0.1 Mhits, 100.0% done
total 5077 docs, 632096 bytes
total 0.160 sec, 3950427.25 bytes/sec, 31729.86 docs/sec

Reindexing content

sphinx:index shouldn't be run while the searchd process is running, so use rake sphinx:rotate instead, which restarts the searchd process after indexing.

Starting the daemon

mkdir -m 664 /var/log/sphinx
rake sphinx:start

(in /var/www/xxx.com/releases/20080429144230)
Sphinx 0.9.8-rc2 (r1234)
Copyright (c) 2001-2008, Andrew Aksyonoff

using config file './sphinx.conf'...
Sphinx searchd server started.

Searching

Documents.find_with_sphinx 'why did I write this'

How to detect traffic from the most common search spiders with Ruby

Tagged spider, web crawler, bot, search, user agent, detect  Languages ruby

This snippet detects traffic from the following bots, which is enough for me:

The code (via):

user_agent = request.user_agent.downcase
@bot = [ 'msnbot', 'yahoo! slurp','googlebot' ].detect { |bot| user_agent.include? bot }

When the Google bot visists your site the @bot string will contain 'googlebot'.

If you need to detect more bots than these, then the user-agents.org site contains a list of various user agents for both bots and browsers.

How to optimize your MephistoBlog powered site's search engine ranking (SEO for MephistoBlog)

Tagged seo, mephistoblog, meta, google, search, keywords  Languages 

At Aktagon we use MephistoBlog as CMS, and I couldn't find any information on how to SEO optimize MephistoBlog on Google, so I'm sharing my notes here.

This tip shows you how to make your pages more search engine friendly.

First, add the title tag, plus the meta description and keywords tags to your layout's Liquid template , as shown here:

<meta name="description" content="{% if article %} {{ article.excerpt }}  {% else %} YOUR DEFAULT SITE DESCRIPTION {% endif %}" />
    <meta name="keywords" content="{% if article %} {% for tag in article.tags %}{{ tag }}, {% endfor %} {% endif %} YOUR DEFAULT KEYWORDS" />
    <title>{% if article %} {{ article.title }} &raquo; {{ site.title }} {% else %} {{ site.title }} &raquo; {{ site.subtitle }} {% endif %}</title>

Remember to update the default description and keywords in the meta tags' body.

Now, whenever you publish an article, simply add an excerpt and some tags to it. The excerpt is used as the meta description and the article's tags as the meta keywords, both make Google a bit happier, but the description is by far the more important.

How to automatically ping search engines when your sitemap has changed

Tagged sitemap, ruby, ping, search, google  Languages ruby

I prefer letting cron update sitemaps in the background, and at the end of the script I ping search engines to let them know it's been updated:

# Recreate sitemap goes here

# Let search engines know about the update
[ "http://www.google.com/webmasters/tools/ping?sitemap=http://xxx/sitemap.xml",
  "http://search.yahooapis.com/SiteExplorerService/V1/ping?sitemap=http://xxx/sitemap.xml",
  "http://submissions.ask.com/ping?sitemap=http://xxx/sitemap.xml",
  "http://webmaster.live.com/ping.aspx?siteMap=http://xxx/sitemap.xml" ].each do |url|
  open(url) do |f|
    if f.status[0] == "200"
      puts "Sitemap successfully submitted to #{url}"      
    else
      puts "Failed to submit sitemap to #{url}"
    end
  end
end

More about sitemaps: http://en.wikipedia.org/wiki/Sitemaps

How to configure wildcard and fuzzy search for Sphinx and Thinking Sphinx

Tagged sphinx, search, thinking-sphinx, wildcard, fuzzy  Languages ruby

This how-to explains how to configure wildcard and fuzzy search for Sphinx and the Thinking Sphinx Rails plugin.

Configure wildcard and fuzzy search in your model

First set the enable_star and min_infix_len properties inside the define_index block:

class Post...
  define_index do
   ...

    set_property :enable_star => true
    set_property :min_infix_len => 1 
  end

Optionally you can make the settings global by adding them to config/sphinx.yml:

production:
    enable_star: true
    min_infix_len: 1

Stop, configure, reindex and start Sphinx

For Sphinx to pickup the changes we need to stop, configure, reindex and start Sphinx. Thinking Sphinx has some rake tasks that allow you to do this:

RAILS_ENV=xxx
rake ts:stop
rake ts:conf
rake ts:in
rake ts:start

Verify Sphinx configuration

Now open the Sphinx configuration file in an editor:

$ vim config/production.sphinx.conf

Verify that you can see the correct settings:

...
index post_core
{
...
   min_infix_len = 1
   enable_star = true
}
...

Test

Fire up the console and run some queries:

Post.search('xxx', :star => true)

Create a search controller

Now all that's left is to create the search controller and view:

class SearchController...
  def index
    @query = params[:query]
    options = {
            :page => params[:page], :per_page => params[:per_page], :star => true,
            :field_weights => { :title => 20, :tags => 10, :body => 5 }
    }
    @posts = Post.search(@query, options)
  end

Note that to get relevant search results you need to assign different weights to fields.

And finally, here's the view code:

<% @posts.each do |post| %>
Nude pics go here...
<% end %>

References

Thinking Sphinx advanced documentation Sphinx Documentation: min_infix_len Sphinx Documentation: min_prefix_len Sphinx Documentation: enable_star