wildcard snippets

How to configure wildcard and fuzzy search for Sphinx and Thinking Sphinx

Tagged search, sphinx, thinking-sphinx, wildcard, fuzzy  Languages ruby

This how-to explains how to configure wildcard and fuzzy search for Sphinx and the Thinking Sphinx Rails plugin.

Configure wildcard and fuzzy search in your model

First set the enable_star and min_infix_len properties inside the define_index block:

class Post...
  define_index do
   ...

    set_property :enable_star => true
    set_property :min_infix_len => 1 
  end

Optionally you can make the settings global by adding them to config/sphinx.yml:

production:
    enable_star: true
    min_infix_len: 1

Stop, configure, reindex and start Sphinx

For Sphinx to pickup the changes we need to stop, configure, reindex and start Sphinx. Thinking Sphinx has some rake tasks that allow you to do this:

RAILS_ENV=xxx
rake ts:stop
rake ts:conf
rake ts:in
rake ts:start

Verify Sphinx configuration

Now open the Sphinx configuration file in an editor:

$ vim config/production.sphinx.conf

Verify that you can see the correct settings:

...
index post_core
{
...
   min_infix_len = 1
   enable_star = true
}
...

Test

Fire up the console and run some queries:

Post.search('xxx', :star => true)

Create a search controller

Now all that's left is to create the search controller and view:

class SearchController...
  def index
    @query = params[:query]
    options = {
            :page => params[:page], :per_page => params[:per_page], :star => true,
            :field_weights => { :title => 20, :tags => 10, :body => 5 }
    }
    @posts = Post.search(@query, options)
  end

Note that to get relevant search results you need to assign different weights to fields.

And finally, here's the view code:

<% @posts.each do |post| %>
Nude pics go here...
<% end %>

References

Thinking Sphinx advanced documentation Sphinx Documentation: min_infix_len Sphinx Documentation: min_prefix_len Sphinx Documentation: enable_star

ElasticSearch Wildcard and NGram Search With Tire

Tagged ngram, tire, wildcard, elasticsearch  Languages ruby

How to implement wildcard search with Tire and Elasticsearch:

settings analysis: {
    filter: {
      ngram_filter: {
        type: "nGram",
        min_gram: 1,
        max_gram: 15
      }
    },
    analyzer: {
      index_ngram_analyzer: {
        tokenizer: "standard",
        filter: ['standard', 'lowercase', "stop", "ngram_filter"],
        type: "custom"
      },
      search_ngram_analyzer: {
        tokenizer: "standard",
        filter: ['standard', 'lowercase', "stop"],
        type: "custom"
      }
    }
  }

  mapping do
    indexes :name,
      search_analyzer: 'search_ngram_analyzer',
      index_analyzer: 'index_ngram_analyzer', 
      #analyzer: 'index_ngram_analyzer', 
      boost: 100.0
      # …
  end

With curl, make sure the mapping is set up properly:

curl 'http://localhost:9200/activities/_mapping?pretty=true'
{
  "skulls" : {
    "skull" : {
      "_all" : {
        "auto_boost" : true
      },
      "properties" : {
        "name" : {
          "type" : "string",
          "boost" : 100.0,
          "analyzer" : "index_ngram_analyzer"
        }
      }
    }
  }
}

You now have wildcard search as long as you remember to specify the fields that you want to search, because by default the _all field is used for search:

# This searches the _all field
curl 'http://localhost:9200/activities/_search?q=simpsons&pretty=true'

# Yes, it really works
curl -XGET 'http://localhost:9200/activities/_search?pretty' -d ' 
{ 
   "query" : { 
      "query_string" : { 
         "query" : "simpsons", 
         "fields" : ["name"] 
      } 
   } 
}'