elasticsearch snippets

NoSQL, Pagination and Search with Mongoid, Kaminari and Tire

Tagged mongoid, kaminari, tire, elasticsearch, pagination  Languages ruby

This example uses Mongoid, Kaminari and Tire (ElasticSearch):

require 'kaminari/models/mongoid_extension'

class Product
  include Mongoid::Document
  include Mongoid::Paranoia
  include Mongoid::Timestamps

  index :application_id, :unique => true

  # NOTE Best to include after Mongoid
  include Tire::Model::Search
  include Tire::Model::Callbacks

  include Kaminari::MongoidExtension::Criteria
  include Kaminari::MongoidExtension::Document
end

Now you can paginate and search all you want:

# Search and paginate
Product.tire.search :page => 1, :per_page => 100, :load => true do
    query             { string "Ho ho" }
    #sort              { by     :rating, 'desc' }
end
# Paginate
Product.page(1).per(100)

Gotchas

* Use Model.tire.search/index, instead of Model.search which conflicts with Mongoid's index/search methods. * Use :load => true to return an array containing the model you're searching for instead of Tire::Result::Items. * Use :per_page and :page with Model.tire.search, not the page/per methods. * Mongoid queries are not the same as ActiveRecord queries, see http://mongoid.org/docs/querying/criteria.html#where * MongoDB URL http://localhost:28017/ * ElasticSearch URL http://localhost:9200/products/_mapping * Boolean queries are difficult https://gist.github.com/1263816

How to use ElasticSearch with Python

Tagged elasticsearch, python, pyes  Languages python

This is a short example on how to use ElasticSearch with Python.

First install pyes (pyes documentation).

Then run this code:

# https://pyes.readthedocs.org/en/latest/references/pyes.es.html
# http://davedash.com/2011/02/25/bulk-load-elasticsearch-using-pyes/
from pyes import *

index_name = 'xxx'
type_name = 'car'

conn = ES('127.0.0.1:9200', timeout=3.5)

docs = [
    {"name":"good",  "id":'1'},
    {"name":"bad", "id":'2'},
    {"name":"ugly", "id":'3'}
]

# Bulk index
for doc in docs:
    # index(doc, index, doc_type, id=None, parent=None, force_insert=False, op_type=None, bulk=False, version=None, querystring_args=None)
    conn.index(doc, index_name, type_name, id=doc['id'], bulk=True)

print conn.refresh()

# Search
def search(query):
    q = StringQuery(query, default_operator="AND")
    result = conn.search(query=q, indices=[index_name])
    for r in result:
        print r


search("good")

You can also use CURL to verify that it works:

# Show index mapping
curl -vvv "http://127.0.0.1:9200/xxx/_mapping?pretty=1"

# Delete index
curl -XDELETE -vvv "http://127.0.0.1:9200/xxx"

# Search
curl -vvv "http://127.0.0.1:9200/xxx/_search?pretty=1"

ElasticSearch Wildcard and NGram Search With Tire

Tagged ngram, wildcard, elasticsearch, tire  Languages ruby

How to implement wildcard search with Tire and Elasticsearch:

settings analysis: {
    filter: {
      ngram_filter: {
        type: "nGram",
        min_gram: 1,
        max_gram: 15
      }
    },
    analyzer: {
      index_ngram_analyzer: {
        tokenizer: "standard",
        filter: ['standard', 'lowercase', "stop", "ngram_filter"],
        type: "custom"
      },
      search_ngram_analyzer: {
        tokenizer: "standard",
        filter: ['standard', 'lowercase', "stop"],
        type: "custom"
      }
    }
  }

  mapping do
    indexes :name,
      search_analyzer: 'search_ngram_analyzer',
      index_analyzer: 'index_ngram_analyzer', 
      #analyzer: 'index_ngram_analyzer', 
      boost: 100.0
      # …
  end

With curl, make sure the mapping is set up properly:

curl 'http://localhost:9200/activities/_mapping?pretty=true'
{
  "skulls" : {
    "skull" : {
      "_all" : {
        "auto_boost" : true
      },
      "properties" : {
        "name" : {
          "type" : "string",
          "boost" : 100.0,
          "analyzer" : "index_ngram_analyzer"
        }
      }
    }
  }
}

You now have wildcard search as long as you remember to specify the fields that you want to search, because by default the _all field is used for search:

# This searches the _all field
curl 'http://localhost:9200/activities/_search?q=simpsons&pretty=true'

# Yes, it really works
curl -XGET 'http://localhost:9200/activities/_search?pretty' -d ' 
{ 
   "query" : { 
      "query_string" : { 
         "query" : "simpsons", 
         "fields" : ["name"] 
      } 
   } 
}'

Elasticsearch: How to delete log entries older than X days with Curator

Tagged curator, elasticsearch  Languages bash, cron
# Install curator
pip install curator
# Download curator config file
curl -o curator.yml https://raw.githubusercontent.com/elastic/curator/master/examples/curator.yml

Next, download, read, and edit the action file: https://www.elastic.co/guide/en/elasticsearch/client/curator/current/actionfile.html

# Run curator
curator --config curator.yml action_file.yml

Add this to crontab:

# Run curator at 00:01
01 00 * * * /usr/local/bin/curator --config /etc/elasticsearch/curator/curator.yml /etc/elasticsearch/curator/remove-old-data.yml >> /var/log/elasticsearch-cu
rotor.log

Tested with Elasticsearch 6.0 and curator version 5.4.1.

UFW + Docker = No firewall

Tagged docker, elasticsearch, gotcha, iptables, ufw, wtf  Languages 

TLDR: Docker can and will override existing iptable rules and expose your services to the Internet

This means you have to think twice when installing Docker on a machine that is only protected by an iptables-based firewall such as UFW. You might think you are protected by your firewall, but you very likely are not. This is probably one of the more common reasons why Elasticsearch servers, which are unprotected by default, are exposed to the internet.

For details, see: https://github.com/docker/for-linux/issues/690

Solution 1: External firewall

One solution is to use a firewall provided by the hosting provider (DO, AWS, GCP, etc.).

Solution 2: Disable Docker’s iptables “feature”

Disable iptables in Docker by adding the following switch:

--iptables=false

Solution 2: Listen on private IPs

This is perhaps the easiest to implement and easiest to forget: expose your containers and services on one of the following private IP address ranges:

  • 10.0.0.0 to 10.255.255.255
  • 172.16.0.0 to 172.31.255.255
  • 192.168.0.0 to 192.168.255.255

Note that binding to 127.0.0.1 will not work with Docker Swarm.