How to use ElasticSearch with Python

Python posted about 1 month ago by christian

This is a short example on how to use ElasticSearch with Python.

First install pyes (pyes documentation).

Then run this code:

   1  # https://pyes.readthedocs.org/en/latest/references/pyes.es.html
   2  # http://davedash.com/2011/02/25/bulk-load-elasticsearch-using-pyes/
   3  from pyes import *
   4  
   5  index_name = 'xxx'
   6  type_name = 'car'
   7  
   8  conn = ES('127.0.0.1:9200', timeout=3.5)
   9  
  10  docs = [
  11      {"name":"good",  "id":'1'},
  12      {"name":"bad", "id":'2'},
  13      {"name":"ugly", "id":'3'}
  14  ]
  15  
  16  # Bulk index
  17  for doc in docs:
  18      # index(doc, index, doc_type, id=None, parent=None, force_insert=False, op_type=None, bulk=False, version=None, querystring_args=None)
  19      conn.index(doc, index_name, type_name, id=doc['id'], bulk=True)
  20  
  21  print conn.refresh()
  22  
  23  # Search
  24  def search(query):
  25      q = StringQuery(query, default_operator="AND")
  26      result = conn.search(query=q, indices=[index_name])
  27      for r in result:
  28          print r
  29  
  30  
  31  search("good")

You can also use CURL to verify that it works:

   1  # Show index mapping
   2  curl -vvv "http://127.0.0.1:9200/xxx/_mapping?pretty=1"
   3  
   4  # Delete index
   5  curl -XDELETE -vvv "http://127.0.0.1:9200/xxx"
   6  
   7  # Search
   8  curl -vvv "http://127.0.0.1:9200/xxx/_search?pretty=1"

Tagged elasticsearch, python, pyes

Levenshtein distance for MySQL

SQL posted 2 months ago by christian

Levenshtein distance for MySQL:

   1  DELIMITER $$
   2  CREATE FUNCTION levenshtein( s1 VARCHAR(255), s2 VARCHAR(255) ) 
   3    RETURNS INT 
   4    DETERMINISTIC 
   5    BEGIN 
   6      DECLARE s1_len, s2_len, i, j, c, c_temp, cost INT; 
   7      DECLARE s1_char CHAR; 
   8      -- max strlen=255 
   9      DECLARE cv0, cv1 VARBINARY(256); 
  10      SET s1_len = CHAR_LENGTH(s1), s2_len = CHAR_LENGTH(s2), cv1 = 0x00, j = 1, i = 1, c = 0; 
  11      IF s1 = s2 THEN 
  12        RETURN 0; 
  13      ELSEIF s1_len = 0 THEN 
  14        RETURN s2_len; 
  15      ELSEIF s2_len = 0 THEN 
  16        RETURN s1_len; 
  17      ELSE 
  18        WHILE j <= s2_len DO 
  19          SET cv1 = CONCAT(cv1, UNHEX(HEX(j))), j = j + 1; 
  20        END WHILE; 
  21        WHILE i <= s1_len DO 
  22          SET s1_char = SUBSTRING(s1, i, 1), c = i, cv0 = UNHEX(HEX(i)), j = 1; 
  23          WHILE j <= s2_len DO 
  24            SET c = c + 1; 
  25            IF s1_char = SUBSTRING(s2, j, 1) THEN  
  26              SET cost = 0; ELSE SET cost = 1; 
  27            END IF; 
  28            SET c_temp = CONV(HEX(SUBSTRING(cv1, j, 1)), 16, 10) + cost; 
  29            IF c > c_temp THEN SET c = c_temp; END IF; 
  30              SET c_temp = CONV(HEX(SUBSTRING(cv1, j+1, 1)), 16, 10) + 1; 
  31              IF c > c_temp THEN  
  32                SET c = c_temp;  
  33              END IF; 
  34              SET cv0 = CONCAT(cv0, UNHEX(HEX(c))), j = j + 1; 
  35          END WHILE; 
  36          SET cv1 = cv0, i = i + 1; 
  37        END WHILE; 
  38      END IF; 
  39      RETURN c; 
  40    END$$
  41  
  42  
  43  CREATE FUNCTION levenshtein_ratio( s1 VARCHAR(255), s2 VARCHAR(255) ) 
  44    RETURNS INT 
  45    DETERMINISTIC 
  46    BEGIN 
  47      DECLARE s1_len, s2_len, max_len INT; 
  48      SET s1_len = LENGTH(s1), s2_len = LENGTH(s2); 
  49      IF s1_len > s2_len THEN  
  50        SET max_len = s1_len;  
  51      ELSE  
  52        SET max_len = s2_len;  
  53      END IF; 
  54      RETURN ROUND((1 - LEVENSHTEIN(s1, s2) / max_len) * 100); 
  55    END$$
  56  
  57  DELIMITER ;

Also see this.

Now you can run these queries:

   1  select levenshtein('butt', 'but') from test;
   2  select levenshtein_ratio('butt', 'but') from test;

Tagged levenshtein, mysql

Slow IO performance with Vagrant and VirtualBox?

Plain Text posted 2 months ago by christian

To fix slow IO performance with Vagrant and VirtualBox, start by reading the documentation:
http://docs-v1.vagrantup.com/v1/docs/host_only_networking.html

It’s a long known issue that VirtualBox shared folder performance degrades quickly as the number of files in the shared folder increases. As a project reaches 1000+ files, doing simple things like running unit tests or even just running an app server can be many orders of magnitude slower than on a native filesystem (e.g. from 5 seconds to over 5 minutes).

If you’re seeing this sort of performance drop-off in your shared folders, NFS shared folders can offer a solution. Vagrant will orchestrate the configuration of the NFS server on the host and will mount of the folder on the guest for you.

Example NFS configuration:

   1  Vagrant::Config.run do |config|
   2    config.vm.share_folder("v-root", "/vagrant", ".", :nfs => true)
   3  end

After this run:

   1  vagrant reload

On OSX you can check that the folder is mounted properly with:

   1  showmount -e

Or check /etc/exports

On Linux use:

   1  sudo mount

Or check /etc/fstab

Tagged virtualbox, vagrant

Fix disconnecting 3g connection using ssh.

Shell Script (Bash) posted 3 months ago by marko
In crowded areas my 3g connection tends to drop. I fixed it by connecting to an outside server using the parameter below. Now ssh is keeping my 3g connection alive by polling the server with one second intervals. The while loop could also be done using "autossh":http://www.harding.motd.ca/autossh/ while true; do ssh -o ServerAliveInterval=1 -l marko some.outside.server.tld; done
Tagged 3g, dropped connections, ssh tricks

How to use PhantomJS to take screenshots

JavaScript posted 4 months ago by christian

   1  page = new WebPage()
   2  if phantom.args.length < 2 or phantom.args.length > 3
   3    console.log "Usage: phantomjs screenshot.coffee URL filename"
   4    phantom.exit()
   5  else
   6    address = phantom.args[0]
   7    output = phantom.args[1]
   8    ua = "User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.101 Safari/537.11"
   9    page.settings.userAgent = ua
  10    page.customHeaders = "Accept-Language": "sv-SE,sv;q=0.8,en-US;q=0.6,en;q=0.4"
  11    page.viewportSize =
  12      width: 1024
  13      height: 760
  14  
  15    page.open address, (status) ->
  16      if status isnt "success"
  17        console.log "Unable to load the address!"
  18        phantom.exit()
  19      else
  20        window.setTimeout (->
  21          page.clipRect =
  22            top: 0
  23            left: 0
  24            width: 1024
  25            height: 760
  26  
  27          page.render output
  28          console.log "Exiting"
  29          phantom.exit()
  30        ), 200

Usage:

   1  screenshot.coffee http://google.com google.png

Tagged phantomjs, screenshot