postgres snippets

How to Use the JSON and HStore Postgres Data Types With Rails 4

Tagged nosql, postgres, hstore, json, rails  Languages ruby

First enable the hstore Postgres extension. In this example we define both a JSON and an HStore column:

class Schema < ActiveRecord::Migration
  def change
    enable_extension "hstore"
    create_table :links do |t|
      t.hstore :data
      t.column :settings, :json
    end
  end
end

Next, we specify accessors for the data that we will be stored in the JSON and HStore columns:

class Link < ActiveRecord::Base
  # Use hstore for text-only data:
  store :data, :name, :url, :description
  # Use JSON to support string, number, object, array, true, false, null
  store :settings, :update_interval, :created_at, :updated_at

We can now use the defined ActiveRecord attributes to store and access JSON and HStore data:

Link.create! name: 'Google', url: 'http://google.com', description: 'Ad company', update_interval: 1.day, created_at: Time.now.utc

Querying the data is where you'll see the biggest differences. Two examples:

# hstore
Link.where("data -> 'name' = ?", 'Google')
# json
Link.where("CAST(settings->>'update_interval' as integer) = ?", 1.day.to_s)

Notes for Postgres 9.3: * HStore can store only text. Nested data is not supported. * JSON supports the following types: string, number, object, array, true, false, null. For example, date and time types are not supported. Nested data is supported.

Recursive Postgres Query That Finds Descendants of a Node in a Tree

Tagged descendants, tree, parent, cte, postgres, recursive, child  Languages sql

The recursive query:

WITH RECURSIVE tree AS (
  SELECT id, name, parent_id FROM nodes WHERE id = 25
  UNION ALL
  SELECT nodes.id, nodes.name, nodes.parent_id FROM nodes, tree WHERE nodes.parent_id = tree.id)

This is where we define a virtual table called tree:

WITH RECURSIVE tree

This SQL finds the root node and initializes the recursive function:

SELECT id, name, parent_id FROM nodes WHERE id = 25

The next part recursively fetches the descendants of the specified node:

SELECT nodes.id, nodes.name, nodes.parent_id FROM nodes, tree WHERE nodes.parent_id = tree.id)

This SQL query shows the descendants, and the parent which has an ID of 25:

SELECT * FROM tree;

In Ruby you could use this code to generate your SQL:

def tree_sql(opts={})
    table = opts.fetch(:table) # e.g. 'categories'
    cols = opts.fetch(:cols) # e.g. %w(id name)
    <<-SQL
      WITH RECURSIVE #{table}_tree AS (
        -- initialize
        SELECT
          #{cols.join(', ')}, 0 as n_depth
        FROM
          #{table}
        WHERE
          -- Use bind variables and ? here if you want
          id = #{parent_id}
        UNION ALL
          -- iterate
          SELECT
            #{cols.map { |col| "#{table}.#{col}" }.join(', ')}, n_depth +1
          FROM
            #{table}, #{table}_tree
          WHERE
            #{table}.parent_id = #{table}_tree.id)
    SQL
  end

In Rails you probably want the method to return an ActiveRecord::Relation object so you can chain e.g. pagination, order, and limit method calls. One way of doing that is shown here:

class User
  def descendants
      ids_query = User.select('users.id') # return only ids
         .joins(:memberships)
        .group('users.id') # avoid duplicates
        .where("memberships.organization_id in (#{tree_sql})", organization).to_sql
      User.where("id in (#{ids_query})")
  end
end

Details: * Use UNION instead of UNION ALL if you have duplicates, or if you're not worried about performance when Postgres has to scan for duplicates. * WITH Queries in Postgres * Recursive queries can loop indefinitely if e.g. self.parent = self. If this happens, use the ANY() function to check if path has been visited once already. Alternatively, use validations to make sure circular references don't happen, and hope for the best. * Use Arel's recursive feature to generate recursive CTE queries with Arel.

Recursive queries with connectby in Postgres

Tagged postgres, recursive, connectby  Languages sql

Create an organization hierarchy and display the organization from the top:

CREATE EXTENSION "tablefunc";
CREATE TABLE organizations(id text, parent_id text, pos int);

INSERT INTO organizations VALUES('row1',NULL, 0);
INSERT INTO organizations VALUES('row2','row1', 0);
INSERT INTO organizations VALUES('row3','row1', 0);
INSERT INTO organizations VALUES('row4','row2', 1);
INSERT INTO organizations VALUES('row5','row2', 0);
INSERT INTO organizations VALUES('row6','row4', 0);
INSERT INTO organizations VALUES('row7','row3', 0);
INSERT INTO organizations VALUES('row8','row6', 0);
INSERT INTO organizations VALUES('row9','row5', 0);

-- Fetch self and descendants for row1
SELECT * FROM connectby('organizations', 'id', 'parent_id', 'row1', 0, '~') AS t(id text, parent_id text, level int, branch text);

Output:

id  | parent_id | level |          branch
------+-----------+-------+--------------------------
 row1 |           |     0 | row1
 row2 | row1      |     1 | row1~row2
 row4 | row2      |     2 | row1~row2~row4
 row6 | row4      |     3 | row1~row2~row4~row6
 row8 | row6      |     4 | row1~row2~row4~row6~row8
 row5 | row2      |     2 | row1~row2~row5
 row9 | row5      |     3 | row1~row2~row5~row9
 row3 | row1      |     1 | row1~row3
 row7 | row3      |     2 | row1~row3~row7

Usually it's better to use recursive CTE queries:

WITH RECURSIVE organization_tree AS (
  SELECT 
    id, parent_id 
  FROM
    organizations
  WHERE
    id = 'row1'
UNION ALL
  SELECT 
    organizations.id, organizations.parent_id 
  FROM 
    organizations, organization_tree
  WHERE
    organizations.parent_id = organization_tree.id
) select * from organization_tree;

Selecting the "top n" or "n latest" rows by group in Postgres

Tagged postgres, top, latest, window, partition  Languages sql

This SQL query selects the top 10 transactions grouped per user:

  • partition the transactions by user_id
  • order the transactions in each partition by the column created_at
  • keep order of the rank, per partition (first row is 1, second is 2)
  • select only first row, i.e. the latest
WITH latest_customer_transactions AS (
  SELECT
    *, rank() OVER (PARTITION BY user_id ORDER BY created_at desc) AS rank
  FROM
    transactions
)
SELECT
  id, customer_id
FROM
  latest_customer_transactions
WHERE
  rank = 1;

Keep your fingers crossed that it works. For alternatives, see Shtack Overflow.

Also note the differences between RANK, ROW_NUMBER, and DENSE_RANK: http://blog.jooq.org/2014/08/12/the-difference-between-row_number-rank-and-dense_rank/

Calculating "percentage of total" for each row with Postgres

Tagged postgres, over, window  Languages sql

Calculating the "percentage of the total" for each row with Postgres can be done with a window function:

SELECT
  *,  (value / SUM(value) OVER ()) AS "% of total"
FROM
  transactions
WHERE
  quarter = '2015-03-31' and company_id = 1;

NOTE: We're using "OVER ()", which means the sum over all rows returned by the where-clause.

Diffs with Postgres' lead and window functions

Tagged lead, diff, window, postgres  Languages sql

Our requirement

Output the previous quarter's value for each company:

ID | NAME              | QUARTER     |  VALUE    | PREVIOUS_VALUE
1  | CORP              | 2015-03-31  |  44317.00 | 38755.00
1  | CORP              | 2014-12-31  |  38755.00 | 33617.00
1  | CORP              | 2014-09-30  |  33617.00 | 32406.00
1  | CORP              | 2014-06-30  |  32406.00 | 29909.00
1  | CORP              | 2014-03-31  |  29909.00 | 0

The solution

To calculate the change we use the window and lead functions:

WITH diffs AS (
  select
    *,
    coalesce(lead(value) over (partition by company_id order by quarter desc), 0) as previous_value
  from
    history
)
select * from diffs;

Note that the window is partitioned by company_id, and ordered by quarter.

Postgres substring search with trigrams

Tagged trigram, postgres, search, substring, unaccent, pgtrgm, pg_trgm  Languages sql

What we have is: * a table "companies" and a column "name" * Postgres and the pgtrgm extension * a company named "Berkshéiße"

Let's enable the trigram extension:

-- $ psql app_schema -U superuser
CREATE EXTENSION pg_trgm;

Now we can search for trigrams, including shèiß

-- $ psql app_schema -U user
SELECT * FROM companies WHERE name ~* 'shèiß';

The query returns nothing, so let's do this instead: * install the unaccent extension * create an "immutable unaccent" function * apply "unaccent" and "lower" to the query * apply "unaccent" and "lower" to the index

-- $ psql app_schema -U superuser
CREATE EXTENSION unaccent;
-- $ psql app_schema -U user
--DROP INDEX companies_name_search_idx;
CREATE OR REPLACE FUNCTION f_unaccent(text)
  RETURNS text AS
$func$
SELECT unaccent('unaccent', $1)
$func$  LANGUAGE sql IMMUTABLE SET search_path = public, pg_temp;
CREATE INDEX companies_name_search_idx ON companies USING gin(f_unaccent(name) gin_trgm_ops);

Finally, the query returns what we're looking for:

-- Plain SQL
SELECT * FROM companies WHERE lower(f_unaccent(name)) LIKE ('%' || lower(f_unaccent('shèiß')) || '%');
-- With pg_trgrm syntax
SELECT * FROM companies WHERE lower(f_unaccent(name) ~* lower(f_unaccent('shéiße'));
-- Look, even this works
SELECT * FROM companies WHERE lower(f_unaccent(name) ~* lower(f_unaccent('shEiSe'));

If Postgres still doesn't use the index we created, it's probably because it's faster to scan the table than using the index.

Notes

* The WHERE-condition must match the index definition:

-- yes
lower(unaccent_text(name) ~* lower(unaccent_text('shéiße'))
-- no
name ~* lower(unaccent_text('shéiße'))
-- no
name ~* unaccent_text('shéiße')

References

* Erwin Brandstetter: Does PostgreSQL support “accent insensitive” collations? * Erwin Brandstetter: PostgreSQL accent + case insensitive search

How to take a backup of a Postgres database

Tagged backup, postgres  Languages bash

This is one way of doing it:

pg_dump -s yamaha_development > structure.sql
pg_dump --column-inserts --data-only yamaha_development  > data.sql

You can then restore it to a different database if needed:

psql -c 'drop database yamaha_test;'
psql -c 'create database yamaha_test;'

psql yamaha_test < structure.sql
psql yamaha_test < data.sql

All without foreign key constraint, etc, errors.

How to create random data with Postgres

Tagged postgres, random, data  Languages sql

This example creates random data from a predefined list of valid values:

insert into transactions(description, price, timestamp) values(
  ('{Bought cat,Bought dog,Bought house}'::text[])[ceil(random()*3)],
  ('{10,20}'::int[])[ceil(random()*2)],
  NOW() - '1 day'::INTERVAL * ROUND(RANDOM() * 100)
)