postgres snippets

How to Use the JSON and HStore Postgres Data Types With Rails 4

Tagged nosql, postgres, hstore, json, rails  Languages ruby

First enable the hstore Postgres extension. In this example we define both a JSON and an HStore column:

class Schema < ActiveRecord::Migration
  def change
    enable_extension "hstore"
    create_table :links do |t|
      t.hstore :data
      t.column :settings, :json
    end
  end
end

Next, we specify accessors for the data that we will be stored in the JSON and HStore columns:

class Link < ActiveRecord::Base
  # Use hstore for text-only data:
  store :data, :name, :url, :description
  # Use JSON to support string, number, object, array, true, false, null
  store :settings, :update_interval, :created_at, :updated_at

We can now use the defined ActiveRecord attributes to store and access JSON and HStore data:

Link.create! name: 'Google', url: 'http://google.com', description: 'Ad company', update_interval: 1.day, created_at: Time.now.utc

Querying the data is where you'll see the biggest differences. Two examples:

# hstore
Link.where("data -> 'name' = ?", 'Google')
# json
Link.where("CAST(settings->>'update_interval' as integer) = ?", 1.day.to_s)

Notes for Postgres 9.3: * HStore can store only text. Nested data is not supported. * JSON supports the following types: string, number, object, array, true, false, null. For example, date and time types are not supported. Nested data is supported.

Recursive Postgres Query That Finds Descendants of a Node in a Tree

Tagged descendants, tree, parent, cte, postgres, recursive, child  Languages sql, ruby

Voilà, the recursive query we are going disect in this snippet:

WITH RECURSIVE tree AS (
  SELECT id, name, parent_id FROM nodes WHERE id = 25
  UNION ALL
  SELECT nodes.id, nodes.name, nodes.parent_id FROM nodes, tree WHERE nodes.parent_id = tree.id)

Here we define a virtual table called tree:

WITH RECURSIVE tree

This SQL finds the root node and initializes the recursive function:

SELECT id, name, parent_id FROM nodes WHERE id = 25

The next part recursively fetches the descendants of the root node:

SELECT nodes.id, nodes.name, nodes.parent_id FROM nodes, tree WHERE nodes.parent_id = tree.id)

This SQL query fetches the root node and all its descendants from the tree:

SELECT * FROM tree;

ActiveRecord / Rails

This example shows how to use CTEs in ActiveRecord to find the parent and suborganizations in a conglomerate:

class Organization
  belongs_to :parent, class_name: 'Organization', required: false
  has_many :children, class_name: 'Organization', foreign_key: :parent_id
  scope :roots, -> { where(parent_id: nil) }
  
  def ancestors(id, columns=Organization.column_names)
    cols = columns.join(', ')
    child_id = ActiveRecord::Base.connection.quote(id)
    Organization.from <<~SQL
      (WITH RECURSIVE org_tree(#{cols}, path) AS (
        SELECT
          #{cols}, ARRAY[id]
        FROM
          organizations
        WHERE
          id = #{child_id}
      UNION ALL
        SELECT
          #{columns.map { |col| "organizations.#{col}" }.join(', ') }, path || organizations.id
        FROM
          org_tree
        JOIN
          organizations ON organizations.id = org_tree.parent_id
        WHERE NOT
          organizations.id = ANY(path) -- avoid infinite loops where e.g. parent = self
      ) SELECT #{cols} FROM org_tree WHERE id != #{child_id}) as organizations
    SQL
  end

  def descendants(id, columns=Organization.column_names)
    cols = columns.join(', ')
    parent_id = ActiveRecord::Base.connection.quote(id)
    Organization.from <<~SQL
      (WITH RECURSIVE org_tree(#{cols}, path) AS (
        SELECT
          #{cols}, ARRAY[id]
        FROM
          organizations
        WHERE
          id = #{parent_id}
      UNION ALL
        SELECT
          #{columns.map { |col| "organizations.#{col}" }.join(', ') }, path || organizations.id
        FROM
          org_tree
        JOIN
          organizations ON organizations.parent_id = org_tree.id
        WHERE NOT
          organizations.id = ANY(path) -- avoid infinite loops where e.g. parent = self
      ) SELECT #{cols} FROM org_tree WHERE id != #{parent_id}) as organizations
    SQL
  end
end
# Find parent companies
Organization.find(1).ancestors
# Find suborganizations
Organization.find(1).descendants

Details

Recursive queries with connectby in Postgres

Tagged postgres, recursive, connectby  Languages sql

Create an organization hierarchy and display the organization from the top:

CREATE EXTENSION "tablefunc";
CREATE TABLE organizations(id text, parent_id text, pos int);

INSERT INTO organizations VALUES('row1',NULL, 0);
INSERT INTO organizations VALUES('row2','row1', 0);
INSERT INTO organizations VALUES('row3','row1', 0);
INSERT INTO organizations VALUES('row4','row2', 1);
INSERT INTO organizations VALUES('row5','row2', 0);
INSERT INTO organizations VALUES('row6','row4', 0);
INSERT INTO organizations VALUES('row7','row3', 0);
INSERT INTO organizations VALUES('row8','row6', 0);
INSERT INTO organizations VALUES('row9','row5', 0);

-- Fetch self and descendants for row1
SELECT * FROM connectby('organizations', 'id', 'parent_id', 'row1', 0, '~') AS t(id text, parent_id text, level int, branch text);

Output:

id  | parent_id | level |          branch
------+-----------+-------+--------------------------
 row1 |           |     0 | row1
 row2 | row1      |     1 | row1~row2
 row4 | row2      |     2 | row1~row2~row4
 row6 | row4      |     3 | row1~row2~row4~row6
 row8 | row6      |     4 | row1~row2~row4~row6~row8
 row5 | row2      |     2 | row1~row2~row5
 row9 | row5      |     3 | row1~row2~row5~row9
 row3 | row1      |     1 | row1~row3
 row7 | row3      |     2 | row1~row3~row7

Usually it's better to use recursive CTE queries:

WITH RECURSIVE organization_tree AS (
  SELECT 
    id, parent_id 
  FROM
    organizations
  WHERE
    id = 'row1'
UNION ALL
  SELECT 
    organizations.id, organizations.parent_id 
  FROM 
    organizations, organization_tree
  WHERE
    organizations.parent_id = organization_tree.id
) select * from organization_tree;

Selecting the "top n" or "n latest" rows by group in Postgres

Tagged postgres, top, latest, window, partition  Languages sql

This SQL query selects the top 10 transactions for each user:

  • partition the transactions by user_id
  • order the transactions in each partition by the column created_at
  • select only first row, i.e. the latest

Note that if you simply want the first record per group it’s probably best to use DISTINCT ON as described here: https://www.periscopedata.com/blog/first-row-per-group-5x-faster

WITH latest_customer_transactions AS (
  SELECT
    *, row_number() OVER (PARTITION BY user_id ORDER BY created_at desc) AS rownum
  FROM
    transactions
)
SELECT
  id, customer_id
FROM
  latest_customer_transactions
WHERE
  rownum <= 10;

Note the differences between RANK, ROW_NUMBER, and DENSE_RANK: http://blog.jooq.org/2014/08/12/the-difference-between-row_number-rank-and-dense_rank/

Calculating "percentage of total" for each row with Postgres

Tagged postgres, over, window  Languages sql

Calculating the “percentage of the total” for each row with Postgres can be done with a window function:

SELECT
  *,  (value / SUM(value) OVER ()) AS "% of total"
FROM
  transactions
WHERE
  quarter = '2015-03-31' and company_id = 1;

We’re using “OVER ()”, which means the sum over all rows returned by the where clause.

Diffs with Postgres' lead and window functions

Tagged lead, diff, window, postgres  Languages sql

Our requirement

Output the previous quarter's value for each company:

ID | NAME              | QUARTER     |  VALUE    | PREVIOUS_VALUE
1  | CORP              | 2015-03-31  |  44317.00 | 38755.00
1  | CORP              | 2014-12-31  |  38755.00 | 33617.00
1  | CORP              | 2014-09-30  |  33617.00 | 32406.00
1  | CORP              | 2014-06-30  |  32406.00 | 29909.00
1  | CORP              | 2014-03-31  |  29909.00 | 0

The solution

To calculate the change we use the window and lead functions:

WITH diffs AS (
  select
    *,
    coalesce(lead(value) over (partition by company_id order by quarter desc), 0) as previous_value
  from
    history
)
select * from diffs;

Note that the window is partitioned by company_id, and ordered by quarter.

Postgres substring search with trigrams

Tagged trigram, postgres, search, substring, unaccent, pgtrgm, pg_trgm  Languages sql

What we have is: * a table "companies" and a column "name" * Postgres and the pgtrgm extension * a company named "Berkshéiße"

Let's enable the trigram extension:

-- $ psql app_schema -U superuser
CREATE EXTENSION pg_trgm;

Now we can search for trigrams, including shèiß

-- $ psql app_schema -U user
SELECT * FROM companies WHERE name ~* 'shèiß';

The query returns nothing, so let's do this instead: * install the unaccent extension * create an "immutable unaccent" function * apply "unaccent" and "lower" to the query * apply "unaccent" and "lower" to the index

-- $ psql app_schema -U superuser
CREATE EXTENSION unaccent;
-- $ psql app_schema -U user
--DROP INDEX companies_name_search_idx;
CREATE OR REPLACE FUNCTION f_unaccent(text)
  RETURNS text AS
$func$
SELECT unaccent('unaccent', $1)
$func$  LANGUAGE sql IMMUTABLE SET search_path = public, pg_temp;
CREATE INDEX companies_name_search_idx ON companies USING gin(f_unaccent(name) gin_trgm_ops);

Finally, the query returns what we're looking for:

-- Plain SQL
SELECT * FROM companies WHERE lower(f_unaccent(name)) LIKE ('%' || lower(f_unaccent('shèiß')) || '%');
-- With pg_trgrm syntax
SELECT * FROM companies WHERE lower(f_unaccent(name) ~* lower(f_unaccent('shéiße'));
-- Look, even this works
SELECT * FROM companies WHERE lower(f_unaccent(name) ~* lower(f_unaccent('shEiSe'));

If Postgres still doesn't use the index we created, it's probably because it's faster to scan the table than using the index.

Notes

* The WHERE-condition must match the index definition:

-- yes
lower(unaccent_text(name) ~* lower(unaccent_text('shéiße'))
-- no
name ~* lower(unaccent_text('shéiße'))
-- no
name ~* unaccent_text('shéiße')

References

* Erwin Brandstetter: Does PostgreSQL support “accent insensitive” collations? * Erwin Brandstetter: PostgreSQL accent + case insensitive search

How to take a backup of a Postgres database

Tagged backup, postgres  Languages bash

This is one way of doing it:

pg_dump -s yamaha_development > structure.sql
pg_dump --column-inserts --data-only yamaha_development  > data.sql

You can then restore it to a different database if needed:

psql -c 'drop database yamaha_test;'
psql -c 'create database yamaha_test;'

psql yamaha_test < structure.sql
psql yamaha_test < data.sql

All without foreign key constraint, etc, errors.

How to create random data with Postgres

Tagged postgres, random, data  Languages sql

This example creates random data from a predefined list of valid values:

insert into transactions(description, price, timestamp) values(
  ('{Bought cat,Bought dog,Bought house}'::text[])[ceil(random()*3)],
  ('{10,20}'::int[])[ceil(random()*2)],
  NOW() - '1 day'::INTERVAL * ROUND(RANDOM() * 100)
)

This sets a column to a random UUID:

UPDATE customers SET name = uuid_in(md5(random()::text || clock_timestamp()::text)::cstring);