postgres snippets
How to Use the JSON and HStore Postgres Data Types With Rails 4
First enable the hstore Postgres extension. In this example we define both a JSON and an HStore column:
class Schema < ActiveRecord::Migration
def change
enable_extension "hstore"
create_table :links do |t|
t.hstore :data
t.column :settings, :json
end
end
end
Next, we specify accessors for the data that we will be stored in the JSON and HStore columns:
class Link < ActiveRecord::Base
# Use hstore for text-only data:
store :data, :name, :url, :description
# Use JSON to support string, number, object, array, true, false, null
store :settings, :update_interval, :created_at, :updated_at
We can now use the defined ActiveRecord attributes to store and access JSON and HStore data:
Link.create! name: 'Google', url: 'http://google.com', description: 'Ad company', update_interval: 1.day, created_at: Time.now.utc
Querying the data is where you'll see the biggest differences. Two examples:
# hstore
Link.where("data -> 'name' = ?", 'Google')
# json
Link.where("CAST(settings->>'update_interval' as integer) = ?", 1.day.to_s)
Notes for Postgres 9.3: * HStore can store only text. Nested data is not supported. * JSON supports the following types: string, number, object, array, true, false, null. For example, date and time types are not supported. Nested data is supported.
Recursive Postgres Query That Finds Descendants of a Node in a Tree
Voilà, the recursive query we are going disect in this snippet:
WITH RECURSIVE tree AS (
SELECT id, name, parent_id FROM nodes WHERE id = 25
UNION ALL
SELECT nodes.id, nodes.name, nodes.parent_id FROM nodes, tree WHERE nodes.parent_id = tree.id)
Here we define a virtual table called tree:
WITH RECURSIVE tree
This SQL finds the root node and initializes the recursive function:
SELECT id, name, parent_id FROM nodes WHERE id = 25
The next part recursively fetches the descendants of the root node:
SELECT nodes.id, nodes.name, nodes.parent_id FROM nodes, tree WHERE nodes.parent_id = tree.id)
This SQL query fetches the root node and all its descendants from the tree:
SELECT * FROM tree;
ActiveRecord / Rails
This example shows how to use CTEs in ActiveRecord to find the parent and suborganizations in a conglomerate:
class Organization
belongs_to :parent, class_name: 'Organization', required: false
has_many :children, class_name: 'Organization', foreign_key: :parent_id
scope :roots, -> { where(parent_id: nil) }
def ancestors(id, columns=Organization.column_names)
cols = columns.join(', ')
child_id = ActiveRecord::Base.connection.quote(id)
Organization.from <<~SQL
(WITH RECURSIVE org_tree(#{cols}, path) AS (
SELECT
#{cols}, ARRAY[id]
FROM
organizations
WHERE
id = #{child_id}
UNION ALL
SELECT
#{columns.map { |col| "organizations.#{col}" }.join(', ') }, path || organizations.id
FROM
org_tree
JOIN
organizations ON organizations.id = org_tree.parent_id
WHERE NOT
organizations.id = ANY(path) -- avoid infinite loops where e.g. parent = self
) SELECT #{cols} FROM org_tree WHERE id != #{child_id}) as organizations
SQL
end
def descendants(id, columns=Organization.column_names)
cols = columns.join(', ')
parent_id = ActiveRecord::Base.connection.quote(id)
Organization.from <<~SQL
(WITH RECURSIVE org_tree(#{cols}, path) AS (
SELECT
#{cols}, ARRAY[id]
FROM
organizations
WHERE
id = #{parent_id}
UNION ALL
SELECT
#{columns.map { |col| "organizations.#{col}" }.join(', ') }, path || organizations.id
FROM
org_tree
JOIN
organizations ON organizations.parent_id = org_tree.id
WHERE NOT
organizations.id = ANY(path) -- avoid infinite loops where e.g. parent = self
) SELECT #{cols} FROM org_tree WHERE id != #{parent_id}) as organizations
SQL
end
end
# Find parent companies
Organization.find(1).ancestors
# Find suborganizations
Organization.find(1).descendants
Details
- Use UNION instead of UNION ALL if you have duplicates, or if you’re not worried about performance when Postgres has to scan for duplicates.
- WITH Queries in Postgres
- Recursive queries can loop indefinitely if e.g. self.parent = self. If this happens, use the ANY() function to check if path has been visited once already. Alternatively, use validations to make sure circular references don’t happen, and hope for the best.
- You can use Arel’s recursive feature to generate recursive CTE queries with Arel.
- Rails CTE gem: https://github.com/christianhellsten/cte-rails
Recursive queries with connectby in Postgres
Create an organization hierarchy and display the organization from the top:
CREATE EXTENSION "tablefunc";
CREATE TABLE organizations(id text, parent_id text, pos int);
INSERT INTO organizations VALUES('row1',NULL, 0);
INSERT INTO organizations VALUES('row2','row1', 0);
INSERT INTO organizations VALUES('row3','row1', 0);
INSERT INTO organizations VALUES('row4','row2', 1);
INSERT INTO organizations VALUES('row5','row2', 0);
INSERT INTO organizations VALUES('row6','row4', 0);
INSERT INTO organizations VALUES('row7','row3', 0);
INSERT INTO organizations VALUES('row8','row6', 0);
INSERT INTO organizations VALUES('row9','row5', 0);
-- Fetch self and descendants for row1
SELECT * FROM connectby('organizations', 'id', 'parent_id', 'row1', 0, '~') AS t(id text, parent_id text, level int, branch text);
Output:
id | parent_id | level | branch
------+-----------+-------+--------------------------
row1 | | 0 | row1
row2 | row1 | 1 | row1~row2
row4 | row2 | 2 | row1~row2~row4
row6 | row4 | 3 | row1~row2~row4~row6
row8 | row6 | 4 | row1~row2~row4~row6~row8
row5 | row2 | 2 | row1~row2~row5
row9 | row5 | 3 | row1~row2~row5~row9
row3 | row1 | 1 | row1~row3
row7 | row3 | 2 | row1~row3~row7
Usually it's better to use recursive CTE queries:
WITH RECURSIVE organization_tree AS (
SELECT
id, parent_id
FROM
organizations
WHERE
id = 'row1'
UNION ALL
SELECT
organizations.id, organizations.parent_id
FROM
organizations, organization_tree
WHERE
organizations.parent_id = organization_tree.id
) select * from organization_tree;
Selecting the "top n" or "n latest" rows by group in Postgres
This SQL query selects the top 10 transactions for each user:
- partition the transactions by user_id
- order the transactions in each partition by the column created_at
- select only first row, i.e. the latest
WITH latest_customer_transactions AS (
SELECT
*, row_number() OVER (PARTITION BY user_id ORDER BY created_at desc) AS rownum
FROM
transactions
)
SELECT
id, user_id
FROM
latest_user_transactions
WHERE
rownum <= 10;
Note that if you simply want the first record per group it’s probably best to use DISTINCT ON as described here: https://www.periscopedata.com/blog/first-row-per-group-5x-faster
SELECT DISTINCT ON (user_id) *
FROM jobs
ORDER BY user_id,
priority DESC,
created_at
Note the differences between RANK, ROW_NUMBER, and DENSE_RANK: http://blog.jooq.org/2014/08/12/the-difference-between-row_number-rank-and-dense_rank/
Calculating "percentage of total" for each row with Postgres
Calculating the “percentage of the total” for each row with Postgres can be done with a window function:
SELECT
*, (value / SUM(value) OVER ()) AS "% of total"
FROM
transactions
WHERE
quarter = '2015-03-31' and company_id = 1;
We’re using “OVER ()”, which means the sum over all rows returned by the where clause.
Diffs with Postgres' lead and window functions
Our requirement
Output the previous quarter's value for each company:
ID | NAME | QUARTER | VALUE | PREVIOUS_VALUE
1 | CORP | 2015-03-31 | 44317.00 | 38755.00
1 | CORP | 2014-12-31 | 38755.00 | 33617.00
1 | CORP | 2014-09-30 | 33617.00 | 32406.00
1 | CORP | 2014-06-30 | 32406.00 | 29909.00
1 | CORP | 2014-03-31 | 29909.00 | 0
The solution
To calculate the change we use the window and lead functions:
WITH diffs AS (
select
*,
coalesce(lead(value) over (partition by company_id order by quarter desc), 0) as previous_value
from
history
)
select * from diffs;
Note that the window is partitioned by company_id, and ordered by quarter.
Postgres substring search with trigrams
What we have is: * a table "companies" and a column "name" * Postgres and the pgtrgm extension * a company named "Berkshéiße"
Let's enable the trigram extension:
-- $ psql app_schema -U superuser
CREATE EXTENSION pg_trgm;
Now we can search for trigrams, including shèiß
-- $ psql app_schema -U user
SELECT * FROM companies WHERE name ~* 'shèiß';
The query returns nothing, so let's do this instead: * install the unaccent extension * create an "immutable unaccent" function * apply "unaccent" and "lower" to the query * apply "unaccent" and "lower" to the index
-- $ psql app_schema -U superuser
CREATE EXTENSION unaccent;
-- $ psql app_schema -U user
--DROP INDEX companies_name_search_idx;
CREATE OR REPLACE FUNCTION f_unaccent(text)
RETURNS text AS
$func$
SELECT unaccent('unaccent', $1)
$func$ LANGUAGE sql IMMUTABLE SET search_path = public, pg_temp;
CREATE INDEX companies_name_search_idx ON companies USING gin(f_unaccent(name) gin_trgm_ops);
Finally, the query returns what we're looking for:
-- Plain SQL
SELECT * FROM companies WHERE lower(f_unaccent(name)) LIKE ('%' || lower(f_unaccent('shèiß')) || '%');
-- With pg_trgrm syntax
SELECT * FROM companies WHERE lower(f_unaccent(name) ~* lower(f_unaccent('shéiße'));
-- Look, even this works
SELECT * FROM companies WHERE lower(f_unaccent(name) ~* lower(f_unaccent('shEiSe'));
If Postgres still doesn't use the index we created, it's probably because it's faster to scan the table than using the index.
Notes
* The WHERE-condition must match the index definition:
-- yes
lower(unaccent_text(name) ~* lower(unaccent_text('shéiße'))
-- no
name ~* lower(unaccent_text('shéiße'))
-- no
name ~* unaccent_text('shéiße')
References
* Erwin Brandstetter: Does PostgreSQL support “accent insensitive” collations? * Erwin Brandstetter: PostgreSQL accent + case insensitive search
How to take a backup of a Postgres database
This is one way of doing it:
pg_dump -s yamaha_development > structure.sql
pg_dump --column-inserts --data-only yamaha_development > data.sql
You can then restore it to a different database if needed:
psql -c 'drop database yamaha_test;'
psql -c 'create database yamaha_test;'
psql yamaha_test < structure.sql
psql yamaha_test < data.sql
All without foreign key constraint, etc, errors.
How to create random data with Postgres
This example creates random data from a predefined list of valid values:
insert into transactions(description, price, timestamp) values(
('{Bought cat,Bought dog,Bought house}'::text[])[ceil(random()*3)],
('{10,20}'::int[])[ceil(random()*2)],
NOW() - '1 day'::INTERVAL * ROUND(RANDOM() * 100)
)
This sets a column to a random UUID:
UPDATE customers SET name = uuid_in(md5(random()::text || clock_timestamp()::text)::cstring);