replication snippets

PostgreSQL 10 With Streaming Replication and PITR

Tagged pitr, postgresql, replication, standby, streaming  Languages bash, sql

Notes on how to configure streaming replication and PITR with PostgreSQL 10: https://www.postgresql.org/docs/current/static/warm-standby.html

Goal

               ┌──────────────────┐                
               │                  │                
               │ Web application  │                
             ┌─│                  │───┐            
             │ │                  │   │            
             │ └──────────────────┘   │            
          Reads                    Writes          
             │                        │            
             │                        │            
             │                        │            
             ▼                        ▼            
   ┌──────────────────┐     ┌──────────────────┐   
   │                  │     │                  │   
   │      Slave       │     │      Master      │   
┌─▶│                  │     │                  │──┐
│  │                  │     │                  │  │
│  └──────────────────┘     └──────────────────┘  │
│                                     │           │
└──────────────Streaming replication──┼───────────┘
                                      │            
                                      │            
                               WAL archiving       
                                      │            
              ┌──────────────────┐    │            
              │                  │    │            
              │  Network drive   │    │            
              │                  │◀───┘            
              │                  │                 
              └──────────────────┘                 
  • Master and slave (warm standby)

    • Master database backed up to slave server via streaming replication
    • WAL files copied from master to a network drive
  • Point-in-time-recovery (PITR) which is useful, for example, if someone deletes a table by mistake
  • Recovery possible even if both master and slave are lost
  • Daily and weekly backups available on a network drive
  • WAL files available on a network drive
  • Network drive backed up

Steps

  1. Install postgres 10 on master and slave
  2. Configure the master

    • wal_level=replica, archive_mode=on, archive_command, wal_keep_segments (optional with replication slots), etc in postgresql.conf
    • archive_command should copy WAL files to a shared network drive for additional redundancy, or the slave
    • create a replication slot for each slave, so that WAL files are stored long enough for slaves to receive them:
SELECT * FROM pg_create_physical_replication_slot('pg-slave-1');
  1. Configure the slave

    • hot_standby=on, etc in postgresql.conf (keep slave configuration as identical as possible to master)
    • primary_slot_name = ‘pg-slave-1’, standby_mode=on, restore_command, primaryconn_info, trigger_file in recovery.conf
    • restore_command should use the WAL files on the network drive that are copied there from the master
  2. Start primary server, stop the slave

Make sure, e.g., monit does not start the slave immediately again.

  1. Copy the master database to the slave with pg_basebackup

Make sure the slave’s data directory is empty:

psql -c '\db'
sudo /etc/init.d/postgresql stop
sudo rm -rf /var/lib/postgresql/10/main
sudo mkdir /var/lib/postgresql/10/main
sudo pg_basebackup -h 10.0.0.1 -D /var/lib/postgresql/10/main -P -U replicator -X stream -W
sudo chown -R postgres:postgres /var/lib/postgresql/10/main
sudo chmod -R 0700 /var/lib/postgresql/10/main
  1. Start slave
sudo service postgresql start
  1. Set up daily backups

Configure daily backups of PostgreSQL data to a network drive.

  1. Backup the backup

Configure daily and weekly backups of network drive.

  1. Check replication status

On master:

select pg_current_wal_lsn();

On slave:

select pg_last_wal_replay_lsn();

Both values should match.

Scenarios

  • Master server is killed

Promote slave to master with touch /tmp/promote-slave-to-master.trigger

  • Master server is killed, slave replication has fallen behind master

Restore slave from the WAL files located on the network drive. Or use a daily backup plus the WAL files if replication has fallen behind too much.

  • Master server and slave are killed

Restore the database from a daily backup and WAL files located on the network drive.

  • Master server, slave, and network drive are killed

Restore the database from a daily backup and the WAL files located on another network drive.

  • “drop database xxx” was run by mistake

Restore the database with PITR. For example, set recovery_target_time = ‘2017-06-06 06:06:06’ in recovery.conf.

  • Additional slaves are needed

Configure the new slave. Remember to create a new replication slot.

  • Slave is removed permanently

Delete the replication slot or the WAL archives will keep accumulating until the master’s disk is full.

Troubleshooting

  • Replication not working

Is recovery.conf in the data directory? Was it there before the server was started?

References

Fetch the status of Postgres' streaming replication with ActiveRecord and Rails

Tagged pg_current_wal_flush_lsn, pg_last_wal_receive_lsn, postgres, replication, streaming  Languages ruby

When using streaming replication you can use this code to fetch the number of bytes the slave is behind the master in applying changes to the WAL (write-ahead log):

class SlaveDatabase < ApplicationRecord
  self.abstract_class = true
  # NOTE: Add slave entry to config/database.yml
  establish_connection :slave

  #
  # Returns the number of bytes the slave is behind the master in applying WAL updates.
  #
  def self.replication_delta
    # Returns, for example, 12/547C3A58 from the master
    current_master_wal = ActiveRecord::Base.connection.select_value("SELECT pg_current_wal_flush_lsn();")
    # Returns, for example, -7504582 from the slave
    delta = connection.select_value("SELECT #{connection.quote(current_master_wal)} - pg_last_wal_receive_lsn()")
    raise "pg_last_wal_receive_lsn returned null, see documentation for details" unless delta
    Integer(delta).abs
  end
end

Tested with version Postgres 10. Note that in version 10, pg_last_xlog_receive_location has been renamed to pg_last_wal_receive_lsn, and pg_current_xlog_flush_location has been renamed to pg_current_wal_flush_lsn.