PostgreSQL 10 With Streaming Replication and PITR

Tagged pitr, postgresql, replication, standby, streaming  Languages bash, sql

Notes on how to configure streaming replication and PITR with PostgreSQL 10.

Goal

  • Master and slave (warm standby)

    • Master database backed up to slave server via streaming replication
    • WAL files copied from master to a network drive
  • Point-in-time-recovery (PITR), e.g., if someone deletes a table by mistake
  • Recovery possible even if both master and slave are lost
  • Daily and weekly backups available on a network drive
  • WAL files available on a network drive
  • Network drive backed up

Steps

  1. Install postgres 10 on master and slave
  2. Configure the master

    • wal_level=replica, archive_mode=on, archive_command, wal_keep_segments (optional with replication slots), etc in postgresql.conf
    • archive_command should copy WAL files to a shared network drive for additional redundancy, or the slave
    • create a replication slot for each slave, so that WAL files are stored long enough for slaves to receive them:
SELECT * FROM pg_create_physical_replication_slot('pg-slave-1');
  1. Configure the slave

    • hot_standby=on, etc in postgresql.conf (keep slave configuration as identical as possible to master)
    • primary_slot_name = ‘pg-slave-1’, standby_mode=on, restore_command, primaryconn_info, trigger_file in recovery.conf
    • restore_command should use the WAL files on the network drive that are copied there from the master
  2. Start primary server, stop the slave

Make sure, e.g., monit does not start the slave immediately again.

  1. Copy the master database to the slave with pg_basebackup

Make sure the slave’s data directory is empty:

psql -c '\db'
sudo /etc/init.d/postgresql stop
sudo rm -rf /var/lib/postgresql/10/main
sudo mkdir /var/lib/postgresql/10/main
sudo pg_basebackup -h 10.0.0.1 -D /var/lib/postgresql/10/main -P -U replicator -X stream -W
sudo chown -R postgres:postgres /var/lib/postgresql/10/main
sudo chmod -R 0700 /var/lib/postgresql/10/main
  1. Start slave
sudo service postgresql start
  1. Set up daily backups

Configure daily backups of PostgreSQL data to a network drive.

  1. Backup the backup

Configure daily and weekly backups of network drive.

  1. Check replication status

On master:

select pg_current_wal_lsn();

On slave:

select pg_last_wal_replay_lsn();

Both values should match.

Scenarios

  • Master server is killed

Promote slave to master with touch /tmp/promote-slave-to-master.trigger

  • Master server is killed, slave replication has fallen behind master

Restore slave from the WAL files located on the network drive. Or use a daily backup plus the WAL files if replication has fallen behind too much.

  • Master server and slave are killed

Restore the database from a daily backup and WAL files located on the network drive.

  • Master server, slave, and network drive are killed

Restore the database from a daily backup and the WAL files located on another network drive.

  • “drop database xxx” was run by mistake

Restore the database with PITR. For example, set recovery_target_time = ‘2017-06-06 06:06:06’ in recovery.conf.

  • Additional slaves are needed

Configure the new slave. Remember to create a new replication slot.

  • Slave is removed permanently

Delete the replication slot or the WAL archives will keep accumulating until the master’s disk is full.

Troubleshooting

  • Replication not working

Is recovery.conf in the data directory? Was it there before the server was started?

References