Notes on how to configure streaming replication and PITR with PostgreSQL 10.
Master and slave (warm standby)
- Master database backed up to slave server via streaming replication
- WAL files copied from master to a network drive
- Point-in-time-recovery (PITR), e.g., if someone deletes a table by mistake
- Recovery possible even if both master and slave are lost
- Daily and weekly backups available on a network drive
- WAL files available on a network drive
- Network drive backed up
- Install postgres 10 on master and slave
Configure the master
- wal_level=replica, archive_mode=on, archive_command, wal_keep_segments (optional with replication slots), etc in postgresql.conf
- archive_command should copy WAL files to a shared network drive for additional redundancy, or the slave
- create a replication slot for each slave, so that WAL files are stored long enough for slaves to receive them:
SELECT * FROM pg_create_physical_replication_slot('pg-slave-1');
Configure the slave
- hot_standby=on, etc in postgresql.conf (keep slave configuration as identical as possible to master)
- primary_slot_name = ‘pg-slave-1’, standby_mode=on, restore_command, primaryconn_info, trigger_file in recovery.conf
- restore_command should use the WAL files on the network drive that are copied there from the master
- Start primary server, stop the slave
Make sure, e.g., monit does not start the slave immediately again.
- Copy the master database to the slave with pg_basebackup
Make sure the slave’s data directory is empty:
psql -c '\db' sudo /etc/init.d/postgresql stop sudo rm -rf /var/lib/postgresql/10/main sudo mkdir /var/lib/postgresql/10/main sudo pg_basebackup -h 10.0.0.1 -D /var/lib/postgresql/10/main -P -U replicator -X stream -W sudo chown -R postgres:postgres /var/lib/postgresql/10/main sudo chmod -R 0700 /var/lib/postgresql/10/main
- Start slave
sudo service postgresql start
- Set up daily backups
Configure daily backups of PostgreSQL data to a network drive.
- Backup the backup
Configure daily and weekly backups of network drive.
- Check replication status
Both values should match.
- Master server is killed
Promote slave to master with touch /tmp/promote-slave-to-master.trigger
- Master server is killed, slave replication has fallen behind master
Restore slave from the WAL files located on the network drive. Or use a daily backup plus the WAL files if replication has fallen behind too much.
- Master server and slave are killed
Restore the database from a daily backup and WAL files located on the network drive.
- Master server, slave, and network drive are killed
Restore the database from a daily backup and the WAL files located on another network drive.
- “drop database xxx” was run by mistake
Restore the database with PITR. For example, set recovery_target_time = ‘2017-06-06 06:06:06’ in recovery.conf.
- Additional slaves are needed
Configure the new slave. Remember to create a new replication slot.
- Slave is removed permanently
Delete the replication slot or the WAL archives will keep accumulating until the master’s disk is full.
- Replication not working
Is recovery.conf in the data directory? Was it there before the server was started?