Adds write_lag, flush_lag and replay_lag cols to pg_stat_replication.
Implements a lag tracker module that reports the lag times based upon measurements of the time taken for recent WAL to be written, flushed and replayed and for the sender to hear about it. These times represent the commit lag that was (or would have been) introduced by each synchronous commit level, if the remote server was configured as a synchronous standby. For an asynchronous standby, the replay_lag column approximates the delay before recent transactions became visible to queries. If the standby server has entirely caught up with the sending server and there is no more WAL activity, the most recently measured lag times will continue to be displayed for a short time and then show NULL.
Physical replication lag tracking is automatic. Logical replication tracking is possible but is the responsibility of the logical decoding plugin. Tracking is a private module operating within each walsender individually, with values reported to shared memory. Module not used outside of walsender.
Design and code is good enough now to commit - kudos to the author. In many ways a difficult topic, with important and subtle behaviour so this shoudl be expected to generate discussion and multiple open items: Test now!
Author: Thomas Munro, following designs by Fujii Masao and Simon Riggs Review: Simon Riggs, Ian Barwick and Craig Ringer
6912acc Replication lag tracking for walsenders
doc/src/sgml/monitoring.sgml | 69 +++++++
src/backend/access/transam/xlog.c | 14 ++
src/backend/catalog/system_views.sql | 3 +
src/backend/replication/walsender.c | 277 +++++++++++++++++++++++++++-
src/include/catalog/pg_proc.h | 2 +-
src/include/replication/logical.h | 2 +
src/include/replication/walsender_private.h | 5 +
src/test/regress/expected/rules.out | 5 +-
8 files changed, 370 insertions(+), 7 deletions(-)