Defend against bad relfrozenxid/relminmxid/datfrozenxid/datminmxid values

Enterprise / PostgreSQL - Tom Lane [sss.pgh.pa.us] - 21 July 2014 10:41 UTC

In commit a61daa14d56867e90dc011bbba52ef771cea6770, we fixed pg_upgrade so that it would install sane relminmxid and datminmxid values, but that does not cure the problem for installations that were already pg_upgraded to 9.3; they'll initially have "1" in those fields. This is not a big problem so long as 1 is "in the past" compared to the current nextMultiXact counter. But if an installation were more than halfway to the MXID wrap point at the time of upgrade, 1 would appear to be "in the future" and that would effectively disable tracking of oldest MXIDs in those tables/databases, until such time as the counter wrapped around.

While in itself this isn't worse than the situation pre-9.3, where we did not manage MXID wraparound risk at all, the consequences of premature truncation of pg_multixact are worse now; so we ought to make some effort to cope with this. We discussed advising users to fix the tracking values manually, but that seems both very tedious and very error-prone.

Instead, this patch adopts two amelioration rules. First, a relminmxid
value that is "in the future" is allowed to be overwritten with a full-table VACUUM's actual freeze cutoff, ignoring the normal rule that relminmxid should never go backwards. (This essentially assumes that we have enough defenses in place that wraparound can never occur anymore, and thus that a value "in the future" must be corrupt.) Second, if we see any "in the future" values then we refrain from truncating pg_clog and pg_multixact. This prevents loss of clog data until we have cleaned up all the broken tracking data. In the worst case that could result in considerable clog bloat, but in practice we expect that relfrozenxid-driven freezing will happen soon enough to fix the problem before clog bloat becomes intolerable. (Users could do manual VACUUM FREEZEs if not.)

Note that this mechanism cannot save us if there are already-wrapped or already-truncated-away MXIDs in the table; it's only capable of dealing with corrupt tracking values. But that's the situation we have with the pg_upgrade bug.

For consistency, apply the same rules to relfrozenxid/datfrozenxid. There are not known mechanisms for these to get messed up, but if they were, the same tactics seem appropriate for fixing them.

78db307 Defend against bad relfrozenxid/relminmxid/datfrozenxid/datminmxid values.
src/backend/commands/vacuum.c | 125 ++++++++++++++++++++++++++++++++---------
1 file changed, 99 insertions(+), 26 deletions(-)

Upstream: git.postgresql.org


  • Share