Generalize how VACUUM skips all-frozen pages

Enterprise / PostgreSQL - Peter Geoghegan [bowt.ie] - 3 April 2022 20:35 UTC

Non-aggressive VACUUMs were at a gratuitous disadvantage (relative to aggressive VACUUMs) around advancing relfrozenxid and relminmxid before now. The issue only came up when concurrent activity unset some heap page's visibility map bit right as VACUUM was considering if the page should get counted in frozenskipped_pages. The non-aggressive case would recheck the all-frozen bit at this point. The aggressive case reasoned that the page (a skippable page) must have at least been all-frozen in the recent past, so skipping it won't make relfrozenxid advancement unsafe (which is never okay for aggressive VACUUMs).

The recheck created a window for some other backend to confuse matters for VACUUM. If the page's VM bit turned out to be unset, VACUUM would conclude that the page was _never_ all-frozen. frozenskipped_pages was not incremented, and yet VACUUM couldn't back out of skipping at this late stage (it couldn't choose to scan the page instead). This made it unsafe to advance relfrozenxid later on.

Consistently avoid the issue by generalizing how we skip frozen pages during aggressive VACUUMs: take the same approach when skipping any skippable page range during aggressive and non-aggressive VACUUMs alike. The new approach makes ranges (not individual pages) the fundamental unit of skipping using the visibility map. frozenskipped_pages is replaced with a boolean flag that represents whether some skippable range with one or more all-visible pages was actually skipped.

It is safe for VACUUM to treat a page as all-frozen provided it at least had its all-frozen bit set after the OldestXmin cutoff was established.
VACUUM is only required to scan pages that might have XIDs < OldestXmin (unfrozen XIDs) to be able to safely advance relfrozenxid. Tuples concurrently inserted on "skipped" pages can be thought of as equivalent to tuples concurrently inserted on a block >= rel_pages.

It's possible that the issue this commit fixes hardly ever came up in practice. But we only had to be unlucky once to lose out on advancing relfrozenxid -- a single affected heap page was enough to throw VACUUM off. That seems like something to avoid on general principle. This is similar to an issue fixed by commit 44fa8488, which taught vacuumlazy.c to not give up on non-aggressive relfrozenxid advancement just because a cleanup lock wasn't immediately available on some heap page.

Skipping an all-visible range is now explicitly structured as a choice made by non-aggressive VACUUMs, by weighing known costs (scanning extra skippable pages to freeze their tuples early) against known benefits (advancing relfrozenxid early). This works in essentially the same way as it always has (don't skip ranges < SKIP_PAGES_THRESHOLD). We could do much better here in the future by considering other relevant factors.

Author: Peter Geoghegan

f3c15cbe50 Generalize how VACUUM skips all-frozen pages.
src/backend/access/heap/vacuumlazy.c | 309 +++++++++++++++++------------------
1 file changed, 146 insertions(+), 163 deletions(-)

Upstream: git.postgresql.org


  • Share