Avoid VACUUM reltuples distortion

Enterprise / PostgreSQL - Peter Geoghegan [bowt.ie] - 17 February 2022 01:15 UTC

Add a heuristic that avoids distortion in the pg_class.reltuples estimates used by VACUUM. Without the heuristic, successive manually run VACUUM commands (run against a table that is never modified after initial bulk loading) will scan the same page in each VACUUM operation. Eventually pg_class.reltuples may reach the point where one single heap page is accidentally considered highly representative of the entire table. This is likely to be completely wrong, since the last heap page typically has fewer tuples than average for the table.

It's not obvious that this was a problem prior to commit 44fa8488, which made vacuumlazy.c consistently scan the last heap page (even when it is all-visible in the visibility map). It seems possible that there were more subtle variants of the same problem that went unnoticed for quite some time, though. Commit 44fa8488 simplified certain aspects of when and how relation truncation was considered, but it did not introduce the "scan the last page" behavior. Essentially the same behavior was introduced much earlier, in commit e8429082. It was conditioned on whether or not truncation looked promising towards the end of the initial heap pass by VACUUM until recently, which was at least somewhat protective. That doesn't seem like something that we should be relying on, though.

Author: Peter Geoghegan Discussion: https://postgr.es/m/CAH2-WzkNKORurux459M64mR63Aw4Jq7MBRVcX=CvALqN3A88WA@mail.gmail.com

74388a1ac3 Avoid VACUUM reltuples distortion.
src/backend/commands/vacuum.c | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)

Upstream: git.postgresql.org

  • Share