Traditionally autovacuum has only ever invoked a worker based on the estimated number of dead tuples in a table and for anti-wraparound purposes. For the latter, with certain classes of tables such as insert-only tables, anti-wraparound vacuums could be the first vacuum that the table ever receives. This could often lead to autovacuum workers being busy for extended periods of time due to having to potentially freeze every page in the table. This could be particularly bad for very large tables. New clusters, or recently pg_restored clusters could suffer even more as many large tables may have the same relfrozenxid, which could result in large numbers of tables requiring an anti-wraparound vacuum all at once.
Here we aim to reduce the work required by anti-wraparound and aggressive
vacuums in general, by triggering autovacuum when the table has received enough INSERTs. This is controlled by adding two new GUCs and reloptions; autovacuum_vacuum_insert_threshold and autovacuum_vacuum_insert_scale_factor. These work exactly the same as the existing scale factor and threshold controls, only base themselves off the number of inserts since the last vacuum, rather than the number of dead tuples. New controls were added rather than reusing the existing controls, to allow these new vacuums to be tuned independently and perhaps even completely disabled altogether, which can be done by setting autovacuum_vacuum_insert_threshold to -1.
We make no attempt to skip index cleanup operations on these vacuums as they may trigger for an insert-mostly table which continually doesn't have enough dead tuples to trigger an autovacuum for the purpose of removing those dead tuples. If we were to skip cleaning the indexes in this case, then it is possible for the index(es) to become bloated over time.
There are additional benefits to triggering autovacuums based on inserts, as tables which never contain enough dead tuples to trigger an autovacuum are now more likely to receive a vacuum, which can mark more of the table as "allvisible" and encourage the query planner to make use of Index Only Scans.
Currently, we still obey vacuum_freeze_min_age when triggering these new autovacuums based on INSERTs. For large insert-only tables, it may be beneficial to lower the table's autovacuum_freeze_min_age so that tuples are eligible to be frozen sooner. Here we've opted not to zero that for these types of vacuums, since the table may just be insert-mostly and we may otherwise freeze tuples that are still destined to be updated or removed in the near future.
There was some debate to what exactly the new scale factor and threshold should default to. For now, these are set to 0.2 and 1000, respectively. There may be some motivation to adjust these before the release.
Author: Laurenz Albe, Darafei Praliaskouski
b07642dbcd Trigger autovacuum based on number of INSERTs
doc/src/sgml/config.sgml | 43 +++++++++++++++++++++++++++
doc/src/sgml/maintenance.sgml | 34 ++++++++++++++++-----
doc/src/sgml/monitoring.sgml | 5 ++++
doc/src/sgml/ref/create_table.sgml | 30 +++++++++++++++++++
src/backend/access/common/reloptions.c | 22 ++++++++++++++
src/backend/catalog/system_views.sql | 1 +
src/backend/postmaster/autovacuum.c | 31 ++++++++++++++++---
src/backend/postmaster/pgstat.c | 16 ++++++++++
src/backend/utils/adt/pgstatfuncs.c | 16 ++++++++++
src/backend/utils/misc/guc.c | 20 +++++++++++++
src/backend/utils/misc/postgresql.conf.sample | 5 ++++
src/bin/psql/tab-complete.c | 4 +++
src/include/catalog/catversion.h | 2 +-
src/include/catalog/pg_proc.dat | 5 ++++
src/include/pgstat.h | 1 +
src/include/postmaster/autovacuum.h | 2 ++
src/include/utils/rel.h | 2 ++
src/test/regress/expected/rules.out | 3 ++
18 files changed, 230 insertions(+), 12 deletions(-)