Add deduplication to nbtree

Enterprise / PostgreSQL - Peter Geoghegan [bowt.ie] - 26 February 2020 21:05 EST

Deduplication reduces the storage overhead of duplicates in indexes that use the standard nbtree index access method. The deduplication process is applied lazily, after the point where opportunistic deletion of LP_DEAD-marked index tuples occurs. Deduplication is only applied at the point where a leaf page split would otherwise be required. New posting list tuples are formed by merging together existing duplicate tuples. The physical representation of the items on an nbtree leaf page is made more space efficient by deduplication, but the logical contents of the page are not changed. Even unique indexes make use of deduplication as a way of controlling bloat from duplicates whose TIDs point to different versions of the same logical table row.

The lazy approach taken by nbtree has significant advantages over a GIN style eager approach. Most individual inserts of index tuples have exactly the same overhead as before. The extra overhead of deduplication is amortized across insertions, just like the overhead of page splits. The key space of indexes works in the same way as it has since commit dd299df8 (the commit that made heap TID a tiebreaker column).

Testing has shown that nbtree deduplication can generally make indexes with about 10 or 15 tuples for each distinct key value about 2.5X - 4X smaller, even with single column integer indexes (e.g., an index on a referencing column that accompanies a foreign key). The final size of single column nbtree indexes comes close to the final size of a similar contrib/btree_gin index, at least in cases where GIN's posting list compression isn't very effective. This can significantly improve transaction throughput, and significantly reduce the cost of vacuuming indexes.

A new index storage parameter (deduplicate_items) controls the use of deduplication. The default setting is 'on', so all new B-Tree indexes automatically use deduplication where possible. This decision will be reviewed at the end of the Postgres 13 beta period.

There is a regression of approximately 2% of transaction throughput with synthetic workloads that consist of append-only inserts into a table with several non-unique indexes, where all indexes have few or no repeated values. The underlying issue is that cycles are wasted on unsuccessful attempts at deduplicating items in non-unique indexes. There doesn't seem to be a way around it short of disabling deduplication entirely. Note that deduplication of items in unique indexes is fairly well targeted in general, which avoids the problem there (we can use a special heuristic to trigger deduplication passes in unique indexes, since we're specifically targeting "version bloat").

Bump XLOG_PAGE_MAGIC because xl_btree_vacuum changed.

No bump in BTREE_VERSION, since the representation of posting list tuples works in a way that's backwards compatible with version 4 indexes (i.e. indexes built on PostgreSQL 12). However, users must still REINDEX a pg_upgrade'd index to use deduplication, regardless of the Postgres version they've upgraded from. This is the only way to set the new nbtree metapage flag indicating that deduplication is generally safe.

Author: Anastasia Lubennikova, Peter Geoghegan

0d861bbb70 Add deduplication to nbtree.
contrib/amcheck/verify_nbtree.c | 231 +++++++--
doc/src/sgml/btree.sgml | 201 +++++++-
doc/src/sgml/charset.sgml | 9 +-
doc/src/sgml/citext.sgml | 7 +-
doc/src/sgml/func.sgml | 9 +-
doc/src/sgml/ref/create_index.sgml | 44 +-
src/backend/access/common/reloptions.c | 10 +
src/backend/access/index/genam.c | 4 +
src/backend/access/nbtree/Makefile | 1 +
src/backend/access/nbtree/README | 133 ++++-
src/backend/access/nbtree/nbtdedup.c | 830 ++++++++++++++++++++++++++++++
src/backend/access/nbtree/nbtinsert.c | 388 ++++++++++++--
src/backend/access/nbtree/nbtpage.c | 246 ++++++++-
src/backend/access/nbtree/nbtree.c | 171 +++++-
src/backend/access/nbtree/nbtsearch.c | 272 +++++++++-
src/backend/access/nbtree/nbtsort.c | 193 ++++++-
src/backend/access/nbtree/nbtsplitloc.c | 39 +-
src/backend/access/nbtree/nbtutils.c | 196 +++++--
src/backend/access/nbtree/nbtxlog.c | 268 +++++++++-
src/backend/access/rmgrdesc/nbtdesc.c | 22 +-
src/backend/storage/page/bufpage.c | 11 +-
src/bin/psql/tab-complete.c | 4 +-
src/include/access/nbtree.h | 433 +++++++++++++---
src/include/access/nbtxlog.h | 117 ++++-
src/include/access/rmgrlist.h | 2 +-
src/include/access/xlog_internal.h | 2 +-
src/test/regress/expected/btree_index.out | 20 +-
src/test/regress/sql/btree_index.sql | 22 +-
28 files changed, 3553 insertions(+), 332 deletions(-)

Upstream: git.postgresql.org


  • Share