Disk-based Hash Aggregation

Enterprise / PostgreSQL - Jeff Davis [postgresql.org] - 18 March 2020 22:42 EDT

While performing hash aggregation, track memory usage when adding new groups to a hash table. If the memory usage exceeds work_mem, enter "spill mode".

In spill mode, new groups are not created in the hash table(s), but existing groups continue to be advanced if input tuples match. Tuples that would cause a new group to be created are instead spilled to a logical tape to be processed later.

The tuples are spilled in a partitioned fashion. When all tuples from the outer plan are processed (either by advancing the group or spilling the tuple), finalize and emit the groups from the hash table. Then, create new batches of work from the spilled partitions, and select one of the saved batches and process it (possibly spilling recursively).

Author: Jeff Davis

1f39bce021 Disk-based Hash Aggregation.
doc/src/sgml/config.sgml | 32 +
src/backend/commands/explain.c | 37 +
src/backend/executor/nodeAgg.c | 1092 ++++++++++++++++++++++++-
src/backend/optimizer/path/costsize.c | 70 +-
src/backend/optimizer/plan/planner.c | 19 +-
src/backend/optimizer/prep/prepunion.c | 2 +-
src/backend/optimizer/util/pathnode.c | 14 +-
src/backend/utils/misc/guc.c | 20 +
src/include/executor/nodeAgg.h | 8 +
src/include/nodes/execnodes.h | 22 +-
src/include/optimizer/cost.h | 4 +-
src/test/regress/expected/aggregates.out | 184 +++++
src/test/regress/expected/groupingsets.out | 122 +++
src/test/regress/expected/select_distinct.out | 62 ++
src/test/regress/expected/sysviews.out | 4 +-
src/test/regress/sql/aggregates.sql | 131 +++
src/test/regress/sql/groupingsets.sql | 103 +++
src/test/regress/sql/select_distinct.sql | 62 ++
18 files changed, 1950 insertions(+), 38 deletions(-)

Upstream: git.postgresql.org


  • Share