Increase number of hash join buckets for underestimate

Enterprise / PostgreSQL - Kevin Grittner [postgresql.org] - 13 October 2014 10:16 UTC

If we expect batching at the very beginning, we size nbuckets for "full work_mem" (see how many tuples we can get into work_mem, while not breaking NTUP_PER_BUCKET threshold).

If we expect to be fine without batching, we start with the 'right' nbuckets and track the optimal nbuckets as we go (without actually resizing the hash table). Once we hit work_mem (considering the optimal nbuckets value), we keep the value.

At the end of the first batch, we check whether (nbuckets != nbuckets_optimal) and resize the hash table if needed. Also, we keep this value for all batches (it's OK because it assumes full work_mem, and it makes the batchno evaluation trivial). So the resize happens only once.

There could be cases where it would improve performance to allow the NTUP_PER_BUCKET threshold to be exceeded to keep everything in one batch rather than spilling to a second batch, but attempts to generate such a case have so far been unsuccessful; that issue may be addressed with a follow-on patch after further investigation.

Tomas Vondra with minor format and comment cleanup by me Reviewed by Robert Haas, Heikki Linnakangas, and Kevin Grittner

30d7ae3 Increase number of hash join buckets for underestimate.
src/backend/commands/explain.c | 11 ++--
src/backend/executor/nodeHash.c | 131 ++++++++++++++++++++++++++++++++++++++-
src/include/executor/hashjoin.h | 5 ++
3 files changed, 141 insertions(+), 6 deletions(-)

Upstream: git.postgresql.org

Related PostgreSQL Activity

Speedup child EquivalenceMember lookup in planner
David Rowley

Compute CRC32C using AVX-512 instructions where available
John Naylor

Further optimize nbtree search scan key comparisons
Peter Geoghegan

Add nbtree skip scan optimization
Peter Geoghegan

Recent PostgreSQL Activity

Doc: Make logical replication examples executable in bulk
Amit Kapila

pg_dump: Adjust reltuples from 0 to -1 for dumps of older versions
Nathan Bossart

Doc: add pre-branch task to run src/tools/copyright.pl
Tom Lane

Make our usage of memset_s() conform strictly to the C11 standard
Tom Lane

Add explicit initialization for all PlannerGlobal fields
Richard Guo

Increase number of hash join buckets for underestimate

Enterprise / PostgreSQL - Kevin Grittner [postgresql.org] - 13 October 2014 10:16 UTC

Related PostgreSQL Activity

Share

Recent PostgreSQL Activity