Clean up some ad-hoc code for sorting and de-duplicating Lists

Enterprise / PostgreSQL - Tom Lane [sss.pgh.pa.us] - 16 July 2019 16:04 EDT

heap.c and relcache.c contained nearly identical copies of logic to insert OIDs into an OID list while preserving the list's OID ordering (and rejecting duplicates, in one case but not the other).

The comments argue that this is faster than qsort for small numbers of OIDs, which is at best unproven, and seems even less likely to be true now that lappend_cell_oid has to move data around. In any case it's ugly and hard-to-follow code, and if we do have a lot of OIDs to consider, it's O(N^2).

Hence, replace with simply lappend'ing OIDs to a List, then list_sort the completed List, then remove adjacent duplicates if necessary. This is demonstrably O(N log N) and it's much simpler for the callers. It's possible that this would be somewhat inefficient if there were a very large number of duplicates, but that seems unlikely in the existing usage.

This adds list_deduplicate_oid and list_oid_cmp infrastructure to list.c. I didn't bother with equivalent functionality for integer or pointer Lists, but such could always be added later if we find a use for it.

Discussion: https://postgr.es/m/26193.1563228600@sss.pgh.pa.us

2f5b8eb5a2 Clean up some ad-hoc code for sorting and de-duplicating Lists.
src/backend/catalog/heap.c | 49 +++++---------------------------------
src/backend/nodes/list.c | 44 ++++++++++++++++++++++++++++++++++
src/backend/utils/cache/relcache.c | 46 +++++++----------------------------
src/include/nodes/pg_list.h | 4 ++++
4 files changed, 63 insertions(+), 80 deletions(-)

Upstream: git.postgresql.org


  • Share