Commit b663a41363 introduced bulk inserts for FDW, but the handling of tuple slots turned out to be problematic for two reasons. Firstly, the slots were re-created for each individual batch. Secondly, all slots referenced the same tuple descriptor - with reasonably small batches this is not an issue, but with large batches this triggers O(N^2) behavior in the resource owner code.
These two issues work against each other - to reduce the number of times a slot has to be created/dropped, larger batches are needed. However, the larger the batch, the more expensive the resource owner gets. For practical batch sizes (100 - 1000) this would not be a big problem, as the benefits (latency savings) greatly exceed the resource owner costs. But for extremely large batches it might be much worse, possibly even losing with non-batching mode.
Fixed by initializing tuple slots only once (and reusing them across batches) and by using a new tuple descriptor copy for each slot.
b676ac443b Optimize creation of slots for FDW bulk inserts
src/backend/executor/nodeModifyTable.c | 52 +++++++++++++++++++++++-----------
src/include/nodes/execnodes.h | 1 +
2 files changed, 37 insertions(+), 16 deletions(-)