Add support for in-order addition reduction using SVE FADDA

Programming / Compilers / GCC - rsandifo [138bc75d-0d04-0410-961f-82ee72b054a4] - 13 January 2018 18:01 EST

This patch adds support for in-order floating-point addition reductions, which are suitable even in strict IEEE mode.

Previously vect_is_simple_reduction would reject any cases that forbid reassociation. The idea is instead to tentatively accept them as "FOLD_LEFT_REDUCTIONs" and only fail later if there is no support for them. Although this patch only handles the particular case of plus and minus on floating-point types, there's no reason in principle why we couldn't handle other cases.

The reductions use a new fold_left_plus_optab if available, otherwise they fall back to elementwise additions or subtractions.

The vect_force_simple_reduction change makes it easier for parloops to read the type of reduction.

2018-01-13 Richard Sandiford Alan Hayward David Sherwood

gcc/
- optabs.def (fold_left_plus_optab): New optab.
- doc/md.texi (fold_left_plus_@var{m}): Document.
- internal-fn.def (IFN_FOLD_LEFT_PLUS): New internal function.
- internal-fn.c (fold_left_direct): Define. (expand_fold_left_optab_fn): Likewise. (direct_fold_left_optab_supported_p): Likewise.
- fold-const-call.c (fold_const_fold_left): New function. (fold_const_call): Use it to fold CFN_FOLD_LEFT_PLUS.
- tree-parloops.c (valid_reduction_p): New function. (gather_scalar_reductions): Use it.
- tree-vectorizer.h (FOLD_LEFT_REDUCTION): New vect_reduction_type. (vect_finish_replace_stmt): Declare.
- tree-vect-loop.c (fold_left_reduction_fn): New function. (needs_fold_left_reduction_p): New function, split out from... (vect_is_simple_reduction): ...here. Accept reductions that forbid reassociation, but give them type FOLD_LEFT_REDUCTION. (vect_force_simple_reduction): Also store the reduction type in the assignment's STMT_VINFO_REDUC_TYPE. (vect_model_reduction_cost): Handle FOLD_LEFT_REDUCTION. (merge_with_identity): New function. (vect_expand_fold_left): Likewise. (vectorize_fold_left_reduction): Likewise. (vectorizable_reduction): Handle FOLD_LEFT_REDUCTION. Leave the scalar phi in place for it. Check for target support and reject cases that would reassociate the operation. Defer the transform phase to vectorize_fold_left_reduction.
- config/aarch64/aarch64.md (UNSPEC_FADDA): New unspec.
- config/aarch64/aarch64-sve.md (fold_left_plus_): New expander. (*fold_left_plus_, *pred_fold_left_plus_): New insns.

gcc/testsuite/
- gcc.dg/vect/no-fast-math-vect16.c: Expect the test to pass and check for a message about using in-order reductions.
- gcc.dg/vect/pr79920.c: Expect both loops to be vectorized and check for a message about using in-order reductions.
- gcc.dg/vect/trapv-vect-reduc-4.c: Expect all three loops to be
vectorized and check for a message about using in-order reductions. Expect targets with variable-length vectors to fall back to the fixed-length mininum.
- gcc.dg/vect/vect-reduc-6.c: Expect the loop to be vectorized and check for a message about using in-order reductions.
- gcc.dg/vect/vect-reduc-in-order-1.c: New test.
- gcc.dg/vect/vect-reduc-in-order-2.c: Likewise.
- gcc.dg/vect/vect-reduc-in-order-3.c: Likewise.
- gcc.dg/vect/vect-reduc-in-order-4.c: Likewise.
- gcc.target/aarch64/sve/reduc_strict_1.c: New test.
- gcc.target/aarch64/sve/reduc_strict_1_run.c: Likewise.
- gcc.target/aarch64/sve/reduc_strict_2.c: Likewise.
- gcc.target/aarch64/sve/reduc_strict_2_run.c: Likewise.
- gcc.target/aarch64/sve/reduc_strict_3.c: Likewise.
- gcc.target/aarch64/sve/slp_13.c: Add floating-point types.
- gfortran.dg/vect/vect-8.f90: Expect 22 loops to be vectorized if
vect_fold_left_plus.

d77809a490a Add support for in-order addition reduction using SVE FADDA
gcc/ChangeLog | 34 ++
gcc/config/aarch64/aarch64-sve.md | 39 ++
gcc/config/aarch64/aarch64.md | 1 +
gcc/doc/md.texi | 8 +
gcc/fold-const-call.c | 25 ++
gcc/internal-fn.c | 5 +
gcc/internal-fn.def | 4 +
gcc/optabs.def | 1 +
gcc/testsuite/ChangeLog | 27 ++
gcc/testsuite/gcc.dg/vect/no-fast-math-vect16.c | 4 +-
gcc/testsuite/gcc.dg/vect/pr79920.c | 5 +-
gcc/testsuite/gcc.dg/vect/trapv-vect-reduc-4.c | 7 +-
gcc/testsuite/gcc.dg/vect/vect-reduc-6.c | 6 +-
gcc/testsuite/gcc.dg/vect/vect-reduc-in-order-1.c | 42 +++
gcc/testsuite/gcc.dg/vect/vect-reduc-in-order-2.c | 44 +++
gcc/testsuite/gcc.dg/vect/vect-reduc-in-order-3.c | 42 +++
gcc/testsuite/gcc.dg/vect/vect-reduc-in-order-4.c | 45 +++
.../gcc.target/aarch64/sve/reduc_strict_1.c | 28 ++
.../gcc.target/aarch64/sve/reduc_strict_1_run.c | 29 ++
.../gcc.target/aarch64/sve/reduc_strict_2.c | 28 ++
.../gcc.target/aarch64/sve/reduc_strict_2_run.c | 31 ++
.../gcc.target/aarch64/sve/reduc_strict_3.c | 131 +++++++
gcc/testsuite/gcc.target/aarch64/sve/slp_13.c | 28 +-
gcc/testsuite/gfortran.dg/vect/vect-8.f90 | 2 +-
gcc/tree-parloops.c | 18 +-
gcc/tree-vect-loop.c | 394 ++++++++++++++++++---
gcc/tree-vectorizer.h | 11 +-
27 files changed, 960 insertions(+), 79 deletions(-)

Upstream: gcc.gnu.org


  • Share