SLP reductions with variable-length vectors

13 January 2018

Two things stopped us using :

(1) We didn't have a way of constructing the initial vector. This patch does it by creating a vector full of the neutral identity value and then using a shift-and-insert function to insert any non-identity inputs into the low-numbered elements. (The non-identity values are needed for double reductions.) Alternatively, for unchained MIN/MAX reductions that have no neutral
value, we instead use the same duplicate-and-interleave approach as for SLP constant and external definitions (added by a previous patch).

(2) The epilogue for constant-length vectors would extract the vector elements associated with each SLP statement and do scalar arithmetic on these individual elements. For variable-length vectors, the patch instead creates a reduction vector for each SLP statement, replacing the elements for other SLP statements with the identity value. It then uses a hardware reduction instruction on each vector.

2018-01-13 Richard Sandiford Alan Hayward David Sherwood

- doc/md.texi (vec_shl_insert_@var{m}): New optab.
- internal-fn.def (VEC_SHL_INSERT): New internal function.
- optabs.def (vec_shl_insert_optab): New optab.
- tree-vectorizer.h (can_duplicate_and_interleave_p): Declare. (duplicate_and_interleave): Likewise.
- tree-vect-loop.c: Include internal-fn.h. (neutral_op_for_slp_reduction): New function, split out from get_initial_defs_for_reduction. (get_initial_def_for_reduction): Handle option 2 for variable-length
vectors by loading the neutral value into a vector and then shifting the initial value into element 0. (get_initial_defs_for_reduction): Replace the code argument with the neutral value calculated by neutral_op_for_slp_reduction. Use gimple_build_vector for constant-length vectors. Use IFN_VEC_SHL_INSERT for variable-length vectors if all but the first group_size elements have a neutral value. Use duplicate_and_interleave otherwise. (vect_create_epilog_for_reduction): Take a neutral_op parameter. Update call to get_initial_defs_for_reduction. Handle SLP reductions for variable-length vectors by creating one vector result for each scalar result, with the elements associated with other scalar results stubbed out with the neutral value. (vectorizable_reduction): Call neutral_op_for_slp_reduction. Require IFN_VEC_SHL_INSERT for double reductions on
variable-length vectors, or SLP reductions that have a neutral value. Require can_duplicate_and_interleave_p support for variable-length unchained SLP reductions if there is no neutral value, such as for MIN/MAX reductions. Also require the number of vector elements to be a multiple of the number of SLP statements when doing variable-length unchained SLP reductions. Update call to vect_create_epilog_for_reduction.
- tree-vect-slp.c (can_duplicate_and_interleave_p): Make public and remove initial values. (duplicate_and_interleave): Make public.
- config/aarch64/ (UNSPEC_INSR): New unspec.
- config/aarch64/ (vec_shl_insert_): New insn.

- gcc.dg/vect/pr37027.c: Remove XFAIL for variable-length vectors.
- gcc.dg/vect/pr67790.c: Likewise.
- gcc.dg/vect/slp-reduc-1.c: Likewise.
- gcc.dg/vect/slp-reduc-2.c: Likewise.
- gcc.dg/vect/slp-reduc-3.c: Likewise.
- gcc.dg/vect/slp-reduc-5.c: Likewise.
- New test.
- Likewise.
- Likewise.
- Likewise.
- Likewise.
- Likewise.

gcc/ChangeLog | 41 +++
gcc/config/aarch64/ | 13 +
gcc/config/aarch64/ | 1 +
gcc/doc/md.texi | 8 +
gcc/internal-fn.def | 3 +
gcc/optabs.def | 1 +
gcc/testsuite/ChangeLog | 17 ++
gcc/testsuite/gcc.dg/vect/pr37027.c | 2 +-
gcc/testsuite/gcc.dg/vect/pr67790.c | 2 +-
gcc/testsuite/gcc.dg/vect/slp-reduc-1.c | 2 +-
gcc/testsuite/gcc.dg/vect/slp-reduc-2.c | 2 +-
gcc/testsuite/gcc.dg/vect/slp-reduc-3.c | 5 +-
gcc/testsuite/gcc.dg/vect/slp-reduc-5.c | 2 +-
gcc/testsuite/ | 58 ++++
gcc/testsuite/ | 35 +++
gcc/testsuite/ | 47 ++++
gcc/testsuite/ | 37 +++
gcc/testsuite/ | 66 +++++
gcc/testsuite/ | 39 +++
gcc/tree-vect-loop.c | 322 ++++++++++++++++++-----
gcc/tree-vect-slp.c | 10 +-
gcc/tree-vectorizer.h | 5 +
22 files changed, 637 insertions(+), 81 deletions(-)


