Following on from the previous patch for strided accesses, this patch allows gather loads to be used with grouped accesses, if we otherwise would need to fall back to VMAT_ELEMENTWISE. However, as the comment says, this is restricted to single-element groups for now:
??? Although the code can handle all group sizes correctly, it probably isn't a win to use separate strided accesses based on nearby locations. Or, even if it's a win over scalar code, it might not be a win over vectorizing at a lower VF, if that allows us to use contiguous accesses.
Single-element groups are an important special case though, and this means that code is less sensitive to GCC's classification of single accesses with constant steps as "grouped" and ones with
variable steps as "strided".
2018-01-13 Richard Sandiford Alan Hayward David Sherwood
- tree-vectorizer.h (vect_gather_scatter_fn_p): Declare.
- tree-vect-data-refs.c (vect_gather_scatter_fn_p): Make public.
- tree-vect-stmts.c (vect_truncate_gather_scatter_offset): New function. (vect_use_strided_gather_scatters_p): Take a masked_p argument. Use vect_truncate_gather_scatter_offset if we can't treat the operation as a normal gather load or scatter store. (get_group_load_store_type): Take the gather_scatter_info as argument. Try using a gather load or scatter store for single-element groups. (get_load_store_type): Update calls to get_group_load_store_type and vect_use_strided_gather_scatters_p.
- gcc.target/aarch64/sve/reduc_strict_3.c: Expect FADDA to be used for double_reduc1.
- gcc.target/aarch64/sve/strided_load_4.c: New test.
- gcc.target/aarch64/sve/strided_load_5.c: Likewise.
- gcc.target/aarch64/sve/strided_load_6.c: Likewise.
- gcc.target/aarch64/sve/strided_load_7.c: Likewise.
1d2c127d7cc Allow gather loads to be used for grouped accesses
gcc/ChangeLog | 17 +++
gcc/testsuite/ChangeLog | 11 ++
.../gcc.target/aarch64/sve/reduc_strict_3.c | 9 +-
.../gcc.target/aarch64/sve/strided_load_4.c | 33 ++++++
.../gcc.target/aarch64/sve/strided_load_5.c | 34 ++++++
.../gcc.target/aarch64/sve/strided_load_6.c | 7 ++
.../gcc.target/aarch64/sve/strided_load_7.c | 34 ++++++
gcc/tree-vect-data-refs.c | 2 +-
gcc/tree-vect-stmts.c | 127 ++++++++++++++++++++-
gcc/tree-vectorizer.h | 2 +
10 files changed, 263 insertions(+), 13 deletions(-)