The fallback way of handling a repeated 128-bit constant vector for SVE is to force the 128 bits to the constant pool and use LD1RQ to load it. Previously the code always used the byte variant of LD1RQ (LD1RQB), with a preceding BSWAP for big-endian targets. However, that BSWAP doesn't handle all cases correctly.
The simplest fix seemed to be to use the LD1RQ appropriate for the element size.
This helps to fix some of the sve/slp_*.c tests for aarch64_be, although a later patch is needed as well.
2018-02-01 Richard Sandiford
- config/aarch64/aarch64-sve.md (sve_ld1rq): Replace with... (*sve_ld1rq
- config/aarch64/aarch64.c (aarch64_expand_sve_widened_duplicate): Remove BSWAP handing for big-endian targets and use the form of LD1RQ appropariate for the mode.
- gcc.target/aarch64/sve/slp_2.c: Expect LD1RQD rather than LD1RQB.
- gcc.target/aarch64/sve/slp_3.c: Expect LD1RQW rather than LD1RQB.
- gcc.target/aarch64/sve/slp_4.c: Expect LD1RQH rather than LD1RQB.
4a5920b6083 [AArch64] Use all SVE LD1RQ variants
gcc/ChangeLog | 9 +++++++++
gcc/config/aarch64/aarch64-sve.md | 10 +++++-----
gcc/config/aarch64/aarch64.c | 19 +++++++------------
gcc/testsuite/ChangeLog | 6 ++++++
gcc/testsuite/gcc.target/aarch64/sve/slp_2.c | 2 +-
gcc/testsuite/gcc.target/aarch64/sve/slp_3.c | 2 +-
gcc/testsuite/gcc.target/aarch64/sve/slp_4.c | 2 +-
7 files changed, 30 insertions(+), 20 deletions(-)