Support for aliasing with variable strides

Programming / Compilers / GCC - rsandifo [138bc75d-0d04-0410-961f-82ee72b054a4] - 13 January 2018 18:02 EST

This patch adds runtime alias checks for loops with variable strides, so that we can vectorise them even without a restrict qualifier. There are several parts to doing this:

1) For accesses like:

x[i * n] += 1;

we need to check whether n (and thus the DR_STEP) is nonzero.
vect_analyze_data_ref_dependence records values that need to be checked in this way, then prune_runtime_alias_test_list records a bounds check on DR_STEP being outside the range [0, 0].

2) For accesses like:

x[i * n] = x[i * n + 1] + 1;

we simply need to test whether abs (n) >= 2. prune_runtime_alias_test_list looks for cases like this and tries to guess whether it is better to use this kind of check or a check for non-overlapping ranges. (We could do an OR of the two conditions at runtime, but that isn't implemented yet.)

3) Checks for overlapping ranges need to cope with variable strides. At present the "length" of each segment in a range check is represented as an offset from the base that lies outside the touched range, in the same direction as DR_STEP. The length can therefore be negative and is sometimes conservative.

With variable steps it's easier to reaon about if we split this into two:

seg_len: distance travelled from the first iteration of interest to the last, e.g. DR_STEP * (VF - 1)

access_size: the number of bytes accessed in each iteration

with access_size always being a positive constant and seg_len possibly being variable. We can then combine alias checks for two accesses that are a constant number of bytes apart by adjusting the access size to account for the gap. This leaves the segment length unchanged, which allows the check to be combined with further accesses.

When seg_len is positive, the runtime alias check has the form:

base_a >= base_b + seg_len_b + access_size_b || base_b >= base_a + seg_len_a + access_size_a

In many accesses the base will be aligned to the access size, which allows us to skip the addition:

base_a > base_b + seg_len_b || base_b > base_a + seg_len_a

A similar saving is possible with "negative" lengths.

The patch therefore tracks the alignment in addition to seg_len and access_size.

2018-01-13 Richard Sandiford Alan Hayward David Sherwood

gcc/
- tree-vectorizer.h (vec_lower_bound): New structure. (_loop_vec_info): Add check_nonzero and lower_bounds. (LOOP_VINFO_CHECK_NONZERO): New macro. (LOOP_VINFO_LOWER_BOUNDS): Likewise. (LOOP_REQUIRES_VERSIONING_FOR_ALIAS): Check lower_bounds too.
- tree-data-ref.h (dr_with_seg_len): Add access_size and align fields. Make seg_len the distance travelled, not including the access size. (dr_direction_indicator): Declare. (dr_zero_step_indicator): Likewise. (dr_known_forward_stride_p): Likewise.
- tree-data-ref.c: Include stringpool.h, tree-vrp.h and tree-ssanames.h. (runtime_alias_check_p): Allow runtime alias checks with
variable strides. (operator ==): Compare access_size and align. (prune_runtime_alias_test_list): Rework for new distinction between the access_size and seg_len. (create_intersect_range_checks_index): Likewise. Cope with polynomial segment lengths. (get_segment_min_max): New function. (create_intersect_range_checks): Use it. (dr_step_indicator): New function. (dr_direction_indicator): Likewise. (dr_zero_step_indicator): Likewise. (dr_known_forward_stride_p): Likewise.
- tree-loop-distribution.c (data_ref_segment_size): Return DR_STEP * (niters - 1). (compute_alias_check_pairs): Update call to the dr_with_seg_len constructor.
- tree-vect-data-refs.c (vect_check_nonzero_value): New function. (vect_preserves_scalar_order_p): New function, split out from... (vect_analyze_data_ref_dependence): ...here. Check for zero steps. (vect_vfa_segment_size): Return DR_STEP * (length_factor - 1). (vect_vfa_access_size): New function. (vect_vfa_align): Likewise. (vect_compile_time_alias): Take access_size_a and access_b arguments. (dump_lower_bound): New function. (vect_check_lower_bound): Likewise. (vect_small_gap_p): Likewise. (vectorizable_with_step_bound_p): Likewise. (vect_prune_runtime_alias_test_list): Ignore cross-iteration depencies if the vectorization factor is 1. Convert the checks for nonzero steps into checks on the bounds of DR_STEP. Try using a bunds check for variable steps if the minimum required step is relatively small. Update calls to the dr_with_seg_len constructor and to vect_compile_time_alias.
- tree-vect-loop-manip.c (vect_create_cond_for_lower_bounds): New function. (vect_loop_versioning): Call it.
- tree-vect-loop.c (vect_analyze_loop_2): Clear LOOP_VINFO_LOWER_BOUNDS when retrying. (vect_estimate_min_profitable_iters): Account for any bounds checks.

gcc/testsuite/
- gcc.dg/vect/bb-slp-cond-1.c: Expect loop vectorization rather than SLP vectorization.
- gcc.dg/vect/vect-alias-check-10.c: New test.
- gcc.dg/vect/vect-alias-check-11.c: Likewise.
- gcc.dg/vect/vect-alias-check-12.c: Likewise.
- gcc.dg/vect/vect-alias-check-8.c: Likewise.
- gcc.dg/vect/vect-alias-check-9.c: Likewise.
- gcc.target/aarch64/sve/strided_load_8.c: Likewise.
- gcc.target/aarch64/sve/var_stride_1.c: Likewise.
- gcc.target/aarch64/sve/var_stride_1.h: Likewise.
- gcc.target/aarch64/sve/var_stride_1_run.c: Likewise.
- gcc.target/aarch64/sve/var_stride_2.c: Likewise.
- gcc.target/aarch64/sve/var_stride_2_run.c: Likewise.
- gcc.target/aarch64/sve/var_stride_3.c: Likewise.
- gcc.target/aarch64/sve/var_stride_3_run.c: Likewise.
- gcc.target/aarch64/sve/var_stride_4.c: Likewise.
- gcc.target/aarch64/sve/var_stride_4_run.c: Likewise.
- gcc.target/aarch64/sve/var_stride_5.c: Likewise.
- gcc.target/aarch64/sve/var_stride_5_run.c: Likewise.
- gcc.target/aarch64/sve/var_stride_6.c: Likewise.
- gcc.target/aarch64/sve/var_stride_6_run.c: Likewise.
- gcc.target/aarch64/sve/var_stride_7.c: Likewise.
- gcc.target/aarch64/sve/var_stride_7_run.c: Likewise.
- gcc.target/aarch64/sve/var_stride_8.c: Likewise.
- gcc.target/aarch64/sve/var_stride_8_run.c: Likewise.
- gfortran.dg/vect/vect-alias-check-1.F90: Likewise.

e85b4a5e910 Support for aliasing with variable strides
gcc/ChangeLog | 58 +++
gcc/testsuite/ChangeLog | 31 ++
gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c | 8 +-
gcc/testsuite/gcc.dg/vect/vect-alias-check-10.c | 69 +++
gcc/testsuite/gcc.dg/vect/vect-alias-check-11.c | 99 ++++
gcc/testsuite/gcc.dg/vect/vect-alias-check-12.c | 99 ++++
gcc/testsuite/gcc.dg/vect/vect-alias-check-8.c | 62 +++
gcc/testsuite/gcc.dg/vect/vect-alias-check-9.c | 55 +++
.../gcc.target/aarch64/sve/strided_load_8.c | 15 +
.../gcc.target/aarch64/sve/var_stride_1.c | 27 ++
.../gcc.target/aarch64/sve/var_stride_1.h | 61 +++
.../gcc.target/aarch64/sve/var_stride_1_run.c | 14 +
.../gcc.target/aarch64/sve/var_stride_2.c | 25 +
.../gcc.target/aarch64/sve/var_stride_2_run.c | 18 +
.../gcc.target/aarch64/sve/var_stride_3.c | 27 ++
.../gcc.target/aarch64/sve/var_stride_3_run.c | 14 +
.../gcc.target/aarch64/sve/var_stride_4.c | 25 +
.../gcc.target/aarch64/sve/var_stride_4_run.c | 18 +
.../gcc.target/aarch64/sve/var_stride_5.c | 27 ++
.../gcc.target/aarch64/sve/var_stride_5_run.c | 14 +
.../gcc.target/aarch64/sve/var_stride_6.c | 25 +
.../gcc.target/aarch64/sve/var_stride_6_run.c | 18 +
.../gcc.target/aarch64/sve/var_stride_7.c | 26 ++
.../gcc.target/aarch64/sve/var_stride_7_run.c | 14 +
.../gcc.target/aarch64/sve/var_stride_8.c | 26 ++
.../gcc.target/aarch64/sve/var_stride_8_run.c | 14 +
.../gfortran.dg/vect/vect-alias-check-1.F90 | 102 +++++
gcc/tree-data-ref.c | 501 +++++++++++++--------
gcc/tree-data-ref.h | 17 +-
gcc/tree-loop-distribution.c | 27 +-
gcc/tree-vect-data-refs.c | 365 +++++++++++++--
gcc/tree-vect-loop-manip.c | 26 ++
gcc/tree-vect-loop.c | 13 +
gcc/tree-vectorizer.h | 25 +-
34 files changed, 1720 insertions(+), 245 deletions(-)

Upstream: gcc.gnu.org


  • Share