Optimise sqrt reciprocal multiplications

Programming / Compilers / GCC - ktkachov [138bc75d-0d04-0410-961f-82ee72b054a4] - 5 September 2018 13:39 EDT

This patch aims to optimise sequences involving uses of 1.0 / sqrt (a) under -freciprocal-math and -funsafe-math-optimizations. In particular consider:

x = 1.0 / sqrt (a); r1 = x * x; // same as 1.0 / a r2 = a * x; // same as sqrt (a)

If x, r1 and r2 are all used further on in the code, this can be transformed into: tmp1 = 1.0 / a tmp2 = sqrt (a) tmp3 = tmp1 * tmp2 x = tmp3 r1 = tmp1 r2 = tmp2

A bit convoluted, but this saves us one multiplication and, more importantly, the sqrt and division are now independent. This also allows optimisation of a subset of these expressions. For example: x = 1.0 / sqrt (a) r1 = x * x

can be transformed to r1 = 1.0 / a, eliminating the sqrt if x is not used anywhere else. And similarly: x = 1.0 / sqrt (a) r1 = a * x

can be transformed to sqrt (a) eliminating the division.

For the testcase: double res, res2, tmp;
void foo (double a, double b) { tmp = 1.0 / __builtin_sqrt (a); res = tmp * tmp; res2 = a * tmp; }

We now generate for aarch64 with -Ofast: foo: fmov d2, 1.0e+0 adrp x2, res2 fsqrt d1, d0 adrp x1, res fdiv d0, d2, d0 adrp x0, tmp str d1, [x2, #:lo12:res2] fmul d1, d1, d0 str d0, [x1, #:lo12:res] str d1, [x0, #:lo12:tmp] ret

where before it generated: foo: fsqrt d2, d0 fmov d1, 1.0e+0 adrp x1, res2 adrp x2, tmp adrp x0, res fdiv d1, d1, d2 fmul d0, d1, d0 fmul d2, d1, d1 str d1, [x2, #:lo12:tmp] str d0, [x1, #:lo12:res2] str d2, [x0, #:lo12:res] ret

As you can see, the new sequence has one fewer multiply and the fsqrt and fdiv are independent.

- tree-ssa-math-opts.c (is_mult_by): New function. (is_square_of): Use the above. (optimize_recip_sqrt): New function. (pass_cse_reciprocals::execute): Use the above.

- gcc.dg/recip_sqrt_mult_1.c: New test.
- gcc.dg/recip_sqrt_mult_2.c: Likewise.
- gcc.dg/recip_sqrt_mult_3.c: Likewise.
- gcc.dg/recip_sqrt_mult_4.c: Likewise.
- gcc.dg/recip_sqrt_mult_5.c: Likewise.
- g++.dg/recip_sqrt_mult_1.C: Likewise.
- g++.dg/recip_sqrt_mult_2.C: Likewise.

3cb2785efe2 Optimise sqrt reciprocal multiplications
gcc/ChangeLog | 7 ++
gcc/testsuite/ChangeLog | 10 ++
gcc/testsuite/g++.dg/recip_sqrt_mult_1.C | 49 ++++++++
gcc/testsuite/g++.dg/recip_sqrt_mult_2.C | 49 ++++++++
gcc/testsuite/gcc.dg/recip_sqrt_mult_1.c | 15 +++
gcc/testsuite/gcc.dg/recip_sqrt_mult_2.c | 11 ++
gcc/testsuite/gcc.dg/recip_sqrt_mult_3.c | 11 ++
gcc/testsuite/gcc.dg/recip_sqrt_mult_4.c | 21 ++++
gcc/testsuite/gcc.dg/recip_sqrt_mult_5.c | 20 +++
gcc/tree-ssa-math-opts.c | 206 ++++++++++++++++++++++++++++++-
10 files changed, 395 insertions(+), 4 deletions(-)

Upstream: gcc.gnu.org


  • Share