x86-64: Add sincosf with vector FMA

System Internals / glibc - H.J. Lu [gmail.com] - 8 January 2018 16:04 EST

Since the x86-64 assembly version of sincosf is higly optimized with
vector instructions, there isn't much room for improvement. However s_sincosf.c written in C with vector math and intrinsics can be optimized by GCC with FMA.

On Skylake, bench-sincosf reports performance improvement:

Assembly FMA improvement max 104.042 101.008 3% min 9.426 8.586 10% mean 20.6209 18.2238 13%

- sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines): Add s_sincosf-sse2 and s_sincosf-fma. (CFLAGS-s_sincosf-fma.c): New.
- sysdeps/x86_64/fpu/multiarch/s_sincosf-fma.c: New file.
- sysdeps/x86_64/fpu/multiarch/s_sincosf-sse2.S: Likewise.
- sysdeps/x86_64/fpu/multiarch/s_sincosf.c: Likewise.
- sysdeps/x86_64/fpu/s_sincosf.S: Don't add alias if __sincosf is defined.

c70e4e9c9e x86-64: Add sincosf with vector FMA
ChangeLog | 11 ++
NEWS | 4 +-
sysdeps/x86_64/fpu/multiarch/Makefile | 5 +-
sysdeps/x86_64/fpu/multiarch/s_sincosf-fma.c | 240 ++++++++++++++++++++++++++
sysdeps/x86_64/fpu/multiarch/s_sincosf-sse2.S | 2 +
sysdeps/x86_64/fpu/multiarch/s_sincosf.c | 28 +++
sysdeps/x86_64/fpu/s_sincosf.S | 2 +
7 files changed, 288 insertions(+), 4 deletions(-)

Upstream: sourceware.org


  • Share