GBE: Optimize write_image instruction for simd8 mode

Graphics / Beignet - Zhigang Gong [intel.com] - 24 February 2014 00:54 UTC

On simd8 mode, we can put the u,v,w,x,r,g,b,a to a selection vector directly and don't need to assign those values again.

Let's see an example, the following code is generated without this patch which is doing a simple image copy:

(26 ) (+f0) mov(8) g113<1>F g114<8,8,1>D { align1 WE_normal 1Q }; (28 ) (+f0) send(8) g108<1>UD g112<8,8,1>F sampler (3, 0, 0, 1) mlen 2 rlen 4 { align1 WE_normal 1Q }; (30 ) mov(8) g99<1>UD 0x0UD { align1 WE_all 1Q }; (32 ) mov(1) g99.7<1>UD 0xffffUD { align1 WE_all }; (34 ) mov(8) g103<1>UD 0x0UD { align1 WE_all 1Q }; (36 ) (+f0) mov(8) g100<1>UD g117<8,8,1>UD { align1 WE_normal 1Q }; (38 ) (+f0) mov(8) g101<1>UD g114<8,8,1>UD { align1 WE_normal 1Q }; (40 ) (+f0) mov(8) g104<1>UD g108<8,8,1>UD { align1 WE_normal 1Q }; (42 ) (+f0) mov(8) g105<1>UD g109<8,8,1>UD { align1 WE_normal 1Q }; (44 ) (+f0) mov(8) g106<1>UD g110<8,8,1>UD { align1 WE_normal 1Q }; (46 ) (+f0) mov(8) g107<1>UD g111<8,8,1>UD { align1 WE_normal 1Q }; (48 ) (+f0) send(8) null g99<8,8,1>UD renderunsupported target 5 mlen 9 rlen 0 { align1 WE_normal 1Q }; (50 ) (+f0) mov(8) g1<1>UW 0x1UW { align1 WE_normal 1Q }; L1: (52 ) mov(8) g112<1>UD g0<8,8,1>UD { align1 WE_all 1Q }; (54 ) send(8) null g112<8,8,1>UD thread_spawnerunsupported target 7 mlen 1 rlen 0 { align1 WE_normal 1Q EOT };

With this patch, we can optimize it as below:

(26 ) (+f0) mov(8) g106<1>F g111<8,8,1>D { align1 WE_normal 1Q }; (28 ) (+f0) send(8) g114<1>UD g105<8,8,1>F sampler (3, 0, 0, 1) mlen 2 rlen 4 { align1 WE_normal 1Q }; (30 ) mov(8) g109<1>UD 0x0UD { align1 WE_all 1Q }; (32 ) mov(1) g109.7<1>UD 0xffffUD { align1 WE_all }; (34 ) mov(8) g113<1>UD 0x0UD { align1 WE_all 1Q }; (36 ) (+f0) send(8) null g109<8,8,1>UD renderunsupported target 5 mlen 9 rlen 0 { align1 WE_normal 1Q }; (38 ) (+f0) mov(8) g1<1>UW 0x1UW { align1 WE_normal 1Q }; L1: (40 ) mov(8) g112<1>UD g0<8,8,1>UD { align1 WE_all 1Q }; (42 ) send(8) null g112<8,8,1>UD thread_spawnerunsupported target 7 mlen 1 rlen 0 { align1 WE_normal 1Q EOT };

This patch could save about 8 instructions per write_image.

35f39cc GBE: Optimize write_image instruction for simd8 mode.
backend/src/backend/gen_context.cpp | 58 +-----------------
backend/src/backend/gen_insn_selection.cpp | 88 ++++++++++++++++++++++------
backend/src/ocl_stdlib.tmpl.h | 12 ++--
3 files changed, 77 insertions(+), 81 deletions(-)

Upstream: cgit.freedesktop.org


  • Share