This isn't as complete as I would like (can't merge interpolation because of the implicit r5 dependency, doesn't work with control flow), but this was cheap and easy.
Improves 3DMMES Taiji performance by 1.15353% +/- 0.299896% (n=29, 16)
total instructions in shared programs: 99810 -> 99059 (-0.75%) instructions in affected programs: 10705 -> 9954 (-7.02%)
4690a93 vc4: Add support for coalescing ALU ops into tex_[srtb] MOVs.
.../drivers/vc4/vc4_opt_coalesce_ff_writes.c | 36 +++++++++++++-------
src/gallium/drivers/vc4/vc4_qir.c | 11 ++++++
src/gallium/drivers/vc4/vc4_qir.h | 1 +
.../vc4/vc4_qir_emit_uniform_stream_resets.c | 18 ++--------
4 files changed, 37 insertions(+), 29 deletions(-)
Upstream: cgit.freedesktop.org