SFU operations have a latency of 2 cicles, so if their results are used in the following cycle to a SFU instruction, the GPU stalls for an extra cycle until the result is available.
This adds the number of stalls to the shader-db debug mode and sum of instruction + stalls to evaluate optimizations to schedule instructions that avoid generating sfu-stalls.
v2: Rename v3d_qpu_generates_sfu_stalls to v3d_qpu_instr_is_sfu (Eric)
c341ab7ffba v3d: add shader-db stat to count SFU stalls
src/broadcom/compiler/qpu_schedule.c | 45 ++++++++++++++++++++++++++++++++++++
src/broadcom/compiler/v3d_compiler.h | 1 +
src/broadcom/compiler/vir.c | 7 ++++--
src/broadcom/qpu/qpu_instr.c | 34 +++++++++++++++++----------
src/broadcom/qpu/qpu_instr.h | 1 +
5 files changed, 74 insertions(+), 14 deletions(-)