For 16-bit bank LDS (ie. Kabini/Stoney) we need a slightly different path. It's completely untested though because I don't have these chips but according to vkpipeline-db the generated assembly seems fine.
Note that 16-bit I/O is currently only exposed on GFX9+ for both compiler backends.
1647e098e94 aco: implement 16-bit interp
src/amd/compiler/aco_instruction_selection.cpp | 38 +++++++++++++++++++++++---
1 file changed, 34 insertions(+), 4 deletions(-)