- Real hardware (so far) all supports MSI-X, but VMs emulating NVMe have been found not to.
- Fix numerous assertions that were getting hit due to the non-MSI-X case not installing the sc->cputovect[i] mapping.
Install a fake cputovect mapping. This mapping is primarily to allow multiple submission queues (per-cpu when possible). Completion queues will be further limited to reduce loop-check overheads.
- For the non-MSI-X case, limit the number of completion queues to 4, since there is really no point having more there being only one interrupt
vector. We use 4 to allow the chipset side to run optimally even though it is not necessarily useful to have that many on the cpu side. Though to be fair, in cases where the cpu-side driver polls for completions, having multiple completion queues CAN help even if there is only one interrupt as each completion queue is separately locked.
- Properly set the interrupt masking registers in the non-MSI-X case (probably not needed). Note that these registers are explicitly not supposed to be accessed by the host when MSI-X is used.
- Fix a bug where the maximum number of queues possible was one too high. This limit is *never* reached anyway, but fix the code just in case.
- Fix a bug where we assumed that the number of queues returned by the NVME_FID_NUMQUEUES command would always be <= the number of queues requested. In fact, this is not the case for at least one chipset or for some VM emulations. Limit the returned values to no more than the requested values.
- Set the queue->nqe field last when creating a completion queue. This prevents interrupts which poll multiple completion queues from attempting to poll a completion queue that has not finished getting set up. This case always occurs when pin-based interrupts are used and sometimes occurs when MSI-X vectors are used, depending on the topology.
- NOTES ON DISABLING MSI-X. Not all chipsets implement pin-based interrupts properly for NVMe. The BPX NVMe card, for example, appears to just leave the pin interrupt in a stuck state (the chipset docs say the level interrupt is cleared once all doorbell heads are synchronized for the completion queues, but this does not happen). So NVMe users should not explicitly disable MSI-X when it is nominally supported, except for testing.
2d74683 nvme - Fix interrupt pin support when MSI-X is unavailable.
sys/dev/disk/nvme/nvme.c | 43 ++++++++++++++++++++++++++++++++++++--
sys/dev/disk/nvme/nvme_admin.c | 46 ++++++++++++++++++++++++++++++-----------
sys/dev/disk/nvme/nvme_attach.c | 23 +++++++++++++++++++++
3 files changed, 98 insertions(+), 14 deletions(-)