ATH11K Archive mirror
 help / color / mirror / Atom feed
From: Alex Williamson <alex.williamson@redhat.com>
To: David Woodhouse <dwmw2@infradead.org>
Cc: Baochen Qiang <quic_bqiang@quicinc.com>,
	Kalle Valo <kvalo@kernel.org>,
	James Prestwood <prestwoj@gmail.com>,
	linux-wireless@vger.kernel.org, ath11k@lists.infradead.org,
	iommu@lists.linux.dev
Subject: Re: ath11k and vfio-pci support
Date: Tue, 16 Jan 2024 11:28:45 -0700	[thread overview]
Message-ID: <20240116112845.55ebfcf7.alex.williamson@redhat.com> (raw)
In-Reply-To: <57d20bd812ccf8d1a5815ad41b5dcea3925d4fe1.camel@infradead.org>

On Tue, 16 Jan 2024 11:41:19 +0100
David Woodhouse <dwmw2@infradead.org> wrote:

> On Tue, 2024-01-16 at 18:08 +0800, Baochen Qiang wrote:
> > 
> > 
> > On 1/16/2024 1:46 AM, Alex Williamson wrote:  
> > > On Sun, 14 Jan 2024 16:36:02 +0200
> > > Kalle Valo <kvalo@kernel.org> wrote:
> > >   
> > > > Baochen Qiang <quic_bqiang@quicinc.com> writes:
> > > >   
> > > > > > > Strange that still fails. Are you now seeing this error in your
> > > > > > > host or your Qemu? or both?
> > > > > > > Could you share your test steps? And if you can share please be as
> > > > > > > detailed as possible since I'm not familiar with passing WLAN
> > > > > > > hardware to a VM using vfio-pci.  
> > > > > > 
> > > > > > Just in Qemu, the hardware works fine on my host machine.
> > > > > > I basically follow this guide to set it up, its written in the
> > > > > > context of GPUs/libvirt but the host setup is exactly the same. By
> > > > > > no means do you need to read it all, once you set the vfio-pci.ids
> > > > > > and see your unclaimed adapter you can stop:
> > > > > > https://wiki.archlinux.org/title/PCI_passthrough_via_OVMF
> > > > > > In short you should be able to set the following host kernel options
> > > > > > and reboot (assuming your motherboard/hardware is compatible):
> > > > > > intel_iommu=on iommu=pt vfio-pci.ids=17cb:1103
> > > > > > Obviously change the device/vendor IDs to whatever ath11k hw you
> > > > > > have. Once the host is rebooted you should see your wlan adapter as
> > > > > > UNCLAIMED, showing the driver in use as vfio-pci. If not, its likely
> > > > > > your motherboard just isn't compatible, the device has to be in its
> > > > > > own IOMMU group (you could try switching PCI ports if this is the
> > > > > > case).
> > > > > > I then build a "kvm_guest.config" kernel with the driver/firmware
> > > > > > for ath11k and boot into that with the following Qemu options:
> > > > > > -enable-kvm -device -vfio-pci,host=<PCI address>
> > > > > > If it seems easier you could also utilize IWD's test-runner which
> > > > > > handles launching the Qemu kernel automatically, detecting any
> > > > > > vfio-devices and passes them through and mounts some useful host
> > > > > > folders into the VM. Its actually a very good general purpose tool
> > > > > > for kernel testing, not just for IWD:
> > > > > > https://git.kernel.org/pub/scm/network/wireless/iwd.git/tree/doc/test-runner.txt
> > > > > > Once set up you can just run test-runner with a few flags and you'll
> > > > > > boot into a shell:
> > > > > > ./tools/test-runner -k <kernel-image> --hw --start /bin/bash
> > > > > > Please reach out if you have questions, thanks for looking into
> > > > > > this.  
> > > > > 
> > > > > Thanks for these details. I reproduced this issue by following your guide.
> > > > > 
> > > > > Seems the root cause is that the MSI vector assigned to WCN6855 in
> > > > > qemu is different with that in host. In my case the MSI vector in qemu
> > > > > is [Address: fee00000  Data: 0020] while in host it is [Address:
> > > > > fee00578 Data: 0000]. So in qemu ath11k configures MSI vector
> > > > > [Address: fee00000 Data: 0020] to WCN6855 hardware/firmware, and
> > > > > firmware uses that vector to fire interrupts to host/qemu. However
> > > > > host IOMMU doesn't know that vector because the real vector is
> > > > > [Address: fee00578  Data: 0000], as a result host blocks that
> > > > > interrupt and reports an error, see below log:
> > > > > 
> > > > > [ 1414.206069] DMAR: DRHD: handling fault status reg 2
> > > > > [ 1414.206081] DMAR: [INTR-REMAP] Request device [02:00.0] fault index
> > > > > 0x0 [fault reason 0x25] Blocked a compatibility format interrupt
> > > > > request
> > > > > [ 1414.210334] DMAR: DRHD: handling fault status reg 2
> > > > > [ 1414.210342] DMAR: [INTR-REMAP] Request device [02:00.0] fault index
> > > > > 0x0 [fault reason 0x25] Blocked a compatibility format interrupt
> > > > > request
> > > > > [ 1414.212496] DMAR: DRHD: handling fault status reg 2
> > > > > [ 1414.212503] DMAR: [INTR-REMAP] Request device [02:00.0] fault index
> > > > > 0x0 [fault reason 0x25] Blocked a compatibility format interrupt
> > > > > request
> > > > > [ 1414.214600] DMAR: DRHD: handling fault status reg 2
> > > > > 
> > > > > While I don't think there is a way for qemu/ath11k to get the real MSI
> > > > > vector from host, I will try to read the vfio code to check further.
> > > > > Before that, to unblock you, a possible hack is to hard code the MSI
> > > > > vector in qemu to the same as in host, on condition that the MSI
> > > > > vector doesn't change.  
> > > > 
> > > > Baochen, awesome that you were able to debug this further. Now we at
> > > > least know what's the problem.  
> > > 
> > > It's an interesting problem, I don't think we've seen another device
> > > where the driver reads the MSI register in order to program another
> > > hardware entity to match the MSI address and data configuration.
> > > 
> > > When assigning a device, the host and guest use entirely separate
> > > address spaces for MSI interrupts.  When the guest enables MSI, the
> > > operation is trapped by the VMM and triggers an ioctl to the host to
> > > perform an equivalent configuration.  Generally the physical device
> > > will interrupt within the host where it may be directly attached to KVM
> > > to signal the interrupt, trigger through the VMM, or where
> > > virtualization hardware supports it, the interrupt can directly trigger
> > > the vCPU.   From the VM perspective, the guest address/data pair is used
> > > to signal the interrupt, which is why it makes sense to virtualize the
> > > MSI registers.  
> >
> > Hi Alex, could you help elaborate more? why from the VM perspective MSI 
> > virtualization is necessary?  
> 
> An MSI is just a write to physical memory space. You can even use it
> like that; configure the device to just write 4 bytes to some address
> in a struct in memory to show that it needs attention, and you then
> poll that memory.
> 
> But mostly we don't (ab)use it like that, of course. We tell the device
> to write to a special range of the physical address space where the
> interrupt controller lives — the range from 0xfee00000 to 0xfeefffff.
> The low 20 bits of the address, and the 32 bits of data written to that
> address, tell the interrupt controller which CPU to interrupt, and
> which vector to raise on the CPU (as well as some other details and
> weird interrupt modes which are theoretically encodable).
> 
> So in your example, the guest writes [Address: fee00000  Data: 0020]
> which means it wants vector 0x20 on CPU#0 (well, the CPU with APICID
> 0). But that's what the *guest* wants. If we just blindly programmed
> that into the hardware, the hardware would deliver vector 0x20 to the
> host's CPU0... which would be very confused by it.
> 
> The host has a driver for that device, probably the VFIO driver. The
> host registers its own interrupt handlers for the real hardware,
> decides which *host* CPU (and vector) should be notified when something
> happens. And when that happens, the VFIO driver will raise an event on
> an eventfd, which will notify QEMU to inject the appropriate interrupt
> into the guest.
> 
> So... when the guest enables the MSI, that's trapped by QEMU which
> remembers which *guest* CPU/vector the interrupt should go to. QEMU
> tells VFIO to enable the corresponding interrupt, and what gets
> programmed into the actual hardware is up to the *host* operating
> system; nothing to do with the guest's information at all.
> 
> Then when the actual hardware raises the interrupt, the VFIO interrupt
> handler runs in the guest, signals an event on the eventfd, and QEMU

s/guest/host/

> receives that and injects the event into the appropriate guest vCPU.
> 
> (In practice QEMU doesn't do it these days; there's actually a shortcut
> which improves latency by allowing the kernel to deliver the event to
> the guest directly, connecting the eventfd directly to the KVM irq
> routing table.)
> 
> 
> Interrupt remapping is probably not important here, but I'll explain it
> briefly anyway. With interrupt remapping, the IOMMU handles the
> 'memory' write from the device, just as it handles all other memory
> transactions. One of the reasons for interrupt remapping is that the
> original definitions of the bits in the MSI (the low 20 bits of the
> address and the 32 bits of what's written) only had 8 bits for the
> target CPU APICID. And we have bigger systems than that now.
> 
> So by using one of the spare bits in the MSI message, we can indicate
> that this isn't just a directly-encoded cpu/vector in "Compatibility
> Format", but is a "Remappable Format" interrupt. Instead of the
> cpu/vector it just contains an index in to the IOMMU's Interrupt
> Redirection Table. Which *does* have a full 32-bits for the target APIC
> ID. That's why x2apic support (which gives us support for >254 CPUs)
> depends on interrupt remapping. 
> 
> The other thing that the IOMMU can do in modern systems is *posted*
> interrupts. Where the entry in the IOMMU's IRT doesn't just specify the
> host's CPU/vector, but actually specifies a *vCPU* to deliver the
> interrupt to. 
> 
> All of which is mostly irrelevant as it's just another bypass
> optimisation to improve latency. The key here is that what the guest
> writes to its emulated MSI table and what the host writes to the real
> hardware are not at all related.
> 
> If we had had this posted interrupt support from the beginning, perhaps
> we could have have a much simpler model — we just let the guest write
> its intended (v)CPU#/vector *directly* to the MSI table in the device,
> and let the IOMMU fix it up by having a table pointing to the
> appropriate set of vCPUs. But that isn't how it happened. The model we
> have is that the VMM has to *emulate* the config space and handle the
> interrupts as described above.
> 
> This means that whenever a device has a non-standard way of configuring
> MSIs, the VMM has to understand and intercept that. I believe we've
> even seen some Atheros devices with the MSI target in some weird MMIO
> registers instead of the standard location, so we've had to hack QEMU
> to handle those too?
> 
> > And, maybe a stupid question, is that possible VM/KVM or vfio only 
> > virtualize write operation to MSI register but leave read operation 
> > un-virtualized? I am asking this because in that way ath11k may get a
> > chance to run in VM after getting the real vector.  
> 
> That might confuse a number of operating systems. Especially if they
> mask/unmask by reading the register, flipping the mask bit and writing
> back again.
> 
> How exactly is the content of this register then given back to the
> firmware? Is that communication snoopable by the VMM?
> 
> 
> > > 
> > > Off hand I don't have a good solution for this, the hardware is
> > > essentially imposing a unique requirement for MSI programming that the
> > > driver needs visibility of the physical MSI address and data.
> > >   
> 
> Strictly, the driver doesn't need visibility to the actual values used
> by the hardware. Another way of it looking at it would be to say that
> the driver programs the MSI through this non-standard method, it just
> needs the VMM to trap and handle that, just as the VMM does for the
> standard MSI table. 
> 
> Which is what I thought we'd already seen on some Atheros devices.
> 
> > >   It's
> > > conceivable that device specific code could either make the physical
> > > address/data pair visible to the VM or trap the firmware programming to
> > > inject the correct physical values.  Is there somewhere other than the
> > > standard MSI capability in config space that the driver could learn the
> > > physical values, ie. somewhere that isn't virtualized?  Thanks,  
> >
> > I don't think we have such capability in configuration space.  
> 
> Configuration space is a complete fiction though; it's all emulated. We
> can do anything we like. Or we can have a PV hypercall which will
> report it. I don't know that we'd *want* to, but all things are
> possible.

RTL8169 has a back door to the MSI-X vector table, maybe that's the one
you're thinking of.  Alternate methods for the driver to access config
space is common on GPUs, presumably because they require extensive
vBIOS support and IO port and MMIO windows through which pre-boot code
can interact with config space is faster and easier than standard
config accesses.  Much of the work of assigning a GPU to a VM is to
wrap those alternate methods in virtualization to keep the driver
working within the guest address space.

The fictitious config space was my thought too, an ath11k vfio-pci
variant driver could insert a vendor defined capability into config
space to expose the physical MSI address/data.  The driver would know
by the presence of the capability that it's running in a VM and to
prefer that mechanism to retrieve MSI address and data.

Alternatively as also suggested here, if programming of the firmware
with the MSI address/data is something that a hypervisor could trap,
then we might be able to make it transparent to the guest.  For example
if it were programmed via MMIO, the guest address/data values could be
auto-magically replaced with physical values.  Since QEMU doesn't know
the physical values, this would also likely be through a device
specific extension to vfio-pci through a variant driver, or maybe some
combination of variant driver and QEMU if we need to make trapping
conditional in order to avoid a performance penalty.

This is essentially device specific interrupt programming, which either
needs to be virtualized (performed by the VMM) or paravirtualized
(performed in cooperation with the guest).  This is also something to
keep in mind relative to the initial source of this issue, ie. testing
device drivers and hardware under device assignment.  There can be
subtle differences.  Thanks,

Alex



  parent reply	other threads:[~2024-01-16 18:29 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <adcb785e-4dc7-4c4a-b341-d53b72e13467@gmail.com>
2024-01-10  9:00 ` ath11k and vfio-pci support Kalle Valo
2024-01-10 13:04   ` James Prestwood
2024-01-10 13:49     ` Kalle Valo
2024-01-10 14:55       ` James Prestwood
2024-01-11  3:51         ` Baochen Qiang
2024-01-11  8:16           ` Kalle Valo
2024-01-11 12:48             ` James Prestwood
2024-01-11 13:11               ` Kalle Valo
2024-01-11 13:38                 ` James Prestwood
2024-01-12  2:04                   ` Baochen Qiang
2024-01-12 12:47                     ` James Prestwood
2024-01-14 12:37                       ` Baochen Qiang
2024-01-14 14:36                         ` Kalle Valo
2024-01-15 17:46                           ` Alex Williamson
2024-01-16 10:08                             ` Baochen Qiang
2024-01-16 10:41                               ` David Woodhouse
2024-01-16 15:29                                 ` Jason Gunthorpe
2024-01-16 18:28                                 ` Alex Williamson [this message]
2024-01-16 21:10                                   ` Jeff Johnson
2024-01-17  5:47                                 ` Baochen Qiang
2024-03-21 19:14                                 ` Johannes Berg
2024-01-16 13:05                         ` James Prestwood
2024-01-17  5:26                           ` Baochen Qiang
2024-01-17 13:20                             ` James Prestwood
2024-01-17 13:43                               ` Kalle Valo
2024-01-17 14:25                                 ` James Prestwood
2024-01-18  2:09                               ` Baochen Qiang
2024-01-19 17:52                                 ` James Prestwood
2024-01-19 17:57                                   ` Kalle Valo
2024-01-19 18:07                                     ` James Prestwood
2024-01-26 18:20                                     ` James Prestwood
2024-01-27  4:31                                       ` Baochen Qiang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240116112845.55ebfcf7.alex.williamson@redhat.com \
    --to=alex.williamson@redhat.com \
    --cc=ath11k@lists.infradead.org \
    --cc=dwmw2@infradead.org \
    --cc=iommu@lists.linux.dev \
    --cc=kvalo@kernel.org \
    --cc=linux-wireless@vger.kernel.org \
    --cc=prestwoj@gmail.com \
    --cc=quic_bqiang@quicinc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).