Xen-Devel Archive mirror
 help / color / mirror / Atom feed
From: "Marek Marczykowski-Górecki" <marmarek@invisiblethingslab.com>
To: qemu-devel@nongnu.org
Cc: "Marek Marczykowski-Górecki" <marmarek@invisiblethingslab.com>,
	"Stefano Stabellini" <sstabellini@kernel.org>,
	"Anthony Perard" <anthony.perard@citrix.com>,
	"Paul Durrant" <paul@xen.org>,
	xen-devel@lists.xenproject.org (open list:X86 Xen CPUs)
Subject: [PATCH v3 3/3] Do not access /dev/mem in MSI-X PCI passthrough on Xen
Date: Mon,  6 May 2024 02:33:22 +0200	[thread overview]
Message-ID: <ebeb8c419feedad9fe0e9f39d3ed3a9ff0f4c49b.1714955598.git-series.marmarek@invisiblethingslab.com> (raw)
In-Reply-To: <cover.f5d45e3c2fb87552abfaf80982b0b724fca2134c.1714955598.git-series.marmarek@invisiblethingslab.com>

The /dev/mem is used for two purposes:
 - reading PCI_MSIX_ENTRY_CTRL_MASKBIT
 - reading Pending Bit Array (PBA)

The first one was originally done because when Xen did not send all
vector ctrl writes to the device model, so QEMU might have outdated old
register value. If Xen is new enough, this has been changed, so QEMU can
now use its cached value of the register instead. Detect the "new
enough" based on XENFEAT_dm_msix_all_writes bit in XENVER_get_features.

The Pending Bit Array (PBA) handling is for the case where it lives on
the same page as the MSI-X table itself. Xen has been extended to handle
this case too (as well as other registers that may live on those pages),
so QEMU handling is not necessary anymore.

Additionally, reading from /dev/mem is trapped and emulated by Xen, so
QEMU doesn't see real values anyway. And if it did, this method is prone
to race conditions. Removing /dev/mem access is useful to work within
stubdomain (avoids emulated reads and potential races), and necessary
when dom0 kernel runs in lockdown mode (where /dev/mem is unavailable at
all).

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
---
Changes in v2:
- Make change conditional on new Xen version (tested via
  XENFEAT_dm_msix_all_writes)
- add few comments
---
 hw/xen/xen_pt_msi.c | 94 ++++++++++++++++++++++++++++------------------
 1 file changed, 59 insertions(+), 35 deletions(-)

diff --git a/hw/xen/xen_pt_msi.c b/hw/xen/xen_pt_msi.c
index 09cca4e..836cc9c 100644
--- a/hw/xen/xen_pt_msi.c
+++ b/hw/xen/xen_pt_msi.c
@@ -460,15 +460,23 @@ static void pci_msix_write(void *opaque, hwaddr addr,
         entry->updated = true;
     } else if (msix->enabled && entry->updated &&
                !(val & PCI_MSIX_ENTRY_CTRL_MASKBIT)) {
-        const volatile uint32_t *vec_ctrl;
-
         /*
-         * If Xen intercepts the mask bit access, entry->vec_ctrl may not be
-         * up-to-date. Read from hardware directly.
+         * Reading mask bit from hardware directly is needed on older Xen only.
          */
-        vec_ctrl = s->msix->phys_iomem_base + entry_nr * PCI_MSIX_ENTRY_SIZE
-            + PCI_MSIX_ENTRY_VECTOR_CTRL;
-        xen_pt_msix_update_one(s, entry_nr, *vec_ctrl);
+        if (s->msix->phys_iomem_base) {
+            /* Memory mapped registers */
+            const volatile uint32_t *vec_ctrl;
+
+            /*
+             * If Xen intercepts the mask bit access, entry->vec_ctrl may not be
+             * up-to-date. Read from hardware directly.
+             */
+            vec_ctrl = s->msix->phys_iomem_base + entry_nr * PCI_MSIX_ENTRY_SIZE
+                + PCI_MSIX_ENTRY_VECTOR_CTRL;
+            xen_pt_msix_update_one(s, entry_nr, *vec_ctrl);
+        } else {
+            xen_pt_msix_update_one(s, entry_nr, entry->latch(VECTOR_CTRL));
+        }
     }
 
     set_entry_value(entry, offset, val);
@@ -493,7 +501,12 @@ static uint64_t pci_msix_read(void *opaque, hwaddr addr,
         return get_entry_value(&msix->msix_entry[entry_nr], offset);
     } else {
         /* Pending Bit Array (PBA) */
-        return *(uint32_t *)(msix->phys_iomem_base + addr);
+        if (s->msix->phys_iomem_base) {
+            return *(uint32_t *)(msix->phys_iomem_base + addr);
+        }
+        XEN_PT_LOG(&s->dev, "reading PBA, addr 0x%lx, offset 0x%lx\n",
+                   addr, addr - msix->total_entries * PCI_MSIX_ENTRY_SIZE);
+        return 0xFFFFFFFF;
     }
 }
 
@@ -528,8 +541,8 @@ int xen_pt_msix_init(XenPCIPassthroughState *s, uint32_t base)
     uint32_t table_off = 0;
     int i, total_entries, bar_index;
     XenHostPCIDevice *hd = &s->real_device;
+    xen_feature_info_t xc_version_info = { 0 };
     PCIDevice *d = &s->dev;
-    int fd = -1;
     XenPTMSIX *msix = NULL;
     int rc = 0;
 
@@ -543,6 +556,10 @@ int xen_pt_msix_init(XenPCIPassthroughState *s, uint32_t base)
         return -1;
     }
 
+    if (xc_version(xen_xc, XENVER_get_features, &xc_version_info) < 0) {
+        return -1;
+    }
+
     rc = xen_host_pci_get_word(hd, base + PCI_MSIX_FLAGS, &control);
     if (rc) {
         XEN_PT_ERR(d, "Failed to read PCI_MSIX_FLAGS field\n");
@@ -576,33 +593,40 @@ int xen_pt_msix_init(XenPCIPassthroughState *s, uint32_t base)
     msix->table_base = s->real_device.io_regions[bar_index].base_addr;
     XEN_PT_LOG(d, "get MSI-X table BAR base 0x%"PRIx64"\n", msix->table_base);
 
-    fd = open("/dev/mem", O_RDWR);
-    if (fd == -1) {
-        rc = -errno;
-        XEN_PT_ERR(d, "Can't open /dev/mem: %s\n", strerror(errno));
-        goto error_out;
-    }
-    XEN_PT_LOG(d, "table_off = 0x%x, total_entries = %d\n",
-               table_off, total_entries);
-    msix->table_offset_adjust = table_off & 0x0fff;
-    msix->phys_iomem_base =
-        mmap(NULL,
-             total_entries * PCI_MSIX_ENTRY_SIZE + msix->table_offset_adjust,
-             PROT_READ,
-             MAP_SHARED | MAP_LOCKED,
-             fd,
-             msix->table_base + table_off - msix->table_offset_adjust);
-    close(fd);
-    if (msix->phys_iomem_base == MAP_FAILED) {
-        rc = -errno;
-        XEN_PT_ERR(d, "Can't map physical MSI-X table: %s\n", strerror(errno));
-        goto error_out;
-    }
-    msix->phys_iomem_base = (char *)msix->phys_iomem_base
-        + msix->table_offset_adjust;
+    /* Accessing /dev/mem is needed only on older Xen. */
+    if (!(xc_version_info.submap & (1U << XENFEAT_dm_msix_all_writes))) {
+        int fd = -1;
+
+        fd = open("/dev/mem", O_RDWR);
+        if (fd == -1) {
+            rc = -errno;
+            XEN_PT_ERR(d, "Can't open /dev/mem: %s\n", strerror(errno));
+            goto error_out;
+        }
+        XEN_PT_LOG(d, "table_off = 0x%x, total_entries = %d\n",
+                   table_off, total_entries);
+        msix->table_offset_adjust = table_off & 0x0fff;
+        msix->phys_iomem_base =
+            mmap(NULL,
+                 total_entries * PCI_MSIX_ENTRY_SIZE
+                 + msix->table_offset_adjust,
+                 PROT_READ,
+                 MAP_SHARED | MAP_LOCKED,
+                 fd,
+                 msix->table_base + table_off - msix->table_offset_adjust);
+        close(fd);
+        if (msix->phys_iomem_base == MAP_FAILED) {
+            rc = -errno;
+            XEN_PT_ERR(d, "Can't map physical MSI-X table: %s\n",
+                       strerror(errno));
+            goto error_out;
+        }
+        msix->phys_iomem_base = (char *)msix->phys_iomem_base
+            + msix->table_offset_adjust;
 
-    XEN_PT_LOG(d, "mapping physical MSI-X table to %p\n",
-               msix->phys_iomem_base);
+        XEN_PT_LOG(d, "mapping physical MSI-X table to %p\n",
+                   msix->phys_iomem_base);
+    }
 
     memory_region_add_subregion_overlap(&s->bar[bar_index], table_off,
                                         &msix->mmio,
-- 
git-series 0.9.1


      parent reply	other threads:[~2024-05-06  0:34 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <cover.f5d45e3c2fb87552abfaf80982b0b724fca2134c.1714955598.git-series.marmarek@invisiblethingslab.com>
2024-05-06  0:33 ` [PATCH v3 1/3] hw/xen/xen_pt: Save back data only for declared registers Marek Marczykowski-Górecki
2024-05-06  0:33 ` [PATCH v3 2/3] Update Xen's features.h header Marek Marczykowski-Górecki
2024-05-06  0:33 ` Marek Marczykowski-Górecki [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ebeb8c419feedad9fe0e9f39d3ed3a9ff0f4c49b.1714955598.git-series.marmarek@invisiblethingslab.com \
    --to=marmarek@invisiblethingslab.com \
    --cc=anthony.perard@citrix.com \
    --cc=paul@xen.org \
    --cc=qemu-devel@nongnu.org \
    --cc=sstabellini@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).