All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/17] x86/hvm: I/O emulation cleanup and fix
@ 2015-06-11 15:42 Paul Durrant
  2015-06-11 15:42 ` [PATCH v2 01/17] x86/hvm: simplify hvmemul_do_io() Paul Durrant
                   ` (17 more replies)
  0 siblings, 18 replies; 38+ messages in thread
From: Paul Durrant @ 2015-06-11 15:42 UTC (permalink / raw)
  To: xen-devel; +Cc: Paul Durrant

This patch series re-works much of the code involved in emulation of port
and memory mapped I/O for HVM guests.

The code has become very convoluted and, at least by inspection, certain
emulations will apparently malfunction.

The series is broken down into 17 patches (which are also available in
my xenbits repo: http://xenbits.xen.org/gitweb/?p=people/pauldu/xen.git
on the emulation22 branch) as follows:

0001-x86-hvm-simplify-hvmemul_do_io.patch
0002-x86-hvm-re-name-struct-hvm_mmio_handler-to-hvm_mmio_.patch
0003-x86-hvm-unify-internal-portio-and-mmio-intercepts.patch
0004-x86-hvm-unify-dpci-portio-intercept-with-standard-po.patch
0005-x86-hvm-unify-stdvga-mmio-intercept-with-standard-mm.patch
0006-x86-hvm-revert-82ed8716b-fix-direct-PCI-port-I-O-emu.patch
0007-x86-hvm-only-call-hvm_io_assist-from-hvm_wait_for_io.patch
0008-x86-hvm-split-I-O-completion-handling-from-state-mod.patch
0009-x86-hvm-remove-hvm_io_pending-check-in-hvmemul_do_io.patch
0010-x86-hvm-remove-HVMIO_dispatched-I-O-state.patch
0011-x86-hvm-remove-hvm_io_state-enumeration.patch
0012-x86-hvm-use-ioreq_t-to-track-in-flight-state.patch
0013-x86-hvm-only-acquire-RAM-pages-for-emulation-when-we.patch
0014-x86-hvm-remove-extraneous-parameter-from-hvmtrace_io.patch
0015-x86-hvm-make-sure-translated-MMIO-reads-or-writes-fa.patch
0016-x86-hvm-remove-multiple-open-coded-chunking-loops.patch
0017-x86-hvm-track-large-memory-mapped-accesses-by-buffer.patch

v2:
 - Removed bogus assertion from patch 16
 - Re-worked patch 17 after basic testing of back-port onto XenServer

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v2 01/17] x86/hvm: simplify hvmemul_do_io()
  2015-06-11 15:42 [PATCH v2 00/17] x86/hvm: I/O emulation cleanup and fix Paul Durrant
@ 2015-06-11 15:42 ` Paul Durrant
  2015-06-17 13:31   ` Jan Beulich
  2015-06-11 15:42 ` [PATCH v2 02/17] x86/hvm: re-name struct hvm_mmio_handler to hvm_mmio_ops Paul Durrant
                   ` (16 subsequent siblings)
  17 siblings, 1 reply; 38+ messages in thread
From: Paul Durrant @ 2015-06-11 15:42 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Paul Durrant, Keir Fraser, Jan Beulich

Currently hvmemul_do_io() handles paging for I/O to/from a guest address
inline. This causes every exit point to have to execute:

if ( ram_page )
    put_page(ram_page);

This patch introduces wrapper hvmemul_do_io_addr() and
hvmemul_do_io_buffer() functions. The latter is used for I/O to/from a Xen
buffer and thus the complexity of paging can be restricted only to the
former, making the common hvmemul_do_io() function less convoluted.

This patch also tightens up some types and introduces pio/mmio wrappers
for the above functions with comments to document their semantics.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/hvm/emulate.c        |  284 +++++++++++++++++++++++++------------
 xen/arch/x86/hvm/io.c             |    2 +-
 xen/include/asm-x86/hvm/emulate.h |   18 ++-
 3 files changed, 206 insertions(+), 98 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index ac9c9d6..c8da114 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -51,41 +51,23 @@ static void hvmtrace_io_assist(int is_mmio, ioreq_t *p)
 }
 
 static int hvmemul_do_io(
-    int is_mmio, paddr_t addr, unsigned long *reps, int size,
-    paddr_t ram_gpa, int dir, int df, void *p_data)
+    int is_mmio, paddr_t addr, unsigned long *reps, unsigned int size,
+    uint8_t dir, bool_t df, bool_t data_is_addr, uint64_t data)
 {
     struct vcpu *curr = current;
-    struct hvm_vcpu_io *vio;
+    struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
     ioreq_t p = {
         .type = is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO,
         .addr = addr,
         .size = size,
         .dir = dir,
         .df = df,
-        .data = ram_gpa,
-        .data_is_ptr = (p_data == NULL),
+        .data = data,
+        .data_is_ptr = data_is_addr, /* ioreq_t field name is misleading */
     };
-    unsigned long ram_gfn = paddr_to_pfn(ram_gpa);
-    p2m_type_t p2mt;
-    struct page_info *ram_page;
+    void *p_data = (void *)data;
     int rc;
 
-    /* Check for paged out page */
-    ram_page = get_page_from_gfn(curr->domain, ram_gfn, &p2mt, P2M_UNSHARE);
-    if ( p2m_is_paging(p2mt) )
-    {
-        if ( ram_page )
-            put_page(ram_page);
-        p2m_mem_paging_populate(curr->domain, ram_gfn);
-        return X86EMUL_RETRY;
-    }
-    if ( p2m_is_shared(p2mt) )
-    {
-        if ( ram_page )
-            put_page(ram_page);
-        return X86EMUL_RETRY;
-    }
-
     /*
      * Weird-sized accesses have undefined behaviour: we discard writes
      * and read all-ones.
@@ -93,23 +75,10 @@ static int hvmemul_do_io(
     if ( unlikely((size > sizeof(long)) || (size & (size - 1))) )
     {
         gdprintk(XENLOG_WARNING, "bad mmio size %d\n", size);
-        ASSERT(p_data != NULL); /* cannot happen with a REP prefix */
-        if ( dir == IOREQ_READ )
-            memset(p_data, ~0, size);
-        if ( ram_page )
-            put_page(ram_page);
         return X86EMUL_UNHANDLEABLE;
     }
 
-    if ( !p.data_is_ptr && (dir == IOREQ_WRITE) )
-    {
-        memcpy(&p.data, p_data, size);
-        p_data = NULL;
-    }
-
-    vio = &curr->arch.hvm_vcpu.hvm_io;
-
-    if ( is_mmio && !p.data_is_ptr )
+    if ( is_mmio && !data_is_addr )
     {
         /* Part of a multi-cycle read or write? */
         if ( dir == IOREQ_WRITE )
@@ -117,11 +86,7 @@ static int hvmemul_do_io(
             paddr_t pa = vio->mmio_large_write_pa;
             unsigned int bytes = vio->mmio_large_write_bytes;
             if ( (addr >= pa) && ((addr + size) <= (pa + bytes)) )
-            {
-                if ( ram_page )
-                    put_page(ram_page);
                 return X86EMUL_OKAY;
-            }
         }
         else
         {
@@ -131,8 +96,6 @@ static int hvmemul_do_io(
             {
                 memcpy(p_data, &vio->mmio_large_read[addr - pa],
                        size);
-                if ( ram_page )
-                    put_page(ram_page);
                 return X86EMUL_OKAY;
             }
         }
@@ -144,40 +107,28 @@ static int hvmemul_do_io(
         break;
     case HVMIO_completed:
         vio->io_state = HVMIO_none;
-        if ( p_data == NULL )
-        {
-            if ( ram_page )
-                put_page(ram_page);
+        if ( data_is_addr || dir == IOREQ_WRITE )
             return X86EMUL_UNHANDLEABLE;
-        }
         goto finish_access;
     case HVMIO_dispatched:
         /* May have to wait for previous cycle of a multi-write to complete. */
-        if ( is_mmio && !p.data_is_ptr && (dir == IOREQ_WRITE) &&
+        if ( is_mmio && !data_is_addr && (dir == IOREQ_WRITE) &&
              (addr == (vio->mmio_large_write_pa +
                        vio->mmio_large_write_bytes)) )
-        {
-            if ( ram_page )
-                put_page(ram_page);
             return X86EMUL_RETRY;
-        }
         /* fallthrough */
     default:
-        if ( ram_page )
-            put_page(ram_page);
         return X86EMUL_UNHANDLEABLE;
     }
 
     if ( hvm_io_pending(curr) )
     {
         gdprintk(XENLOG_WARNING, "WARNING: io already pending?\n");
-        if ( ram_page )
-            put_page(ram_page);
         return X86EMUL_UNHANDLEABLE;
     }
 
-    vio->io_state =
-        (p_data == NULL) ? HVMIO_dispatched : HVMIO_awaiting_completion;
+    vio->io_state = (data_is_addr || dir == IOREQ_WRITE) ?
+        HVMIO_dispatched : HVMIO_awaiting_completion;
     vio->io_size = size;
 
     /*
@@ -190,7 +141,12 @@ static int hvmemul_do_io(
     p.count = *reps;
 
     if ( dir == IOREQ_WRITE )
+    {
+        if ( !data_is_addr )
+            memcpy(&p.data, p_data, size);
+
         hvmtrace_io_assist(is_mmio, &p);
+    }
 
     if ( is_mmio )
     {
@@ -235,7 +191,7 @@ static int hvmemul_do_io(
             rc = X86EMUL_RETRY;
             if ( !hvm_send_assist_req(s, &p) )
                 vio->io_state = HVMIO_none;
-            else if ( p_data == NULL )
+            else if ( data_is_addr || dir == IOREQ_WRITE )
                 rc = X86EMUL_OKAY;
         }
         break;
@@ -245,20 +201,18 @@ static int hvmemul_do_io(
     }
 
     if ( rc != X86EMUL_OKAY )
-    {
-        if ( ram_page )
-            put_page(ram_page);
         return rc;
-    }
 
  finish_access:
     if ( dir == IOREQ_READ )
+    {
         hvmtrace_io_assist(is_mmio, &p);
 
-    if ( p_data != NULL )
-        memcpy(p_data, &vio->io_data, size);
+        if ( !data_is_addr )
+            memcpy(p_data, &vio->io_data, size);
+    }
 
-    if ( is_mmio && !p.data_is_ptr )
+    if ( is_mmio && !data_is_addr )
     {
         /* Part of a multi-cycle read or write? */
         if ( dir == IOREQ_WRITE )
@@ -285,23 +239,159 @@ static int hvmemul_do_io(
         }
     }
 
-    if ( ram_page )
-        put_page(ram_page);
     return X86EMUL_OKAY;
 }
 
-int hvmemul_do_pio(
-    unsigned long port, unsigned long *reps, int size,
-    paddr_t ram_gpa, int dir, int df, void *p_data)
+int hvmemul_do_io_buffer(
+    bool_t is_mmio, paddr_t addr, unsigned long *reps, unsigned int size,
+    uint8_t dir, bool_t df, uint8_t *buffer)
 {
-    return hvmemul_do_io(0, port, reps, size, ram_gpa, dir, df, p_data);
+    int rc;
+
+    BUG_ON(buffer == NULL);
+
+    rc = hvmemul_do_io(is_mmio, addr, reps, size, dir, df, 0,
+                       (uint64_t)buffer);
+    if ( rc == X86EMUL_UNHANDLEABLE && dir == IOREQ_READ )
+        memset(buffer, 0xff, size);
+
+    return rc;
 }
 
-static int hvmemul_do_mmio(
-    paddr_t gpa, unsigned long *reps, int size,
-    paddr_t ram_gpa, int dir, int df, void *p_data)
+static int hvmemul_acquire_page(unsigned long gmfn, struct page_info **page)
+{
+    struct vcpu *v = current;
+    struct domain *d = v->domain;
+    p2m_type_t p2mt;
+
+    *page = get_page_from_gfn(d, gmfn, &p2mt, P2M_UNSHARE);
+
+    if ( *page == NULL )
+        return X86EMUL_UNHANDLEABLE;
+
+    if ( p2m_is_paging(p2mt) )
+    {
+        put_page(*page);
+        p2m_mem_paging_populate(d, gmfn);
+        return X86EMUL_RETRY;
+    }
+
+    if ( p2m_is_shared(p2mt) )
+    {
+        put_page(*page);
+        return X86EMUL_RETRY;
+    }
+
+    return X86EMUL_OKAY;
+}
+
+static void hvmemul_release_page(struct page_info *page)
+{
+    put_page(page);
+}
+
+int hvmemul_do_io_addr(
+    bool_t is_mmio, paddr_t addr, unsigned long *reps, 
+    unsigned int size, uint8_t dir, bool_t df, paddr_t ram_gpa)
+{
+    struct page_info *ram_page;
+    int rc;
+
+    rc = hvmemul_acquire_page(paddr_to_pfn(ram_gpa), &ram_page);
+    if ( rc != X86EMUL_OKAY )
+        return rc;
+
+    rc = hvmemul_do_io(is_mmio, addr, reps, size, dir, df, 1,
+                       (uint64_t)ram_gpa);
+
+    hvmemul_release_page(ram_page);
+
+    return rc;
+}
+
+/*
+ * Perform I/O between <port> and <buffer>. <dir> indicates the
+ * direction: IOREQ_READ means a read from <port> to <buffer> and
+ * IOREQ_WRITE means a write from <buffer> to <port>. Each access has
+ * width <size> and up to *<reps> accesses will be performed. If
+ * X86EMUL_OKAY is returned then <reps> will be updated with the number
+ * of accesses actually performed.
+ *
+ * NOTE: If *<reps> is greater than 1, each access will use the
+ *       <buffer> pointer; there is no implicit interation over a
+ *       block of memory starting at <buffer>.
+ */
+int hvmemul_do_pio_buffer(uint16_t port,
+                          unsigned long *reps,
+                          unsigned int size,
+                          uint8_t dir,
+                          uint8_t *buffer)
+{
+    return hvmemul_do_io_buffer(0, port, reps, size, dir, 0, buffer);
+}
+
+/*
+ * Perform I/O between <port> and guest RAM starting at <ram_addr>.
+ * <dir> indicates the direction: IOREQ_READ means a read from <port> to
+ * RAM and IOREQ_WRITE means a write from RAM to <port>. Each access has
+ * width <size> and up to *<reps> accesses will be performed. If
+ * X86EMUL_OKAY is returned then <reps> will be updated with the number
+ * of accesses actually performed.
+ * Each access will be done to/from successive RAM addresses, increasing
+ * if <df> is 0 or decreasing if <df> is 1.
+ */
+static int hvmemul_do_pio_addr(uint16_t port,
+                               unsigned long *reps,
+                               unsigned int size,
+                               uint8_t dir,
+                               bool_t df,
+                               paddr_t ram_addr)
+{
+    return hvmemul_do_io_addr(0, port, reps, size, dir, df, ram_addr);
+}
+
+/*
+ * Perform I/O between MMIO space starting at <mmio_gpa> and <buffer>.
+ * <dir> indicates the direction: IOREQ_READ means a read from MMIO to
+ * <buffer> and IOREQ_WRITE means a write from <buffer> to MMIO. Each
+ * access has width <size> and up to *<reps> accesses will be performed.
+ * If X86EMUL_OKAY is returned then <reps> will be updated with the number
+ * of accesses actually performed.
+ * Each access will be done to/from successive MMIO addresses, increasing
+ * if <df> is 0 or decreasing if <df> is 1.
+ *
+ * NOTE: If *<reps> is greater than 1, each access will use the
+ *       <buffer> pointer; there is no implicit interation over a
+ *       block of memory starting at <buffer>.
+ */
+static int hvmemul_do_mmio_buffer(paddr_t mmio_gpa,
+                                  unsigned long *reps,
+                                  unsigned int size,
+                                  uint8_t dir,
+                                  bool_t df,
+                                  uint8_t *buffer)
+{
+    return hvmemul_do_io_buffer(1, mmio_gpa, reps, size, dir, df, buffer);
+}
+
+/*
+ * Perform I/O between MMIO space starting at <mmio_gpa> and guest RAM
+ * starting at <ram_gpa>. <dir> indicates the direction: IOREQ_READ
+ * means a read from MMIO to RAM and IOREQ_WRITE means a write from RAM
+ * to MMIO. Each access has width <size> and up to *<reps> accesses will
+ * be performed. If X86EMUL_OKAY is returned then <reps> will be updated
+ * with the number of accesses actually performed.
+ * Each access will be done to/from successive RAM *and* MMIO addresses,
+ * increasing if <df> is 0 or decreasing if <df> is 1.
+ */
+static int hvmemul_do_mmio_addr(paddr_t mmio_gpa,
+                                unsigned long *reps,
+                                unsigned int size,
+                                uint8_t dir,
+                                bool_t df,
+                                paddr_t ram_gpa)
 {
-    return hvmemul_do_io(1, gpa, reps, size, ram_gpa, dir, df, p_data);
+    return hvmemul_do_io_addr(1, mmio_gpa, reps, size, dir, df, ram_gpa);
 }
 
 /*
@@ -503,7 +593,8 @@ static int __hvmemul_read(
         gpa = (((paddr_t)vio->mmio_gpfn << PAGE_SHIFT) | off);
         while ( (off + chunk) <= PAGE_SIZE )
         {
-            rc = hvmemul_do_mmio(gpa, &reps, chunk, 0, IOREQ_READ, 0, p_data);
+            rc = hvmemul_do_mmio_buffer(gpa, &reps, chunk, IOREQ_READ, 0,
+                                        p_data);
             if ( rc != X86EMUL_OKAY || bytes == chunk )
                 return rc;
             addr += chunk;
@@ -537,7 +628,8 @@ static int __hvmemul_read(
                                     hvmemul_ctxt);
         while ( rc == X86EMUL_OKAY )
         {
-            rc = hvmemul_do_mmio(gpa, &reps, chunk, 0, IOREQ_READ, 0, p_data);
+            rc = hvmemul_do_mmio_buffer(gpa, &reps, chunk, IOREQ_READ, 0,
+                                        p_data);
             if ( rc != X86EMUL_OKAY || bytes == chunk )
                 break;
             addr += chunk;
@@ -645,7 +737,8 @@ static int hvmemul_write(
         gpa = (((paddr_t)vio->mmio_gpfn << PAGE_SHIFT) | off);
         while ( (off + chunk) <= PAGE_SIZE )
         {
-            rc = hvmemul_do_mmio(gpa, &reps, chunk, 0, IOREQ_WRITE, 0, p_data);
+            rc = hvmemul_do_mmio_buffer(gpa, &reps, chunk, IOREQ_WRITE, 0,
+                                        p_data);
             if ( rc != X86EMUL_OKAY || bytes == chunk )
                 return rc;
             addr += chunk;
@@ -675,7 +768,8 @@ static int hvmemul_write(
                                     hvmemul_ctxt);
         while ( rc == X86EMUL_OKAY )
         {
-            rc = hvmemul_do_mmio(gpa, &reps, chunk, 0, IOREQ_WRITE, 0, p_data);
+            rc = hvmemul_do_mmio_buffer(gpa, &reps, chunk, IOREQ_WRITE, 0,
+                                        p_data);
             if ( rc != X86EMUL_OKAY || bytes == chunk )
                 break;
             addr += chunk;
@@ -849,8 +943,8 @@ static int hvmemul_rep_ins(
     if ( p2mt == p2m_mmio_direct || p2mt == p2m_mmio_dm )
         return X86EMUL_UNHANDLEABLE;
 
-    return hvmemul_do_pio(src_port, reps, bytes_per_rep, gpa, IOREQ_READ,
-                          !!(ctxt->regs->eflags & X86_EFLAGS_DF), NULL);
+    return hvmemul_do_pio_addr(src_port, reps, bytes_per_rep, IOREQ_READ,
+                               !!(ctxt->regs->eflags & X86_EFLAGS_DF), gpa);
 }
 
 static int hvmemul_rep_outs(
@@ -887,8 +981,8 @@ static int hvmemul_rep_outs(
     if ( p2mt == p2m_mmio_direct || p2mt == p2m_mmio_dm )
         return X86EMUL_UNHANDLEABLE;
 
-    return hvmemul_do_pio(dst_port, reps, bytes_per_rep, gpa, IOREQ_WRITE,
-                          !!(ctxt->regs->eflags & X86_EFLAGS_DF), NULL);
+    return hvmemul_do_pio_addr(dst_port, reps, bytes_per_rep, IOREQ_WRITE,
+                               !!(ctxt->regs->eflags & X86_EFLAGS_DF), gpa);
 }
 
 static int hvmemul_rep_movs(
@@ -944,12 +1038,12 @@ static int hvmemul_rep_movs(
         return X86EMUL_UNHANDLEABLE;
 
     if ( sp2mt == p2m_mmio_dm )
-        return hvmemul_do_mmio(
-            sgpa, reps, bytes_per_rep, dgpa, IOREQ_READ, df, NULL);
+        return hvmemul_do_mmio_addr(
+            sgpa, reps, bytes_per_rep, IOREQ_READ, df, dgpa);
 
     if ( dp2mt == p2m_mmio_dm )
-        return hvmemul_do_mmio(
-            dgpa, reps, bytes_per_rep, sgpa, IOREQ_WRITE, df, NULL);
+        return hvmemul_do_mmio_addr(
+            dgpa, reps, bytes_per_rep, IOREQ_WRITE, df, sgpa);
 
     /* RAM-to-RAM copy: emulate as equivalent of memmove(dgpa, sgpa, bytes). */
     bytes = *reps * bytes_per_rep;
@@ -1102,8 +1196,8 @@ static int hvmemul_rep_stos(
         return X86EMUL_UNHANDLEABLE;
 
     case p2m_mmio_dm:
-        return hvmemul_do_mmio(gpa, reps, bytes_per_rep, 0, IOREQ_WRITE, df,
-                               p_data);
+        return hvmemul_do_mmio_buffer(gpa, reps, bytes_per_rep, IOREQ_WRITE,
+                                      df, p_data);
     }
 }
 
@@ -1142,7 +1236,8 @@ static int hvmemul_read_io(
 {
     unsigned long reps = 1;
     *val = 0;
-    return hvmemul_do_pio(port, &reps, bytes, 0, IOREQ_READ, 0, val);
+    return hvmemul_do_pio_buffer(port, &reps, bytes, IOREQ_READ,
+                                 (uint8_t *)val);
 }
 
 static int hvmemul_write_io(
@@ -1152,7 +1247,8 @@ static int hvmemul_write_io(
     struct x86_emulate_ctxt *ctxt)
 {
     unsigned long reps = 1;
-    return hvmemul_do_pio(port, &reps, bytes, 0, IOREQ_WRITE, 0, &val);
+    return hvmemul_do_pio_buffer(port, &reps, bytes, IOREQ_WRITE,
+                                 (uint8_t *)&val);
 }
 
 static int hvmemul_read_cr(
diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index 68fb890..0d50339 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -140,7 +140,7 @@ int handle_pio(uint16_t port, unsigned int size, int dir)
     if ( dir == IOREQ_WRITE )
         data = guest_cpu_user_regs()->eax;
 
-    rc = hvmemul_do_pio(port, &reps, size, 0, dir, 0, &data);
+    rc = hvmemul_do_pio_buffer(port, &reps, size, dir, (uint8_t *)&data);
 
     switch ( rc )
     {
diff --git a/xen/include/asm-x86/hvm/emulate.h b/xen/include/asm-x86/hvm/emulate.h
index b3971c8..edf8948 100644
--- a/xen/include/asm-x86/hvm/emulate.h
+++ b/xen/include/asm-x86/hvm/emulate.h
@@ -50,11 +50,23 @@ struct segment_register *hvmemul_get_seg_reg(
     enum x86_segment seg,
     struct hvm_emulate_ctxt *hvmemul_ctxt);
 
-int hvmemul_do_pio(
-    unsigned long port, unsigned long *reps, int size,
-    paddr_t ram_gpa, int dir, int df, void *p_data);
+int hvmemul_do_pio_buffer(uint16_t port,
+                          unsigned long *reps,
+                          unsigned int size,
+                          uint8_t dir,
+                          uint8_t *buffer);
 
 void hvm_dump_emulation_state(const char *prefix,
                               struct hvm_emulate_ctxt *hvmemul_ctxt);
 
 #endif /* __ASM_X86_HVM_EMULATE_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 02/17] x86/hvm: re-name struct hvm_mmio_handler to hvm_mmio_ops
  2015-06-11 15:42 [PATCH v2 00/17] x86/hvm: I/O emulation cleanup and fix Paul Durrant
  2015-06-11 15:42 ` [PATCH v2 01/17] x86/hvm: simplify hvmemul_do_io() Paul Durrant
@ 2015-06-11 15:42 ` Paul Durrant
  2015-06-17 12:43   ` Jan Beulich
  2015-06-11 15:42 ` [PATCH v2 03/17] x86/hvm: unify internal portio and mmio intercepts Paul Durrant
                   ` (15 subsequent siblings)
  17 siblings, 1 reply; 38+ messages in thread
From: Paul Durrant @ 2015-06-11 15:42 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Paul Durrant, Keir Fraser, Jan Beulich

The struct just contains three methods and no data, so the name
hvm_mmio_ops more accurately reflects its content. A subsequent patch
introduces a new structure which more accurately warrants the name
hvm_mmio_handler so doing the rename in this purely cosmetic patch avoids
conflating functional and cosmetic changes in a single patch.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 tools/tests/vhpet/emul.h                  |    8 +++---
 tools/tests/vhpet/main.c                  |    6 ++---
 xen/arch/x86/hvm/hpet.c                   |    8 +++---
 xen/arch/x86/hvm/intercept.c              |   42 ++++++++++++++---------------
 xen/arch/x86/hvm/vioapic.c                |    8 +++---
 xen/arch/x86/hvm/vlapic.c                 |    8 +++---
 xen/arch/x86/hvm/vmsi.c                   |    8 +++---
 xen/drivers/passthrough/amd/iommu_guest.c |    8 +++---
 xen/include/asm-x86/hvm/io.h              |   18 ++++++-------
 9 files changed, 57 insertions(+), 57 deletions(-)

diff --git a/tools/tests/vhpet/emul.h b/tools/tests/vhpet/emul.h
index 09e4611..383acff 100644
--- a/tools/tests/vhpet/emul.h
+++ b/tools/tests/vhpet/emul.h
@@ -237,11 +237,11 @@ typedef int (*hvm_mmio_write_t)(struct vcpu *v,
 typedef int (*hvm_mmio_check_t)(struct vcpu *v, unsigned long addr);
 
 
-struct hvm_mmio_handler
+struct hvm_mmio_ops
 {
-    hvm_mmio_check_t check_handler;
-    hvm_mmio_read_t read_handler;
-    hvm_mmio_write_t write_handler;
+    hvm_mmio_check_t check;
+    hvm_mmio_read_t  read;
+    hvm_mmio_write_t write;
 };
 
 /* Marshalling and unmarshalling uses a buffer with size and cursor. */
diff --git a/tools/tests/vhpet/main.c b/tools/tests/vhpet/main.c
index fbd7510..6fe65ea 100644
--- a/tools/tests/vhpet/main.c
+++ b/tools/tests/vhpet/main.c
@@ -70,7 +70,7 @@ static int skip_error_on_load;
 
 static char *global_thousep;
 
-extern const struct hvm_mmio_handler hpet_mmio_handler;
+extern const struct hvm_mmio_ops hpet_mmio_ops;
 
 struct domain dom1;
 struct vcpu vcpu0;
@@ -297,13 +297,13 @@ void udelay(int w)
 unsigned int hpet_readl(unsigned long a)
 {
     unsigned long ret = 0;
-    hpet_mmio_handler.read_handler(current, a, 4, &ret);
+    hpet_mmio_ops.read(current, a, 4, &ret);
     return ret;
 }
 
 void hpet_writel(unsigned long d, unsigned long a)
 {
-    hpet_mmio_handler.write_handler(current, a, 4, d);
+    hpet_mmio_ops.write(current, a, 4, d);
     return;
 }
 
diff --git a/xen/arch/x86/hvm/hpet.c b/xen/arch/x86/hvm/hpet.c
index d898169..9585ca8 100644
--- a/xen/arch/x86/hvm/hpet.c
+++ b/xen/arch/x86/hvm/hpet.c
@@ -504,10 +504,10 @@ static int hpet_range(struct vcpu *v, unsigned long addr)
              (addr < (HPET_BASE_ADDRESS + HPET_MMAP_SIZE)) );
 }
 
-const struct hvm_mmio_handler hpet_mmio_handler = {
-    .check_handler = hpet_range,
-    .read_handler  = hpet_read,
-    .write_handler = hpet_write
+const struct hvm_mmio_ops hpet_mmio_ops = {
+    .check = hpet_range,
+    .read  = hpet_read,
+    .write = hpet_write
 };
 
 
diff --git a/xen/arch/x86/hvm/intercept.c b/xen/arch/x86/hvm/intercept.c
index d52a48c..dc39b1b 100644
--- a/xen/arch/x86/hvm/intercept.c
+++ b/xen/arch/x86/hvm/intercept.c
@@ -32,20 +32,20 @@
 #include <xen/event.h>
 #include <xen/iommu.h>
 
-static const struct hvm_mmio_handler *const
+static const struct hvm_mmio_ops *const
 hvm_mmio_handlers[HVM_MMIO_HANDLER_NR] =
 {
-    &hpet_mmio_handler,
-    &vlapic_mmio_handler,
-    &vioapic_mmio_handler,
-    &msixtbl_mmio_handler,
-    &iommu_mmio_handler
+    &hpet_mmio_ops,
+    &vlapic_mmio_ops,
+    &vioapic_mmio_ops,
+    &msixtbl_mmio_ops,
+    &iommu_mmio_ops
 };
 
 static int hvm_mmio_access(struct vcpu *v,
                            ioreq_t *p,
-                           hvm_mmio_read_t read_handler,
-                           hvm_mmio_write_t write_handler)
+                           hvm_mmio_read_t read,
+                           hvm_mmio_write_t write)
 {
     struct hvm_vcpu_io *vio = &v->arch.hvm_vcpu.hvm_io;
     unsigned long data;
@@ -64,11 +64,11 @@ static int hvm_mmio_access(struct vcpu *v,
                 vio->mmio_retrying = 0;
             }
             else
-                rc = read_handler(v, p->addr, p->size, &data);
+                rc = read(v, p->addr, p->size, &data);
             p->data = data;
         }
         else /* p->dir == IOREQ_WRITE */
-            rc = write_handler(v, p->addr, p->size, p->data);
+            rc = write(v, p->addr, p->size, p->data);
         return rc;
     }
 
@@ -86,7 +86,7 @@ static int hvm_mmio_access(struct vcpu *v,
             }
             else
             {
-                rc = read_handler(v, p->addr + step * i, p->size, &data);
+                rc = read(v, p->addr + step * i, p->size, &data);
                 if ( rc != X86EMUL_OKAY )
                     break;
             }
@@ -145,7 +145,7 @@ static int hvm_mmio_access(struct vcpu *v,
             }
             if ( rc != X86EMUL_OKAY )
                 break;
-            rc = write_handler(v, p->addr + step * i, p->size, data);
+            rc = write(v, p->addr + step * i, p->size, data);
             if ( rc != X86EMUL_OKAY )
                 break;
         }
@@ -169,7 +169,7 @@ bool_t hvm_mmio_internal(paddr_t gpa)
     unsigned int i;
 
     for ( i = 0; i < HVM_MMIO_HANDLER_NR; ++i )
-        if ( hvm_mmio_handlers[i]->check_handler(curr, gpa) )
+        if ( hvm_mmio_handlers[i]->check(curr, gpa) )
             return 1;
 
     return 0;
@@ -182,21 +182,21 @@ int hvm_mmio_intercept(ioreq_t *p)
 
     for ( i = 0; i < HVM_MMIO_HANDLER_NR; i++ )
     {
-        hvm_mmio_check_t check_handler =
-            hvm_mmio_handlers[i]->check_handler;
+        hvm_mmio_check_t check =
+            hvm_mmio_handlers[i]->check;
 
-        if ( check_handler(v, p->addr) )
+        if ( check(v, p->addr) )
         {
             if ( unlikely(p->count > 1) &&
-                 !check_handler(v, unlikely(p->df)
-                                   ? p->addr - (p->count - 1L) * p->size
-                                   : p->addr + (p->count - 1L) * p->size) )
+                 !check(v, unlikely(p->df)
+                        ? p->addr - (p->count - 1L) * p->size
+                        : p->addr + (p->count - 1L) * p->size) )
                 p->count = 1;
 
             return hvm_mmio_access(
                 v, p,
-                hvm_mmio_handlers[i]->read_handler,
-                hvm_mmio_handlers[i]->write_handler);
+                hvm_mmio_handlers[i]->read,
+                hvm_mmio_handlers[i]->write);
         }
     }
 
diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c
index 1e48110..e4ab336 100644
--- a/xen/arch/x86/hvm/vioapic.c
+++ b/xen/arch/x86/hvm/vioapic.c
@@ -250,10 +250,10 @@ static int vioapic_range(struct vcpu *v, unsigned long addr)
              (addr < vioapic->base_address + VIOAPIC_MEM_LENGTH)));
 }
 
-const struct hvm_mmio_handler vioapic_mmio_handler = {
-    .check_handler = vioapic_range,
-    .read_handler = vioapic_read,
-    .write_handler = vioapic_write
+const struct hvm_mmio_ops vioapic_mmio_ops = {
+    .check = vioapic_range,
+    .read = vioapic_read,
+    .write = vioapic_write
 };
 
 static void ioapic_inj_irq(
diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index 92b0fa8..56171d6 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -997,10 +997,10 @@ static int vlapic_range(struct vcpu *v, unsigned long addr)
            (offset < PAGE_SIZE);
 }
 
-const struct hvm_mmio_handler vlapic_mmio_handler = {
-    .check_handler = vlapic_range,
-    .read_handler = vlapic_read,
-    .write_handler = vlapic_write
+const struct hvm_mmio_ops vlapic_mmio_ops = {
+    .check = vlapic_range,
+    .read = vlapic_read,
+    .write = vlapic_write
 };
 
 static void set_x2apic_id(struct vlapic *vlapic)
diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
index 4968d39..bee5d03 100644
--- a/xen/arch/x86/hvm/vmsi.c
+++ b/xen/arch/x86/hvm/vmsi.c
@@ -394,10 +394,10 @@ static int msixtbl_range(struct vcpu *v, unsigned long addr)
     return !!virt;
 }
 
-const struct hvm_mmio_handler msixtbl_mmio_handler = {
-    .check_handler = msixtbl_range,
-    .read_handler = msixtbl_read,
-    .write_handler = msixtbl_write
+const struct hvm_mmio_ops msixtbl_mmio_ops = {
+    .check = msixtbl_range,
+    .read = msixtbl_read,
+    .write = msixtbl_write
 };
 
 static void add_msixtbl_entry(struct domain *d,
diff --git a/xen/drivers/passthrough/amd/iommu_guest.c b/xen/drivers/passthrough/amd/iommu_guest.c
index 98e7b38..7b0c102 100644
--- a/xen/drivers/passthrough/amd/iommu_guest.c
+++ b/xen/drivers/passthrough/amd/iommu_guest.c
@@ -919,8 +919,8 @@ static int guest_iommu_mmio_range(struct vcpu *v, unsigned long addr)
            addr < iommu->mmio_base + IOMMU_MMIO_SIZE;
 }
 
-const struct hvm_mmio_handler iommu_mmio_handler = {
-    .check_handler = guest_iommu_mmio_range,
-    .read_handler = guest_iommu_mmio_read,
-    .write_handler = guest_iommu_mmio_write
+const struct hvm_mmio_ops iommu_mmio_ops = {
+    .check = guest_iommu_mmio_range,
+    .read = guest_iommu_mmio_read,
+    .write = guest_iommu_mmio_write
 };
diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
index 886a9d6..f2aaec5 100644
--- a/xen/include/asm-x86/hvm/io.h
+++ b/xen/include/asm-x86/hvm/io.h
@@ -59,17 +59,17 @@ struct hvm_io_handler {
     struct  io_handler hdl_list[MAX_IO_HANDLER];
 };
 
-struct hvm_mmio_handler {
-    hvm_mmio_check_t check_handler;
-    hvm_mmio_read_t read_handler;
-    hvm_mmio_write_t write_handler;
+struct hvm_mmio_ops {
+    hvm_mmio_check_t check;
+    hvm_mmio_read_t  read;
+    hvm_mmio_write_t write;
 };
 
-extern const struct hvm_mmio_handler hpet_mmio_handler;
-extern const struct hvm_mmio_handler vlapic_mmio_handler;
-extern const struct hvm_mmio_handler vioapic_mmio_handler;
-extern const struct hvm_mmio_handler msixtbl_mmio_handler;
-extern const struct hvm_mmio_handler iommu_mmio_handler;
+extern const struct hvm_mmio_ops hpet_mmio_ops;
+extern const struct hvm_mmio_ops vlapic_mmio_ops;
+extern const struct hvm_mmio_ops vioapic_mmio_ops;
+extern const struct hvm_mmio_ops msixtbl_mmio_ops;
+extern const struct hvm_mmio_ops iommu_mmio_ops;
 
 #define HVM_MMIO_HANDLER_NR 5
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 03/17] x86/hvm: unify internal portio and mmio intercepts
  2015-06-11 15:42 [PATCH v2 00/17] x86/hvm: I/O emulation cleanup and fix Paul Durrant
  2015-06-11 15:42 ` [PATCH v2 01/17] x86/hvm: simplify hvmemul_do_io() Paul Durrant
  2015-06-11 15:42 ` [PATCH v2 02/17] x86/hvm: re-name struct hvm_mmio_handler to hvm_mmio_ops Paul Durrant
@ 2015-06-11 15:42 ` Paul Durrant
  2015-06-17 14:22   ` Jan Beulich
  2015-06-11 15:42 ` [PATCH v2 04/17] x86/hvm: unify dpci portio intercept with standard portio intercept Paul Durrant
                   ` (14 subsequent siblings)
  17 siblings, 1 reply; 38+ messages in thread
From: Paul Durrant @ 2015-06-11 15:42 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Paul Durrant, Keir Fraser, Jan Beulich

The implementation of mmio and portio intercepts is unnecessarily different.
This leads to much code duplication. This patch unifies much of the
intercept handling, leaving only distinct handlers for stdvga mmio and dpci
portio. Subsequent patches will unify those handlers.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/hvm/emulate.c                |   11 +-
 xen/arch/x86/hvm/hpet.c                   |    2 +
 xen/arch/x86/hvm/hvm.c                    |   42 ++-
 xen/arch/x86/hvm/intercept.c              |  581 ++++++++++++++++-------------
 xen/arch/x86/hvm/stdvga.c                 |   30 +-
 xen/arch/x86/hvm/vioapic.c                |    2 +
 xen/arch/x86/hvm/vlapic.c                 |    3 +
 xen/arch/x86/hvm/vmsi.c                   |    5 +
 xen/drivers/passthrough/amd/iommu_guest.c |   30 +-
 xen/include/asm-x86/hvm/domain.h          |    8 +-
 xen/include/asm-x86/hvm/io.h              |  132 +++----
 11 files changed, 475 insertions(+), 371 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index c8da114..7da2d68 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -148,16 +148,7 @@ static int hvmemul_do_io(
         hvmtrace_io_assist(is_mmio, &p);
     }
 
-    if ( is_mmio )
-    {
-        rc = hvm_mmio_intercept(&p);
-        if ( rc == X86EMUL_UNHANDLEABLE )
-            rc = hvm_buffered_io_intercept(&p);
-    }
-    else
-    {
-        rc = hvm_portio_intercept(&p);
-    }
+    rc = hvm_io_intercept(&p);
 
     switch ( rc )
     {
diff --git a/xen/arch/x86/hvm/hpet.c b/xen/arch/x86/hvm/hpet.c
index 9585ca8..24b5ec9 100644
--- a/xen/arch/x86/hvm/hpet.c
+++ b/xen/arch/x86/hvm/hpet.c
@@ -659,6 +659,8 @@ void hpet_init(struct domain *d)
         h->hpet.comparator64[i] = ~0ULL;
         h->pt[i].source = PTSRC_isa;
     }
+
+    register_mmio_handler(d, &hpet_mmio_ops);
 }
 
 void hpet_deinit(struct domain *d)
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 2736802..1943099 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -1451,18 +1451,30 @@ int hvm_domain_initialise(struct domain *d)
     INIT_LIST_HEAD(&d->arch.hvm_domain.msixtbl_list);
     spin_lock_init(&d->arch.hvm_domain.msixtbl_list_lock);
 
+    INIT_LIST_HEAD(&d->arch.hvm_domain.io_handler.list);
+    spin_lock_init(&d->arch.hvm_domain.io_handler.lock);
+
     hvm_init_cacheattr_region_list(d);
 
     rc = paging_enable(d, PG_refcounts|PG_translate|PG_external);
     if ( rc != 0 )
         goto fail0;
 
-    d->arch.hvm_domain.params = xzalloc_array(uint64_t, HVM_NR_PARAMS);
-    d->arch.hvm_domain.io_handler = xmalloc(struct hvm_io_handler);
     rc = -ENOMEM;
-    if ( !d->arch.hvm_domain.params || !d->arch.hvm_domain.io_handler )
+
+    d->arch.hvm_domain.params = xzalloc_array(uint64_t, HVM_NR_PARAMS);
+    if ( !d->arch.hvm_domain.params )
         goto fail1;
-    d->arch.hvm_domain.io_handler->num_slot = 0;
+
+    d->arch.hvm_domain.mmio_handler = xzalloc_array(struct hvm_mmio_handler,
+                                                    NR_MMIO_HANDLERS);
+    if ( !d->arch.hvm_domain.mmio_handler )
+        goto fail2;
+
+    d->arch.hvm_domain.portio_handler = xzalloc_array(struct hvm_portio_handler,
+                                                      NR_PORTIO_HANDLERS);
+    if ( !d->arch.hvm_domain.portio_handler )
+        goto fail3;
 
     /* Set the default IO Bitmap. */
     if ( is_hardware_domain(d) )
@@ -1471,7 +1483,7 @@ int hvm_domain_initialise(struct domain *d)
         if ( d->arch.hvm_domain.io_bitmap == NULL )
         {
             rc = -ENOMEM;
-            goto fail1;
+            goto fail4;
         }
         memset(d->arch.hvm_domain.io_bitmap, ~0, HVM_IOBITMAP_SIZE);
     }
@@ -1493,30 +1505,37 @@ int hvm_domain_initialise(struct domain *d)
 
     rc = vioapic_init(d);
     if ( rc != 0 )
-        goto fail1;
+        goto fail5;
 
     stdvga_init(d);
 
     rtc_init(d);
 
+    msixtbl_init(d);
+
     register_portio_handler(d, 0xe9, 1, hvm_print_line);
     register_portio_handler(d, 0xcf8, 4, hvm_access_cf8);
 
     rc = hvm_funcs.domain_initialise(d);
     if ( rc != 0 )
-        goto fail2;
+        goto fail6;
 
     return 0;
 
- fail2:
+ fail6:
     rtc_deinit(d);
     stdvga_deinit(d);
     vioapic_deinit(d);
- fail1:
+ fail5:
     if ( is_hardware_domain(d) )
         xfree(d->arch.hvm_domain.io_bitmap);
-    xfree(d->arch.hvm_domain.io_handler);
+ fail4:
+    xfree(d->arch.hvm_domain.portio_handler);
+ fail3:
+    xfree(d->arch.hvm_domain.mmio_handler);
+ fail2:
     xfree(d->arch.hvm_domain.params);
+ fail1:
  fail0:
     hvm_destroy_cacheattr_region_list(d);
     return rc;
@@ -1545,7 +1564,8 @@ void hvm_domain_relinquish_resources(struct domain *d)
 
 void hvm_domain_destroy(struct domain *d)
 {
-    xfree(d->arch.hvm_domain.io_handler);
+    xfree(d->arch.hvm_domain.portio_handler);
+    xfree(d->arch.hvm_domain.mmio_handler);
     xfree(d->arch.hvm_domain.params);
 
     hvm_destroy_cacheattr_region_list(d);
diff --git a/xen/arch/x86/hvm/intercept.c b/xen/arch/x86/hvm/intercept.c
index dc39b1b..08aa4fd 100644
--- a/xen/arch/x86/hvm/intercept.c
+++ b/xen/arch/x86/hvm/intercept.c
@@ -32,45 +32,134 @@
 #include <xen/event.h>
 #include <xen/iommu.h>
 
-static const struct hvm_mmio_ops *const
-hvm_mmio_handlers[HVM_MMIO_HANDLER_NR] =
+static bool_t hvm_mmio_accept(struct hvm_io_handler *io_handler,
+                              struct vcpu *v,
+                              uint64_t addr,
+                              uint32_t size)
 {
-    &hpet_mmio_ops,
-    &vlapic_mmio_ops,
-    &vioapic_mmio_ops,
-    &msixtbl_mmio_ops,
-    &iommu_mmio_ops
+    struct hvm_mmio_handler *mmio_handler;
+    const struct hvm_mmio_ops *ops;
+
+    mmio_handler = container_of(io_handler,
+                                struct hvm_mmio_handler,
+                                io_handler);
+    ops = mmio_handler->ops;
+
+    BUG_ON(ops == NULL);
+
+    return ops->check(v, addr);
+}
+
+static int hvm_mmio_read(struct hvm_io_handler *io_handler,
+                         struct vcpu *v,
+                         uint64_t addr,
+                         uint32_t size,
+                         uint64_t *data)
+{
+    struct hvm_mmio_handler *mmio_handler;
+    const struct hvm_mmio_ops *ops;
+
+    mmio_handler = container_of(io_handler,
+                                struct hvm_mmio_handler,
+                                io_handler);
+    ops = mmio_handler->ops;
+
+    BUG_ON(ops == NULL);
+
+    return ops->read(v, addr, size, data);
+}
+
+static int hvm_mmio_write(struct hvm_io_handler *io_handler,
+                          struct vcpu *v,
+                          uint64_t addr,
+                          uint32_t size,
+                          uint64_t data)
+{
+    struct hvm_mmio_handler *mmio_handler;
+    const struct hvm_mmio_ops *ops;
+
+    mmio_handler = container_of(io_handler,
+                                struct hvm_mmio_handler,
+                                io_handler);
+    ops = mmio_handler->ops;
+
+    BUG_ON(ops == NULL);
+
+    return ops->write(v, addr, size, data);
+}
+
+static const struct hvm_io_ops mmio_ops = {
+    .accept = hvm_mmio_accept,
+    .read = hvm_mmio_read,
+    .write = hvm_mmio_write
+};
+
+static bool_t hvm_portio_accept(struct hvm_io_handler *io_handler,
+                                struct vcpu *v,
+                                uint64_t addr,
+                                uint32_t size)
+{
+    struct hvm_portio_handler *portio_handler;
+
+    portio_handler = container_of(io_handler,
+                                  struct hvm_portio_handler,
+                                  io_handler);
+
+    return (addr >= portio_handler->start) &&
+           ((addr + size) <= portio_handler->end);
+}
+
+static int hvm_portio_read(struct hvm_io_handler *io_handler,
+                           struct vcpu *v,
+                           uint64_t addr,
+                           uint32_t size,
+                           uint64_t *data)
+{
+    struct hvm_portio_handler *portio_handler;
+    uint32_t val;
+    int rc;
+
+    portio_handler = container_of(io_handler,
+                                  struct hvm_portio_handler,
+                                  io_handler);
+
+    rc = portio_handler->action(IOREQ_READ, addr, size, &val);
+    if ( rc == X86EMUL_OKAY )
+        *data = val;
+
+    return rc;
+}
+
+static int hvm_portio_write(struct hvm_io_handler *io_handler,
+                            struct vcpu *v,
+                            uint64_t addr,
+                            uint32_t size,
+                            uint64_t data)
+{
+    struct hvm_portio_handler *portio_handler;
+    uint32_t val = data;
+
+    portio_handler = container_of(io_handler,
+                                  struct hvm_portio_handler,
+                                  io_handler);
+
+    return portio_handler->action(IOREQ_WRITE, addr, size, &val);
+}
+
+static const struct hvm_io_ops portio_ops = {
+    .accept = hvm_portio_accept,
+    .read = hvm_portio_read,
+    .write = hvm_portio_write
 };
 
-static int hvm_mmio_access(struct vcpu *v,
-                           ioreq_t *p,
-                           hvm_mmio_read_t read,
-                           hvm_mmio_write_t write)
+static int process_io_intercept(struct vcpu *v, ioreq_t *p,
+                                struct hvm_io_handler *handler)
 {
     struct hvm_vcpu_io *vio = &v->arch.hvm_vcpu.hvm_io;
-    unsigned long data;
+    const struct hvm_io_ops *ops = handler->ops;
     int rc = X86EMUL_OKAY, i, step = p->df ? -p->size : p->size;
-
-    if ( !p->data_is_ptr )
-    {
-        if ( p->dir == IOREQ_READ )
-        {
-            if ( vio->mmio_retrying )
-            {
-                if ( vio->mmio_large_read_bytes != p->size )
-                    return X86EMUL_UNHANDLEABLE;
-                memcpy(&data, vio->mmio_large_read, p->size);
-                vio->mmio_large_read_bytes = 0;
-                vio->mmio_retrying = 0;
-            }
-            else
-                rc = read(v, p->addr, p->size, &data);
-            p->data = data;
-        }
-        else /* p->dir == IOREQ_WRITE */
-            rc = write(v, p->addr, p->size, p->data);
-        return rc;
-    }
+    uint64_t data;
+    uint64_t addr;
 
     if ( p->dir == IOREQ_READ )
     {
@@ -86,31 +175,40 @@ static int hvm_mmio_access(struct vcpu *v,
             }
             else
             {
-                rc = read(v, p->addr + step * i, p->size, &data);
+                addr = (p->type == IOREQ_TYPE_COPY) ?
+                    p->addr + step * i :
+                    p->addr;
+                rc = ops->read(handler, v, addr, p->size, &data);
                 if ( rc != X86EMUL_OKAY )
                     break;
             }
-            switch ( hvm_copy_to_guest_phys(p->data + step * i,
-                                            &data, p->size) )
+
+            if ( p->data_is_ptr )
             {
-            case HVMCOPY_okay:
-                break;
-            case HVMCOPY_gfn_paged_out:
-            case HVMCOPY_gfn_shared:
-                rc = X86EMUL_RETRY;
-                break;
-            case HVMCOPY_bad_gfn_to_mfn:
-                /* Drop the write as real hardware would. */
-                continue;
-            case HVMCOPY_bad_gva_to_gfn:
-                ASSERT(0);
-                /* fall through */
-            default:
-                rc = X86EMUL_UNHANDLEABLE;
-                break;
+                switch ( hvm_copy_to_guest_phys(p->data + step * i,
+                                                &data, p->size) )
+                {
+                case HVMCOPY_okay:
+                    break;
+                case HVMCOPY_gfn_paged_out:
+                case HVMCOPY_gfn_shared:
+                    rc = X86EMUL_RETRY;
+                    break;
+                case HVMCOPY_bad_gfn_to_mfn:
+                    /* Drop the write as real hardware would. */
+                    continue;
+                case HVMCOPY_bad_gva_to_gfn:
+                    ASSERT(0);
+                    /* fall through */
+                default:
+                    rc = X86EMUL_UNHANDLEABLE;
+                    break;
+                }
+                if ( rc != X86EMUL_OKAY )
+                    break;
             }
-            if ( rc != X86EMUL_OKAY)
-                break;
+            else
+                p->data = data;
         }
 
         if ( rc == X86EMUL_RETRY )
@@ -120,32 +218,41 @@ static int hvm_mmio_access(struct vcpu *v,
             memcpy(vio->mmio_large_read, &data, p->size);
         }
     }
-    else
+    else /* p->dir == IOREQ_WRITE */
     {
         for ( i = 0; i < p->count; i++ )
         {
-            switch ( hvm_copy_from_guest_phys(&data, p->data + step * i,
-                                              p->size) )
+            if ( p->data_is_ptr )
             {
-            case HVMCOPY_okay:
-                break;
-            case HVMCOPY_gfn_paged_out:
-            case HVMCOPY_gfn_shared:
-                rc = X86EMUL_RETRY;
-                break;
-            case HVMCOPY_bad_gfn_to_mfn:
-                data = ~0;
-                break;
-            case HVMCOPY_bad_gva_to_gfn:
-                ASSERT(0);
-                /* fall through */
-            default:
-                rc = X86EMUL_UNHANDLEABLE;
-                break;
+                switch ( hvm_copy_from_guest_phys(&data, p->data + step * i,
+                                                  p->size) )
+                {
+                case HVMCOPY_okay:
+                    break;
+                case HVMCOPY_gfn_paged_out:
+                case HVMCOPY_gfn_shared:
+                    rc = X86EMUL_RETRY;
+                    break;
+                case HVMCOPY_bad_gfn_to_mfn:
+                    data = ~0;
+                    break;
+                case HVMCOPY_bad_gva_to_gfn:
+                    ASSERT(0);
+                    /* fall through */
+                default:
+                    rc = X86EMUL_UNHANDLEABLE;
+                    break;
+                }
+                if ( rc != X86EMUL_OKAY )
+                    break;
             }
-            if ( rc != X86EMUL_OKAY )
-                break;
-            rc = write(v, p->addr + step * i, p->size, data);
+            else
+                data = p->data;
+
+            addr = (p->type == IOREQ_TYPE_COPY) ?
+                p->addr + step * i :
+                p->addr;
+            rc = ops->write(handler, v, addr, p->size, data);
             if ( rc != X86EMUL_OKAY )
                 break;
         }
@@ -163,240 +270,182 @@ static int hvm_mmio_access(struct vcpu *v,
     return rc;
 }
 
-bool_t hvm_mmio_internal(paddr_t gpa)
-{
-    struct vcpu *curr = current;
-    unsigned int i;
-
-    for ( i = 0; i < HVM_MMIO_HANDLER_NR; ++i )
-        if ( hvm_mmio_handlers[i]->check(curr, gpa) )
-            return 1;
+static DEFINE_RCU_READ_LOCK(intercept_rcu_lock);
 
-    return 0;
-}
-
-int hvm_mmio_intercept(ioreq_t *p)
+struct hvm_io_handler *hvm_find_io_handler(struct vcpu *v,
+                                           ioreq_t *p)
 {
-    struct vcpu *v = current;
-    int i;
+    struct domain *d = v->domain;
+    struct hvm_io_handler *handler;
 
-    for ( i = 0; i < HVM_MMIO_HANDLER_NR; i++ )
-    {
-        hvm_mmio_check_t check =
-            hvm_mmio_handlers[i]->check;
+    rcu_read_lock(&intercept_rcu_lock);
 
-        if ( check(v, p->addr) )
-        {
-            if ( unlikely(p->count > 1) &&
-                 !check(v, unlikely(p->df)
-                        ? p->addr - (p->count - 1L) * p->size
-                        : p->addr + (p->count - 1L) * p->size) )
-                p->count = 1;
-
-            return hvm_mmio_access(
-                v, p,
-                hvm_mmio_handlers[i]->read,
-                hvm_mmio_handlers[i]->write);
-        }
-    }
-
-    return X86EMUL_UNHANDLEABLE;
-}
-
-static int process_portio_intercept(portio_action_t action, ioreq_t *p)
-{
-    struct hvm_vcpu_io *vio = &current->arch.hvm_vcpu.hvm_io;
-    int rc = X86EMUL_OKAY, i, step = p->df ? -p->size : p->size;
-    uint32_t data;
-
-    if ( !p->data_is_ptr )
+    list_for_each_entry( handler,
+                         &d->arch.hvm_domain.io_handler.list,
+                         list_entry )
     {
-        if ( p->dir == IOREQ_READ )
-        {
-            if ( vio->mmio_retrying )
-            {
-                if ( vio->mmio_large_read_bytes != p->size )
-                    return X86EMUL_UNHANDLEABLE;
-                memcpy(&data, vio->mmio_large_read, p->size);
-                vio->mmio_large_read_bytes = 0;
-                vio->mmio_retrying = 0;
-            }
-            else
-                rc = action(IOREQ_READ, p->addr, p->size, &data);
-            p->data = data;
-        }
-        else
-        {
-            data = p->data;
-            rc = action(IOREQ_WRITE, p->addr, p->size, &data);
-        }
-        return rc;
-    }
+        const struct hvm_io_ops *ops = handler->ops;
+        uint64_t start, end;
 
-    if ( p->dir == IOREQ_READ )
-    {
-        for ( i = 0; i < p->count; i++ )
+        if ( handler->type != p->type )
+            continue;
+
+        switch ( handler->type )
         {
-            if ( vio->mmio_retrying )
+        case IOREQ_TYPE_PIO:
+            start = p->addr;
+            end = p->addr + p->size;
+            break;
+        case IOREQ_TYPE_COPY:
+            if ( p->df )
             {
-                if ( vio->mmio_large_read_bytes != p->size )
-                    return X86EMUL_UNHANDLEABLE;
-                memcpy(&data, vio->mmio_large_read, p->size);
-                vio->mmio_large_read_bytes = 0;
-                vio->mmio_retrying = 0;
+                start = (p->addr - (p->count - 1) * p->size);
+                end = p->addr + p->size;
             }
             else
             {
-                rc = action(IOREQ_READ, p->addr, p->size, &data);
-                if ( rc != X86EMUL_OKAY )
-                    break;
-            }
-            switch ( hvm_copy_to_guest_phys(p->data + step * i,
-                                            &data, p->size) )
-            {
-            case HVMCOPY_okay:
-                break;
-            case HVMCOPY_gfn_paged_out:
-            case HVMCOPY_gfn_shared:
-                rc = X86EMUL_RETRY;
-                break;
-            case HVMCOPY_bad_gfn_to_mfn:
-                /* Drop the write as real hardware would. */
-                continue;
-            case HVMCOPY_bad_gva_to_gfn:
-                ASSERT(0);
-                /* fall through */
-            default:
-                rc = X86EMUL_UNHANDLEABLE;
-                break;
+                start = p->addr;
+                end = p->addr + p->count * p->size;
             }
-            if ( rc != X86EMUL_OKAY)
-                break;
+            break;
+        default:
+            BUG();
         }
 
-        if ( rc == X86EMUL_RETRY )
-        {
-            vio->mmio_retry = 1;
-            vio->mmio_large_read_bytes = p->size;
-            memcpy(vio->mmio_large_read, &data, p->size);
-        }
+        if ( ops->accept(handler, v, start, end - start) )
+            goto done;
     }
-    else /* p->dir == IOREQ_WRITE */
-    {
-        for ( i = 0; i < p->count; i++ )
-        {
-            data = 0;
-            switch ( hvm_copy_from_guest_phys(&data, p->data + step * i,
-                                              p->size) )
-            {
-            case HVMCOPY_okay:
-                break;
-            case HVMCOPY_gfn_paged_out:
-            case HVMCOPY_gfn_shared:
-                rc = X86EMUL_RETRY;
-                break;
-            case HVMCOPY_bad_gfn_to_mfn:
-                data = ~0;
-                break;
-            case HVMCOPY_bad_gva_to_gfn:
-                ASSERT(0);
-                /* fall through */
-            default:
-                rc = X86EMUL_UNHANDLEABLE;
-                break;
-            }
-            if ( rc != X86EMUL_OKAY )
-                break;
-            rc = action(IOREQ_WRITE, p->addr, p->size, &data);
-            if ( rc != X86EMUL_OKAY )
-                break;
-        }
 
-        if ( rc == X86EMUL_RETRY )
-            vio->mmio_retry = 1;
-    }
+    handler = NULL;
 
-    if ( i != 0 )
-    {
-        p->count = i;
-        rc = X86EMUL_OKAY;
-    }
+done:
+    rcu_read_unlock(&intercept_rcu_lock);
 
-    return rc;
+    return handler;
 }
 
-/*
- * Check if the request is handled inside xen
- * return value: 0 --not handled; 1 --handled
- */
-int hvm_io_intercept(ioreq_t *p, int type)
+int hvm_io_intercept(ioreq_t *p)
 {
     struct vcpu *v = current;
-    struct hvm_io_handler *handler = v->domain->arch.hvm_domain.io_handler;
-    int i;
-    unsigned long addr, size;
+    struct hvm_io_handler *handler;
 
-    if ( type == HVM_PORTIO )
+    if ( p->type == IOREQ_TYPE_PIO )
     {
         int rc = dpci_ioport_intercept(p);
         if ( (rc == X86EMUL_OKAY) || (rc == X86EMUL_RETRY) )
             return rc;
     }
+    else if ( p->type == IOREQ_TYPE_COPY )
+    {
+        int rc = stdvga_intercept_mmio(p);
+        if ( (rc == X86EMUL_OKAY) || (rc == X86EMUL_RETRY) )
+            return rc;
+    }
 
-    for ( i = 0; i < handler->num_slot; i++ )
+    handler = hvm_find_io_handler(v, p);
+
+    if ( handler == NULL )
+        return X86EMUL_UNHANDLEABLE;
+
+    return process_io_intercept(v, p, handler);
+}
+
+void hvm_register_io_handler(struct domain *d,
+                             struct hvm_io_handler *handler,
+                             bool_t head)
+{
+    spin_lock(&d->arch.hvm_domain.io_handler.lock);
+    if ( head )
+        list_add_rcu(&handler->list_entry,
+                     &d->arch.hvm_domain.io_handler.list);
+    else
+        list_add_tail_rcu(&handler->list_entry,
+                          &d->arch.hvm_domain.io_handler.list);
+    spin_unlock(&d->arch.hvm_domain.io_handler.lock);
+}
+
+void register_mmio_handler(struct domain *d, const struct hvm_mmio_ops *ops)
+{
+    struct hvm_mmio_handler *mmio_handler;
+    unsigned int i;
+
+    for ( i = 0; i < NR_MMIO_HANDLERS; i++ )
     {
-        if ( type != handler->hdl_list[i].type )
-            continue;
-        addr = handler->hdl_list[i].addr;
-        size = handler->hdl_list[i].size;
-        if ( (p->addr >= addr) &&
-             ((p->addr + p->size) <= (addr + size)) )
-        {
-            if ( type == HVM_PORTIO )
-                return process_portio_intercept(
-                    handler->hdl_list[i].action.portio, p);
+        mmio_handler = &d->arch.hvm_domain.mmio_handler[i];
 
-            if ( unlikely(p->count > 1) &&
-                 (unlikely(p->df)
-                  ? p->addr - (p->count - 1L) * p->size < addr
-                  : p->addr + p->count * 1L * p->size - 1 >= addr + size) )
-                p->count = 1;
+        if ( mmio_handler->io_handler.ops == NULL )
+            break;
+    }
 
-            return handler->hdl_list[i].action.mmio(p);
-        }
+    BUG_ON(i == NR_MMIO_HANDLERS);
+
+    mmio_handler->io_handler.type = IOREQ_TYPE_COPY;
+    mmio_handler->io_handler.ops = &mmio_ops;
+
+    hvm_register_io_handler(d, &mmio_handler->io_handler, 0);
+
+    mmio_handler->ops = ops;
+}
+
+void register_portio_handler(struct domain *d, uint64_t addr,
+                             uint32_t size, portio_action_t action)
+{
+    struct hvm_portio_handler *portio_handler;
+    unsigned int i;
+
+    for ( i = 0; i < NR_PORTIO_HANDLERS; i++ )
+    {
+        portio_handler = &d->arch.hvm_domain.portio_handler[i];
+
+        if ( portio_handler->io_handler.ops == NULL )
+            break;
     }
 
-    return X86EMUL_UNHANDLEABLE;
+    BUG_ON(i == NR_PORTIO_HANDLERS);
+
+    portio_handler->io_handler.type = IOREQ_TYPE_PIO;
+    portio_handler->io_handler.ops = &portio_ops;
+
+    hvm_register_io_handler(d, &portio_handler->io_handler, 0);
+
+    portio_handler->start = addr;
+    portio_handler->end = addr + size;
+    portio_handler->action = action;
 }
 
-void register_io_handler(
-    struct domain *d, unsigned long addr, unsigned long size,
-    void *action, int type)
+void relocate_portio_handler(struct domain *d, uint64_t old_addr,
+                             uint64_t new_addr, uint32_t size)
 {
-    struct hvm_io_handler *handler = d->arch.hvm_domain.io_handler;
-    int num = handler->num_slot;
+    struct hvm_portio_handler *portio_handler;
+    unsigned int i;
+
+    for ( i = 0; i < NR_PORTIO_HANDLERS; i++ )
+    {
+        portio_handler = &d->arch.hvm_domain.portio_handler[i];
+
+        if ( portio_handler->io_handler.ops == NULL )
+            break;
+
+        if ( portio_handler->start == old_addr &&
+             portio_handler->end == old_addr + size )
+            goto found;
+    }
 
-    BUG_ON(num >= MAX_IO_HANDLER);
+    BUG();
 
-    handler->hdl_list[num].addr = addr;
-    handler->hdl_list[num].size = size;
-    handler->hdl_list[num].action.ptr = action;
-    handler->hdl_list[num].type = type;
-    handler->num_slot++;
+ found:
+    portio_handler->start = new_addr;
+    portio_handler->end = new_addr + size;
 }
 
-void relocate_io_handler(
-    struct domain *d, unsigned long old_addr, unsigned long new_addr,
-    unsigned long size, int type)
+bool_t hvm_mmio_internal(paddr_t gpa)
 {
-    struct hvm_io_handler *handler = d->arch.hvm_domain.io_handler;
-    int i;
-
-    for ( i = 0; i < handler->num_slot; i++ )
-        if ( (handler->hdl_list[i].addr == old_addr) &&
-             (handler->hdl_list[i].size == size) &&
-             (handler->hdl_list[i].type == type) )
-            handler->hdl_list[i].addr = new_addr;
+    ioreq_t p = {
+        .type = IOREQ_TYPE_COPY,
+        .addr = gpa
+    };
+
+    return (hvm_find_io_handler(current, &p) != NULL);
 }
 
 /*
diff --git a/xen/arch/x86/hvm/stdvga.c b/xen/arch/x86/hvm/stdvga.c
index 13d1029..34cdd90 100644
--- a/xen/arch/x86/hvm/stdvga.c
+++ b/xen/arch/x86/hvm/stdvga.c
@@ -547,12 +547,27 @@ static int mmio_move(struct hvm_hw_stdvga *s, ioreq_t *p)
     return 1;
 }
 
-static int stdvga_intercept_mmio(ioreq_t *p)
+int stdvga_intercept_mmio(ioreq_t *p)
 {
     struct domain *d = current->domain;
     struct hvm_hw_stdvga *s = &d->arch.hvm_domain.stdvga;
+    uint64_t start, end;
     int buf = 0, rc;
 
+    if ( p->df )
+    {
+        start = (p->addr - (p->count - 1) * p->size);
+        end = p->addr + p->size;
+    }
+    else
+    {
+        start = p->addr;
+        end = p->addr + p->count * p->size;
+    }
+
+    if ( (start < VGA_MEM_BASE) || (end > (VGA_MEM_BASE + VGA_MEM_SIZE)) )
+        return X86EMUL_UNHANDLEABLE;
+
     if ( p->size > 8 )
     {
         gdprintk(XENLOG_WARNING, "invalid mmio size %d\n", (int)p->size);
@@ -619,9 +634,6 @@ void stdvga_init(struct domain *d)
         register_portio_handler(d, 0x3c4, 2, stdvga_intercept_pio);
         /* Graphics registers. */
         register_portio_handler(d, 0x3ce, 2, stdvga_intercept_pio);
-        /* MMIO. */
-        register_buffered_io_handler(
-            d, VGA_MEM_BASE, VGA_MEM_SIZE, stdvga_intercept_mmio);
     }
 }
 
@@ -638,3 +650,13 @@ void stdvga_deinit(struct domain *d)
         s->vram_page[i] = NULL;
     }
 }
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c
index e4ab336..df76019 100644
--- a/xen/arch/x86/hvm/vioapic.c
+++ b/xen/arch/x86/hvm/vioapic.c
@@ -456,6 +456,8 @@ int vioapic_init(struct domain *d)
     d->arch.hvm_domain.vioapic->domain = d;
     vioapic_reset(d);
 
+    register_mmio_handler(d, &vioapic_mmio_ops);
+
     return 0;
 }
 
diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index 56171d6..6916e14 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -1463,6 +1463,9 @@ int vlapic_init(struct vcpu *v)
                  vlapic_init_sipi_action,
                  (unsigned long)v);
 
+    if ( v->vcpu_id == 0 )
+        register_mmio_handler(v->domain, &vlapic_mmio_ops);
+
     return 0;
 }
 
diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
index bee5d03..5245a61 100644
--- a/xen/arch/x86/hvm/vmsi.c
+++ b/xen/arch/x86/hvm/vmsi.c
@@ -531,6 +531,11 @@ found:
     spin_unlock_irq(&irq_desc->lock);
 }
 
+void msixtbl_init(struct domain *d)
+{
+    register_mmio_handler(d, &msixtbl_mmio_ops);
+}
+
 void msixtbl_pt_cleanup(struct domain *d)
 {
     struct msixtbl_entry *entry, *temp;
diff --git a/xen/drivers/passthrough/amd/iommu_guest.c b/xen/drivers/passthrough/amd/iommu_guest.c
index 7b0c102..1ac85bd 100644
--- a/xen/drivers/passthrough/amd/iommu_guest.c
+++ b/xen/drivers/passthrough/amd/iommu_guest.c
@@ -868,6 +868,20 @@ static void guest_iommu_reg_init(struct guest_iommu *iommu)
     iommu->reg_ext_feature.hi = upper;
 }
 
+static int guest_iommu_mmio_range(struct vcpu *v, unsigned long addr)
+{
+    struct guest_iommu *iommu = vcpu_iommu(v);
+
+    return iommu && addr >= iommu->mmio_base &&
+           addr < iommu->mmio_base + IOMMU_MMIO_SIZE;
+}
+
+const struct hvm_mmio_ops iommu_mmio_ops = {
+    .check = guest_iommu_mmio_range,
+    .read = guest_iommu_mmio_read,
+    .write = guest_iommu_mmio_write
+};
+
 /* Domain specific initialization */
 int guest_iommu_init(struct domain* d)
 {
@@ -894,6 +908,8 @@ int guest_iommu_init(struct domain* d)
 
     spin_lock_init(&iommu->lock);
 
+    register_mmio_handler(d, &iommu_mmio_ops);
+
     return 0;
 }
 
@@ -910,17 +926,3 @@ void guest_iommu_destroy(struct domain *d)
 
     domain_hvm_iommu(d)->arch.g_iommu = NULL;
 }
-
-static int guest_iommu_mmio_range(struct vcpu *v, unsigned long addr)
-{
-    struct guest_iommu *iommu = vcpu_iommu(v);
-
-    return iommu && addr >= iommu->mmio_base &&
-           addr < iommu->mmio_base + IOMMU_MMIO_SIZE;
-}
-
-const struct hvm_mmio_ops iommu_mmio_ops = {
-    .check = guest_iommu_mmio_range,
-    .read = guest_iommu_mmio_read,
-    .write = guest_iommu_mmio_write
-};
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index bdab45d..d9c0cfe 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -92,7 +92,13 @@ struct hvm_domain {
 
     struct pl_time         pl_time;
 
-    struct hvm_io_handler *io_handler;
+    struct hvm_mmio_handler   *mmio_handler;
+    struct hvm_portio_handler *portio_handler;
+
+    struct {
+        spinlock_t       lock;
+        struct list_head list;
+    } io_handler;
 
     /* Lock protects access to irq, vpic and vioapic. */
     spinlock_t             irq_lock;
diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
index f2aaec5..752b6e9 100644
--- a/xen/include/asm-x86/hvm/io.h
+++ b/xen/include/asm-x86/hvm/io.h
@@ -25,10 +25,34 @@
 #include <public/hvm/ioreq.h>
 #include <public/event_channel.h>
 
-#define MAX_IO_HANDLER             16
+struct hvm_io_handler;
+
+typedef int (*hvm_io_read_t)(struct hvm_io_handler *handler,
+                             struct vcpu *v,
+                             uint64_t addr,
+                             uint32_t size,
+                             uint64_t *val);
+typedef int (*hvm_io_write_t)(struct hvm_io_handler *handler,
+                              struct vcpu *v,
+                              uint64_t addr,
+                              uint32_t size,
+                              uint64_t val);
+typedef bool_t (*hvm_io_accept_t)(struct hvm_io_handler *handler,
+                                  struct vcpu *v,
+                                  uint64_t addr,
+                                  uint32_t size);
+
+struct hvm_io_ops {
+    hvm_io_accept_t accept;
+    hvm_io_read_t   read;
+    hvm_io_write_t  write;
+};
 
-#define HVM_PORTIO                  0
-#define HVM_BUFFERED_IO             2
+struct hvm_io_handler {
+    struct list_head        list_entry;
+    const struct hvm_io_ops *ops;
+    uint8_t                 type;
+};
 
 typedef int (*hvm_mmio_read_t)(struct vcpu *v,
                                unsigned long addr,
@@ -40,81 +64,47 @@ typedef int (*hvm_mmio_write_t)(struct vcpu *v,
                                 unsigned long val);
 typedef int (*hvm_mmio_check_t)(struct vcpu *v, unsigned long addr);
 
-typedef int (*portio_action_t)(
-    int dir, uint32_t port, uint32_t bytes, uint32_t *val);
-typedef int (*mmio_action_t)(ioreq_t *);
-struct io_handler {
-    int                 type;
-    unsigned long       addr;
-    unsigned long       size;
-    union {
-        portio_action_t portio;
-        mmio_action_t   mmio;
-        void           *ptr;
-    } action;
-};
-
-struct hvm_io_handler {
-    int     num_slot;
-    struct  io_handler hdl_list[MAX_IO_HANDLER];
-};
-
 struct hvm_mmio_ops {
     hvm_mmio_check_t check;
     hvm_mmio_read_t  read;
     hvm_mmio_write_t write;
 };
 
-extern const struct hvm_mmio_ops hpet_mmio_ops;
-extern const struct hvm_mmio_ops vlapic_mmio_ops;
-extern const struct hvm_mmio_ops vioapic_mmio_ops;
-extern const struct hvm_mmio_ops msixtbl_mmio_ops;
-extern const struct hvm_mmio_ops iommu_mmio_ops;
+#define NR_MMIO_HANDLERS 5
 
-#define HVM_MMIO_HANDLER_NR 5
+struct hvm_mmio_handler {
+    const struct hvm_mmio_ops *ops;
+    struct hvm_io_handler     io_handler;
+};
 
-int hvm_io_intercept(ioreq_t *p, int type);
-void register_io_handler(
-    struct domain *d, unsigned long addr, unsigned long size,
-    void *action, int type);
-void relocate_io_handler(
-    struct domain *d, unsigned long old_addr, unsigned long new_addr,
-    unsigned long size, int type);
+typedef int (*portio_action_t)(
+    int dir, uint32_t port, uint32_t bytes, uint32_t *val);
+
+#define NR_PORTIO_HANDLERS 16
+
+struct hvm_portio_handler {
+    uint64_t              start, end;
+    portio_action_t       action;
+    struct hvm_io_handler io_handler;
+};
 
-static inline int hvm_portio_intercept(ioreq_t *p)
-{
-    return hvm_io_intercept(p, HVM_PORTIO);
-}
+void hvm_register_io_handler(struct domain *d,
+                             struct hvm_io_handler *handler,
+                             bool_t head);
+struct hvm_io_handler *hvm_find_io_handler(struct vcpu *v,
+                                           ioreq_t *p);
 
-static inline int hvm_buffered_io_intercept(ioreq_t *p)
-{
-    return hvm_io_intercept(p, HVM_BUFFERED_IO);
-}
+int hvm_io_intercept(ioreq_t *p);
 
+void register_mmio_handler(struct domain *d,
+                           const struct hvm_mmio_ops *ops);
+void register_portio_handler(struct domain *d, uint64_t addr,
+                             uint32_t size, portio_action_t action);
+void relocate_portio_handler(struct domain *d, uint64_t old_addr,
+                             uint64_t new_addr, uint32_t size);
 bool_t hvm_mmio_internal(paddr_t gpa);
-int hvm_mmio_intercept(ioreq_t *p);
-int hvm_buffered_io_send(ioreq_t *p);
 
-static inline void register_portio_handler(
-    struct domain *d, unsigned long addr,
-    unsigned long size, portio_action_t action)
-{
-    register_io_handler(d, addr, size, action, HVM_PORTIO);
-}
-
-static inline void relocate_portio_handler(
-    struct domain *d, unsigned long old_addr, unsigned long new_addr,
-    unsigned long size)
-{
-    relocate_io_handler(d, old_addr, new_addr, size, HVM_PORTIO);
-}
-
-static inline void register_buffered_io_handler(
-    struct domain *d, unsigned long addr,
-    unsigned long size, mmio_action_t action)
-{
-    register_io_handler(d, addr, size, action, HVM_BUFFERED_IO);
-}
+int hvm_buffered_io_send(ioreq_t *p);
 
 void send_timeoffset_req(unsigned long timeoff);
 void send_invalidate_req(void);
@@ -127,6 +117,7 @@ void hvm_io_assist(ioreq_t *p);
 void hvm_dpci_eoi(struct domain *d, unsigned int guest_irq,
                   const union vioapic_redir_entry *ent);
 void msix_write_completion(struct vcpu *);
+void msixtbl_init(struct domain *d);
 
 struct hvm_hw_stdvga {
     uint8_t sr_index;
@@ -141,8 +132,19 @@ struct hvm_hw_stdvga {
 };
 
 void stdvga_init(struct domain *d);
+int stdvga_intercept_mmio(ioreq_t *p);
 void stdvga_deinit(struct domain *d);
 
 extern void hvm_dpci_msi_eoi(struct domain *d, int vector);
 #endif /* __ASM_X86_HVM_IO_H__ */
 
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 04/17] x86/hvm: unify dpci portio intercept with standard portio intercept
  2015-06-11 15:42 [PATCH v2 00/17] x86/hvm: I/O emulation cleanup and fix Paul Durrant
                   ` (2 preceding siblings ...)
  2015-06-11 15:42 ` [PATCH v2 03/17] x86/hvm: unify internal portio and mmio intercepts Paul Durrant
@ 2015-06-11 15:42 ` Paul Durrant
  2015-06-17 14:36   ` Jan Beulich
  2015-06-11 15:42 ` [PATCH v2 05/17] x86/hvm: unify stdvga mmio intercept with standard mmio intercept Paul Durrant
                   ` (13 subsequent siblings)
  17 siblings, 1 reply; 38+ messages in thread
From: Paul Durrant @ 2015-06-11 15:42 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Paul Durrant, Keir Fraser, Jan Beulich

This patch re-works the dpci portio intercepts so that they can be unified
with standard portio handling thereby removing a substantial amount of
code duplication.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/hvm/hvm.c           |    2 +
 xen/arch/x86/hvm/intercept.c     |    8 +-
 xen/arch/x86/hvm/io.c            |  232 +++++++++++++-------------------------
 xen/include/asm-x86/hvm/domain.h |    1 +
 xen/include/asm-x86/hvm/io.h     |    3 +
 xen/include/asm-x86/hvm/vcpu.h   |    2 +
 xen/include/xen/iommu.h          |    1 -
 7 files changed, 89 insertions(+), 160 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 1943099..dbba6e3 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -1516,6 +1516,8 @@ int hvm_domain_initialise(struct domain *d)
     register_portio_handler(d, 0xe9, 1, hvm_print_line);
     register_portio_handler(d, 0xcf8, 4, hvm_access_cf8);
 
+    register_dpci_portio_handler(d);
+
     rc = hvm_funcs.domain_initialise(d);
     if ( rc != 0 )
         goto fail6;
diff --git a/xen/arch/x86/hvm/intercept.c b/xen/arch/x86/hvm/intercept.c
index 08aa4fd..b5e7879 100644
--- a/xen/arch/x86/hvm/intercept.c
+++ b/xen/arch/x86/hvm/intercept.c
@@ -329,13 +329,7 @@ int hvm_io_intercept(ioreq_t *p)
     struct vcpu *v = current;
     struct hvm_io_handler *handler;
 
-    if ( p->type == IOREQ_TYPE_PIO )
-    {
-        int rc = dpci_ioport_intercept(p);
-        if ( (rc == X86EMUL_OKAY) || (rc == X86EMUL_RETRY) )
-            return rc;
-    }
-    else if ( p->type == IOREQ_TYPE_COPY )
+    if ( p->type == IOREQ_TYPE_COPY )
     {
         int rc = stdvga_intercept_mmio(p);
         if ( (rc == X86EMUL_OKAY) || (rc == X86EMUL_RETRY) )
diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index 0d50339..87625ed 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -208,185 +208,113 @@ void hvm_io_assist(ioreq_t *p)
     }
 }
 
-static int dpci_ioport_read(uint32_t mport, ioreq_t *p)
+static bool_t dpci_portio_accept(struct hvm_io_handler *io_handler,
+                                 struct vcpu *v,
+                                 uint64_t addr,
+                                 uint32_t size)
 {
-    struct hvm_vcpu_io *vio = &current->arch.hvm_vcpu.hvm_io;
-    int rc = X86EMUL_OKAY, i, step = p->df ? -p->size : p->size;
-    uint32_t data = 0;
-
-    for ( i = 0; i < p->count; i++ )
-    {
-        if ( vio->mmio_retrying )
-        {
-            if ( vio->mmio_large_read_bytes != p->size )
-                return X86EMUL_UNHANDLEABLE;
-            memcpy(&data, vio->mmio_large_read, p->size);
-            vio->mmio_large_read_bytes = 0;
-            vio->mmio_retrying = 0;
-        }
-        else switch ( p->size )
-        {
-        case 1:
-            data = inb(mport);
-            break;
-        case 2:
-            data = inw(mport);
-            break;
-        case 4:
-            data = inl(mport);
-            break;
-        default:
-            BUG();
-        }
+    struct hvm_iommu *hd = domain_hvm_iommu(v->domain);
+    struct hvm_vcpu_io *vio = &v->arch.hvm_vcpu.hvm_io;
+    struct g2m_ioport *g2m_ioport;
+    uint32_t start, end;
+    uint32_t gport = addr, mport;
 
-        if ( p->data_is_ptr )
-        {
-            switch ( hvm_copy_to_guest_phys(p->data + step * i,
-                                            &data, p->size) )
-            {
-            case HVMCOPY_okay:
-                break;
-            case HVMCOPY_gfn_paged_out:
-            case HVMCOPY_gfn_shared:
-                rc = X86EMUL_RETRY;
-                break;
-            case HVMCOPY_bad_gfn_to_mfn:
-                /* Drop the write as real hardware would. */
-                continue;
-            case HVMCOPY_bad_gva_to_gfn:
-                ASSERT(0);
-                /* fall through */
-            default:
-                rc = X86EMUL_UNHANDLEABLE;
-                break;
-            }
-            if ( rc != X86EMUL_OKAY)
-                break;
-        }
-        else
-            p->data = data;
-    }
 
-    if ( rc == X86EMUL_RETRY )
+    list_for_each_entry( g2m_ioport, &hd->arch.g2m_ioport_list, list )
     {
-        vio->mmio_retry = 1;
-        vio->mmio_large_read_bytes = p->size;
-        memcpy(vio->mmio_large_read, &data, p->size);
+        start = g2m_ioport->gport;
+        end = start + g2m_ioport->np;
+        if ( (gport >= start) && (gport < end) )
+            goto found;
     }
 
-    if ( i != 0 )
+    return 0;
+
+ found:
+    mport = (gport - start) + g2m_ioport->mport;
+
+    if ( !ioports_access_permitted(current->domain, mport,
+                                   mport + size - 1) )
     {
-        p->count = i;
-        rc = X86EMUL_OKAY;
+        gdprintk(XENLOG_ERR, "Error: access to gport=%#x denied!\n",
+                 (uint32_t)addr);
+        return 0;
     }
 
-    return rc;
+    vio->g2m_ioport = g2m_ioport;
+    return 1;
 }
 
-static int dpci_ioport_write(uint32_t mport, ioreq_t *p)
+static int dpci_portio_read(struct hvm_io_handler *io_handler,
+                            struct vcpu *v,
+                            uint64_t addr,
+                            uint32_t size,
+                            uint64_t *data)
 {
-    int rc = X86EMUL_OKAY, i, step = p->df ? -p->size : p->size;
-    uint32_t data;
-
-    for ( i = 0; i < p->count; i++ )
-    {
-        data = p->data;
-        if ( p->data_is_ptr )
-        {
-            switch ( hvm_copy_from_guest_phys(&data, p->data + step * i,
-                                              p->size) )
-            {
-            case HVMCOPY_okay:
-                break;
-            case HVMCOPY_gfn_paged_out:
-            case HVMCOPY_gfn_shared:
-                rc = X86EMUL_RETRY;
-                break;
-            case HVMCOPY_bad_gfn_to_mfn:
-                data = ~0;
-                break;
-            case HVMCOPY_bad_gva_to_gfn:
-                ASSERT(0);
-                /* fall through */
-            default:
-                rc = X86EMUL_UNHANDLEABLE;
-                break;
-            }
-            if ( rc != X86EMUL_OKAY)
-                break;
-        }
-
-        switch ( p->size )
-        {
-        case 1:
-            outb(data, mport);
-            break;
-        case 2:
-            outw(data, mport);
-            break;
-        case 4:
-            outl(data, mport);
-            break;
-        default:
-            BUG();
-        }
-    }
+    struct hvm_vcpu_io *vio = &v->arch.hvm_vcpu.hvm_io;
+    struct g2m_ioport *g2m_ioport = vio->g2m_ioport;
+    uint32_t mport = (addr - g2m_ioport->gport) + g2m_ioport->mport;
 
-    if ( rc == X86EMUL_RETRY )
-        current->arch.hvm_vcpu.hvm_io.mmio_retry = 1;
-
-    if ( i != 0 )
+    switch ( size )
     {
-        p->count = i;
-        rc = X86EMUL_OKAY;
+    case 1:
+        *data = inb(mport);
+        break;
+    case 2:
+        *data = inw(mport);
+        break;
+    case 4:
+        *data = inl(mport);
+        break;
+    default:
+        BUG();
     }
 
-    return rc;
+    return X86EMUL_OKAY;
 }
 
-int dpci_ioport_intercept(ioreq_t *p)
+static int dpci_portio_write(struct hvm_io_handler *io_handler,
+                             struct vcpu *v,
+                             uint64_t addr,
+                             uint32_t size,
+                             uint64_t data)
 {
-    struct domain *d = current->domain;
-    struct hvm_iommu *hd = domain_hvm_iommu(d);
-    struct g2m_ioport *g2m_ioport;
-    unsigned int mport, gport = p->addr;
-    unsigned int s = 0, e = 0;
-    int rc;
+    struct hvm_vcpu_io *vio = &v->arch.hvm_vcpu.hvm_io;
+    struct g2m_ioport *g2m_ioport = vio->g2m_ioport;
+    uint32_t mport = (addr - g2m_ioport->gport) + g2m_ioport->mport;
 
-    list_for_each_entry( g2m_ioport, &hd->arch.g2m_ioport_list, list )
+    switch ( size )
     {
-        s = g2m_ioport->gport;
-        e = s + g2m_ioport->np;
-        if ( (gport >= s) && (gport < e) )
-            goto found;
+    case 1:
+        outb(data, mport);
+        break;
+    case 2:
+        outw(data, mport);
+        break;
+    case 4:
+        outl(data, mport);
+        break;
+    default:
+        BUG();
     }
 
-    return X86EMUL_UNHANDLEABLE;
+    return X86EMUL_OKAY;
+}
 
- found:
-    mport = (gport - s) + g2m_ioport->mport;
+static const struct hvm_io_ops dpci_portio_ops = {
+    .accept = dpci_portio_accept,
+    .read = dpci_portio_read,
+    .write = dpci_portio_write
+};
 
-    if ( !ioports_access_permitted(d, mport, mport + p->size - 1) ) 
-    {
-        gdprintk(XENLOG_ERR, "Error: access to gport=%#x denied!\n",
-                 (uint32_t)p->addr);
-        return X86EMUL_UNHANDLEABLE;
-    }
+void register_dpci_portio_handler(struct domain *d)
+{
+    struct hvm_io_handler *handler = &d->arch.hvm_domain.dpci_handler;
 
-    switch ( p->dir )
-    {
-    case IOREQ_READ:
-        rc = dpci_ioport_read(mport, p);
-        break;
-    case IOREQ_WRITE:
-        rc = dpci_ioport_write(mport, p);
-        break;
-    default:
-        gdprintk(XENLOG_ERR, "Error: couldn't handle p->dir = %d", p->dir);
-        rc = X86EMUL_UNHANDLEABLE;
-    }
+    handler->type = IOREQ_TYPE_PIO;
+    handler->ops = &dpci_portio_ops;
 
-    return rc;
+    hvm_register_io_handler(d, handler, 1);
 }
 
 /*
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index d9c0cfe..5d9c9f5 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -94,6 +94,7 @@ struct hvm_domain {
 
     struct hvm_mmio_handler   *mmio_handler;
     struct hvm_portio_handler *portio_handler;
+    struct hvm_io_handler     dpci_handler;
 
     struct {
         spinlock_t       lock;
diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
index 752b6e9..7b8db78 100644
--- a/xen/include/asm-x86/hvm/io.h
+++ b/xen/include/asm-x86/hvm/io.h
@@ -136,6 +136,9 @@ int stdvga_intercept_mmio(ioreq_t *p);
 void stdvga_deinit(struct domain *d);
 
 extern void hvm_dpci_msi_eoi(struct domain *d, int vector);
+
+void register_dpci_portio_handler(struct domain *d);
+
 #endif /* __ASM_X86_HVM_IO_H__ */
 
 
diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
index 3d8f4dc..dd08416 100644
--- a/xen/include/asm-x86/hvm/vcpu.h
+++ b/xen/include/asm-x86/hvm/vcpu.h
@@ -77,6 +77,8 @@ struct hvm_vcpu_io {
     bool_t mmio_retry, mmio_retrying;
 
     unsigned long msix_unmask_address;
+
+    struct g2m_ioport *g2m_ioport;
 };
 
 #define VMCX_EADDR    (~0ULL)
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index b30bf41..1d00696 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -93,7 +93,6 @@ void pt_pci_init(void);
 
 struct pirq;
 int hvm_do_IRQ_dpci(struct domain *, struct pirq *);
-int dpci_ioport_intercept(ioreq_t *p);
 int pt_irq_create_bind(struct domain *, xen_domctl_bind_pt_irq_t *);
 int pt_irq_destroy_bind(struct domain *, xen_domctl_bind_pt_irq_t *);
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 05/17] x86/hvm: unify stdvga mmio intercept with standard mmio intercept
  2015-06-11 15:42 [PATCH v2 00/17] x86/hvm: I/O emulation cleanup and fix Paul Durrant
                   ` (3 preceding siblings ...)
  2015-06-11 15:42 ` [PATCH v2 04/17] x86/hvm: unify dpci portio intercept with standard portio intercept Paul Durrant
@ 2015-06-11 15:42 ` Paul Durrant
  2015-06-17 15:50   ` Jan Beulich
  2015-06-11 15:42 ` [PATCH v2 06/17] x86/hvm: revert 82ed8716b "fix direct PCI port I/O emulation retry Paul Durrant
                   ` (12 subsequent siblings)
  17 siblings, 1 reply; 38+ messages in thread
From: Paul Durrant @ 2015-06-11 15:42 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Paul Durrant, Keir Fraser, Jan Beulich

It's clear from the following check in hvmemul_rep_movs:

    if ( sp2mt == p2m_mmio_direct || dp2mt == p2m_mmio_direct ||
         (sp2mt == p2m_mmio_dm && dp2mt == p2m_mmio_dm) )
        return X86EMUL_UNHANDLEABLE;

that mmio <-> mmio copy is not handled. This means the code in the
stdvga mmio intercept that explicitly handles mmio <-> mmio copy when
hvm_copy_to/from_guest_phys() fails is never going to be executed.

This patch therefore adds a check in hvmemul_do_io_addr() to make sure
mmio <-> mmio is disallowed and then registers standard mmio intercept ops
in stdvga_init().

With this patch all mmio and portio handled within Xen now goes through
process_io_intercept().

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/hvm/emulate.c   |    9 +++
 xen/arch/x86/hvm/intercept.c |    7 --
 xen/arch/x86/hvm/stdvga.c    |  170 ++++++++----------------------------------
 xen/include/asm-x86/hvm/io.h |    1 -
 4 files changed, 42 insertions(+), 145 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 7da2d68..4af70b0 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -273,6 +273,15 @@ static int hvmemul_acquire_page(unsigned long gmfn, struct page_info **page)
         return X86EMUL_RETRY;
     }
 
+    /* This code should not be reached if the gmfn is not RAM */
+    if ( p2m_is_mmio(p2mt) )
+    {
+        domain_crash(d);
+
+        put_page(*page);
+        return X86EMUL_UNHANDLEABLE;
+    }
+
     return X86EMUL_OKAY;
 }
 
diff --git a/xen/arch/x86/hvm/intercept.c b/xen/arch/x86/hvm/intercept.c
index b5e7879..b8bcd0b 100644
--- a/xen/arch/x86/hvm/intercept.c
+++ b/xen/arch/x86/hvm/intercept.c
@@ -329,13 +329,6 @@ int hvm_io_intercept(ioreq_t *p)
     struct vcpu *v = current;
     struct hvm_io_handler *handler;
 
-    if ( p->type == IOREQ_TYPE_COPY )
-    {
-        int rc = stdvga_intercept_mmio(p);
-        if ( (rc == X86EMUL_OKAY) || (rc == X86EMUL_RETRY) )
-            return rc;
-    }
-
     handler = hvm_find_io_handler(v, p);
 
     if ( handler == NULL )
diff --git a/xen/arch/x86/hvm/stdvga.c b/xen/arch/x86/hvm/stdvga.c
index 34cdd90..4999def 100644
--- a/xen/arch/x86/hvm/stdvga.c
+++ b/xen/arch/x86/hvm/stdvga.c
@@ -275,7 +275,8 @@ static uint8_t stdvga_mem_readb(uint64_t addr)
     return ret;
 }
 
-static uint64_t stdvga_mem_read(uint64_t addr, uint64_t size)
+static int stdvga_mem_read(struct vcpu *v, unsigned long addr,
+                           unsigned long size, unsigned long *p_data)
 {
     uint64_t data = 0;
 
@@ -313,7 +314,9 @@ static uint64_t stdvga_mem_read(uint64_t addr, uint64_t size)
         break;
     }
 
-    return data;
+    *p_data = data;
+
+    return X86EMUL_OKAY;
 }
 
 static void stdvga_mem_writeb(uint64_t addr, uint32_t val)
@@ -424,8 +427,17 @@ static void stdvga_mem_writeb(uint64_t addr, uint32_t val)
     }
 }
 
-static void stdvga_mem_write(uint64_t addr, uint64_t data, uint64_t size)
+static int stdvga_mem_write(struct vcpu *v, unsigned long addr,
+                            unsigned long size, unsigned long data)
 {
+    ioreq_t p = { .type = IOREQ_TYPE_COPY,
+                  .addr = addr,
+                  .size = size,
+                  .count = 1,
+                  .dir = IOREQ_WRITE,
+                  .data = data,
+    };
+
     /* Intercept mmio write */
     switch ( size )
     {
@@ -460,153 +472,35 @@ static void stdvga_mem_write(uint64_t addr, uint64_t data, uint64_t size)
         gdprintk(XENLOG_WARNING, "invalid io size: %"PRId64"\n", size);
         break;
     }
-}
-
-static uint32_t read_data;
-
-static int mmio_move(struct hvm_hw_stdvga *s, ioreq_t *p)
-{
-    int i;
-    uint64_t addr = p->addr;
-    p2m_type_t p2mt;
-    struct domain *d = current->domain;
-    int step = p->df ? -p->size : p->size;
 
-    if ( p->data_is_ptr )
-    {
-        uint64_t data = p->data, tmp;
-
-        if ( p->dir == IOREQ_READ )
-        {
-            for ( i = 0; i < p->count; i++ ) 
-            {
-                tmp = stdvga_mem_read(addr, p->size);
-                if ( hvm_copy_to_guest_phys(data, &tmp, p->size) !=
-                     HVMCOPY_okay )
-                {
-                    struct page_info *dp = get_page_from_gfn(
-                            d, data >> PAGE_SHIFT, &p2mt, P2M_ALLOC);
-                    /*
-                     * The only case we handle is vga_mem <-> vga_mem.
-                     * Anything else disables caching and leaves it to qemu-dm.
-                     */
-                    if ( (p2mt != p2m_mmio_dm) || (data < VGA_MEM_BASE) ||
-                         ((data + p->size) > (VGA_MEM_BASE + VGA_MEM_SIZE)) )
-                    {
-                        if ( dp )
-                            put_page(dp);
-                        return 0;
-                    }
-                    ASSERT(!dp);
-                    stdvga_mem_write(data, tmp, p->size);
-                }
-                data += step;
-                addr += step;
-            }
-        }
-        else
-        {
-            for ( i = 0; i < p->count; i++ )
-            {
-                if ( hvm_copy_from_guest_phys(&tmp, data, p->size) !=
-                     HVMCOPY_okay )
-                {
-                    struct page_info *dp = get_page_from_gfn(
-                        d, data >> PAGE_SHIFT, &p2mt, P2M_ALLOC);
-                    if ( (p2mt != p2m_mmio_dm) || (data < VGA_MEM_BASE) ||
-                         ((data + p->size) > (VGA_MEM_BASE + VGA_MEM_SIZE)) )
-                    {
-                        if ( dp )
-                            put_page(dp);
-                        return 0;
-                    }
-                    ASSERT(!dp);
-                    tmp = stdvga_mem_read(data, p->size);
-                }
-                stdvga_mem_write(addr, tmp, p->size);
-                data += step;
-                addr += step;
-            }
-        }
-    }
-    else if ( p->dir == IOREQ_WRITE )
-    {
-        for ( i = 0; i < p->count; i++ )
-        {
-            stdvga_mem_write(addr, p->data, p->size);
-            addr += step;
-        }
-    }
-    else
-    {
-        ASSERT(p->count == 1);
-        p->data = stdvga_mem_read(addr, p->size);
-    }
+    if (!hvm_buffered_io_send(&p))
+        return X86EMUL_UNHANDLEABLE;
 
-    read_data = p->data;
-    return 1;
+    return X86EMUL_OKAY;
 }
 
-int stdvga_intercept_mmio(ioreq_t *p)
+static int stdvga_mem_check(struct vcpu *v, unsigned long addr)
 {
-    struct domain *d = current->domain;
-    struct hvm_hw_stdvga *s = &d->arch.hvm_domain.stdvga;
-    uint64_t start, end;
-    int buf = 0, rc;
-
-    if ( p->df )
-    {
-        start = (p->addr - (p->count - 1) * p->size);
-        end = p->addr + p->size;
-    }
-    else
-    {
-        start = p->addr;
-        end = p->addr + p->count * p->size;
-    }
-
-    if ( (start < VGA_MEM_BASE) || (end > (VGA_MEM_BASE + VGA_MEM_SIZE)) )
-        return X86EMUL_UNHANDLEABLE;
-
-    if ( p->size > 8 )
-    {
-        gdprintk(XENLOG_WARNING, "invalid mmio size %d\n", (int)p->size);
-        return X86EMUL_UNHANDLEABLE;
-    }
+    struct hvm_hw_stdvga *s = &v->domain->arch.hvm_domain.stdvga;
+    int rc;
 
     spin_lock(&s->lock);
 
-    if ( s->stdvga && s->cache )
-    {
-        switch ( p->type )
-        {
-        case IOREQ_TYPE_COPY:
-            buf = mmio_move(s, p);
-            if ( !buf )
-                s->cache = 0;
-            break;
-        default:
-            gdprintk(XENLOG_WARNING, "unsupported mmio request type:%d "
-                     "addr:0x%04x data:0x%04x size:%d count:%d state:%d "
-                     "isptr:%d dir:%d df:%d\n",
-                     p->type, (int)p->addr, (int)p->data, (int)p->size,
-                     (int)p->count, p->state,
-                     p->data_is_ptr, p->dir, p->df);
-            s->cache = 0;
-        }
-    }
-    else
-    {
-        buf = (p->dir == IOREQ_WRITE);
-    }
-
-    rc = (buf && hvm_buffered_io_send(p));
+    rc = s->stdvga && s->cache &&
+        (addr >= VGA_MEM_BASE) &&
+        (addr < (VGA_MEM_BASE + VGA_MEM_SIZE));
 
     spin_unlock(&s->lock);
 
-    return rc ? X86EMUL_OKAY : X86EMUL_UNHANDLEABLE;
+    return rc;
 }
 
+const struct hvm_mmio_ops stdvga_mem_ops = {
+    .check = stdvga_mem_check,
+    .read = stdvga_mem_read,
+    .write = stdvga_mem_write
+};
+
 void stdvga_init(struct domain *d)
 {
     struct hvm_hw_stdvga *s = &d->arch.hvm_domain.stdvga;
@@ -634,6 +528,8 @@ void stdvga_init(struct domain *d)
         register_portio_handler(d, 0x3c4, 2, stdvga_intercept_pio);
         /* Graphics registers. */
         register_portio_handler(d, 0x3ce, 2, stdvga_intercept_pio);
+        /* VGA memory */
+        register_mmio_handler(d, &stdvga_mem_ops);
     }
 }
 
diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
index 7b8db78..50965f1 100644
--- a/xen/include/asm-x86/hvm/io.h
+++ b/xen/include/asm-x86/hvm/io.h
@@ -132,7 +132,6 @@ struct hvm_hw_stdvga {
 };
 
 void stdvga_init(struct domain *d);
-int stdvga_intercept_mmio(ioreq_t *p);
 void stdvga_deinit(struct domain *d);
 
 extern void hvm_dpci_msi_eoi(struct domain *d, int vector);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 06/17] x86/hvm: revert 82ed8716b "fix direct PCI port I/O emulation retry...
  2015-06-11 15:42 [PATCH v2 00/17] x86/hvm: I/O emulation cleanup and fix Paul Durrant
                   ` (4 preceding siblings ...)
  2015-06-11 15:42 ` [PATCH v2 05/17] x86/hvm: unify stdvga mmio intercept with standard mmio intercept Paul Durrant
@ 2015-06-11 15:42 ` Paul Durrant
  2015-06-17 10:54   ` Paul Durrant
  2015-06-11 15:42 ` [PATCH v2 07/17] x86/hvm: only call hvm_io_assist() from hvm_wait_for_io() Paul Durrant
                   ` (11 subsequent siblings)
  17 siblings, 1 reply; 38+ messages in thread
From: Paul Durrant @ 2015-06-11 15:42 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Paul Durrant, Keir Fraser, Jan Beulich

...and error handling"

NOTE: A straight reversion was not possible because of subsequent changes
      in the code so this at least partially a manual reversion.

By limiting hvmemul_do_io_addr() to reps falling within the pages on which
a reference has already been taken, we can guarantee that calls to
hvm_copy_to/from_guest_phys() will not hit the HVMCOPY_gfn_paged_out
or HVMCOPY_gfn_shared cases. Thus we can remove the retry logic from
the intercept code and simplify it significantly.

Normally hvmemul_do_io_addr() will only reference single page at a time.
It will, however, take an extra page reference for I/O spanning a page
boundary.

It is still important to know, upon returning from x86_emulate(), whether
the number of reps was reduced so the mmio_retry flag is retained for that
purpose.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/hvm/emulate.c     |   83 ++++++++++++++++++++++++++++------------
 xen/arch/x86/hvm/intercept.c   |   52 +++++--------------------
 xen/include/asm-x86/hvm/vcpu.h |    2 +-
 3 files changed, 69 insertions(+), 68 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 4af70b0..38b8956 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -60,6 +60,7 @@ static int hvmemul_do_io(
         .type = is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO,
         .addr = addr,
         .size = size,
+        .count = *reps,
         .dir = dir,
         .df = df,
         .data = data,
@@ -131,15 +132,6 @@ static int hvmemul_do_io(
         HVMIO_dispatched : HVMIO_awaiting_completion;
     vio->io_size = size;
 
-    /*
-     * When retrying a repeated string instruction, force exit to guest after
-     * completion of the retried iteration to allow handling of interrupts.
-     */
-    if ( vio->mmio_retrying )
-        *reps = 1;
-
-    p.count = *reps;
-
     if ( dir == IOREQ_WRITE )
     {
         if ( !data_is_addr )
@@ -153,17 +145,9 @@ static int hvmemul_do_io(
     switch ( rc )
     {
     case X86EMUL_OKAY:
-    case X86EMUL_RETRY:
-        *reps = p.count;
         p.state = STATE_IORESP_READY;
-        if ( !vio->mmio_retry )
-        {
-            hvm_io_assist(&p);
-            vio->io_state = HVMIO_none;
-        }
-        else
-            /* Defer hvm_io_assist() invocation to hvm_do_resume(). */
-            vio->io_state = HVMIO_handle_mmio_awaiting_completion;
+        hvm_io_assist(&p);
+        vio->io_state = HVMIO_none;
         break;
     case X86EMUL_UNHANDLEABLE:
     {
@@ -294,17 +278,67 @@ int hvmemul_do_io_addr(
     bool_t is_mmio, paddr_t addr, unsigned long *reps, 
     unsigned int size, uint8_t dir, bool_t df, paddr_t ram_gpa)
 {
-    struct page_info *ram_page;
+    struct vcpu *v = current;
+    unsigned long ram_gmfn = paddr_to_pfn(ram_gpa);
+    struct page_info *ram_page[2];
+    int nr_pages = 0;
+    unsigned long count;
     int rc;
 
-    rc = hvmemul_acquire_page(paddr_to_pfn(ram_gpa), &ram_page);
+
+    rc = hvmemul_acquire_page(ram_gmfn, &ram_page[nr_pages]);
     if ( rc != X86EMUL_OKAY )
-        return rc;
+        goto out;
+
+    nr_pages++;
+
+    /* Detemine how many reps will fit within this page */
+    for ( count = 0; count < *reps; count++ )
+    {
+        paddr_t start, end;
+
+        if ( df )
+        {
+            start = ram_gpa - count * size;
+            end = ram_gpa + size - 1;
+        }
+        else
+        {
+            start = ram_gpa;
+            end = ram_gpa + (count + 1) * size - 1;
+        }
+
+        if ( paddr_to_pfn(start) != ram_gmfn ||
+             paddr_to_pfn(end) != ram_gmfn )
+            break;
+    }
+
+    if ( count == 0 )
+    {
+        /*
+         * This access must span two pages, so grab a reference to
+         * the next page and do a single rep.
+         */
+        rc = hvmemul_acquire_page(df ? ram_gmfn - 1 : ram_gmfn + 1,
+                                  &ram_page[nr_pages]);
+        if ( rc != X86EMUL_OKAY )
+            goto out;
 
-    rc = hvmemul_do_io(is_mmio, addr, reps, size, dir, df, 1,
+        nr_pages++;
+        count = 1;
+    }
+
+    rc = hvmemul_do_io(is_mmio, addr, &count, size, dir, df, 1,
                        (uint64_t)ram_gpa);
+    if ( rc == X86EMUL_OKAY )
+    {
+        v->arch.hvm_vcpu.hvm_io.mmio_retry = (count < *reps);
+        *reps = count;
+    }
 
-    hvmemul_release_page(ram_page);
+ out:
+    while ( --nr_pages >= 0 )
+        hvmemul_release_page(ram_page[nr_pages]);
 
     return rc;
 }
@@ -1554,7 +1588,6 @@ static int _hvm_emulate_one(struct hvm_emulate_ctxt *hvmemul_ctxt,
     }
 
     hvmemul_ctxt->exn_pending = 0;
-    vio->mmio_retrying = vio->mmio_retry;
     vio->mmio_retry = 0;
 
     if ( cpu_has_vmx )
diff --git a/xen/arch/x86/hvm/intercept.c b/xen/arch/x86/hvm/intercept.c
index b8bcd0b..c0c9254 100644
--- a/xen/arch/x86/hvm/intercept.c
+++ b/xen/arch/x86/hvm/intercept.c
@@ -155,7 +155,6 @@ static const struct hvm_io_ops portio_ops = {
 static int process_io_intercept(struct vcpu *v, ioreq_t *p,
                                 struct hvm_io_handler *handler)
 {
-    struct hvm_vcpu_io *vio = &v->arch.hvm_vcpu.hvm_io;
     const struct hvm_io_ops *ops = handler->ops;
     int rc = X86EMUL_OKAY, i, step = p->df ? -p->size : p->size;
     uint64_t data;
@@ -165,23 +164,12 @@ static int process_io_intercept(struct vcpu *v, ioreq_t *p,
     {
         for ( i = 0; i < p->count; i++ )
         {
-            if ( vio->mmio_retrying )
-            {
-                if ( vio->mmio_large_read_bytes != p->size )
-                    return X86EMUL_UNHANDLEABLE;
-                memcpy(&data, vio->mmio_large_read, p->size);
-                vio->mmio_large_read_bytes = 0;
-                vio->mmio_retrying = 0;
-            }
-            else
-            {
-                addr = (p->type == IOREQ_TYPE_COPY) ?
-                    p->addr + step * i :
-                    p->addr;
-                rc = ops->read(handler, v, addr, p->size, &data);
-                if ( rc != X86EMUL_OKAY )
-                    break;
-            }
+            addr = (p->type == IOREQ_TYPE_COPY) ?
+                p->addr + step * i :
+                p->addr;
+            rc = ops->read(handler, v, addr, p->size, &data);
+            if ( rc != X86EMUL_OKAY )
+                break;
 
             if ( p->data_is_ptr )
             {
@@ -190,14 +178,12 @@ static int process_io_intercept(struct vcpu *v, ioreq_t *p,
                 {
                 case HVMCOPY_okay:
                     break;
-                case HVMCOPY_gfn_paged_out:
-                case HVMCOPY_gfn_shared:
-                    rc = X86EMUL_RETRY;
-                    break;
                 case HVMCOPY_bad_gfn_to_mfn:
                     /* Drop the write as real hardware would. */
                     continue;
                 case HVMCOPY_bad_gva_to_gfn:
+                case HVMCOPY_gfn_paged_out:
+                case HVMCOPY_gfn_shared:
                     ASSERT(0);
                     /* fall through */
                 default:
@@ -210,13 +196,6 @@ static int process_io_intercept(struct vcpu *v, ioreq_t *p,
             else
                 p->data = data;
         }
-
-        if ( rc == X86EMUL_RETRY )
-        {
-            vio->mmio_retry = 1;
-            vio->mmio_large_read_bytes = p->size;
-            memcpy(vio->mmio_large_read, &data, p->size);
-        }
     }
     else /* p->dir == IOREQ_WRITE */
     {
@@ -229,14 +208,12 @@ static int process_io_intercept(struct vcpu *v, ioreq_t *p,
                 {
                 case HVMCOPY_okay:
                     break;
-                case HVMCOPY_gfn_paged_out:
-                case HVMCOPY_gfn_shared:
-                    rc = X86EMUL_RETRY;
-                    break;
                 case HVMCOPY_bad_gfn_to_mfn:
                     data = ~0;
                     break;
                 case HVMCOPY_bad_gva_to_gfn:
+                case HVMCOPY_gfn_paged_out:
+                case HVMCOPY_gfn_shared:
                     ASSERT(0);
                     /* fall through */
                 default:
@@ -256,15 +233,6 @@ static int process_io_intercept(struct vcpu *v, ioreq_t *p,
             if ( rc != X86EMUL_OKAY )
                 break;
         }
-
-        if ( rc == X86EMUL_RETRY )
-            vio->mmio_retry = 1;
-    }
-
-    if ( i != 0 )
-    {
-        p->count = i;
-        rc = X86EMUL_OKAY;
     }
 
     return rc;
diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
index dd08416..97d78bd 100644
--- a/xen/include/asm-x86/hvm/vcpu.h
+++ b/xen/include/asm-x86/hvm/vcpu.h
@@ -74,7 +74,7 @@ struct hvm_vcpu_io {
      * For string instruction emulation we need to be able to signal a
      * necessary retry through other than function return codes.
      */
-    bool_t mmio_retry, mmio_retrying;
+    bool_t mmio_retry;
 
     unsigned long msix_unmask_address;
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 07/17] x86/hvm: only call hvm_io_assist() from hvm_wait_for_io()
  2015-06-11 15:42 [PATCH v2 00/17] x86/hvm: I/O emulation cleanup and fix Paul Durrant
                   ` (5 preceding siblings ...)
  2015-06-11 15:42 ` [PATCH v2 06/17] x86/hvm: revert 82ed8716b "fix direct PCI port I/O emulation retry Paul Durrant
@ 2015-06-11 15:42 ` Paul Durrant
  2015-06-11 15:42 ` [PATCH v2 08/17] x86/hvm: split I/O completion handling from state model Paul Durrant
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 38+ messages in thread
From: Paul Durrant @ 2015-06-11 15:42 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Paul Durrant, Keir Fraser, Jan Beulich

By removing the calls in hvmemul_do_io() (which is replaced by a single
assignment) and hvm_complete_assist_request() (which is replaced by a
call to process_portio_intercept() with a suitable set of ops) then
hvm_io_assist() can be moved into hvm.c and made static (and hence be a
candidate for inlining).

This patch also fixes the I/O state test at the end of hvm_io_assist()
to check the correct value. Since the ioreq server patch series was
integrated the current ioreq state is no longer an indicator of in-flight
I/O state, since an I/O sheduled by re-emulation may be targetted at a
different ioreq server.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/hvm/emulate.c   |   36 +++++++++++++++++---
 xen/arch/x86/hvm/hvm.c       |   74 ++++++++++++++++++++++++------------------
 xen/arch/x86/hvm/intercept.c |    4 +--
 xen/arch/x86/hvm/io.c        |   39 ----------------------
 xen/include/asm-x86/hvm/io.h |    3 +-
 5 files changed, 79 insertions(+), 77 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 38b8956..4b3581a 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -50,6 +50,34 @@ static void hvmtrace_io_assist(int is_mmio, ioreq_t *p)
     trace_var(event, 0/*!cycles*/, size, buffer);
 }
 
+static int null_read(struct hvm_io_handler *io_handler,
+                     struct vcpu *v,
+                     uint64_t addr,
+                     uint32_t size,
+                     uint64_t *data)
+{
+    *data = ~0ul;
+    return X86EMUL_OKAY;
+}
+
+static int null_write(struct hvm_io_handler *io_handler,
+                      struct vcpu *v,
+                      uint64_t addr,
+                      uint32_t size,
+                      uint64_t data)
+{
+    return X86EMUL_OKAY;
+}
+
+static const struct hvm_io_ops null_ops = {
+    .read = null_read,
+    .write = null_write
+};
+
+static struct hvm_io_handler null_handler = {
+    .ops = &null_ops
+};
+
 static int hvmemul_do_io(
     int is_mmio, paddr_t addr, unsigned long *reps, unsigned int size,
     uint8_t dir, bool_t df, bool_t data_is_addr, uint64_t data)
@@ -145,8 +173,7 @@ static int hvmemul_do_io(
     switch ( rc )
     {
     case X86EMUL_OKAY:
-        p.state = STATE_IORESP_READY;
-        hvm_io_assist(&p);
+        vio->io_data = p.data;
         vio->io_state = HVMIO_none;
         break;
     case X86EMUL_UNHANDLEABLE:
@@ -157,8 +184,9 @@ static int hvmemul_do_io(
         /* If there is no suitable backing DM, just ignore accesses */
         if ( !s )
         {
-            hvm_complete_assist_req(&p);
-            rc = X86EMUL_OKAY;
+            rc = process_io_intercept(curr, &p, &null_handler);
+            if ( rc == X86EMUL_OKAY )
+                vio->io_data = p.data;
             vio->io_state = HVMIO_none;
         }
         else
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index dbba6e3..799f0e7 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -411,6 +411,49 @@ bool_t hvm_io_pending(struct vcpu *v)
     return 0;
 }
 
+static void hvm_io_assist(ioreq_t *p)
+{
+    struct vcpu *curr = current;
+    struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
+    enum hvm_io_state io_state;
+
+    p->state = STATE_IOREQ_NONE;
+
+    io_state = vio->io_state;
+    vio->io_state = HVMIO_none;
+
+    switch ( io_state )
+    {
+    case HVMIO_awaiting_completion:
+        vio->io_state = HVMIO_completed;
+        vio->io_data = p->data;
+        break;
+    case HVMIO_handle_mmio_awaiting_completion:
+        vio->io_state = HVMIO_completed;
+        vio->io_data = p->data;
+        (void)handle_mmio();
+        break;
+    case HVMIO_handle_pio_awaiting_completion:
+        if ( vio->io_size == 4 ) /* Needs zero extension. */
+            guest_cpu_user_regs()->rax = (uint32_t)p->data;
+        else
+            memcpy(&guest_cpu_user_regs()->rax, &p->data, vio->io_size);
+        break;
+    default:
+        break;
+    }
+
+    /*
+     * Re-emulation may have scheduled another I/O so io_state set
+     * at the top of the function may have changed.
+     */
+    if ( vio->io_state == HVMIO_none )
+    {
+        msix_write_completion(curr);
+        vcpu_end_shutdown_deferral(curr);
+    }
+}
+
 static bool_t hvm_wait_for_io(struct hvm_ioreq_vcpu *sv, ioreq_t *p)
 {
     /* NB. Optimised for common case (p->state == STATE_IOREQ_NONE). */
@@ -2667,37 +2710,6 @@ bool_t hvm_send_assist_req(struct hvm_ioreq_server *s, ioreq_t *proto_p)
     return 0;
 }
 
-void hvm_complete_assist_req(ioreq_t *p)
-{
-    switch ( p->type )
-    {
-    case IOREQ_TYPE_PCI_CONFIG:
-        ASSERT_UNREACHABLE();
-        break;
-    case IOREQ_TYPE_COPY:
-    case IOREQ_TYPE_PIO:
-        if ( p->dir == IOREQ_READ )
-        {
-            if ( !p->data_is_ptr )
-                p->data = ~0ul;
-            else
-            {
-                int i, step = p->df ? -p->size : p->size;
-                uint32_t data = ~0;
-
-                for ( i = 0; i < p->count; i++ )
-                    hvm_copy_to_guest_phys(p->data + step * i, &data,
-                                           p->size);
-            }
-        }
-        /* FALLTHRU */
-    default:
-        p->state = STATE_IORESP_READY;
-        hvm_io_assist(p);
-        break;
-    }
-}
-
 void hvm_broadcast_assist_req(ioreq_t *p)
 {
     struct domain *d = current->domain;
diff --git a/xen/arch/x86/hvm/intercept.c b/xen/arch/x86/hvm/intercept.c
index c0c9254..0537d39 100644
--- a/xen/arch/x86/hvm/intercept.c
+++ b/xen/arch/x86/hvm/intercept.c
@@ -152,8 +152,8 @@ static const struct hvm_io_ops portio_ops = {
     .write = hvm_portio_write
 };
 
-static int process_io_intercept(struct vcpu *v, ioreq_t *p,
-                                struct hvm_io_handler *handler)
+int process_io_intercept(struct vcpu *v, ioreq_t *p,
+                         struct hvm_io_handler *handler)
 {
     const struct hvm_io_ops *ops = handler->ops;
     int rc = X86EMUL_OKAY, i, step = p->df ? -p->size : p->size;
diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index 87625ed..ee0ed82 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -169,45 +169,6 @@ int handle_pio(uint16_t port, unsigned int size, int dir)
     return 1;
 }
 
-void hvm_io_assist(ioreq_t *p)
-{
-    struct vcpu *curr = current;
-    struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
-    enum hvm_io_state io_state;
-
-    p->state = STATE_IOREQ_NONE;
-
-    io_state = vio->io_state;
-    vio->io_state = HVMIO_none;
-
-    switch ( io_state )
-    {
-    case HVMIO_awaiting_completion:
-        vio->io_state = HVMIO_completed;
-        vio->io_data = p->data;
-        break;
-    case HVMIO_handle_mmio_awaiting_completion:
-        vio->io_state = HVMIO_completed;
-        vio->io_data = p->data;
-        (void)handle_mmio();
-        break;
-    case HVMIO_handle_pio_awaiting_completion:
-        if ( vio->io_size == 4 ) /* Needs zero extension. */
-            guest_cpu_user_regs()->rax = (uint32_t)p->data;
-        else
-            memcpy(&guest_cpu_user_regs()->rax, &p->data, vio->io_size);
-        break;
-    default:
-        break;
-    }
-
-    if ( p->state == STATE_IOREQ_NONE )
-    {
-        msix_write_completion(curr);
-        vcpu_end_shutdown_deferral(curr);
-    }
-}
-
 static bool_t dpci_portio_accept(struct hvm_io_handler *io_handler,
                                  struct vcpu *v,
                                  uint64_t addr,
diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
index 50965f1..7aa1f80 100644
--- a/xen/include/asm-x86/hvm/io.h
+++ b/xen/include/asm-x86/hvm/io.h
@@ -94,6 +94,8 @@ void hvm_register_io_handler(struct domain *d,
 struct hvm_io_handler *hvm_find_io_handler(struct vcpu *v,
                                            ioreq_t *p);
 
+int process_io_intercept(struct vcpu *v, ioreq_t *p,
+                         struct hvm_io_handler *handler);
 int hvm_io_intercept(ioreq_t *p);
 
 void register_mmio_handler(struct domain *d,
@@ -113,7 +115,6 @@ int handle_mmio_with_translation(unsigned long gva, unsigned long gpfn,
                                  struct npfec);
 int handle_pio(uint16_t port, unsigned int size, int dir);
 void hvm_interrupt_post(struct vcpu *v, int vector, int type);
-void hvm_io_assist(ioreq_t *p);
 void hvm_dpci_eoi(struct domain *d, unsigned int guest_irq,
                   const union vioapic_redir_entry *ent);
 void msix_write_completion(struct vcpu *);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 08/17] x86/hvm: split I/O completion handling from state model
  2015-06-11 15:42 [PATCH v2 00/17] x86/hvm: I/O emulation cleanup and fix Paul Durrant
                   ` (6 preceding siblings ...)
  2015-06-11 15:42 ` [PATCH v2 07/17] x86/hvm: only call hvm_io_assist() from hvm_wait_for_io() Paul Durrant
@ 2015-06-11 15:42 ` Paul Durrant
  2015-06-11 15:42 ` [PATCH v2 09/17] x86/hvm: remove hvm_io_pending() check in hvmemul_do_io() Paul Durrant
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 38+ messages in thread
From: Paul Durrant @ 2015-06-11 15:42 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Paul Durrant, Keir Fraser, Jan Beulich

The state of in-flight I/O and how its completion will be handled are
logically separate and conflating the two makes the code unnecessarily
confusing.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/hvm/hvm.c         |   42 +++++++++++++++++++++++++---------------
 xen/arch/x86/hvm/io.c          |    4 ++--
 xen/include/asm-x86/hvm/vcpu.h |   15 +++++++++-----
 3 files changed, 38 insertions(+), 23 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 799f0e7..916459a 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -415,31 +415,41 @@ static void hvm_io_assist(ioreq_t *p)
 {
     struct vcpu *curr = current;
     struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
-    enum hvm_io_state io_state;
 
     p->state = STATE_IOREQ_NONE;
 
-    io_state = vio->io_state;
-    vio->io_state = HVMIO_none;
-
-    switch ( io_state )
+    switch ( vio->io_state )
     {
     case HVMIO_awaiting_completion:
+    {
+        enum hvm_io_completion completion = vio->io_completion;
+
         vio->io_state = HVMIO_completed;
         vio->io_data = p->data;
+        vio->io_completion = HVMIO_no_completion;
+
+        switch ( completion )
+        {
+        case HVMIO_mmio_completion:
+            (void)handle_mmio();
+            break;
+
+        case HVMIO_pio_completion:
+            if ( vio->io_size == 4 ) /* Needs zero extension. */
+                guest_cpu_user_regs()->rax = (uint32_t)p->data;
+            else
+                memcpy(&guest_cpu_user_regs()->rax, &p->data, vio->io_size);
+
+            vio->io_state = HVMIO_none;
+            break;
+        default:
+            break;
+        }
+
         break;
-    case HVMIO_handle_mmio_awaiting_completion:
-        vio->io_state = HVMIO_completed;
-        vio->io_data = p->data;
-        (void)handle_mmio();
-        break;
-    case HVMIO_handle_pio_awaiting_completion:
-        if ( vio->io_size == 4 ) /* Needs zero extension. */
-            guest_cpu_user_regs()->rax = (uint32_t)p->data;
-        else
-            memcpy(&guest_cpu_user_regs()->rax, &p->data, vio->io_size);
-        break;
+    }
     default:
+        vio->io_state = HVMIO_none;
         break;
     }
 
diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index ee0ed82..5527988 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -93,7 +93,7 @@ int handle_mmio(void)
     if ( rc != X86EMUL_RETRY )
         vio->io_state = HVMIO_none;
     if ( vio->io_state == HVMIO_awaiting_completion )
-        vio->io_state = HVMIO_handle_mmio_awaiting_completion;
+        vio->io_completion = HVMIO_mmio_completion;
     else
         vio->mmio_access = (struct npfec){};
 
@@ -158,7 +158,7 @@ int handle_pio(uint16_t port, unsigned int size, int dir)
             return 0;
         /* Completion in hvm_io_assist() with no re-emulation required. */
         ASSERT(dir == IOREQ_READ);
-        vio->io_state = HVMIO_handle_pio_awaiting_completion;
+        vio->io_completion = HVMIO_pio_completion;
         break;
     default:
         gdprintk(XENLOG_ERR, "Weird HVM ioemulation status %d.\n", rc);
diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
index 97d78bd..87f9c8f 100644
--- a/xen/include/asm-x86/hvm/vcpu.h
+++ b/xen/include/asm-x86/hvm/vcpu.h
@@ -34,11 +34,15 @@ enum hvm_io_state {
     HVMIO_none = 0,
     HVMIO_dispatched,
     HVMIO_awaiting_completion,
-    HVMIO_handle_mmio_awaiting_completion,
-    HVMIO_handle_pio_awaiting_completion,
     HVMIO_completed
 };
 
+enum hvm_io_completion {
+    HVMIO_no_completion = 0,
+    HVMIO_mmio_completion,
+    HVMIO_pio_completion
+};
+
 struct hvm_vcpu_asid {
     uint64_t generation;
     uint32_t asid;
@@ -46,9 +50,10 @@ struct hvm_vcpu_asid {
 
 struct hvm_vcpu_io {
     /* I/O request in flight to device model. */
-    enum hvm_io_state   io_state;
-    unsigned long       io_data;
-    int                 io_size;
+    enum hvm_io_state      io_state;
+    unsigned long          io_data;
+    int                    io_size;
+    enum hvm_io_completion io_completion;
 
     /*
      * HVM emulation:
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 09/17] x86/hvm: remove hvm_io_pending() check in hvmemul_do_io()
  2015-06-11 15:42 [PATCH v2 00/17] x86/hvm: I/O emulation cleanup and fix Paul Durrant
                   ` (7 preceding siblings ...)
  2015-06-11 15:42 ` [PATCH v2 08/17] x86/hvm: split I/O completion handling from state model Paul Durrant
@ 2015-06-11 15:42 ` Paul Durrant
  2015-06-11 15:42 ` [PATCH v2 10/17] x86/hvm: remove HVMIO_dispatched I/O state Paul Durrant
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 38+ messages in thread
From: Paul Durrant @ 2015-06-11 15:42 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Paul Durrant, Keir Fraser, Jan Beulich

The check is done at the wrong point (since it is irrelevant if the
I/O is to be handled by the hypervisor) and its functionality can be
covered by returning X86EMUL_UNHANDLEABLE from hvm_send_assist_req()
instead.

This patch also removes the domain_crash() call from
hvm_send_assist_req(). Returning X86EMUL_UNHANDLEABLE allows the
higher layers of emulation to decide what to do instead.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/hvm/emulate.c    |   10 ++--------
 xen/arch/x86/hvm/hvm.c        |   16 ++++++----------
 xen/include/asm-x86/hvm/hvm.h |    2 +-
 3 files changed, 9 insertions(+), 19 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 4b3581a..b9f948d 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -150,12 +150,6 @@ static int hvmemul_do_io(
         return X86EMUL_UNHANDLEABLE;
     }
 
-    if ( hvm_io_pending(curr) )
-    {
-        gdprintk(XENLOG_WARNING, "WARNING: io already pending?\n");
-        return X86EMUL_UNHANDLEABLE;
-    }
-
     vio->io_state = (data_is_addr || dir == IOREQ_WRITE) ?
         HVMIO_dispatched : HVMIO_awaiting_completion;
     vio->io_size = size;
@@ -191,8 +185,8 @@ static int hvmemul_do_io(
         }
         else
         {
-            rc = X86EMUL_RETRY;
-            if ( !hvm_send_assist_req(s, &p) )
+            rc = hvm_send_assist_req(s, &p);
+            if ( rc != X86EMUL_RETRY )
                 vio->io_state = HVMIO_none;
             else if ( data_is_addr || dir == IOREQ_WRITE )
                 rc = X86EMUL_OKAY;
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 916459a..a74410d 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -2663,7 +2663,7 @@ int hvm_buffered_io_send(ioreq_t *p)
     return 1;
 }
 
-bool_t hvm_send_assist_req(struct hvm_ioreq_server *s, ioreq_t *proto_p)
+int hvm_send_assist_req(struct hvm_ioreq_server *s, ioreq_t *proto_p)
 {
     struct vcpu *curr = current;
     struct domain *d = curr->domain;
@@ -2671,7 +2671,7 @@ bool_t hvm_send_assist_req(struct hvm_ioreq_server *s, ioreq_t *proto_p)
 
     ASSERT(s);
     if ( unlikely(!vcpu_start_shutdown_deferral(curr)) )
-        return 0; /* implicitly bins the i/o operation */
+        return X86EMUL_OKAY;
 
     list_for_each_entry ( sv,
                           &s->ioreq_vcpu_list,
@@ -2686,14 +2686,14 @@ bool_t hvm_send_assist_req(struct hvm_ioreq_server *s, ioreq_t *proto_p)
             {
                 gprintk(XENLOG_ERR, "device model set bad IO state %d\n",
                         p->state);
-                goto crash;
+                break;
             }
 
             if ( unlikely(p->vp_eport != port) )
             {
                 gprintk(XENLOG_ERR, "device model set bad event channel %d\n",
                         p->vp_eport);
-                goto crash;
+                break;
             }
 
             proto_p->state = STATE_IOREQ_NONE;
@@ -2709,15 +2709,11 @@ bool_t hvm_send_assist_req(struct hvm_ioreq_server *s, ioreq_t *proto_p)
              */
             p->state = STATE_IOREQ_READY;
             notify_via_xen_event_channel(d, port);
-            break;
+            return X86EMUL_RETRY;
         }
     }
 
-    return 1;
-
- crash:
-    domain_crash(d);
-    return 0;
+    return X86EMUL_UNHANDLEABLE;
 }
 
 void hvm_broadcast_assist_req(ioreq_t *p)
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index 77eeac5..57f9605 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -230,7 +230,7 @@ void hvm_vcpu_reset_state(struct vcpu *v, uint16_t cs, uint16_t ip);
 
 struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
                                                  ioreq_t *p);
-bool_t hvm_send_assist_req(struct hvm_ioreq_server *s, ioreq_t *p);
+int hvm_send_assist_req(struct hvm_ioreq_server *s, ioreq_t *p);
 void hvm_broadcast_assist_req(ioreq_t *p);
 void hvm_complete_assist_req(ioreq_t *p);
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 10/17] x86/hvm: remove HVMIO_dispatched I/O state
  2015-06-11 15:42 [PATCH v2 00/17] x86/hvm: I/O emulation cleanup and fix Paul Durrant
                   ` (8 preceding siblings ...)
  2015-06-11 15:42 ` [PATCH v2 09/17] x86/hvm: remove hvm_io_pending() check in hvmemul_do_io() Paul Durrant
@ 2015-06-11 15:42 ` Paul Durrant
  2015-06-11 15:42 ` [PATCH v2 11/17] x86/hvm: remove hvm_io_state enumeration Paul Durrant
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 38+ messages in thread
From: Paul Durrant @ 2015-06-11 15:42 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Paul Durrant, Keir Fraser, Jan Beulich

By removing the HVMIO_dispatched state and making all pending emulations
(i.e. all those not handled by the hypervisor) use HVMIO_awating_completion,
various code-paths can be simplified.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/hvm/emulate.c     |   15 ++-------------
 xen/arch/x86/hvm/hvm.c         |   40 +++++++++++++++++++---------------------
 xen/arch/x86/hvm/io.c          |   20 +++++++++-----------
 xen/include/asm-x86/hvm/vcpu.h |    2 +-
 4 files changed, 31 insertions(+), 46 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index b9f948d..0340653 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -136,23 +136,14 @@ static int hvmemul_do_io(
         break;
     case HVMIO_completed:
         vio->io_state = HVMIO_none;
-        if ( data_is_addr || dir == IOREQ_WRITE )
-            return X86EMUL_UNHANDLEABLE;
         goto finish_access;
-    case HVMIO_dispatched:
-        /* May have to wait for previous cycle of a multi-write to complete. */
-        if ( is_mmio && !data_is_addr && (dir == IOREQ_WRITE) &&
-             (addr == (vio->mmio_large_write_pa +
-                       vio->mmio_large_write_bytes)) )
-            return X86EMUL_RETRY;
-        /* fallthrough */
     default:
         return X86EMUL_UNHANDLEABLE;
     }
 
-    vio->io_state = (data_is_addr || dir == IOREQ_WRITE) ?
-        HVMIO_dispatched : HVMIO_awaiting_completion;
+    vio->io_state = HVMIO_awaiting_completion;
     vio->io_size = size;
+    vio->io_dir = dir;
 
     if ( dir == IOREQ_WRITE )
     {
@@ -188,8 +179,6 @@ static int hvmemul_do_io(
             rc = hvm_send_assist_req(s, &p);
             if ( rc != X86EMUL_RETRY )
                 vio->io_state = HVMIO_none;
-            else if ( data_is_addr || dir == IOREQ_WRITE )
-                rc = X86EMUL_OKAY;
         }
         break;
     }
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index a74410d..8f814ba 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -415,41 +415,39 @@ static void hvm_io_assist(ioreq_t *p)
 {
     struct vcpu *curr = current;
     struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
+    enum hvm_io_completion completion = vio->io_completion;
 
     p->state = STATE_IOREQ_NONE;
 
-    switch ( vio->io_state )
-    {
-    case HVMIO_awaiting_completion:
-    {
-        enum hvm_io_completion completion = vio->io_completion;
+    /*
+     * This function should never be called unless an I/O emulation
+     * is awating completion.
+     */
+    if (vio->io_state != HVMIO_awaiting_completion)
+        domain_crash(curr->domain);
 
-        vio->io_state = HVMIO_completed;
-        vio->io_data = p->data;
-        vio->io_completion = HVMIO_no_completion;
+    vio->io_state = HVMIO_completed;
+    vio->io_data = p->data;
+    vio->io_completion = HVMIO_no_completion;
 
-        switch ( completion )
-        {
-        case HVMIO_mmio_completion:
-            (void)handle_mmio();
-            break;
+    switch ( completion )
+    {
+    case HVMIO_mmio_completion:
+        (void)handle_mmio();
+        break;
 
-        case HVMIO_pio_completion:
+    case HVMIO_pio_completion:
+        if ( vio->io_dir == IOREQ_READ )
+        {
             if ( vio->io_size == 4 ) /* Needs zero extension. */
                 guest_cpu_user_regs()->rax = (uint32_t)p->data;
             else
                 memcpy(&guest_cpu_user_regs()->rax, &p->data, vio->io_size);
-
-            vio->io_state = HVMIO_none;
-            break;
-        default:
-            break;
         }
 
+        vio->io_state = HVMIO_none;
         break;
-    }
     default:
-        vio->io_state = HVMIO_none;
         break;
     }
 
diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index 5527988..21516d5 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -90,19 +90,21 @@ int handle_mmio(void)
 
     rc = hvm_emulate_one(&ctxt);
 
-    if ( rc != X86EMUL_RETRY )
-        vio->io_state = HVMIO_none;
-    if ( vio->io_state == HVMIO_awaiting_completion )
-        vio->io_completion = HVMIO_mmio_completion;
-    else
-        vio->mmio_access = (struct npfec){};
-
     switch ( rc )
     {
+    case X86EMUL_OKAY:
+        vio->mmio_access = (struct npfec){};
+        break;
+    case X86EMUL_RETRY:
+        vio->io_completion = HVMIO_mmio_completion;
+        break;
     case X86EMUL_UNHANDLEABLE:
+        vio->mmio_access = (struct npfec){};
         hvm_dump_emulation_state(XENLOG_G_WARNING "MMIO", &ctxt);
         return 0;
     case X86EMUL_EXCEPTION:
+        vio->io_state = HVMIO_none;
+        vio->mmio_access = (struct npfec){};
         if ( ctxt.exn_pending )
             hvm_inject_trap(&ctxt.trap);
         break;
@@ -154,10 +156,6 @@ int handle_pio(uint16_t port, unsigned int size, int dir)
         }
         break;
     case X86EMUL_RETRY:
-        if ( vio->io_state != HVMIO_awaiting_completion )
-            return 0;
-        /* Completion in hvm_io_assist() with no re-emulation required. */
-        ASSERT(dir == IOREQ_READ);
         vio->io_completion = HVMIO_pio_completion;
         break;
     default:
diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
index 87f9c8f..f3c58a2 100644
--- a/xen/include/asm-x86/hvm/vcpu.h
+++ b/xen/include/asm-x86/hvm/vcpu.h
@@ -32,7 +32,6 @@
 
 enum hvm_io_state {
     HVMIO_none = 0,
-    HVMIO_dispatched,
     HVMIO_awaiting_completion,
     HVMIO_completed
 };
@@ -53,6 +52,7 @@ struct hvm_vcpu_io {
     enum hvm_io_state      io_state;
     unsigned long          io_data;
     int                    io_size;
+    int                    io_dir;
     enum hvm_io_completion io_completion;
 
     /*
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 11/17] x86/hvm: remove hvm_io_state enumeration
  2015-06-11 15:42 [PATCH v2 00/17] x86/hvm: I/O emulation cleanup and fix Paul Durrant
                   ` (9 preceding siblings ...)
  2015-06-11 15:42 ` [PATCH v2 10/17] x86/hvm: remove HVMIO_dispatched I/O state Paul Durrant
@ 2015-06-11 15:42 ` Paul Durrant
  2015-06-11 15:42 ` [PATCH v2 12/17] x86/hvm: use ioreq_t to track in-flight state Paul Durrant
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 38+ messages in thread
From: Paul Durrant @ 2015-06-11 15:42 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Paul Durrant, Keir Fraser, Jan Beulich

Emulation request status is already covered by STATE_IOREQ_XXX values so
just use those. The mapping is:

HVMIO_none                -> STATE_IOREQ_NONE
HVMIO_awaiting_completion -> STATE_IOREQ_READY
HVMIO_completed           -> STATE_IORESP_READY

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/hvm/emulate.c       |   14 +++++++-------
 xen/arch/x86/hvm/hvm.c           |    8 ++++----
 xen/arch/x86/hvm/io.c            |    2 +-
 xen/arch/x86/hvm/svm/nestedsvm.c |    2 +-
 xen/arch/x86/hvm/vmx/realmode.c  |    6 +++---
 xen/include/asm-x86/hvm/vcpu.h   |    8 +-------
 6 files changed, 17 insertions(+), 23 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 0340653..1c34288 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -132,16 +132,16 @@ static int hvmemul_do_io(
 
     switch ( vio->io_state )
     {
-    case HVMIO_none:
+    case STATE_IOREQ_NONE:
         break;
-    case HVMIO_completed:
-        vio->io_state = HVMIO_none;
+    case STATE_IORESP_READY:
+        vio->io_state = STATE_IOREQ_NONE;
         goto finish_access;
     default:
         return X86EMUL_UNHANDLEABLE;
     }
 
-    vio->io_state = HVMIO_awaiting_completion;
+    vio->io_state = STATE_IOREQ_READY;
     vio->io_size = size;
     vio->io_dir = dir;
 
@@ -159,7 +159,7 @@ static int hvmemul_do_io(
     {
     case X86EMUL_OKAY:
         vio->io_data = p.data;
-        vio->io_state = HVMIO_none;
+        vio->io_state = STATE_IOREQ_NONE;
         break;
     case X86EMUL_UNHANDLEABLE:
     {
@@ -172,13 +172,13 @@ static int hvmemul_do_io(
             rc = process_io_intercept(curr, &p, &null_handler);
             if ( rc == X86EMUL_OKAY )
                 vio->io_data = p.data;
-            vio->io_state = HVMIO_none;
+            vio->io_state = STATE_IOREQ_NONE;
         }
         else
         {
             rc = hvm_send_assist_req(s, &p);
             if ( rc != X86EMUL_RETRY )
-                vio->io_state = HVMIO_none;
+                vio->io_state = STATE_IOREQ_NONE;
         }
         break;
     }
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 8f814ba..85944f6 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -423,10 +423,10 @@ static void hvm_io_assist(ioreq_t *p)
      * This function should never be called unless an I/O emulation
      * is awating completion.
      */
-    if (vio->io_state != HVMIO_awaiting_completion)
+    if (vio->io_state != STATE_IOREQ_READY)
         domain_crash(curr->domain);
 
-    vio->io_state = HVMIO_completed;
+    vio->io_state = STATE_IORESP_READY;
     vio->io_data = p->data;
     vio->io_completion = HVMIO_no_completion;
 
@@ -445,7 +445,7 @@ static void hvm_io_assist(ioreq_t *p)
                 memcpy(&guest_cpu_user_regs()->rax, &p->data, vio->io_size);
         }
 
-        vio->io_state = HVMIO_none;
+        vio->io_state = STATE_IOREQ_NONE;
         break;
     default:
         break;
@@ -455,7 +455,7 @@ static void hvm_io_assist(ioreq_t *p)
      * Re-emulation may have scheduled another I/O so io_state set
      * at the top of the function may have changed.
      */
-    if ( vio->io_state == HVMIO_none )
+    if ( vio->io_state == STATE_IOREQ_NONE )
     {
         msix_write_completion(curr);
         vcpu_end_shutdown_deferral(curr);
diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index 21516d5..b09b369 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -103,7 +103,7 @@ int handle_mmio(void)
         hvm_dump_emulation_state(XENLOG_G_WARNING "MMIO", &ctxt);
         return 0;
     case X86EMUL_EXCEPTION:
-        vio->io_state = HVMIO_none;
+        vio->io_state = STATE_IOREQ_NONE;
         vio->mmio_access = (struct npfec){};
         if ( ctxt.exn_pending )
             hvm_inject_trap(&ctxt.trap);
diff --git a/xen/arch/x86/hvm/svm/nestedsvm.c b/xen/arch/x86/hvm/svm/nestedsvm.c
index be5797a..8b165c6 100644
--- a/xen/arch/x86/hvm/svm/nestedsvm.c
+++ b/xen/arch/x86/hvm/svm/nestedsvm.c
@@ -1231,7 +1231,7 @@ enum hvm_intblk nsvm_intr_blocked(struct vcpu *v)
          * Delay the injection because this would result in delivering
          * an interrupt *within* the execution of an instruction.
          */
-        if ( v->arch.hvm_vcpu.hvm_io.io_state != HVMIO_none )
+        if ( v->arch.hvm_vcpu.hvm_io.io_state != STATE_IOREQ_NONE )
             return hvm_intblk_shadow;
 
         if ( !nv->nv_vmexit_pending && n2vmcb->exitintinfo.bytes != 0 ) {
diff --git a/xen/arch/x86/hvm/vmx/realmode.c b/xen/arch/x86/hvm/vmx/realmode.c
index fe8b4a0..8c2da9a 100644
--- a/xen/arch/x86/hvm/vmx/realmode.c
+++ b/xen/arch/x86/hvm/vmx/realmode.c
@@ -177,7 +177,7 @@ void vmx_realmode(struct cpu_user_regs *regs)
 
     hvm_emulate_prepare(&hvmemul_ctxt, regs);
 
-    if ( vio->io_state == HVMIO_completed )
+    if ( vio->io_state == STATE_IORESP_READY )
         realmode_emulate_one(&hvmemul_ctxt);
 
     /* Only deliver interrupts into emulated real mode. */
@@ -191,7 +191,7 @@ void vmx_realmode(struct cpu_user_regs *regs)
     curr->arch.hvm_vmx.vmx_emulate = 1;
     while ( curr->arch.hvm_vmx.vmx_emulate &&
             !softirq_pending(smp_processor_id()) &&
-            (vio->io_state == HVMIO_none) )
+            (vio->io_state == STATE_IOREQ_NONE) )
     {
         /*
          * Check for pending interrupts only every 16 instructions, because
@@ -216,7 +216,7 @@ void vmx_realmode(struct cpu_user_regs *regs)
     }
 
     /* Need to emulate next time if we've started an IO operation */
-    if ( vio->io_state != HVMIO_none )
+    if ( vio->io_state != STATE_IOREQ_NONE )
         curr->arch.hvm_vmx.vmx_emulate = 1;
 
     if ( !curr->arch.hvm_vmx.vmx_emulate && !curr->arch.hvm_vmx.vmx_realmode )
diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
index f3c58a2..1c2ec27 100644
--- a/xen/include/asm-x86/hvm/vcpu.h
+++ b/xen/include/asm-x86/hvm/vcpu.h
@@ -30,12 +30,6 @@
 #include <asm/hvm/svm/nestedsvm.h>
 #include <asm/mtrr.h>
 
-enum hvm_io_state {
-    HVMIO_none = 0,
-    HVMIO_awaiting_completion,
-    HVMIO_completed
-};
-
 enum hvm_io_completion {
     HVMIO_no_completion = 0,
     HVMIO_mmio_completion,
@@ -49,7 +43,7 @@ struct hvm_vcpu_asid {
 
 struct hvm_vcpu_io {
     /* I/O request in flight to device model. */
-    enum hvm_io_state      io_state;
+    uint8_t                io_state;
     unsigned long          io_data;
     int                    io_size;
     int                    io_dir;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 12/17] x86/hvm: use ioreq_t to track in-flight state
  2015-06-11 15:42 [PATCH v2 00/17] x86/hvm: I/O emulation cleanup and fix Paul Durrant
                   ` (10 preceding siblings ...)
  2015-06-11 15:42 ` [PATCH v2 11/17] x86/hvm: remove hvm_io_state enumeration Paul Durrant
@ 2015-06-11 15:42 ` Paul Durrant
  2015-06-11 15:42 ` [PATCH v2 13/17] x86/hvm: only acquire RAM pages for emulation when we need to Paul Durrant
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 38+ messages in thread
From: Paul Durrant @ 2015-06-11 15:42 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Paul Durrant, Keir Fraser, Jan Beulich

Use an ioreq_t rather than open coded state, size, dir and data fields
in struct hvm_vcpu_io. This also allows PIO completion to be handled
similarly to MMIO completion by re-issuing the handle_pio() call.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/hvm/emulate.c       |  149 +++++++++++++++++++++-----------------
 xen/arch/x86/hvm/hvm.c           |   19 ++---
 xen/arch/x86/hvm/io.c            |    2 +-
 xen/arch/x86/hvm/svm/nestedsvm.c |    2 +-
 xen/arch/x86/hvm/vmx/realmode.c  |    6 +-
 xen/include/asm-x86/hvm/vcpu.h   |    5 +-
 6 files changed, 93 insertions(+), 90 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 1c34288..8dd02af 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -93,6 +93,7 @@ static int hvmemul_do_io(
         .df = df,
         .data = data,
         .data_is_ptr = data_is_addr, /* ioreq_t field name is misleading */
+        .state = STATE_IOREQ_READY,
     };
     void *p_data = (void *)data;
     int rc;
@@ -130,21 +131,79 @@ static int hvmemul_do_io(
         }
     }
 
-    switch ( vio->io_state )
+    switch ( vio->io_req.state )
     {
     case STATE_IOREQ_NONE:
+        vio->io_req = p;
         break;
     case STATE_IORESP_READY:
-        vio->io_state = STATE_IOREQ_NONE;
-        goto finish_access;
+        p = vio->io_req;
+
+        if ( (p.type != is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO) ||
+             (p.addr != addr) ||
+             (p.size != size) ||
+             (p.count != *reps) ||
+             (p.dir != dir) ||
+             (p.df != df) ||
+             (p.data_is_ptr != data_is_addr) )
+        {
+            gdprintk(XENLOG_WARNING, "mismatched request\n");
+            domain_crash(curr->domain);
+
+            vio->io_req.state = STATE_IOREQ_NONE;
+            return X86EMUL_UNHANDLEABLE;
+        }                             
+
+ resp_ready:
+        vio->io_req.state = STATE_IOREQ_NONE;
+
+        if ( dir == IOREQ_READ )
+        {
+            hvmtrace_io_assist(is_mmio, &p);
+
+            if ( !data_is_addr )
+                memcpy(p_data, &p.data, size);
+        }
+
+        if ( is_mmio && !data_is_addr )
+        {
+            /* Part of a multi-cycle read or write? */
+            if ( dir == IOREQ_WRITE )
+            {
+                paddr_t pa = vio->mmio_large_write_pa;
+                unsigned int bytes = vio->mmio_large_write_bytes;
+                if ( bytes == 0 )
+                    pa = vio->mmio_large_write_pa = addr;
+                if ( addr == (pa + bytes) )
+                    vio->mmio_large_write_bytes += size;
+            }
+            else
+            {
+                paddr_t pa = vio->mmio_large_read_pa;
+                unsigned int bytes = vio->mmio_large_read_bytes;
+                if ( bytes == 0 )
+                    pa = vio->mmio_large_read_pa = addr;
+                if ( (addr == (pa + bytes)) &&
+                     ((bytes + size) <= sizeof(vio->mmio_large_read)) )
+                {
+                    memcpy(&vio->mmio_large_read[addr - pa], p_data,
+                           size);
+                    vio->mmio_large_read_bytes += size;
+                }
+            }
+        }
+
+        return X86EMUL_OKAY;
     default:
+        /*
+         * This function should never be called unless
+         * vio->io_req.state matches the above cases.
+         */
+        gdprintk(XENLOG_WARNING, "bad emulation state\n");
+        domain_crash(curr->domain);
         return X86EMUL_UNHANDLEABLE;
     }
 
-    vio->io_state = STATE_IOREQ_READY;
-    vio->io_size = size;
-    vio->io_dir = dir;
-
     if ( dir == IOREQ_WRITE )
     {
         if ( !data_is_addr )
@@ -155,77 +214,31 @@ static int hvmemul_do_io(
 
     rc = hvm_io_intercept(&p);
 
-    switch ( rc )
-    {
-    case X86EMUL_OKAY:
-        vio->io_data = p.data;
-        vio->io_state = STATE_IOREQ_NONE;
-        break;
-    case X86EMUL_UNHANDLEABLE:
+    if ( rc == X86EMUL_UNHANDLEABLE )
     {
         struct hvm_ioreq_server *s =
             hvm_select_ioreq_server(curr->domain, &p);
 
         /* If there is no suitable backing DM, just ignore accesses */
-        if ( !s )
-        {
-            rc = process_io_intercept(curr, &p, &null_handler);
-            if ( rc == X86EMUL_OKAY )
-                vio->io_data = p.data;
-            vio->io_state = STATE_IOREQ_NONE;
-        }
-        else
-        {
-            rc = hvm_send_assist_req(s, &p);
-            if ( rc != X86EMUL_RETRY )
-                vio->io_state = STATE_IOREQ_NONE;
-        }
-        break;
-    }
-    default:
-        BUG();
+        rc = !s ?
+            process_io_intercept(curr, &p, &null_handler) :
+            hvm_send_assist_req(s, &p);
     }
 
-    if ( rc != X86EMUL_OKAY )
-        return rc;
-
- finish_access:
-    if ( dir == IOREQ_READ )
-    {
-        hvmtrace_io_assist(is_mmio, &p);
-
-        if ( !data_is_addr )
-            memcpy(p_data, &vio->io_data, size);
-    }
-
-    if ( is_mmio && !data_is_addr )
+    switch ( rc )
     {
-        /* Part of a multi-cycle read or write? */
-        if ( dir == IOREQ_WRITE )
-        {
-            paddr_t pa = vio->mmio_large_write_pa;
-            unsigned int bytes = vio->mmio_large_write_bytes;
-            if ( bytes == 0 )
-                pa = vio->mmio_large_write_pa = addr;
-            if ( addr == (pa + bytes) )
-                vio->mmio_large_write_bytes += size;
-        }
-        else
-        {
-            paddr_t pa = vio->mmio_large_read_pa;
-            unsigned int bytes = vio->mmio_large_read_bytes;
-            if ( bytes == 0 )
-                pa = vio->mmio_large_read_pa = addr;
-            if ( (addr == (pa + bytes)) &&
-                 ((bytes + size) <= sizeof(vio->mmio_large_read)) )
-            {
-                memcpy(&vio->mmio_large_read[bytes], p_data, size);
-                vio->mmio_large_read_bytes += size;
-            }
-        }
+    case X86EMUL_OKAY:
+        goto resp_ready;
+    case X86EMUL_UNHANDLEABLE:
+        vio->io_req.state = STATE_IOREQ_NONE;
+        break;
+    case X86EMUL_RETRY:
+        break;
+    default:
+        BUG();
     }
 
-    return X86EMUL_OKAY;
+    return rc;
 }
 
 int hvmemul_do_io_buffer(
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 85944f6..48711ab 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -423,11 +423,11 @@ static void hvm_io_assist(ioreq_t *p)
      * This function should never be called unless an I/O emulation
      * is awating completion.
      */
-    if (vio->io_state != STATE_IOREQ_READY)
+    if (vio->io_req.state != STATE_IOREQ_READY)
         domain_crash(curr->domain);
 
-    vio->io_state = STATE_IORESP_READY;
-    vio->io_data = p->data;
+    vio->io_req.state = STATE_IORESP_READY;
+    vio->io_req.data = p->data;
     vio->io_completion = HVMIO_no_completion;
 
     switch ( completion )
@@ -437,15 +437,8 @@ static void hvm_io_assist(ioreq_t *p)
         break;
 
     case HVMIO_pio_completion:
-        if ( vio->io_dir == IOREQ_READ )
-        {
-            if ( vio->io_size == 4 ) /* Needs zero extension. */
-                guest_cpu_user_regs()->rax = (uint32_t)p->data;
-            else
-                memcpy(&guest_cpu_user_regs()->rax, &p->data, vio->io_size);
-        }
-
-        vio->io_state = STATE_IOREQ_NONE;
+        (void)handle_pio(vio->io_req.addr, vio->io_req.size,
+                         vio->io_req.dir);
         break;
     default:
         break;
@@ -455,7 +448,7 @@ static void hvm_io_assist(ioreq_t *p)
      * Re-emulation may have scheduled another I/O so io_state set
      * at the top of the function may have changed.
      */
-    if ( vio->io_state == STATE_IOREQ_NONE )
+    if ( vio->io_req.state == STATE_IOREQ_NONE )
     {
         msix_write_completion(curr);
         vcpu_end_shutdown_deferral(curr);
diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index b09b369..e31164e 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -103,7 +103,7 @@ int handle_mmio(void)
         hvm_dump_emulation_state(XENLOG_G_WARNING "MMIO", &ctxt);
         return 0;
     case X86EMUL_EXCEPTION:
-        vio->io_state = STATE_IOREQ_NONE;
+        vio->io_req.state = STATE_IOREQ_NONE;
         vio->mmio_access = (struct npfec){};
         if ( ctxt.exn_pending )
             hvm_inject_trap(&ctxt.trap);
diff --git a/xen/arch/x86/hvm/svm/nestedsvm.c b/xen/arch/x86/hvm/svm/nestedsvm.c
index 8b165c6..78667a2 100644
--- a/xen/arch/x86/hvm/svm/nestedsvm.c
+++ b/xen/arch/x86/hvm/svm/nestedsvm.c
@@ -1231,7 +1231,7 @@ enum hvm_intblk nsvm_intr_blocked(struct vcpu *v)
          * Delay the injection because this would result in delivering
          * an interrupt *within* the execution of an instruction.
          */
-        if ( v->arch.hvm_vcpu.hvm_io.io_state != STATE_IOREQ_NONE )
+        if ( v->arch.hvm_vcpu.hvm_io.io_req.state != STATE_IOREQ_NONE )
             return hvm_intblk_shadow;
 
         if ( !nv->nv_vmexit_pending && n2vmcb->exitintinfo.bytes != 0 ) {
diff --git a/xen/arch/x86/hvm/vmx/realmode.c b/xen/arch/x86/hvm/vmx/realmode.c
index 8c2da9a..69c0297 100644
--- a/xen/arch/x86/hvm/vmx/realmode.c
+++ b/xen/arch/x86/hvm/vmx/realmode.c
@@ -177,7 +177,7 @@ void vmx_realmode(struct cpu_user_regs *regs)
 
     hvm_emulate_prepare(&hvmemul_ctxt, regs);
 
-    if ( vio->io_state == STATE_IORESP_READY )
+    if ( vio->io_req.state == STATE_IORESP_READY )
         realmode_emulate_one(&hvmemul_ctxt);
 
     /* Only deliver interrupts into emulated real mode. */
@@ -191,7 +191,7 @@ void vmx_realmode(struct cpu_user_regs *regs)
     curr->arch.hvm_vmx.vmx_emulate = 1;
     while ( curr->arch.hvm_vmx.vmx_emulate &&
             !softirq_pending(smp_processor_id()) &&
-            (vio->io_state == STATE_IOREQ_NONE) )
+            (vio->io_req.state == STATE_IOREQ_NONE) )
     {
         /*
          * Check for pending interrupts only every 16 instructions, because
@@ -216,7 +216,7 @@ void vmx_realmode(struct cpu_user_regs *regs)
     }
 
     /* Need to emulate next time if we've started an IO operation */
-    if ( vio->io_state != STATE_IOREQ_NONE )
+    if ( vio->io_req.state != STATE_IOREQ_NONE )
         curr->arch.hvm_vmx.vmx_emulate = 1;
 
     if ( !curr->arch.hvm_vmx.vmx_emulate && !curr->arch.hvm_vmx.vmx_realmode )
diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
index 1c2ec27..46e89e4 100644
--- a/xen/include/asm-x86/hvm/vcpu.h
+++ b/xen/include/asm-x86/hvm/vcpu.h
@@ -43,10 +43,7 @@ struct hvm_vcpu_asid {
 
 struct hvm_vcpu_io {
     /* I/O request in flight to device model. */
-    uint8_t                io_state;
-    unsigned long          io_data;
-    int                    io_size;
-    int                    io_dir;
+    ioreq_t                io_req;
     enum hvm_io_completion io_completion;
 
     /*
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 13/17] x86/hvm: only acquire RAM pages for emulation when we need to
  2015-06-11 15:42 [PATCH v2 00/17] x86/hvm: I/O emulation cleanup and fix Paul Durrant
                   ` (11 preceding siblings ...)
  2015-06-11 15:42 ` [PATCH v2 12/17] x86/hvm: use ioreq_t to track in-flight state Paul Durrant
@ 2015-06-11 15:42 ` Paul Durrant
  2015-06-11 15:42 ` [PATCH v2 14/17] x86/hvm: remove extraneous parameter from hvmtrace_io_assist() Paul Durrant
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 38+ messages in thread
From: Paul Durrant @ 2015-06-11 15:42 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Paul Durrant, Keir Fraser, Jan Beulich

If hvmemul_do_io_addr() is called to complete a previously issued
emulation then there is no need to acquire the RAM pages again. There
is also no need to re-calculate the value of *reps, providing
hvmemul_do_io() updates it when returning X86EMUL_OKAY.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/hvm/emulate.c |   83 ++++++++++++++++++++++++--------------------
 1 file changed, 46 insertions(+), 37 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 8dd02af..016bc79 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -142,7 +142,6 @@ static int hvmemul_do_io(
         if ( (p.type != is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO) ||
              (p.addr != addr) ||
              (p.size != size) ||
-             (p.count != *reps) ||
              (p.dir != dir) ||
              (p.df != df) ||
              (p.data_is_ptr != data_is_addr) )
@@ -193,6 +192,7 @@ static int hvmemul_do_io(
             }
         }
 
+        *reps = p.count;
         return X86EMUL_OKAY;
     default:
         /*
@@ -303,60 +303,69 @@ int hvmemul_do_io_addr(
     unsigned int size, uint8_t dir, bool_t df, paddr_t ram_gpa)
 {
     struct vcpu *v = current;
+    struct hvm_vcpu_io *vio = &v->arch.hvm_vcpu.hvm_io;
     unsigned long ram_gmfn = paddr_to_pfn(ram_gpa);
     struct page_info *ram_page[2];
     int nr_pages = 0;
     unsigned long count;
     int rc;
 
+    switch ( vio->io_req.state )
+    {
+    case STATE_IOREQ_NONE:
+        rc = hvmemul_acquire_page(ram_gmfn, &ram_page[nr_pages]);
+        if ( rc != X86EMUL_OKAY )
+            goto out;
 
-    rc = hvmemul_acquire_page(ram_gmfn, &ram_page[nr_pages]);
-    if ( rc != X86EMUL_OKAY )
-        goto out;
+        nr_pages++;
 
-    nr_pages++;
+        /* Detemine how many reps will fit within this page */
+        for ( count = 0; count < *reps; count++ )
+        {
+            paddr_t start, end;
 
-    /* Detemine how many reps will fit within this page */
-    for ( count = 0; count < *reps; count++ )
-    {
-        paddr_t start, end;
+            if ( df )
+            {
+                start = ram_gpa - count * size;
+                end = ram_gpa + size - 1;
+            }
+            else
+            {
+                start = ram_gpa;
+                end = ram_gpa + (count + 1) * size - 1;
+            }
 
-        if ( df )
-        {
-            start = ram_gpa - count * size;
-            end = ram_gpa + size - 1;
+            if ( paddr_to_pfn(start) != ram_gmfn ||
+                 paddr_to_pfn(end) != ram_gmfn )
+                break;
         }
-        else
+
+        if ( count == 0 )
         {
-            start = ram_gpa;
-            end = ram_gpa + (count + 1) * size - 1;
+            /*
+             * This access must span two pages, so grab a reference
+             * to the next page and do a single rep.
+             */
+            rc = hvmemul_acquire_page(df ? ram_gmfn - 1 : ram_gmfn + 1,
+                                      &ram_page[nr_pages]);
+            if ( rc != X86EMUL_OKAY )
+                goto out;
+
+            nr_pages++;
+            count = 1;
         }
-
-        if ( paddr_to_pfn(start) != ram_gmfn ||
-             paddr_to_pfn(end) != ram_gmfn )
-            break;
-    }
-
-    if ( count == 0 )
-    {
-        /*
-         * This access must span two pages, so grab a reference to
-         * the next page and do a single rep.
-         */
-        rc = hvmemul_acquire_page(df ? ram_gmfn - 1 : ram_gmfn + 1,
-                                  &ram_page[nr_pages]);
-        if ( rc != X86EMUL_OKAY )
-            goto out;
-
-        nr_pages++;
-        count = 1;
-    }
+        break;
+    case STATE_IORESP_READY:
+        break;
+    default:
+        return X86EMUL_UNHANDLEABLE;
+     }
 
     rc = hvmemul_do_io(is_mmio, addr, &count, size, dir, df, 1,
                        (uint64_t)ram_gpa);
     if ( rc == X86EMUL_OKAY )
     {
-        v->arch.hvm_vcpu.hvm_io.mmio_retry = (count < *reps);
+        vio->mmio_retry = (count < *reps);
         *reps = count;
     }
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 14/17] x86/hvm: remove extraneous parameter from hvmtrace_io_assist()
  2015-06-11 15:42 [PATCH v2 00/17] x86/hvm: I/O emulation cleanup and fix Paul Durrant
                   ` (12 preceding siblings ...)
  2015-06-11 15:42 ` [PATCH v2 13/17] x86/hvm: only acquire RAM pages for emulation when we need to Paul Durrant
@ 2015-06-11 15:42 ` Paul Durrant
  2015-06-11 15:42 ` [PATCH v2 15/17] x86/hvm: make sure translated MMIO reads or writes fall within a page Paul Durrant
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 38+ messages in thread
From: Paul Durrant @ 2015-06-11 15:42 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Paul Durrant, Keir Fraser, Jan Beulich

The is_mmio parameter can be inferred from the ioreq type.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/hvm/emulate.c |    7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 016bc79..1eab7bd 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -23,8 +23,9 @@
 #include <asm/hvm/support.h>
 #include <asm/hvm/svm/svm.h>
 
-static void hvmtrace_io_assist(int is_mmio, ioreq_t *p)
+static void hvmtrace_io_assist(ioreq_t *p)
 {
+    bool_t is_mmio = (p->type == IOREQ_TYPE_COPY);
     unsigned int size, event;
     unsigned char buffer[12];
 
@@ -158,7 +159,7 @@ static int hvmemul_do_io(
 
         if ( dir == IOREQ_READ )
         {
-            hvmtrace_io_assist(is_mmio, &p);
+            hvmtrace_io_assist(&p);
 
             if ( !data_is_addr )
                 memcpy(p_data, &p.data, size);
@@ -209,7 +210,7 @@ static int hvmemul_do_io(
         if ( !data_is_addr )
             memcpy(&p.data, p_data, size);
 
-        hvmtrace_io_assist(is_mmio, &p);
+        hvmtrace_io_assist(&p);
     }
 
     rc = hvm_io_intercept(&p);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 15/17] x86/hvm: make sure translated MMIO reads or writes fall within a page
  2015-06-11 15:42 [PATCH v2 00/17] x86/hvm: I/O emulation cleanup and fix Paul Durrant
                   ` (13 preceding siblings ...)
  2015-06-11 15:42 ` [PATCH v2 14/17] x86/hvm: remove extraneous parameter from hvmtrace_io_assist() Paul Durrant
@ 2015-06-11 15:42 ` Paul Durrant
  2015-06-11 15:43 ` [PATCH v2 16/17] x86/hvm: remove multiple open coded 'chunking' loops Paul Durrant
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 38+ messages in thread
From: Paul Durrant @ 2015-06-11 15:42 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Paul Durrant, Keir Fraser, Jan Beulich

...otherwise they will simply carry on to the next page using a normal
linear-to-phys translation.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/hvm/emulate.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 1eab7bd..0d748e7 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -665,7 +665,6 @@ static int __hvmemul_read(
                                         p_data);
             if ( rc != X86EMUL_OKAY || bytes == chunk )
                 return rc;
-            addr += chunk;
             off += chunk;
             gpa += chunk;
             p_data += chunk;
@@ -673,6 +672,8 @@ static int __hvmemul_read(
             if ( bytes < chunk )
                 chunk = bytes;
         }
+
+        return X86EMUL_UNHANDLEABLE;
     }
 
     if ( (seg != x86_seg_none) &&
@@ -809,7 +810,6 @@ static int hvmemul_write(
                                         p_data);
             if ( rc != X86EMUL_OKAY || bytes == chunk )
                 return rc;
-            addr += chunk;
             off += chunk;
             gpa += chunk;
             p_data += chunk;
@@ -817,6 +817,8 @@ static int hvmemul_write(
             if ( bytes < chunk )
                 chunk = bytes;
         }
+
+        return X86EMUL_UNHANDLEABLE;
     }
 
     if ( (seg != x86_seg_none) &&
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 16/17] x86/hvm: remove multiple open coded 'chunking' loops
  2015-06-11 15:42 [PATCH v2 00/17] x86/hvm: I/O emulation cleanup and fix Paul Durrant
                   ` (14 preceding siblings ...)
  2015-06-11 15:42 ` [PATCH v2 15/17] x86/hvm: make sure translated MMIO reads or writes fall within a page Paul Durrant
@ 2015-06-11 15:43 ` Paul Durrant
  2015-06-11 15:43 ` [PATCH v2 17/17] x86/hvm: track large memory mapped accesses by buffer offset Paul Durrant
  2015-06-12 10:44 ` [PATCH v2 00/17] x86/hvm: I/O emulation cleanup and fix Fabio Fantoni
  17 siblings, 0 replies; 38+ messages in thread
From: Paul Durrant @ 2015-06-11 15:43 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Paul Durrant, Keir Fraser, Jan Beulich

...in hvmemul_read/write()

Add hvmemul_phys_mmio_access() and hvmemul_linear_mmio_access() functions
to reduce code duplication.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/hvm/emulate.c |  244 +++++++++++++++++++++++++-------------------
 1 file changed, 140 insertions(+), 104 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 0d748e7..467f2da 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -619,6 +619,127 @@ static int hvmemul_virtual_to_linear(
     return X86EMUL_EXCEPTION;
 }
 
+static int hvmemul_phys_mmio_access(paddr_t gpa,
+                                    unsigned int size,
+                                    uint8_t dir,
+                                    uint8_t *buffer,
+                                    unsigned int *off)
+{
+    unsigned long one_rep = 1;
+    unsigned int chunk;
+    int rc = 0;
+
+    /* Accesses must fall within a page */
+    if ( (gpa & (PAGE_SIZE - 1)) + size > PAGE_SIZE )
+        return X86EMUL_UNHANDLEABLE;
+
+    /*
+     * hvmemul_do_io() cannot handle non-power-of-2 accesses or
+     * accesses larger than sizeof(long), so choose the highest power
+     * of 2 not exceeding sizeof(long) as the 'chunk' size.
+     */
+    chunk = 1 << (fls(size) - 1);
+    if ( chunk > sizeof (long) )
+        chunk = sizeof (long);
+
+    while ( size != 0 )
+    {
+        rc = hvmemul_do_mmio_buffer(gpa, &one_rep, chunk, dir, 0,
+                                    &buffer[*off]);
+        if ( rc != X86EMUL_OKAY )
+            break;
+
+        /* Advance to the next chunk */
+        gpa += chunk;
+        *off += chunk;
+        size -= chunk;
+
+        /*
+         * If the chunk now exceeds the remaining size, choose the next
+         * lowest power of 2 that will fit.
+         */
+        while ( chunk > size )
+            chunk >>= 1;
+    }
+
+    return rc;
+}
+
+static int hvmemul_phys_mmio_read(paddr_t gpa,
+                                  unsigned int size,
+                                  uint8_t *buffer,
+                                  unsigned int *off)
+{
+    return hvmemul_phys_mmio_access(gpa, size, IOREQ_READ, buffer,
+                                    off);
+}
+
+static int hvmemul_phys_mmio_write(paddr_t gpa,
+                                   unsigned int size,
+                                   uint8_t *buffer,
+                                   unsigned int *off)
+{
+    return hvmemul_phys_mmio_access(gpa, size, IOREQ_WRITE, buffer,
+                                    off);
+}
+
+static int hvmemul_linear_mmio_access(unsigned long mmio_addr,
+                                      unsigned int size,
+                                      uint8_t dir,
+                                      uint8_t *buffer,
+                                      uint32_t pfec,
+                                      struct hvm_emulate_ctxt *hvmemul_ctxt)
+{
+    unsigned long page_off = mmio_addr & (PAGE_SIZE - 1);
+    unsigned int chunk, buffer_off = 0;
+    paddr_t gpa;
+    unsigned long one_rep = 1;
+    int rc;
+
+    chunk = min_t(unsigned int, size, PAGE_SIZE - page_off);
+    rc = hvmemul_linear_to_phys(mmio_addr, &gpa, chunk,
+                                &one_rep, pfec, hvmemul_ctxt);
+    while ( rc == X86EMUL_OKAY )
+    {
+        rc = hvmemul_phys_mmio_access(gpa, chunk, dir, buffer,
+                                      &buffer_off);
+        if ( rc != X86EMUL_OKAY )
+            break;
+
+        mmio_addr += chunk;
+        size -= chunk;
+
+        if ( size == 0 )
+            break;
+
+        chunk = min_t(unsigned int, size, PAGE_SIZE);
+        rc = hvmemul_linear_to_phys(mmio_addr, &gpa, chunk,
+                                    &one_rep, pfec, hvmemul_ctxt);
+    }
+
+    return rc;
+}
+
+static int hvmemul_linear_mmio_read(unsigned long mmio_addr,
+                                    unsigned int size,
+                                    uint8_t *buffer,
+                                    uint32_t pfec,
+                                    struct hvm_emulate_ctxt *hvmemul_ctxt)
+{
+    return hvmemul_linear_mmio_access(mmio_addr, size, IOREQ_READ, buffer,
+                                      pfec, hvmemul_ctxt);
+}
+
+static int hvmemul_linear_mmio_write(unsigned long mmio_addr,
+                                     unsigned int size,
+                                     uint8_t *buffer,
+                                     uint32_t pfec,
+                                     struct hvm_emulate_ctxt *hvmemul_ctxt)
+{
+    return hvmemul_linear_mmio_access(mmio_addr, size, IOREQ_WRITE, buffer,
+                                      pfec, hvmemul_ctxt);
+}
+
 static int __hvmemul_read(
     enum x86_segment seg,
     unsigned long offset,
@@ -628,52 +749,28 @@ static int __hvmemul_read(
     struct hvm_emulate_ctxt *hvmemul_ctxt)
 {
     struct vcpu *curr = current;
-    unsigned long addr, reps = 1;
-    unsigned int off, chunk = min(bytes, 1U << LONG_BYTEORDER);
+    unsigned long addr, one_rep = 1;
     uint32_t pfec = PFEC_page_present;
     struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
     paddr_t gpa;
     int rc;
 
     rc = hvmemul_virtual_to_linear(
-        seg, offset, bytes, &reps, access_type, hvmemul_ctxt, &addr);
+        seg, offset, bytes, &one_rep, access_type, hvmemul_ctxt, &addr);
     if ( rc != X86EMUL_OKAY )
         return rc;
-    off = addr & (PAGE_SIZE - 1);
-    /*
-     * We only need to handle sizes actual instruction operands can have. All
-     * such sizes are either powers of 2 or the sum of two powers of 2. Thus
-     * picking as initial chunk size the largest power of 2 not greater than
-     * the total size will always result in only power-of-2 size requests
-     * issued to hvmemul_do_mmio() (hvmemul_do_io() rejects non-powers-of-2).
-     */
-    while ( chunk & (chunk - 1) )
-        chunk &= chunk - 1;
-    if ( off + bytes > PAGE_SIZE )
-        while ( off & (chunk - 1) )
-            chunk >>= 1;
 
     if ( ((access_type != hvm_access_insn_fetch
            ? vio->mmio_access.read_access
            : vio->mmio_access.insn_fetch)) &&
          (vio->mmio_gva == (addr & PAGE_MASK)) )
     {
-        gpa = (((paddr_t)vio->mmio_gpfn << PAGE_SHIFT) | off);
-        while ( (off + chunk) <= PAGE_SIZE )
-        {
-            rc = hvmemul_do_mmio_buffer(gpa, &reps, chunk, IOREQ_READ, 0,
-                                        p_data);
-            if ( rc != X86EMUL_OKAY || bytes == chunk )
-                return rc;
-            off += chunk;
-            gpa += chunk;
-            p_data += chunk;
-            bytes -= chunk;
-            if ( bytes < chunk )
-                chunk = bytes;
-        }
+        unsigned int off = 0;
 
-        return X86EMUL_UNHANDLEABLE;
+        gpa = (((paddr_t)vio->mmio_gpfn << PAGE_SHIFT) |
+               (addr & (PAGE_SIZE - 1)));
+
+        return hvmemul_phys_mmio_read(gpa, bytes, p_data, &off);
     }
 
     if ( (seg != x86_seg_none) &&
@@ -693,30 +790,9 @@ static int __hvmemul_read(
     case HVMCOPY_bad_gfn_to_mfn:
         if ( access_type == hvm_access_insn_fetch )
             return X86EMUL_UNHANDLEABLE;
-        rc = hvmemul_linear_to_phys(addr, &gpa, chunk, &reps, pfec,
-                                    hvmemul_ctxt);
-        while ( rc == X86EMUL_OKAY )
-        {
-            rc = hvmemul_do_mmio_buffer(gpa, &reps, chunk, IOREQ_READ, 0,
-                                        p_data);
-            if ( rc != X86EMUL_OKAY || bytes == chunk )
-                break;
-            addr += chunk;
-            off += chunk;
-            p_data += chunk;
-            bytes -= chunk;
-            if ( bytes < chunk )
-                chunk = bytes;
-            if ( off < PAGE_SIZE )
-                gpa += chunk;
-            else
-            {
-                rc = hvmemul_linear_to_phys(addr, &gpa, chunk, &reps, pfec,
-                                            hvmemul_ctxt);
-                off = 0;
-            }
-        }
-        return rc;
+
+        return hvmemul_linear_mmio_read(addr, bytes, p_data,
+                                        pfec, hvmemul_ctxt);
     case HVMCOPY_gfn_paged_out:
     case HVMCOPY_gfn_shared:
         return X86EMUL_RETRY;
@@ -781,44 +857,26 @@ static int hvmemul_write(
     struct hvm_emulate_ctxt *hvmemul_ctxt =
         container_of(ctxt, struct hvm_emulate_ctxt, ctxt);
     struct vcpu *curr = current;
-    unsigned long addr, reps = 1;
-    unsigned int off, chunk = min(bytes, 1U << LONG_BYTEORDER);
+    unsigned long addr, one_rep = 1;
     uint32_t pfec = PFEC_page_present | PFEC_write_access;
     struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
     paddr_t gpa;
     int rc;
 
     rc = hvmemul_virtual_to_linear(
-        seg, offset, bytes, &reps, hvm_access_write, hvmemul_ctxt, &addr);
+        seg, offset, bytes, &one_rep, hvm_access_write, hvmemul_ctxt, &addr);
     if ( rc != X86EMUL_OKAY )
         return rc;
-    off = addr & (PAGE_SIZE - 1);
-    /* See the respective comment in __hvmemul_read(). */
-    while ( chunk & (chunk - 1) )
-        chunk &= chunk - 1;
-    if ( off + bytes > PAGE_SIZE )
-        while ( off & (chunk - 1) )
-            chunk >>= 1;
 
     if ( vio->mmio_access.write_access &&
          (vio->mmio_gva == (addr & PAGE_MASK)) )
     {
-        gpa = (((paddr_t)vio->mmio_gpfn << PAGE_SHIFT) | off);
-        while ( (off + chunk) <= PAGE_SIZE )
-        {
-            rc = hvmemul_do_mmio_buffer(gpa, &reps, chunk, IOREQ_WRITE, 0,
-                                        p_data);
-            if ( rc != X86EMUL_OKAY || bytes == chunk )
-                return rc;
-            off += chunk;
-            gpa += chunk;
-            p_data += chunk;
-            bytes -= chunk;
-            if ( bytes < chunk )
-                chunk = bytes;
-        }
+        unsigned int off = 0;
 
-        return X86EMUL_UNHANDLEABLE;
+        gpa = (((paddr_t)vio->mmio_gpfn << PAGE_SHIFT) |
+               (addr & (PAGE_SIZE - 1)));
+
+        return hvmemul_phys_mmio_write(gpa, bytes, p_data, &off);
     }
 
     if ( (seg != x86_seg_none) &&
@@ -834,30 +892,8 @@ static int hvmemul_write(
     case HVMCOPY_bad_gva_to_gfn:
         return X86EMUL_EXCEPTION;
     case HVMCOPY_bad_gfn_to_mfn:
-        rc = hvmemul_linear_to_phys(addr, &gpa, chunk, &reps, pfec,
-                                    hvmemul_ctxt);
-        while ( rc == X86EMUL_OKAY )
-        {
-            rc = hvmemul_do_mmio_buffer(gpa, &reps, chunk, IOREQ_WRITE, 0,
-                                        p_data);
-            if ( rc != X86EMUL_OKAY || bytes == chunk )
-                break;
-            addr += chunk;
-            off += chunk;
-            p_data += chunk;
-            bytes -= chunk;
-            if ( bytes < chunk )
-                chunk = bytes;
-            if ( off < PAGE_SIZE )
-                gpa += chunk;
-            else
-            {
-                rc = hvmemul_linear_to_phys(addr, &gpa, chunk, &reps, pfec,
-                                            hvmemul_ctxt);
-                off = 0;
-            }
-        }
-        return rc;
+        return hvmemul_linear_mmio_write(addr, bytes, p_data,
+                                         pfec, hvmemul_ctxt);
     case HVMCOPY_gfn_paged_out:
     case HVMCOPY_gfn_shared:
         return X86EMUL_RETRY;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 17/17] x86/hvm: track large memory mapped accesses by buffer offset
  2015-06-11 15:42 [PATCH v2 00/17] x86/hvm: I/O emulation cleanup and fix Paul Durrant
                   ` (15 preceding siblings ...)
  2015-06-11 15:43 ` [PATCH v2 16/17] x86/hvm: remove multiple open coded 'chunking' loops Paul Durrant
@ 2015-06-11 15:43 ` Paul Durrant
  2015-06-12 10:44 ` [PATCH v2 00/17] x86/hvm: I/O emulation cleanup and fix Fabio Fantoni
  17 siblings, 0 replies; 38+ messages in thread
From: Paul Durrant @ 2015-06-11 15:43 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Paul Durrant, Keir Fraser, Jan Beulich

The code in hvmemul_do_io() that tracks large reads or writes, to avoid
re-issue of component I/O, is defeated by accesses across a page boundary
because it uses physical address. The code is also only relevant to memory
mapped I/O to or from a buffer.

This patch re-factors the code and moves it into hvmemul_phys_mmio_access()
where it is relevant and tracks using buffer offset rather then address.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/hvm/emulate.c     |   86 ++++++++++++++--------------------------
 xen/include/asm-x86/hvm/vcpu.h |   16 ++++----
 2 files changed, 39 insertions(+), 63 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 467f2da..867498f 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -109,29 +109,6 @@ static int hvmemul_do_io(
         return X86EMUL_UNHANDLEABLE;
     }
 
-    if ( is_mmio && !data_is_addr )
-    {
-        /* Part of a multi-cycle read or write? */
-        if ( dir == IOREQ_WRITE )
-        {
-            paddr_t pa = vio->mmio_large_write_pa;
-            unsigned int bytes = vio->mmio_large_write_bytes;
-            if ( (addr >= pa) && ((addr + size) <= (pa + bytes)) )
-                return X86EMUL_OKAY;
-        }
-        else
-        {
-            paddr_t pa = vio->mmio_large_read_pa;
-            unsigned int bytes = vio->mmio_large_read_bytes;
-            if ( (addr >= pa) && ((addr + size) <= (pa + bytes)) )
-            {
-                memcpy(p_data, &vio->mmio_large_read[addr - pa],
-                       size);
-                return X86EMUL_OKAY;
-            }
-        }
-    }
-
     switch ( vio->io_req.state )
     {
     case STATE_IOREQ_NONE:
@@ -165,34 +142,6 @@ static int hvmemul_do_io(
                 memcpy(p_data, &p.data, size);
         }
 
-        if ( is_mmio && !data_is_addr )
-        {
-            /* Part of a multi-cycle read or write? */
-            if ( dir == IOREQ_WRITE )
-            {
-                paddr_t pa = vio->mmio_large_write_pa;
-                unsigned int bytes = vio->mmio_large_write_bytes;
-                if ( bytes == 0 )
-                    pa = vio->mmio_large_write_pa = addr;
-                if ( addr == (pa + bytes) )
-                    vio->mmio_large_write_bytes += size;
-            }
-            else
-            {
-                paddr_t pa = vio->mmio_large_read_pa;
-                unsigned int bytes = vio->mmio_large_read_bytes;
-                if ( bytes == 0 )
-                    pa = vio->mmio_large_read_pa = addr;
-                if ( (addr == (pa + bytes)) &&
-                     ((bytes + size) <= sizeof(vio->mmio_large_read)) )
-                {
-                    memcpy(&vio->mmio_large_read[addr - pa], p_data,
-                           size);
-                    vio->mmio_large_read_bytes += size;
-                }
-            }
-        }
-
         *reps = p.count;
         return X86EMUL_OKAY;
     default:
@@ -625,6 +574,7 @@ static int hvmemul_phys_mmio_access(paddr_t gpa,
                                     uint8_t *buffer,
                                     unsigned int *off)
 {
+    struct hvm_vcpu_io *vio = &current->arch.hvm_vcpu.hvm_io;
     unsigned long one_rep = 1;
     unsigned int chunk;
     int rc = 0;
@@ -644,10 +594,34 @@ static int hvmemul_phys_mmio_access(paddr_t gpa,
 
     while ( size != 0 )
     {
-        rc = hvmemul_do_mmio_buffer(gpa, &one_rep, chunk, dir, 0,
-                                    &buffer[*off]);
-        if ( rc != X86EMUL_OKAY )
-            break;
+        /* Have we already done this chunk? */
+        if ( (*off + chunk) <= vio->mmio_cache[dir].size )
+        {
+            ASSERT(*off + chunk <= vio->mmio_cache[dir].size);
+
+            if ( dir == IOREQ_READ )
+                memcpy(&buffer[*off],
+                       &vio->mmio_cache[IOREQ_READ].buffer[*off],
+                       chunk);
+            else
+                ASSERT(memcmp(&buffer[*off],
+                              &vio->mmio_cache[IOREQ_WRITE].buffer[*off],
+                              chunk) == 0);
+        }
+        else
+        {
+            ASSERT(*off == vio->mmio_cache[dir].size);
+
+            rc = hvmemul_do_mmio_buffer(gpa, &one_rep, chunk, dir, 0,
+                                        &buffer[*off]);
+            if ( rc != X86EMUL_OKAY )
+                break;
+
+            /* Note that we have now done this chunk */
+            memcpy(&vio->mmio_cache[dir].buffer[*off],
+                   &buffer[*off], chunk);
+            vio->mmio_cache[dir].size += chunk;
+        }
 
         /* Advance to the next chunk */
         gpa += chunk;
@@ -1675,7 +1649,7 @@ static int _hvm_emulate_one(struct hvm_emulate_ctxt *hvmemul_ctxt,
         rc = X86EMUL_RETRY;
     if ( rc != X86EMUL_RETRY )
     {
-        vio->mmio_large_read_bytes = vio->mmio_large_write_bytes = 0;
+        memset(&vio->mmio_cache, 0, sizeof(vio->mmio_cache));
         vio->mmio_insn_bytes = 0;
     }
     else
diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
index 46e89e4..261b649 100644
--- a/xen/include/asm-x86/hvm/vcpu.h
+++ b/xen/include/asm-x86/hvm/vcpu.h
@@ -56,13 +56,15 @@ struct hvm_vcpu_io {
     unsigned long       mmio_gva;
     unsigned long       mmio_gpfn;
 
-    /* We may read up to m256 as a number of device-model transactions. */
-    paddr_t mmio_large_read_pa;
-    uint8_t mmio_large_read[32];
-    unsigned int mmio_large_read_bytes;
-    /* We may write up to m256 as a number of device-model transactions. */
-    unsigned int mmio_large_write_bytes;
-    paddr_t mmio_large_write_pa;
+    /*
+     * We may read or write up to m256 as a number of device-model
+     * transactions.
+     */
+    struct {
+        unsigned long size;
+        uint8_t buffer[32];
+    } mmio_cache[2]; /* Indexed by ioreq type */
+
     /* For retries we shouldn't re-fetch the instruction. */
     unsigned int mmio_insn_bytes;
     unsigned char mmio_insn[16];
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 00/17] x86/hvm: I/O emulation cleanup and fix
  2015-06-11 15:42 [PATCH v2 00/17] x86/hvm: I/O emulation cleanup and fix Paul Durrant
                   ` (16 preceding siblings ...)
  2015-06-11 15:43 ` [PATCH v2 17/17] x86/hvm: track large memory mapped accesses by buffer offset Paul Durrant
@ 2015-06-12 10:44 ` Fabio Fantoni
  2015-06-12 11:45   ` Paul Durrant
  17 siblings, 1 reply; 38+ messages in thread
From: Fabio Fantoni @ 2015-06-12 10:44 UTC (permalink / raw)
  To: Paul Durrant, xen-devel

Il 11/06/2015 17:42, Paul Durrant ha scritto:
> This patch series re-works much of the code involved in emulation of port
> and memory mapped I/O for HVM guests.
>
> The code has become very convoluted and, at least by inspection, certain
> emulations will apparently malfunction.
>
> The series is broken down into 17 patches (which are also available in
> my xenbits repo: http://xenbits.xen.org/gitweb/?p=people/pauldu/xen.git
> on the emulation22 branch) as follows:
>
> 0001-x86-hvm-simplify-hvmemul_do_io.patch
> 0002-x86-hvm-re-name-struct-hvm_mmio_handler-to-hvm_mmio_.patch
> 0003-x86-hvm-unify-internal-portio-and-mmio-intercepts.patch
> 0004-x86-hvm-unify-dpci-portio-intercept-with-standard-po.patch
> 0005-x86-hvm-unify-stdvga-mmio-intercept-with-standard-mm.patch
> 0006-x86-hvm-revert-82ed8716b-fix-direct-PCI-port-I-O-emu.patch
> 0007-x86-hvm-only-call-hvm_io_assist-from-hvm_wait_for_io.patch
> 0008-x86-hvm-split-I-O-completion-handling-from-state-mod.patch
> 0009-x86-hvm-remove-hvm_io_pending-check-in-hvmemul_do_io.patch
> 0010-x86-hvm-remove-HVMIO_dispatched-I-O-state.patch
> 0011-x86-hvm-remove-hvm_io_state-enumeration.patch
> 0012-x86-hvm-use-ioreq_t-to-track-in-flight-state.patch
> 0013-x86-hvm-only-acquire-RAM-pages-for-emulation-when-we.patch
> 0014-x86-hvm-remove-extraneous-parameter-from-hvmtrace_io.patch
> 0015-x86-hvm-make-sure-translated-MMIO-reads-or-writes-fa.patch
> 0016-x86-hvm-remove-multiple-open-coded-chunking-loops.patch
> 0017-x86-hvm-track-large-memory-mapped-accesses-by-buffer.patch
>
> v2:
>   - Removed bogus assertion from patch 16
>   - Re-worked patch 17 after basic testing of back-port onto XenServer

Thanks, I confirm that this new version solves the critical regression 
that causes dom0 insta-reboot.

I tested them on windows and linux hvm domUs, I had strange results...
On windows 7 64 bit sp1 domU (with new winpv drivers) seems there are a 
small performance regression (or probably "only" latency increased) but 
I not found warning/error or something userful in logs, I'm not sure is 
real regression of your patches, if needed I'll retry without.
On linux domU (fedora 21) qxl that had sse2 instructions problem still 
not working, same of latest 2 years not always qemu crash with a gdb 
showing sse2 instruction problem but xorg crash with EQ overflowing 
and/or domU remain with 100% cpu used by xorg qemu process at 100% cpu.
I posted the backtrace similar time ago and a qemu/spice developer told 
that was a conseguence of other problem if I remember good.
How can I be sure that sse2 or similar istructions (with videoram) are 
now working correctly?

If you need more informations/tests tell me and I'll post them.

Thanks for any reply and sorry for my bad english.

>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 00/17] x86/hvm: I/O emulation cleanup and fix
  2015-06-12 10:44 ` [PATCH v2 00/17] x86/hvm: I/O emulation cleanup and fix Fabio Fantoni
@ 2015-06-12 11:45   ` Paul Durrant
  0 siblings, 0 replies; 38+ messages in thread
From: Paul Durrant @ 2015-06-12 11:45 UTC (permalink / raw)
  To: Fabio Fantoni, xen-devel@lists.xenproject.org

> -----Original Message-----
> From: Fabio Fantoni [mailto:fabio.fantoni@m2r.biz]
> Sent: 12 June 2015 11:44
> To: Paul Durrant; xen-devel@lists.xenproject.org
> Subject: Re: [Xen-devel] [PATCH v2 00/17] x86/hvm: I/O emulation cleanup
> and fix
> 
> Il 11/06/2015 17:42, Paul Durrant ha scritto:
> > This patch series re-works much of the code involved in emulation of port
> > and memory mapped I/O for HVM guests.
> >
> > The code has become very convoluted and, at least by inspection, certain
> > emulations will apparently malfunction.
> >
> > The series is broken down into 17 patches (which are also available in
> > my xenbits repo:
> http://xenbits.xen.org/gitweb/?p=people/pauldu/xen.git
> > on the emulation22 branch) as follows:
> >
> > 0001-x86-hvm-simplify-hvmemul_do_io.patch
> > 0002-x86-hvm-re-name-struct-hvm_mmio_handler-to-hvm_mmio_.patch
> > 0003-x86-hvm-unify-internal-portio-and-mmio-intercepts.patch
> > 0004-x86-hvm-unify-dpci-portio-intercept-with-standard-po.patch
> > 0005-x86-hvm-unify-stdvga-mmio-intercept-with-standard-mm.patch
> > 0006-x86-hvm-revert-82ed8716b-fix-direct-PCI-port-I-O-emu.patch
> > 0007-x86-hvm-only-call-hvm_io_assist-from-hvm_wait_for_io.patch
> > 0008-x86-hvm-split-I-O-completion-handling-from-state-mod.patch
> > 0009-x86-hvm-remove-hvm_io_pending-check-in-hvmemul_do_io.patch
> > 0010-x86-hvm-remove-HVMIO_dispatched-I-O-state.patch
> > 0011-x86-hvm-remove-hvm_io_state-enumeration.patch
> > 0012-x86-hvm-use-ioreq_t-to-track-in-flight-state.patch
> > 0013-x86-hvm-only-acquire-RAM-pages-for-emulation-when-we.patch
> > 0014-x86-hvm-remove-extraneous-parameter-from-hvmtrace_io.patch
> > 0015-x86-hvm-make-sure-translated-MMIO-reads-or-writes-fa.patch
> > 0016-x86-hvm-remove-multiple-open-coded-chunking-loops.patch
> > 0017-x86-hvm-track-large-memory-mapped-accesses-by-buffer.patch
> >
> > v2:
> >   - Removed bogus assertion from patch 16
> >   - Re-worked patch 17 after basic testing of back-port onto XenServer
> 
> Thanks, I confirm that this new version solves the critical regression
> that causes dom0 insta-reboot.
> 
> I tested them on windows and linux hvm domUs, I had strange results...
> On windows 7 64 bit sp1 domU (with new winpv drivers) seems there are a
> small performance regression (or probably "only" latency increased) but
> I not found warning/error or something userful in logs, I'm not sure is
> real regression of your patches, if needed I'll retry without.
> On linux domU (fedora 21) qxl that had sse2 instructions problem still
> not working, same of latest 2 years not always qemu crash with a gdb
> showing sse2 instruction problem but xorg crash with EQ overflowing
> and/or domU remain with 100% cpu used by xorg qemu process at 100% cpu.
> I posted the backtrace similar time ago and a qemu/spice developer told
> that was a conseguence of other problem if I remember good.
> How can I be sure that sse2 or similar istructions (with videoram) are
> now working correctly?

Once I have debugged through XenServer automated testing I plan to add a targeted SSE test somewhere, possibly as a smoke test in hvmloader.

  Paul

> 
> If you need more informations/tests tell me and I'll post them.
> 
> Thanks for any reply and sorry for my bad english.
> 
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 06/17] x86/hvm: revert 82ed8716b "fix direct PCI port I/O emulation retry...
  2015-06-11 15:42 ` [PATCH v2 06/17] x86/hvm: revert 82ed8716b "fix direct PCI port I/O emulation retry Paul Durrant
@ 2015-06-17 10:54   ` Paul Durrant
  0 siblings, 0 replies; 38+ messages in thread
From: Paul Durrant @ 2015-06-17 10:54 UTC (permalink / raw)
  To: Paul Durrant, xen-devel@lists.xenproject.org
  Cc: Andrew Cooper, Keir (Xen.org), Jan Beulich

> -----Original Message-----
> From: Paul Durrant [mailto:paul.durrant@citrix.com]
> Sent: 11 June 2015 16:43
> To: xen-devel@lists.xenproject.org
> Cc: Paul Durrant; Keir (Xen.org); Jan Beulich; Andrew Cooper
> Subject: [PATCH v2 06/17] x86/hvm: revert 82ed8716b "fix direct PCI port I/O
> emulation retry...
> 
> ...and error handling"
> 
> NOTE: A straight reversion was not possible because of subsequent changes
>       in the code so this at least partially a manual reversion.
> 
> By limiting hvmemul_do_io_addr() to reps falling within the pages on which
> a reference has already been taken, we can guarantee that calls to
> hvm_copy_to/from_guest_phys() will not hit the HVMCOPY_gfn_paged_out
> or HVMCOPY_gfn_shared cases. Thus we can remove the retry logic from
> the intercept code and simplify it significantly.
> 
> Normally hvmemul_do_io_addr() will only reference single page at a time.
> It will, however, take an extra page reference for I/O spanning a page
> boundary.
> 
> It is still important to know, upon returning from x86_emulate(), whether
> the number of reps was reduced so the mmio_retry flag is retained for that
> purpose.
> 
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>

Unfortunately, during further testing with XenServer, I found an issue with this patch and also subsequent patches in the series. I will therefore be posting a v3 of the series.

  Paul

> Cc: Keir Fraser <keir@xen.org>
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
>  xen/arch/x86/hvm/emulate.c     |   83 ++++++++++++++++++++++++++++--
> ----------
>  xen/arch/x86/hvm/intercept.c   |   52 +++++--------------------
>  xen/include/asm-x86/hvm/vcpu.h |    2 +-
>  3 files changed, 69 insertions(+), 68 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
> index 4af70b0..38b8956 100644
> --- a/xen/arch/x86/hvm/emulate.c
> +++ b/xen/arch/x86/hvm/emulate.c
> @@ -60,6 +60,7 @@ static int hvmemul_do_io(
>          .type = is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO,
>          .addr = addr,
>          .size = size,
> +        .count = *reps,
>          .dir = dir,
>          .df = df,
>          .data = data,
> @@ -131,15 +132,6 @@ static int hvmemul_do_io(
>          HVMIO_dispatched : HVMIO_awaiting_completion;
>      vio->io_size = size;
> 
> -    /*
> -     * When retrying a repeated string instruction, force exit to guest after
> -     * completion of the retried iteration to allow handling of interrupts.
> -     */
> -    if ( vio->mmio_retrying )
> -        *reps = 1;
> -
> -    p.count = *reps;
> -
>      if ( dir == IOREQ_WRITE )
>      {
>          if ( !data_is_addr )
> @@ -153,17 +145,9 @@ static int hvmemul_do_io(
>      switch ( rc )
>      {
>      case X86EMUL_OKAY:
> -    case X86EMUL_RETRY:
> -        *reps = p.count;
>          p.state = STATE_IORESP_READY;
> -        if ( !vio->mmio_retry )
> -        {
> -            hvm_io_assist(&p);
> -            vio->io_state = HVMIO_none;
> -        }
> -        else
> -            /* Defer hvm_io_assist() invocation to hvm_do_resume(). */
> -            vio->io_state = HVMIO_handle_mmio_awaiting_completion;
> +        hvm_io_assist(&p);
> +        vio->io_state = HVMIO_none;
>          break;
>      case X86EMUL_UNHANDLEABLE:
>      {
> @@ -294,17 +278,67 @@ int hvmemul_do_io_addr(
>      bool_t is_mmio, paddr_t addr, unsigned long *reps,
>      unsigned int size, uint8_t dir, bool_t df, paddr_t ram_gpa)
>  {
> -    struct page_info *ram_page;
> +    struct vcpu *v = current;
> +    unsigned long ram_gmfn = paddr_to_pfn(ram_gpa);
> +    struct page_info *ram_page[2];
> +    int nr_pages = 0;
> +    unsigned long count;
>      int rc;
> 
> -    rc = hvmemul_acquire_page(paddr_to_pfn(ram_gpa), &ram_page);
> +
> +    rc = hvmemul_acquire_page(ram_gmfn, &ram_page[nr_pages]);
>      if ( rc != X86EMUL_OKAY )
> -        return rc;
> +        goto out;
> +
> +    nr_pages++;
> +
> +    /* Detemine how many reps will fit within this page */
> +    for ( count = 0; count < *reps; count++ )
> +    {
> +        paddr_t start, end;
> +
> +        if ( df )
> +        {
> +            start = ram_gpa - count * size;
> +            end = ram_gpa + size - 1;
> +        }
> +        else
> +        {
> +            start = ram_gpa;
> +            end = ram_gpa + (count + 1) * size - 1;
> +        }
> +
> +        if ( paddr_to_pfn(start) != ram_gmfn ||
> +             paddr_to_pfn(end) != ram_gmfn )
> +            break;
> +    }
> +
> +    if ( count == 0 )
> +    {
> +        /*
> +         * This access must span two pages, so grab a reference to
> +         * the next page and do a single rep.
> +         */
> +        rc = hvmemul_acquire_page(df ? ram_gmfn - 1 : ram_gmfn + 1,
> +                                  &ram_page[nr_pages]);
> +        if ( rc != X86EMUL_OKAY )
> +            goto out;
> 
> -    rc = hvmemul_do_io(is_mmio, addr, reps, size, dir, df, 1,
> +        nr_pages++;
> +        count = 1;
> +    }
> +
> +    rc = hvmemul_do_io(is_mmio, addr, &count, size, dir, df, 1,
>                         (uint64_t)ram_gpa);
> +    if ( rc == X86EMUL_OKAY )
> +    {
> +        v->arch.hvm_vcpu.hvm_io.mmio_retry = (count < *reps);
> +        *reps = count;
> +    }
> 
> -    hvmemul_release_page(ram_page);
> + out:
> +    while ( --nr_pages >= 0 )
> +        hvmemul_release_page(ram_page[nr_pages]);
> 
>      return rc;
>  }
> @@ -1554,7 +1588,6 @@ static int _hvm_emulate_one(struct
> hvm_emulate_ctxt *hvmemul_ctxt,
>      }
> 
>      hvmemul_ctxt->exn_pending = 0;
> -    vio->mmio_retrying = vio->mmio_retry;
>      vio->mmio_retry = 0;
> 
>      if ( cpu_has_vmx )
> diff --git a/xen/arch/x86/hvm/intercept.c b/xen/arch/x86/hvm/intercept.c
> index b8bcd0b..c0c9254 100644
> --- a/xen/arch/x86/hvm/intercept.c
> +++ b/xen/arch/x86/hvm/intercept.c
> @@ -155,7 +155,6 @@ static const struct hvm_io_ops portio_ops = {
>  static int process_io_intercept(struct vcpu *v, ioreq_t *p,
>                                  struct hvm_io_handler *handler)
>  {
> -    struct hvm_vcpu_io *vio = &v->arch.hvm_vcpu.hvm_io;
>      const struct hvm_io_ops *ops = handler->ops;
>      int rc = X86EMUL_OKAY, i, step = p->df ? -p->size : p->size;
>      uint64_t data;
> @@ -165,23 +164,12 @@ static int process_io_intercept(struct vcpu *v,
> ioreq_t *p,
>      {
>          for ( i = 0; i < p->count; i++ )
>          {
> -            if ( vio->mmio_retrying )
> -            {
> -                if ( vio->mmio_large_read_bytes != p->size )
> -                    return X86EMUL_UNHANDLEABLE;
> -                memcpy(&data, vio->mmio_large_read, p->size);
> -                vio->mmio_large_read_bytes = 0;
> -                vio->mmio_retrying = 0;
> -            }
> -            else
> -            {
> -                addr = (p->type == IOREQ_TYPE_COPY) ?
> -                    p->addr + step * i :
> -                    p->addr;
> -                rc = ops->read(handler, v, addr, p->size, &data);
> -                if ( rc != X86EMUL_OKAY )
> -                    break;
> -            }
> +            addr = (p->type == IOREQ_TYPE_COPY) ?
> +                p->addr + step * i :
> +                p->addr;
> +            rc = ops->read(handler, v, addr, p->size, &data);
> +            if ( rc != X86EMUL_OKAY )
> +                break;
> 
>              if ( p->data_is_ptr )
>              {
> @@ -190,14 +178,12 @@ static int process_io_intercept(struct vcpu *v,
> ioreq_t *p,
>                  {
>                  case HVMCOPY_okay:
>                      break;
> -                case HVMCOPY_gfn_paged_out:
> -                case HVMCOPY_gfn_shared:
> -                    rc = X86EMUL_RETRY;
> -                    break;
>                  case HVMCOPY_bad_gfn_to_mfn:
>                      /* Drop the write as real hardware would. */
>                      continue;
>                  case HVMCOPY_bad_gva_to_gfn:
> +                case HVMCOPY_gfn_paged_out:
> +                case HVMCOPY_gfn_shared:
>                      ASSERT(0);
>                      /* fall through */
>                  default:
> @@ -210,13 +196,6 @@ static int process_io_intercept(struct vcpu *v,
> ioreq_t *p,
>              else
>                  p->data = data;
>          }
> -
> -        if ( rc == X86EMUL_RETRY )
> -        {
> -            vio->mmio_retry = 1;
> -            vio->mmio_large_read_bytes = p->size;
> -            memcpy(vio->mmio_large_read, &data, p->size);
> -        }
>      }
>      else /* p->dir == IOREQ_WRITE */
>      {
> @@ -229,14 +208,12 @@ static int process_io_intercept(struct vcpu *v,
> ioreq_t *p,
>                  {
>                  case HVMCOPY_okay:
>                      break;
> -                case HVMCOPY_gfn_paged_out:
> -                case HVMCOPY_gfn_shared:
> -                    rc = X86EMUL_RETRY;
> -                    break;
>                  case HVMCOPY_bad_gfn_to_mfn:
>                      data = ~0;
>                      break;
>                  case HVMCOPY_bad_gva_to_gfn:
> +                case HVMCOPY_gfn_paged_out:
> +                case HVMCOPY_gfn_shared:
>                      ASSERT(0);
>                      /* fall through */
>                  default:
> @@ -256,15 +233,6 @@ static int process_io_intercept(struct vcpu *v,
> ioreq_t *p,
>              if ( rc != X86EMUL_OKAY )
>                  break;
>          }
> -
> -        if ( rc == X86EMUL_RETRY )
> -            vio->mmio_retry = 1;
> -    }
> -
> -    if ( i != 0 )
> -    {
> -        p->count = i;
> -        rc = X86EMUL_OKAY;
>      }
> 
>      return rc;
> diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-
> x86/hvm/vcpu.h
> index dd08416..97d78bd 100644
> --- a/xen/include/asm-x86/hvm/vcpu.h
> +++ b/xen/include/asm-x86/hvm/vcpu.h
> @@ -74,7 +74,7 @@ struct hvm_vcpu_io {
>       * For string instruction emulation we need to be able to signal a
>       * necessary retry through other than function return codes.
>       */
> -    bool_t mmio_retry, mmio_retrying;
> +    bool_t mmio_retry;
> 
>      unsigned long msix_unmask_address;
> 
> --
> 1.7.10.4

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 02/17] x86/hvm: re-name struct hvm_mmio_handler to hvm_mmio_ops
  2015-06-11 15:42 ` [PATCH v2 02/17] x86/hvm: re-name struct hvm_mmio_handler to hvm_mmio_ops Paul Durrant
@ 2015-06-17 12:43   ` Jan Beulich
  2015-06-17 12:45     ` Paul Durrant
  0 siblings, 1 reply; 38+ messages in thread
From: Jan Beulich @ 2015-06-17 12:43 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Andrew Cooper, Keir Fraser, xen-devel

>>> On 11.06.15 at 17:42, <paul.durrant@citrix.com> wrote:
> The struct just contains three methods and no data, so the name
> hvm_mmio_ops more accurately reflects its content. A subsequent patch
> introduces a new structure which more accurately warrants the name
> hvm_mmio_handler so doing the rename in this purely cosmetic patch avoids
> conflating functional and cosmetic changes in a single patch.
> 
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>

Acked-by: Jan Beulich <jbeulich@suse.com>
albeit

> @@ -182,21 +182,21 @@ int hvm_mmio_intercept(ioreq_t *p)
>  
>      for ( i = 0; i < HVM_MMIO_HANDLER_NR; i++ )
>      {
> -        hvm_mmio_check_t check_handler =
> -            hvm_mmio_handlers[i]->check_handler;
> +        hvm_mmio_check_t check =
> +            hvm_mmio_handlers[i]->check;

Does this still not fit on a single line?

Jan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 02/17] x86/hvm: re-name struct hvm_mmio_handler to hvm_mmio_ops
  2015-06-17 12:43   ` Jan Beulich
@ 2015-06-17 12:45     ` Paul Durrant
  0 siblings, 0 replies; 38+ messages in thread
From: Paul Durrant @ 2015-06-17 12:45 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Keir (Xen.org), xen-devel@lists.xenproject.org

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 17 June 2015 13:44
> To: Paul Durrant
> Cc: Andrew Cooper; xen-devel@lists.xenproject.org; Keir (Xen.org)
> Subject: Re: [PATCH v2 02/17] x86/hvm: re-name struct hvm_mmio_handler
> to hvm_mmio_ops
> 
> >>> On 11.06.15 at 17:42, <paul.durrant@citrix.com> wrote:
> > The struct just contains three methods and no data, so the name
> > hvm_mmio_ops more accurately reflects its content. A subsequent patch
> > introduces a new structure which more accurately warrants the name
> > hvm_mmio_handler so doing the rename in this purely cosmetic patch
> avoids
> > conflating functional and cosmetic changes in a single patch.
> >
> > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> 
> Acked-by: Jan Beulich <jbeulich@suse.com>

Thanks.

> albeit
> 
> > @@ -182,21 +182,21 @@ int hvm_mmio_intercept(ioreq_t *p)
> >
> >      for ( i = 0; i < HVM_MMIO_HANDLER_NR; i++ )
> >      {
> > -        hvm_mmio_check_t check_handler =
> > -            hvm_mmio_handlers[i]->check_handler;
> > +        hvm_mmio_check_t check =
> > +            hvm_mmio_handlers[i]->check;
> 
> Does this still not fit on a single line?
> 

Probably does. I'll pull the change into v3 of the series if that's the case.

> Jan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 01/17] x86/hvm: simplify hvmemul_do_io()
  2015-06-11 15:42 ` [PATCH v2 01/17] x86/hvm: simplify hvmemul_do_io() Paul Durrant
@ 2015-06-17 13:31   ` Jan Beulich
  2015-06-17 13:54     ` Paul Durrant
  0 siblings, 1 reply; 38+ messages in thread
From: Jan Beulich @ 2015-06-17 13:31 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Andrew Cooper, Keir Fraser, xen-devel

>>> On 11.06.15 at 17:42, <paul.durrant@citrix.com> wrote:
> --- a/xen/arch/x86/hvm/emulate.c
> +++ b/xen/arch/x86/hvm/emulate.c
> @@ -51,41 +51,23 @@ static void hvmtrace_io_assist(int is_mmio, ioreq_t *p)
>  }
>  
>  static int hvmemul_do_io(
> -    int is_mmio, paddr_t addr, unsigned long *reps, int size,
> -    paddr_t ram_gpa, int dir, int df, void *p_data)
> +    int is_mmio, paddr_t addr, unsigned long *reps, unsigned int size,
> +    uint8_t dir, bool_t df, bool_t data_is_addr, uint64_t data)

is_mmio surely can become bool_t too?

>  {
>      struct vcpu *curr = current;
> -    struct hvm_vcpu_io *vio;
> +    struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
>      ioreq_t p = {
>          .type = is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO,
>          .addr = addr,
>          .size = size,
>          .dir = dir,
>          .df = df,
> -        .data = ram_gpa,
> -        .data_is_ptr = (p_data == NULL),
> +        .data = data,
> +        .data_is_ptr = data_is_addr, /* ioreq_t field name is misleading */
>      };
> -    unsigned long ram_gfn = paddr_to_pfn(ram_gpa);
> -    p2m_type_t p2mt;
> -    struct page_info *ram_page;
> +    void *p_data = (void *)data;

While presumably compiling fine, casting uint64_t to a pointer looks
bogus without an intermediate cast to (unsigned) long or (u)intptr_t.

> @@ -190,7 +141,12 @@ static int hvmemul_do_io(
>      p.count = *reps;
>  
>      if ( dir == IOREQ_WRITE )
> +    {
> +        if ( !data_is_addr )
> +            memcpy(&p.data, p_data, size);
> +
>          hvmtrace_io_assist(is_mmio, &p);
> +    }

Why would you need to do this _only_ here (i.e. either not at all,
or not just for tracing purposes)? Or is this just to _restore_ the
original value for tracing (in which case the question would be
whether it indeed can get modified)?

> -int hvmemul_do_pio(
> -    unsigned long port, unsigned long *reps, int size,
> -    paddr_t ram_gpa, int dir, int df, void *p_data)
> +int hvmemul_do_io_buffer(

static?

> +    bool_t is_mmio, paddr_t addr, unsigned long *reps, unsigned int size,
> +    uint8_t dir, bool_t df, uint8_t *buffer)
>  {
> -    return hvmemul_do_io(0, port, reps, size, ram_gpa, dir, df, p_data);
> +    int rc;
> +
> +    BUG_ON(buffer == NULL);
> +
> +    rc = hvmemul_do_io(is_mmio, addr, reps, size, dir, df, 0,
> +                       (uint64_t)buffer);

Same remark regarding a cast between uint64_t and a pointer.

> +static int hvmemul_acquire_page(unsigned long gmfn, struct page_info **page)
> +{
> +    struct vcpu *v = current;
> +    struct domain *d = v->domain;

curr and currd please (albeit I don't see the need for the former as
a local variable).

> +static void hvmemul_release_page(struct page_info *page)

inline?

> +int hvmemul_do_io_addr(

static?

> +    bool_t is_mmio, paddr_t addr, unsigned long *reps, 
> +    unsigned int size, uint8_t dir, bool_t df, paddr_t ram_gpa)
> +{
> +    struct page_info *ram_page;
> +    int rc;
> +
> +    rc = hvmemul_acquire_page(paddr_to_pfn(ram_gpa), &ram_page);
> +    if ( rc != X86EMUL_OKAY )
> +        return rc;
> +
> +    rc = hvmemul_do_io(is_mmio, addr, reps, size, dir, df, 1,
> +                       (uint64_t)ram_gpa);

Pointless cast.

> +/*
> + * Perform I/O between <port> and <buffer>. <dir> indicates the
> + * direction: IOREQ_READ means a read from <port> to <buffer> and
> + * IOREQ_WRITE means a write from <buffer> to <port>. Each access has
> + * width <size> and up to *<reps> accesses will be performed. If
> + * X86EMUL_OKAY is returned then <reps> will be updated with the number
> + * of accesses actually performed.
> + *
> + * NOTE: If *<reps> is greater than 1, each access will use the
> + *       <buffer> pointer; there is no implicit interation over a
> + *       block of memory starting at <buffer>.
> + */
> +int hvmemul_do_pio_buffer(uint16_t port,
> +                          unsigned long *reps,

Considering the comment - can't you instead drop the reps parameter
here then?

> @@ -1142,7 +1236,8 @@ static int hvmemul_read_io(
>  {
>      unsigned long reps = 1;
>      *val = 0;
> -    return hvmemul_do_pio(port, &reps, bytes, 0, IOREQ_READ, 0, val);
> +    return hvmemul_do_pio_buffer(port, &reps, bytes, IOREQ_READ,
> +                                 (uint8_t *)val);
>  }
>  
>  static int hvmemul_write_io(
> @@ -1152,7 +1247,8 @@ static int hvmemul_write_io(
>      struct x86_emulate_ctxt *ctxt)
>  {
>      unsigned long reps = 1;
> -    return hvmemul_do_pio(port, &reps, bytes, 0, IOREQ_WRITE, 0, &val);
> +    return hvmemul_do_pio_buffer(port, &reps, bytes, IOREQ_WRITE,
> +                                 (uint8_t *)&val);

To avoid such bogus casts, please have hvmemul_do_pio_buffer()
take void * instead.

Jan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 01/17] x86/hvm: simplify hvmemul_do_io()
  2015-06-17 13:31   ` Jan Beulich
@ 2015-06-17 13:54     ` Paul Durrant
  2015-06-17 14:47       ` Jan Beulich
  0 siblings, 1 reply; 38+ messages in thread
From: Paul Durrant @ 2015-06-17 13:54 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Keir (Xen.org), xen-devel@lists.xenproject.org

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 17 June 2015 14:31
> To: Paul Durrant
> Cc: Andrew Cooper; xen-devel@lists.xenproject.org; Keir (Xen.org)
> Subject: Re: [PATCH v2 01/17] x86/hvm: simplify hvmemul_do_io()
> 
> >>> On 11.06.15 at 17:42, <paul.durrant@citrix.com> wrote:
> > --- a/xen/arch/x86/hvm/emulate.c
> > +++ b/xen/arch/x86/hvm/emulate.c
> > @@ -51,41 +51,23 @@ static void hvmtrace_io_assist(int is_mmio, ioreq_t
> *p)
> >  }
> >
> >  static int hvmemul_do_io(
> > -    int is_mmio, paddr_t addr, unsigned long *reps, int size,
> > -    paddr_t ram_gpa, int dir, int df, void *p_data)
> > +    int is_mmio, paddr_t addr, unsigned long *reps, unsigned int size,
> > +    uint8_t dir, bool_t df, bool_t data_is_addr, uint64_t data)
> 
> is_mmio surely can become bool_t too?
> 

Yes, it can.

> >  {
> >      struct vcpu *curr = current;
> > -    struct hvm_vcpu_io *vio;
> > +    struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
> >      ioreq_t p = {
> >          .type = is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO,
> >          .addr = addr,
> >          .size = size,
> >          .dir = dir,
> >          .df = df,
> > -        .data = ram_gpa,
> > -        .data_is_ptr = (p_data == NULL),
> > +        .data = data,
> > +        .data_is_ptr = data_is_addr, /* ioreq_t field name is misleading */
> >      };
> > -    unsigned long ram_gfn = paddr_to_pfn(ram_gpa);
> > -    p2m_type_t p2mt;
> > -    struct page_info *ram_page;
> > +    void *p_data = (void *)data;
> 
> While presumably compiling fine, casting uint64_t to a pointer looks
> bogus without an intermediate cast to (unsigned) long or (u)intptr_t.
> 

Ok. I may just change the arg to the function to a untptr_t then.

> > @@ -190,7 +141,12 @@ static int hvmemul_do_io(
> >      p.count = *reps;
> >
> >      if ( dir == IOREQ_WRITE )
> > +    {
> > +        if ( !data_is_addr )
> > +            memcpy(&p.data, p_data, size);
> > +
> >          hvmtrace_io_assist(is_mmio, &p);
> > +    }
> 
> Why would you need to do this _only_ here (i.e. either not at all,
> or not just for tracing purposes)? Or is this just to _restore_ the
> original value for tracing (in which case the question would be
> whether it indeed can get modified)?

Not sure I follow. If hvmemul_do_io is  called with !data_is_addr then the data arg will be a pointer to a buffer, rather than a gpa. So, the gpa that was first copied into the ioreq (i.e. p) needs to be overwritten with the actual data if it's a write.

> 
> > -int hvmemul_do_pio(
> > -    unsigned long port, unsigned long *reps, int size,
> > -    paddr_t ram_gpa, int dir, int df, void *p_data)
> > +int hvmemul_do_io_buffer(
> 
> static?

Ok.

> 
> > +    bool_t is_mmio, paddr_t addr, unsigned long *reps, unsigned int size,
> > +    uint8_t dir, bool_t df, uint8_t *buffer)
> >  {
> > -    return hvmemul_do_io(0, port, reps, size, ram_gpa, dir, df, p_data);
> > +    int rc;
> > +
> > +    BUG_ON(buffer == NULL);
> > +
> > +    rc = hvmemul_do_io(is_mmio, addr, reps, size, dir, df, 0,
> > +                       (uint64_t)buffer);
> 
> Same remark regarding a cast between uint64_t and a pointer.
> 

Ok.

> > +static int hvmemul_acquire_page(unsigned long gmfn, struct page_info
> **page)
> > +{
> > +    struct vcpu *v = current;
> > +    struct domain *d = v->domain;
> 
> curr and currd please (albeit I don't see the need for the former as
> a local variable).
> 

Ok.

> > +static void hvmemul_release_page(struct page_info *page)
> 
> inline?
> 

Probably, but I was hoping the compiler would make that decision.

> > +int hvmemul_do_io_addr(
> 
> static?
> 

Ok.

> > +    bool_t is_mmio, paddr_t addr, unsigned long *reps,
> > +    unsigned int size, uint8_t dir, bool_t df, paddr_t ram_gpa)
> > +{
> > +    struct page_info *ram_page;
> > +    int rc;
> > +
> > +    rc = hvmemul_acquire_page(paddr_to_pfn(ram_gpa), &ram_page);
> > +    if ( rc != X86EMUL_OKAY )
> > +        return rc;
> > +
> > +    rc = hvmemul_do_io(is_mmio, addr, reps, size, dir, df, 1,
> > +                       (uint64_t)ram_gpa);
> 
> Pointless cast.
> 

Yes, I guess.

> > +/*
> > + * Perform I/O between <port> and <buffer>. <dir> indicates the
> > + * direction: IOREQ_READ means a read from <port> to <buffer> and
> > + * IOREQ_WRITE means a write from <buffer> to <port>. Each access has
> > + * width <size> and up to *<reps> accesses will be performed. If
> > + * X86EMUL_OKAY is returned then <reps> will be updated with the
> number
> > + * of accesses actually performed.
> > + *
> > + * NOTE: If *<reps> is greater than 1, each access will use the
> > + *       <buffer> pointer; there is no implicit interation over a
> > + *       block of memory starting at <buffer>.
> > + */
> > +int hvmemul_do_pio_buffer(uint16_t port,
> > +                          unsigned long *reps,
> 
> Considering the comment - can't you instead drop the reps parameter
> here then?
> 

No. A rep stos does multiple port writes from the same buffer pointer. 

> > @@ -1142,7 +1236,8 @@ static int hvmemul_read_io(
> >  {
> >      unsigned long reps = 1;
> >      *val = 0;
> > -    return hvmemul_do_pio(port, &reps, bytes, 0, IOREQ_READ, 0, val);
> > +    return hvmemul_do_pio_buffer(port, &reps, bytes, IOREQ_READ,
> > +                                 (uint8_t *)val);
> >  }
> >
> >  static int hvmemul_write_io(
> > @@ -1152,7 +1247,8 @@ static int hvmemul_write_io(
> >      struct x86_emulate_ctxt *ctxt)
> >  {
> >      unsigned long reps = 1;
> > -    return hvmemul_do_pio(port, &reps, bytes, 0, IOREQ_WRITE, 0, &val);
> > +    return hvmemul_do_pio_buffer(port, &reps, bytes, IOREQ_WRITE,
> > +                                 (uint8_t *)&val);
> 
> To avoid such bogus casts, please have hvmemul_do_pio_buffer()
> take void * instead.
> 

Ok.

  Paul

> Jan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 03/17] x86/hvm: unify internal portio and mmio intercepts
  2015-06-11 15:42 ` [PATCH v2 03/17] x86/hvm: unify internal portio and mmio intercepts Paul Durrant
@ 2015-06-17 14:22   ` Jan Beulich
  2015-06-17 14:40     ` Paul Durrant
  0 siblings, 1 reply; 38+ messages in thread
From: Jan Beulich @ 2015-06-17 14:22 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Andrew Cooper, Keir Fraser, xen-devel

>>> On 11.06.15 at 17:42, <paul.durrant@citrix.com> wrote:
> - fail2:
> + fail6:
>      rtc_deinit(d);
>      stdvga_deinit(d);
>      vioapic_deinit(d);
> - fail1:
> + fail5:
>      if ( is_hardware_domain(d) )
>          xfree(d->arch.hvm_domain.io_bitmap);
> -    xfree(d->arch.hvm_domain.io_handler);
> + fail4:
> +    xfree(d->arch.hvm_domain.portio_handler);
> + fail3:
> +    xfree(d->arch.hvm_domain.mmio_handler);
> + fail2:
>      xfree(d->arch.hvm_domain.params);
> + fail1:
>   fail0:

We certainly don't want two consecutive labels. And with struct
domain starting out zero-initialized you don't need these many
earlier labels either (xfree(NULL) is perfectly fine).

> @@ -86,31 +175,40 @@ static int hvm_mmio_access(struct vcpu *v,
>              }
>              else
>              {
> -                rc = read(v, p->addr + step * i, p->size, &data);
> +                addr = (p->type == IOREQ_TYPE_COPY) ?
> +                    p->addr + step * i :
> +                    p->addr;
> +                rc = ops->read(handler, v, addr, p->size, &data);

Why this IOREQ_TYPE_COPY dependency that wasn't there before?

>                  if ( rc != X86EMUL_OKAY )
>                      break;
>              }
> -            switch ( hvm_copy_to_guest_phys(p->data + step * i,
> -                                            &data, p->size) )
> +
> +            if ( p->data_is_ptr )
>              {
> -            case HVMCOPY_okay:
> -                break;
> -            case HVMCOPY_gfn_paged_out:
> -            case HVMCOPY_gfn_shared:
> -                rc = X86EMUL_RETRY;
> -                break;
> -            case HVMCOPY_bad_gfn_to_mfn:
> -                /* Drop the write as real hardware would. */
> -                continue;
> -            case HVMCOPY_bad_gva_to_gfn:
> -                ASSERT(0);
> -                /* fall through */
> -            default:
> -                rc = X86EMUL_UNHANDLEABLE;
> -                break;
> +                switch ( hvm_copy_to_guest_phys(p->data + step * i,
> +                                                &data, p->size) )
> +                {
> +                case HVMCOPY_okay:
> +                    break;
> +                case HVMCOPY_gfn_paged_out:
> +                case HVMCOPY_gfn_shared:
> +                    rc = X86EMUL_RETRY;
> +                    break;
> +                case HVMCOPY_bad_gfn_to_mfn:
> +                    /* Drop the write as real hardware would. */
> +                    continue;
> +                case HVMCOPY_bad_gva_to_gfn:
> +                    ASSERT(0);

ASSERT_UNREACHABLE() please if you already touch such code.

> @@ -163,240 +270,182 @@ static int hvm_mmio_access(struct vcpu *v,
>      return rc;
>  }
>  
> -bool_t hvm_mmio_internal(paddr_t gpa)
> -{
> -    struct vcpu *curr = current;
> -    unsigned int i;
> -
> -    for ( i = 0; i < HVM_MMIO_HANDLER_NR; ++i )
> -        if ( hvm_mmio_handlers[i]->check(curr, gpa) )
> -            return 1;
> +static DEFINE_RCU_READ_LOCK(intercept_rcu_lock);
>  
> -    return 0;
> -}
> -
> -int hvm_mmio_intercept(ioreq_t *p)
> +struct hvm_io_handler *hvm_find_io_handler(struct vcpu *v,

I suppose v == current here, i.e. it should be named curr. And I
guess the function should return a pointer to const.

> +                                           ioreq_t *p)
>  {
> -    struct vcpu *v = current;
> -    int i;
> +    struct domain *d = v->domain;
> +    struct hvm_io_handler *handler;
>  
> -    for ( i = 0; i < HVM_MMIO_HANDLER_NR; i++ )
> -    {
> -        hvm_mmio_check_t check =
> -            hvm_mmio_handlers[i]->check;
> +    rcu_read_lock(&intercept_rcu_lock);
>  
> -        if ( check(v, p->addr) )
> -        {
> -            if ( unlikely(p->count > 1) &&
> -                 !check(v, unlikely(p->df)
> -                        ? p->addr - (p->count - 1L) * p->size
> -                        : p->addr + (p->count - 1L) * p->size) )
> -                p->count = 1;
> -
> -            return hvm_mmio_access(
> -                v, p,
> -                hvm_mmio_handlers[i]->read,
> -                hvm_mmio_handlers[i]->write);
> -        }
> -    }
> -
> -    return X86EMUL_UNHANDLEABLE;
> -}
> -
> -static int process_portio_intercept(portio_action_t action, ioreq_t *p)
> -{
> -    struct hvm_vcpu_io *vio = &current->arch.hvm_vcpu.hvm_io;
> -    int rc = X86EMUL_OKAY, i, step = p->df ? -p->size : p->size;
> -    uint32_t data;
> -
> -    if ( !p->data_is_ptr )
> +    list_for_each_entry( handler,
> +                         &d->arch.hvm_domain.io_handler.list,
> +                         list_entry )
>      {
> -        if ( p->dir == IOREQ_READ )
> -        {
> -            if ( vio->mmio_retrying )
> -            {
> -                if ( vio->mmio_large_read_bytes != p->size )
> -                    return X86EMUL_UNHANDLEABLE;
> -                memcpy(&data, vio->mmio_large_read, p->size);
> -                vio->mmio_large_read_bytes = 0;
> -                vio->mmio_retrying = 0;
> -            }
> -            else
> -                rc = action(IOREQ_READ, p->addr, p->size, &data);
> -            p->data = data;
> -        }
> -        else
> -        {
> -            data = p->data;
> -            rc = action(IOREQ_WRITE, p->addr, p->size, &data);
> -        }
> -        return rc;
> -    }
> +        const struct hvm_io_ops *ops = handler->ops;
> +        uint64_t start, end;
>  
> -    if ( p->dir == IOREQ_READ )
> -    {
> -        for ( i = 0; i < p->count; i++ )
> +        if ( handler->type != p->type )
> +            continue;
> +
> +        switch ( handler->type )
>          {
> -            if ( vio->mmio_retrying )
> +        case IOREQ_TYPE_PIO:
> +            start = p->addr;
> +            end = p->addr + p->size;
> +            break;
> +        case IOREQ_TYPE_COPY:
> +            if ( p->df )
>              {
> -                if ( vio->mmio_large_read_bytes != p->size )
> -                    return X86EMUL_UNHANDLEABLE;
> -                memcpy(&data, vio->mmio_large_read, p->size);
> -                vio->mmio_large_read_bytes = 0;
> -                vio->mmio_retrying = 0;
> +                start = (p->addr - (p->count - 1) * p->size);

You're re-introducing a problem here that I fixed not so long ago:
The way it's coded here the multiplication is a 32-bit unsigned one,
but we need it to be a 64-bit signed one. I.e. you need 1L here
(and elsewhere) instead of just 1.

> +                end = p->addr + p->size;
>              }
>              else
>              {
> -                rc = action(IOREQ_READ, p->addr, p->size, &data);
> -                if ( rc != X86EMUL_OKAY )
> -                    break;
> -            }
> -            switch ( hvm_copy_to_guest_phys(p->data + step * i,
> -                                            &data, p->size) )
> -            {
> -            case HVMCOPY_okay:
> -                break;
> -            case HVMCOPY_gfn_paged_out:
> -            case HVMCOPY_gfn_shared:
> -                rc = X86EMUL_RETRY;
> -                break;
> -            case HVMCOPY_bad_gfn_to_mfn:
> -                /* Drop the write as real hardware would. */
> -                continue;
> -            case HVMCOPY_bad_gva_to_gfn:
> -                ASSERT(0);
> -                /* fall through */
> -            default:
> -                rc = X86EMUL_UNHANDLEABLE;
> -                break;
> +                start = p->addr;
> +                end = p->addr + p->count * p->size;

Same here - one of the multiplicands needs to be extended to long.

> +done:

Labels indented by at least one blank please.

> +void register_mmio_handler(struct domain *d, const struct hvm_mmio_ops *ops)
> +{
> +    struct hvm_mmio_handler *mmio_handler;
> +    unsigned int i;
> +
> +    for ( i = 0; i < NR_MMIO_HANDLERS; i++ )
>      {
> -        if ( type != handler->hdl_list[i].type )
> -            continue;
> -        addr = handler->hdl_list[i].addr;
> -        size = handler->hdl_list[i].size;
> -        if ( (p->addr >= addr) &&
> -             ((p->addr + p->size) <= (addr + size)) )
> -        {
> -            if ( type == HVM_PORTIO )
> -                return process_portio_intercept(
> -                    handler->hdl_list[i].action.portio, p);
> +        mmio_handler = &d->arch.hvm_domain.mmio_handler[i];
>  
> -            if ( unlikely(p->count > 1) &&
> -                 (unlikely(p->df)
> -                  ? p->addr - (p->count - 1L) * p->size < addr
> -                  : p->addr + p->count * 1L * p->size - 1 >= addr + size) )
> -                p->count = 1;
> +        if ( mmio_handler->io_handler.ops == NULL )
> +            break;
> +    }
>  
> -            return handler->hdl_list[i].action.mmio(p);
> -        }
> +    BUG_ON(i == NR_MMIO_HANDLERS);
> +
> +    mmio_handler->io_handler.type = IOREQ_TYPE_COPY;
> +    mmio_handler->io_handler.ops = &mmio_ops;
> +
> +    hvm_register_io_handler(d, &mmio_handler->io_handler, 0);

So you're building a linked list of elements you could go through as
an array? And the ->io_handler initialization is the same for all array
elements. This doesn't look like clear simplification to me.

> +    mmio_handler->ops = ops;

And why is this being done after registration?

And is _all_ of this really usefully per-domain anyway?

> -void relocate_io_handler(
> -    struct domain *d, unsigned long old_addr, unsigned long new_addr,
> -    unsigned long size, int type)
> +bool_t hvm_mmio_internal(paddr_t gpa)
>  {
> -    struct hvm_io_handler *handler = d->arch.hvm_domain.io_handler;
> -    int i;
> -
> -    for ( i = 0; i < handler->num_slot; i++ )
> -        if ( (handler->hdl_list[i].addr == old_addr) &&
> -             (handler->hdl_list[i].size == size) &&
> -             (handler->hdl_list[i].type == type) )
> -            handler->hdl_list[i].addr = new_addr;
> +    ioreq_t p = {
> +        .type = IOREQ_TYPE_COPY,
> +        .addr = gpa
> +    };
> +
> +    return (hvm_find_io_handler(current, &p) != NULL);

Pointless parentheses.

> -typedef int (*portio_action_t)(
> -    int dir, uint32_t port, uint32_t bytes, uint32_t *val);
> -typedef int (*mmio_action_t)(ioreq_t *);
> -struct io_handler {
> -    int                 type;
> -    unsigned long       addr;
> -    unsigned long       size;
> -    union {
> -        portio_action_t portio;
> -        mmio_action_t   mmio;
> -        void           *ptr;
> -    } action;
> -};
> -
> -struct hvm_io_handler {
> -    int     num_slot;
> -    struct  io_handler hdl_list[MAX_IO_HANDLER];
> -};
> -
>  struct hvm_mmio_ops {
>      hvm_mmio_check_t check;
>      hvm_mmio_read_t  read;
>      hvm_mmio_write_t write;
>  };
>  
> -extern const struct hvm_mmio_ops hpet_mmio_ops;
> -extern const struct hvm_mmio_ops vlapic_mmio_ops;
> -extern const struct hvm_mmio_ops vioapic_mmio_ops;
> -extern const struct hvm_mmio_ops msixtbl_mmio_ops;
> -extern const struct hvm_mmio_ops iommu_mmio_ops;

As a result they should all become static in their source files.

> +#define NR_MMIO_HANDLERS 5
>  
> -#define HVM_MMIO_HANDLER_NR 5
> +struct hvm_mmio_handler {
> +    const struct hvm_mmio_ops *ops;
> +    struct hvm_io_handler     io_handler;
> +};
>  
> -int hvm_io_intercept(ioreq_t *p, int type);
> -void register_io_handler(
> -    struct domain *d, unsigned long addr, unsigned long size,
> -    void *action, int type);
> -void relocate_io_handler(
> -    struct domain *d, unsigned long old_addr, unsigned long new_addr,
> -    unsigned long size, int type);
> +typedef int (*portio_action_t)(
> +    int dir, uint32_t port, uint32_t bytes, uint32_t *val);
> +
> +#define NR_PORTIO_HANDLERS 16
> +
> +struct hvm_portio_handler {
> +    uint64_t              start, end;

Why uint64_t? (Note that this is simply the place I spotted this first
at - the problem appears to exist in all of the port I/O handling.)

Jan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 04/17] x86/hvm: unify dpci portio intercept with standard portio intercept
  2015-06-11 15:42 ` [PATCH v2 04/17] x86/hvm: unify dpci portio intercept with standard portio intercept Paul Durrant
@ 2015-06-17 14:36   ` Jan Beulich
  2015-06-17 14:46     ` Paul Durrant
  0 siblings, 1 reply; 38+ messages in thread
From: Jan Beulich @ 2015-06-17 14:36 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Andrew Cooper, Keir Fraser, xen-devel

>>> On 11.06.15 at 17:42, <paul.durrant@citrix.com> wrote:
> --- a/xen/arch/x86/hvm/io.c
> +++ b/xen/arch/x86/hvm/io.c
> @@ -208,185 +208,113 @@ void hvm_io_assist(ioreq_t *p)
>      }
>  }
>  
> -static int dpci_ioport_read(uint32_t mport, ioreq_t *p)
> +static bool_t dpci_portio_accept(struct hvm_io_handler *io_handler,
> +                                 struct vcpu *v,
> +                                 uint64_t addr,
> +                                 uint32_t size)
>  {
> -    struct hvm_vcpu_io *vio = &current->arch.hvm_vcpu.hvm_io;
> -    int rc = X86EMUL_OKAY, i, step = p->df ? -p->size : p->size;
> -    uint32_t data = 0;
> -
> -    for ( i = 0; i < p->count; i++ )
> -    {
> -        if ( vio->mmio_retrying )
> -        {
> -            if ( vio->mmio_large_read_bytes != p->size )
> -                return X86EMUL_UNHANDLEABLE;
> -            memcpy(&data, vio->mmio_large_read, p->size);
> -            vio->mmio_large_read_bytes = 0;
> -            vio->mmio_retrying = 0;
> -        }
> -        else switch ( p->size )
> -        {
> -        case 1:
> -            data = inb(mport);
> -            break;
> -        case 2:
> -            data = inw(mport);
> -            break;
> -        case 4:
> -            data = inl(mport);
> -            break;
> -        default:
> -            BUG();
> -        }
> +    struct hvm_iommu *hd = domain_hvm_iommu(v->domain);
> +    struct hvm_vcpu_io *vio = &v->arch.hvm_vcpu.hvm_io;
> +    struct g2m_ioport *g2m_ioport;
> +    uint32_t start, end;
> +    uint32_t gport = addr, mport;
>  
> -        if ( p->data_is_ptr )
> -        {
> -            switch ( hvm_copy_to_guest_phys(p->data + step * i,
> -                                            &data, p->size) )
> -            {
> -            case HVMCOPY_okay:
> -                break;
> -            case HVMCOPY_gfn_paged_out:
> -            case HVMCOPY_gfn_shared:
> -                rc = X86EMUL_RETRY;
> -                break;
> -            case HVMCOPY_bad_gfn_to_mfn:
> -                /* Drop the write as real hardware would. */
> -                continue;
> -            case HVMCOPY_bad_gva_to_gfn:
> -                ASSERT(0);
> -                /* fall through */
> -            default:
> -                rc = X86EMUL_UNHANDLEABLE;
> -                break;
> -            }
> -            if ( rc != X86EMUL_OKAY)
> -                break;
> -        }
> -        else
> -            p->data = data;
> -    }
>  
> -    if ( rc == X86EMUL_RETRY )
> +    list_for_each_entry( g2m_ioport, &hd->arch.g2m_ioport_list, list )
>      {
> -        vio->mmio_retry = 1;
> -        vio->mmio_large_read_bytes = p->size;
> -        memcpy(vio->mmio_large_read, &data, p->size);
> +        start = g2m_ioport->gport;
> +        end = start + g2m_ioport->np;
> +        if ( (gport >= start) && (gport < end) )
> +            goto found;
>      }
>  
> -    if ( i != 0 )
> +    return 0;
> +
> + found:
> +    mport = (gport - start) + g2m_ioport->mport;
> +
> +    if ( !ioports_access_permitted(current->domain, mport,

Isn't v == current here (which of course should be made obvious
again by naming it curr)? And then - is this check needed here at
all? I realize you just move it, but the permission check should be
(and is being) done in the XEN_DOMCTL_ioport_mapping handler.

> +void register_dpci_portio_handler(struct domain *d)
> +{
> +    struct hvm_io_handler *handler = &d->arch.hvm_domain.dpci_handler;
>  
> -    switch ( p->dir )
> -    {
> -    case IOREQ_READ:
> -        rc = dpci_ioport_read(mport, p);
> -        break;
> -    case IOREQ_WRITE:
> -        rc = dpci_ioport_write(mport, p);
> -        break;
> -    default:
> -        gdprintk(XENLOG_ERR, "Error: couldn't handle p->dir = %d", p->dir);
> -        rc = X86EMUL_UNHANDLEABLE;
> -    }
> +    handler->type = IOREQ_TYPE_PIO;
> +    handler->ops = &dpci_portio_ops;
>  
> -    return rc;
> +    hvm_register_io_handler(d, handler, 1);

Again registering a pointer into struct domain, with invariant
initialization.

Jan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 03/17] x86/hvm: unify internal portio and mmio intercepts
  2015-06-17 14:22   ` Jan Beulich
@ 2015-06-17 14:40     ` Paul Durrant
  2015-06-17 14:55       ` Jan Beulich
  0 siblings, 1 reply; 38+ messages in thread
From: Paul Durrant @ 2015-06-17 14:40 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Keir (Xen.org), xen-devel@lists.xenproject.org

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 17 June 2015 15:23
> To: Paul Durrant
> Cc: Andrew Cooper; xen-devel@lists.xenproject.org; Keir (Xen.org)
> Subject: Re: [PATCH v2 03/17] x86/hvm: unify internal portio and mmio
> intercepts
> 
> >>> On 11.06.15 at 17:42, <paul.durrant@citrix.com> wrote:
> > - fail2:
> > + fail6:
> >      rtc_deinit(d);
> >      stdvga_deinit(d);
> >      vioapic_deinit(d);
> > - fail1:
> > + fail5:
> >      if ( is_hardware_domain(d) )
> >          xfree(d->arch.hvm_domain.io_bitmap);
> > -    xfree(d->arch.hvm_domain.io_handler);
> > + fail4:
> > +    xfree(d->arch.hvm_domain.portio_handler);
> > + fail3:
> > +    xfree(d->arch.hvm_domain.mmio_handler);
> > + fail2:
> >      xfree(d->arch.hvm_domain.params);
> > + fail1:
> >   fail0:
> 
> We certainly don't want two consecutive labels. And with struct
> domain starting out zero-initialized you don't need these many
> earlier labels either (xfree(NULL) is perfectly fine).
> 

Ok. I'll see if I can condense some of them.

> > @@ -86,31 +175,40 @@ static int hvm_mmio_access(struct vcpu *v,
> >              }
> >              else
> >              {
> > -                rc = read(v, p->addr + step * i, p->size, &data);
> > +                addr = (p->type == IOREQ_TYPE_COPY) ?
> > +                    p->addr + step * i :
> > +                    p->addr;
> > +                rc = ops->read(handler, v, addr, p->size, &data);
> 
> Why this IOREQ_TYPE_COPY dependency that wasn't there before?
> 

Because the unified function is used for both port and memory mapped I/O and it's only correct to increment in the addr value in the mmio case (i.e. when ioreq type is 'copy').

> >                  if ( rc != X86EMUL_OKAY )
> >                      break;
> >              }
> > -            switch ( hvm_copy_to_guest_phys(p->data + step * i,
> > -                                            &data, p->size) )
> > +
> > +            if ( p->data_is_ptr )
> >              {
> > -            case HVMCOPY_okay:
> > -                break;
> > -            case HVMCOPY_gfn_paged_out:
> > -            case HVMCOPY_gfn_shared:
> > -                rc = X86EMUL_RETRY;
> > -                break;
> > -            case HVMCOPY_bad_gfn_to_mfn:
> > -                /* Drop the write as real hardware would. */
> > -                continue;
> > -            case HVMCOPY_bad_gva_to_gfn:
> > -                ASSERT(0);
> > -                /* fall through */
> > -            default:
> > -                rc = X86EMUL_UNHANDLEABLE;
> > -                break;
> > +                switch ( hvm_copy_to_guest_phys(p->data + step * i,
> > +                                                &data, p->size) )
> > +                {
> > +                case HVMCOPY_okay:
> > +                    break;
> > +                case HVMCOPY_gfn_paged_out:
> > +                case HVMCOPY_gfn_shared:
> > +                    rc = X86EMUL_RETRY;
> > +                    break;
> > +                case HVMCOPY_bad_gfn_to_mfn:
> > +                    /* Drop the write as real hardware would. */
> > +                    continue;
> > +                case HVMCOPY_bad_gva_to_gfn:
> > +                    ASSERT(0);
> 
> ASSERT_UNREACHABLE() please if you already touch such code.
> 

Ok. I'll change them as I see them.

> > @@ -163,240 +270,182 @@ static int hvm_mmio_access(struct vcpu *v,
> >      return rc;
> >  }
> >
> > -bool_t hvm_mmio_internal(paddr_t gpa)
> > -{
> > -    struct vcpu *curr = current;
> > -    unsigned int i;
> > -
> > -    for ( i = 0; i < HVM_MMIO_HANDLER_NR; ++i )
> > -        if ( hvm_mmio_handlers[i]->check(curr, gpa) )
> > -            return 1;
> > +static DEFINE_RCU_READ_LOCK(intercept_rcu_lock);
> >
> > -    return 0;
> > -}
> > -
> > -int hvm_mmio_intercept(ioreq_t *p)
> > +struct hvm_io_handler *hvm_find_io_handler(struct vcpu *v,
> 
> I suppose v == current here, i.e. it should be named curr. And I
> guess the function should return a pointer to const.
> 

Ok.

> > +                                           ioreq_t *p)
> >  {
> > -    struct vcpu *v = current;
> > -    int i;
> > +    struct domain *d = v->domain;
> > +    struct hvm_io_handler *handler;
> >
> > -    for ( i = 0; i < HVM_MMIO_HANDLER_NR; i++ )
> > -    {
> > -        hvm_mmio_check_t check =
> > -            hvm_mmio_handlers[i]->check;
> > +    rcu_read_lock(&intercept_rcu_lock);
> >
> > -        if ( check(v, p->addr) )
> > -        {
> > -            if ( unlikely(p->count > 1) &&
> > -                 !check(v, unlikely(p->df)
> > -                        ? p->addr - (p->count - 1L) * p->size
> > -                        : p->addr + (p->count - 1L) * p->size) )
> > -                p->count = 1;
> > -
> > -            return hvm_mmio_access(
> > -                v, p,
> > -                hvm_mmio_handlers[i]->read,
> > -                hvm_mmio_handlers[i]->write);
> > -        }
> > -    }
> > -
> > -    return X86EMUL_UNHANDLEABLE;
> > -}
> > -
> > -static int process_portio_intercept(portio_action_t action, ioreq_t *p)
> > -{
> > -    struct hvm_vcpu_io *vio = &current->arch.hvm_vcpu.hvm_io;
> > -    int rc = X86EMUL_OKAY, i, step = p->df ? -p->size : p->size;
> > -    uint32_t data;
> > -
> > -    if ( !p->data_is_ptr )
> > +    list_for_each_entry( handler,
> > +                         &d->arch.hvm_domain.io_handler.list,
> > +                         list_entry )
> >      {
> > -        if ( p->dir == IOREQ_READ )
> > -        {
> > -            if ( vio->mmio_retrying )
> > -            {
> > -                if ( vio->mmio_large_read_bytes != p->size )
> > -                    return X86EMUL_UNHANDLEABLE;
> > -                memcpy(&data, vio->mmio_large_read, p->size);
> > -                vio->mmio_large_read_bytes = 0;
> > -                vio->mmio_retrying = 0;
> > -            }
> > -            else
> > -                rc = action(IOREQ_READ, p->addr, p->size, &data);
> > -            p->data = data;
> > -        }
> > -        else
> > -        {
> > -            data = p->data;
> > -            rc = action(IOREQ_WRITE, p->addr, p->size, &data);
> > -        }
> > -        return rc;
> > -    }
> > +        const struct hvm_io_ops *ops = handler->ops;
> > +        uint64_t start, end;
> >
> > -    if ( p->dir == IOREQ_READ )
> > -    {
> > -        for ( i = 0; i < p->count; i++ )
> > +        if ( handler->type != p->type )
> > +            continue;
> > +
> > +        switch ( handler->type )
> >          {
> > -            if ( vio->mmio_retrying )
> > +        case IOREQ_TYPE_PIO:
> > +            start = p->addr;
> > +            end = p->addr + p->size;
> > +            break;
> > +        case IOREQ_TYPE_COPY:
> > +            if ( p->df )
> >              {
> > -                if ( vio->mmio_large_read_bytes != p->size )
> > -                    return X86EMUL_UNHANDLEABLE;
> > -                memcpy(&data, vio->mmio_large_read, p->size);
> > -                vio->mmio_large_read_bytes = 0;
> > -                vio->mmio_retrying = 0;
> > +                start = (p->addr - (p->count - 1) * p->size);
> 
> You're re-introducing a problem here that I fixed not so long ago:
> The way it's coded here the multiplication is a 32-bit unsigned one,
> but we need it to be a 64-bit signed one. I.e. you need 1L here
> (and elsewhere) instead of just 1.

Ok. That's clearly a bit subtle though, since I missed it. Would it be more obvious to have something like:

start = p->addr - (uint64_t)p->size * (p->count - 1);

> 
> > +                end = p->addr + p->size;
> >              }
> >              else
> >              {
> > -                rc = action(IOREQ_READ, p->addr, p->size, &data);
> > -                if ( rc != X86EMUL_OKAY )
> > -                    break;
> > -            }
> > -            switch ( hvm_copy_to_guest_phys(p->data + step * i,
> > -                                            &data, p->size) )
> > -            {
> > -            case HVMCOPY_okay:
> > -                break;
> > -            case HVMCOPY_gfn_paged_out:
> > -            case HVMCOPY_gfn_shared:
> > -                rc = X86EMUL_RETRY;
> > -                break;
> > -            case HVMCOPY_bad_gfn_to_mfn:
> > -                /* Drop the write as real hardware would. */
> > -                continue;
> > -            case HVMCOPY_bad_gva_to_gfn:
> > -                ASSERT(0);
> > -                /* fall through */
> > -            default:
> > -                rc = X86EMUL_UNHANDLEABLE;
> > -                break;
> > +                start = p->addr;
> > +                end = p->addr + p->count * p->size;
> 
> Same here - one of the multiplicands needs to be extended to long.
> 

Ok.

> > +done:
> 
> Labels indented by at least one blank please.
> 
> > +void register_mmio_handler(struct domain *d, const struct
> hvm_mmio_ops *ops)
> > +{
> > +    struct hvm_mmio_handler *mmio_handler;
> > +    unsigned int i;
> > +
> > +    for ( i = 0; i < NR_MMIO_HANDLERS; i++ )
> >      {
> > -        if ( type != handler->hdl_list[i].type )
> > -            continue;
> > -        addr = handler->hdl_list[i].addr;
> > -        size = handler->hdl_list[i].size;
> > -        if ( (p->addr >= addr) &&
> > -             ((p->addr + p->size) <= (addr + size)) )
> > -        {
> > -            if ( type == HVM_PORTIO )
> > -                return process_portio_intercept(
> > -                    handler->hdl_list[i].action.portio, p);
> > +        mmio_handler = &d->arch.hvm_domain.mmio_handler[i];
> >
> > -            if ( unlikely(p->count > 1) &&
> > -                 (unlikely(p->df)
> > -                  ? p->addr - (p->count - 1L) * p->size < addr
> > -                  : p->addr + p->count * 1L * p->size - 1 >= addr + size) )
> > -                p->count = 1;
> > +        if ( mmio_handler->io_handler.ops == NULL )
> > +            break;
> > +    }
> >
> > -            return handler->hdl_list[i].action.mmio(p);
> > -        }
> > +    BUG_ON(i == NR_MMIO_HANDLERS);
> > +
> > +    mmio_handler->io_handler.type = IOREQ_TYPE_COPY;
> > +    mmio_handler->io_handler.ops = &mmio_ops;
> > +
> > +    hvm_register_io_handler(d, &mmio_handler->io_handler, 0);
> 
> So you're building a linked list of elements you could go through as
> an array? And the ->io_handler initialization is the same for all array
> elements. This doesn't look like clear simplification to me.
> 

In the process loop, because it works on both mmio and portio, I use a generic io_hander struct that I can then use container_of() to get back at the outer portio or mmio struct when I need to. I agree it looks complex, but it's a relatively small price to remove the almost totally duplicated code if we have separate process loops for mmio and portio.

> > +    mmio_handler->ops = ops;
> 
> And why is this being done after registration?

Possibly it doesn't.

> 
> And is _all_ of this really usefully per-domain anyway?
> 

Maybe the portio handlers could go global too but it seems more flexible to make things per-domain.

> > -void relocate_io_handler(
> > -    struct domain *d, unsigned long old_addr, unsigned long new_addr,
> > -    unsigned long size, int type)
> > +bool_t hvm_mmio_internal(paddr_t gpa)
> >  {
> > -    struct hvm_io_handler *handler = d->arch.hvm_domain.io_handler;
> > -    int i;
> > -
> > -    for ( i = 0; i < handler->num_slot; i++ )
> > -        if ( (handler->hdl_list[i].addr == old_addr) &&
> > -             (handler->hdl_list[i].size == size) &&
> > -             (handler->hdl_list[i].type == type) )
> > -            handler->hdl_list[i].addr = new_addr;
> > +    ioreq_t p = {
> > +        .type = IOREQ_TYPE_COPY,
> > +        .addr = gpa
> > +    };
> > +
> > +    return (hvm_find_io_handler(current, &p) != NULL);
> 
> Pointless parentheses.

Ok.

> 
> > -typedef int (*portio_action_t)(
> > -    int dir, uint32_t port, uint32_t bytes, uint32_t *val);
> > -typedef int (*mmio_action_t)(ioreq_t *);
> > -struct io_handler {
> > -    int                 type;
> > -    unsigned long       addr;
> > -    unsigned long       size;
> > -    union {
> > -        portio_action_t portio;
> > -        mmio_action_t   mmio;
> > -        void           *ptr;
> > -    } action;
> > -};
> > -
> > -struct hvm_io_handler {
> > -    int     num_slot;
> > -    struct  io_handler hdl_list[MAX_IO_HANDLER];
> > -};
> > -
> >  struct hvm_mmio_ops {
> >      hvm_mmio_check_t check;
> >      hvm_mmio_read_t  read;
> >      hvm_mmio_write_t write;
> >  };
> >
> > -extern const struct hvm_mmio_ops hpet_mmio_ops;
> > -extern const struct hvm_mmio_ops vlapic_mmio_ops;
> > -extern const struct hvm_mmio_ops vioapic_mmio_ops;
> > -extern const struct hvm_mmio_ops msixtbl_mmio_ops;
> > -extern const struct hvm_mmio_ops iommu_mmio_ops;
> 
> As a result they should all become static in their source files.
> 

Ok.

> > +#define NR_MMIO_HANDLERS 5
> >
> > -#define HVM_MMIO_HANDLER_NR 5
> > +struct hvm_mmio_handler {
> > +    const struct hvm_mmio_ops *ops;
> > +    struct hvm_io_handler     io_handler;
> > +};
> >
> > -int hvm_io_intercept(ioreq_t *p, int type);
> > -void register_io_handler(
> > -    struct domain *d, unsigned long addr, unsigned long size,
> > -    void *action, int type);
> > -void relocate_io_handler(
> > -    struct domain *d, unsigned long old_addr, unsigned long new_addr,
> > -    unsigned long size, int type);
> > +typedef int (*portio_action_t)(
> > +    int dir, uint32_t port, uint32_t bytes, uint32_t *val);
> > +
> > +#define NR_PORTIO_HANDLERS 16
> > +
> > +struct hvm_portio_handler {
> > +    uint64_t              start, end;
> 
> Why uint64_t? (Note that this is simply the place I spotted this first
> at - the problem appears to exist in all of the port I/O handling.)

Because the common process loop and other functions deal in 64-bit vals so that they don't have to care whether they are port or mmio address values. They are cast down to 32-bits before the action is invoked.

  Paul

> 
> Jan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 04/17] x86/hvm: unify dpci portio intercept with standard portio intercept
  2015-06-17 14:36   ` Jan Beulich
@ 2015-06-17 14:46     ` Paul Durrant
  2015-06-17 14:58       ` Jan Beulich
  0 siblings, 1 reply; 38+ messages in thread
From: Paul Durrant @ 2015-06-17 14:46 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Keir (Xen.org), xen-devel@lists.xenproject.org

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 17 June 2015 15:37
> To: Paul Durrant
> Cc: Andrew Cooper; xen-devel@lists.xenproject.org; Keir (Xen.org)
> Subject: Re: [PATCH v2 04/17] x86/hvm: unify dpci portio intercept with
> standard portio intercept
> 
> >>> On 11.06.15 at 17:42, <paul.durrant@citrix.com> wrote:
> > --- a/xen/arch/x86/hvm/io.c
> > +++ b/xen/arch/x86/hvm/io.c
> > @@ -208,185 +208,113 @@ void hvm_io_assist(ioreq_t *p)
> >      }
> >  }
> >
> > -static int dpci_ioport_read(uint32_t mport, ioreq_t *p)
> > +static bool_t dpci_portio_accept(struct hvm_io_handler *io_handler,
> > +                                 struct vcpu *v,
> > +                                 uint64_t addr,
> > +                                 uint32_t size)
> >  {
> > -    struct hvm_vcpu_io *vio = &current->arch.hvm_vcpu.hvm_io;
> > -    int rc = X86EMUL_OKAY, i, step = p->df ? -p->size : p->size;
> > -    uint32_t data = 0;
> > -
> > -    for ( i = 0; i < p->count; i++ )
> > -    {
> > -        if ( vio->mmio_retrying )
> > -        {
> > -            if ( vio->mmio_large_read_bytes != p->size )
> > -                return X86EMUL_UNHANDLEABLE;
> > -            memcpy(&data, vio->mmio_large_read, p->size);
> > -            vio->mmio_large_read_bytes = 0;
> > -            vio->mmio_retrying = 0;
> > -        }
> > -        else switch ( p->size )
> > -        {
> > -        case 1:
> > -            data = inb(mport);
> > -            break;
> > -        case 2:
> > -            data = inw(mport);
> > -            break;
> > -        case 4:
> > -            data = inl(mport);
> > -            break;
> > -        default:
> > -            BUG();
> > -        }
> > +    struct hvm_iommu *hd = domain_hvm_iommu(v->domain);
> > +    struct hvm_vcpu_io *vio = &v->arch.hvm_vcpu.hvm_io;
> > +    struct g2m_ioport *g2m_ioport;
> > +    uint32_t start, end;
> > +    uint32_t gport = addr, mport;
> >
> > -        if ( p->data_is_ptr )
> > -        {
> > -            switch ( hvm_copy_to_guest_phys(p->data + step * i,
> > -                                            &data, p->size) )
> > -            {
> > -            case HVMCOPY_okay:
> > -                break;
> > -            case HVMCOPY_gfn_paged_out:
> > -            case HVMCOPY_gfn_shared:
> > -                rc = X86EMUL_RETRY;
> > -                break;
> > -            case HVMCOPY_bad_gfn_to_mfn:
> > -                /* Drop the write as real hardware would. */
> > -                continue;
> > -            case HVMCOPY_bad_gva_to_gfn:
> > -                ASSERT(0);
> > -                /* fall through */
> > -            default:
> > -                rc = X86EMUL_UNHANDLEABLE;
> > -                break;
> > -            }
> > -            if ( rc != X86EMUL_OKAY)
> > -                break;
> > -        }
> > -        else
> > -            p->data = data;
> > -    }
> >
> > -    if ( rc == X86EMUL_RETRY )
> > +    list_for_each_entry( g2m_ioport, &hd->arch.g2m_ioport_list, list )
> >      {
> > -        vio->mmio_retry = 1;
> > -        vio->mmio_large_read_bytes = p->size;
> > -        memcpy(vio->mmio_large_read, &data, p->size);
> > +        start = g2m_ioport->gport;
> > +        end = start + g2m_ioport->np;
> > +        if ( (gport >= start) && (gport < end) )
> > +            goto found;
> >      }
> >
> > -    if ( i != 0 )
> > +    return 0;
> > +
> > + found:
> > +    mport = (gport - start) + g2m_ioport->mport;
> > +
> > +    if ( !ioports_access_permitted(current->domain, mport,
> 
> Isn't v == current here (which of course should be made obvious
> again by naming it curr)? And then - is this check needed here at
> all? I realize you just move it, but the permission check should be
> (and is being) done in the XEN_DOMCTL_ioport_mapping handler.
> 

Ok. I'll happily drop the check.

> > +void register_dpci_portio_handler(struct domain *d)
> > +{
> > +    struct hvm_io_handler *handler = &d-
> >arch.hvm_domain.dpci_handler;
> >
> > -    switch ( p->dir )
> > -    {
> > -    case IOREQ_READ:
> > -        rc = dpci_ioport_read(mport, p);
> > -        break;
> > -    case IOREQ_WRITE:
> > -        rc = dpci_ioport_write(mport, p);
> > -        break;
> > -    default:
> > -        gdprintk(XENLOG_ERR, "Error: couldn't handle p->dir = %d", p->dir);
> > -        rc = X86EMUL_UNHANDLEABLE;
> > -    }
> > +    handler->type = IOREQ_TYPE_PIO;
> > +    handler->ops = &dpci_portio_ops;
> >
> > -    return rc;
> > +    hvm_register_io_handler(d, handler, 1);
> 
> Again registering a pointer into struct domain, with invariant
> initialization.
> 

Agreed, but I don't see any other way of unifying this with the main process loop. Do you have an alternative suggestion?

  Paul

> Jan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 01/17] x86/hvm: simplify hvmemul_do_io()
  2015-06-17 13:54     ` Paul Durrant
@ 2015-06-17 14:47       ` Jan Beulich
  2015-06-17 14:55         ` Paul Durrant
  0 siblings, 1 reply; 38+ messages in thread
From: Jan Beulich @ 2015-06-17 14:47 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Andrew Cooper, Keir(Xen.org), xen-devel@lists.xenproject.org

>>> On 17.06.15 at 15:54, <Paul.Durrant@citrix.com> wrote:
>>  -----Original Message-----
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: 17 June 2015 14:31
>> >>> On 11.06.15 at 17:42, <paul.durrant@citrix.com> wrote:
>> > @@ -190,7 +141,12 @@ static int hvmemul_do_io(
>> >      p.count = *reps;
>> >
>> >      if ( dir == IOREQ_WRITE )
>> > +    {
>> > +        if ( !data_is_addr )
>> > +            memcpy(&p.data, p_data, size);
>> > +
>> >          hvmtrace_io_assist(is_mmio, &p);
>> > +    }
>> 
>> Why would you need to do this _only_ here (i.e. either not at all,
>> or not just for tracing purposes)? Or is this just to _restore_ the
>> original value for tracing (in which case the question would be
>> whether it indeed can get modified)?
> 
> Not sure I follow. If hvmemul_do_io is  called with !data_is_addr then the 
> data arg will be a pointer to a buffer, rather than a gpa. So, the gpa that 
> was first copied into the ioreq (i.e. p) needs to be overwritten with the 
> actual data if it's a write.

Ah, it's being moved here from about 100 lines up (in the original
code) - I simply forgot about that deletion by the time I got here.

>> > +static void hvmemul_release_page(struct page_info *page)
>> 
>> inline?
>> 
> 
> Probably, but I was hoping the compiler would make that decision.

Likely it will, but for functions as simple as this one we can help it.

>> > +/*
>> > + * Perform I/O between <port> and <buffer>. <dir> indicates the
>> > + * direction: IOREQ_READ means a read from <port> to <buffer> and
>> > + * IOREQ_WRITE means a write from <buffer> to <port>. Each access has
>> > + * width <size> and up to *<reps> accesses will be performed. If
>> > + * X86EMUL_OKAY is returned then <reps> will be updated with the
>> number
>> > + * of accesses actually performed.
>> > + *
>> > + * NOTE: If *<reps> is greater than 1, each access will use the
>> > + *       <buffer> pointer; there is no implicit interation over a
>> > + *       block of memory starting at <buffer>.
>> > + */
>> > +int hvmemul_do_pio_buffer(uint16_t port,
>> > +                          unsigned long *reps,
>> 
>> Considering the comment - can't you instead drop the reps parameter
>> here then?
>> 
> 
> No. A rep stos does multiple port writes from the same buffer pointer. 

A REP STOS doesn't do any port writes at all. REP OUTS does, but
it accesses "a block of memory", which the comment specifically says
doesn't happen here. I.e. I still think either the comment is wrong (or
at least misleading) or the function could be simplified.

Jan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 01/17] x86/hvm: simplify hvmemul_do_io()
  2015-06-17 14:47       ` Jan Beulich
@ 2015-06-17 14:55         ` Paul Durrant
  2015-06-17 14:59           ` Jan Beulich
  0 siblings, 1 reply; 38+ messages in thread
From: Paul Durrant @ 2015-06-17 14:55 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Keir (Xen.org), xen-devel@lists.xenproject.org

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 17 June 2015 15:48
> To: Paul Durrant
> Cc: Andrew Cooper; xen-devel@lists.xenproject.org; Keir (Xen.org)
> Subject: RE: [PATCH v2 01/17] x86/hvm: simplify hvmemul_do_io()
> 
> >>> On 17.06.15 at 15:54, <Paul.Durrant@citrix.com> wrote:
> >>  -----Original Message-----
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> Sent: 17 June 2015 14:31
> >> >>> On 11.06.15 at 17:42, <paul.durrant@citrix.com> wrote:
> >> > @@ -190,7 +141,12 @@ static int hvmemul_do_io(
> >> >      p.count = *reps;
> >> >
> >> >      if ( dir == IOREQ_WRITE )
> >> > +    {
> >> > +        if ( !data_is_addr )
> >> > +            memcpy(&p.data, p_data, size);
> >> > +
> >> >          hvmtrace_io_assist(is_mmio, &p);
> >> > +    }
> >>
> >> Why would you need to do this _only_ here (i.e. either not at all,
> >> or not just for tracing purposes)? Or is this just to _restore_ the
> >> original value for tracing (in which case the question would be
> >> whether it indeed can get modified)?
> >
> > Not sure I follow. If hvmemul_do_io is  called with !data_is_addr then the
> > data arg will be a pointer to a buffer, rather than a gpa. So, the gpa that
> > was first copied into the ioreq (i.e. p) needs to be overwritten with the
> > actual data if it's a write.
> 
> Ah, it's being moved here from about 100 lines up (in the original
> code) - I simply forgot about that deletion by the time I got here.
> 
> >> > +static void hvmemul_release_page(struct page_info *page)
> >>
> >> inline?
> >>
> >
> > Probably, but I was hoping the compiler would make that decision.
> 
> Likely it will, but for functions as simple as this one we can help it.
> 
> >> > +/*
> >> > + * Perform I/O between <port> and <buffer>. <dir> indicates the
> >> > + * direction: IOREQ_READ means a read from <port> to <buffer> and
> >> > + * IOREQ_WRITE means a write from <buffer> to <port>. Each access
> has
> >> > + * width <size> and up to *<reps> accesses will be performed. If
> >> > + * X86EMUL_OKAY is returned then <reps> will be updated with the
> >> number
> >> > + * of accesses actually performed.
> >> > + *
> >> > + * NOTE: If *<reps> is greater than 1, each access will use the
> >> > + *       <buffer> pointer; there is no implicit interation over a
> >> > + *       block of memory starting at <buffer>.
> >> > + */
> >> > +int hvmemul_do_pio_buffer(uint16_t port,
> >> > +                          unsigned long *reps,
> >>
> >> Considering the comment - can't you instead drop the reps parameter
> >> here then?
> >>
> >
> > No. A rep stos does multiple port writes from the same buffer pointer.
> 
> A REP STOS doesn't do any port writes at all. REP OUTS does, but
> it accesses "a block of memory", which the comment specifically says
> doesn't happen here. I.e. I still think either the comment is wrong (or
> at least misleading) or the function could be simplified.
> 

Hmm. Good point. I'm sure I ran into a case where something was trying to multi rep port I/O from a buffer, but you're right... it makes no sense.

  Paul

> Jan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 03/17] x86/hvm: unify internal portio and mmio intercepts
  2015-06-17 14:40     ` Paul Durrant
@ 2015-06-17 14:55       ` Jan Beulich
  0 siblings, 0 replies; 38+ messages in thread
From: Jan Beulich @ 2015-06-17 14:55 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Andrew Cooper, Keir(Xen.org), xen-devel@lists.xenproject.org

>>> On 17.06.15 at 16:40, <Paul.Durrant@citrix.com> wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: 17 June 2015 15:23
>> >>> On 11.06.15 at 17:42, <paul.durrant@citrix.com> wrote:
>> > +        case IOREQ_TYPE_COPY:
>> > +            if ( p->df )
>> >              {
>> > -                if ( vio->mmio_large_read_bytes != p->size )
>> > -                    return X86EMUL_UNHANDLEABLE;
>> > -                memcpy(&data, vio->mmio_large_read, p->size);
>> > -                vio->mmio_large_read_bytes = 0;
>> > -                vio->mmio_retrying = 0;
>> > +                start = (p->addr - (p->count - 1) * p->size);
>> 
>> You're re-introducing a problem here that I fixed not so long ago:
>> The way it's coded here the multiplication is a 32-bit unsigned one,
>> but we need it to be a 64-bit signed one. I.e. you need 1L here
>> (and elsewhere) instead of just 1.
> 
> Ok. That's clearly a bit subtle though, since I missed it. Would it be more 
> obvious to have something like:
> 
> start = p->addr - (uint64_t)p->size * (p->count - 1);

Depends on the person looking at it: I generally advocate for
avoiding casts whenever possible. Hence the 1L conversion I did
a while back. (Apart from that your code would break again once
it would be used in a hypothetical 128-bit environment, while my
variant would - assuming long would then be 128 bits wide - still
work.)

>> And is _all_ of this really usefully per-domain anyway?
>> 
> 
> Maybe the portio handlers could go global too but it seems more flexible to 
> make things per-domain.

Maybe, in the long run.

>> > +#define NR_MMIO_HANDLERS 5
>> >
>> > -#define HVM_MMIO_HANDLER_NR 5
>> > +struct hvm_mmio_handler {
>> > +    const struct hvm_mmio_ops *ops;
>> > +    struct hvm_io_handler     io_handler;
>> > +};
>> >
>> > -int hvm_io_intercept(ioreq_t *p, int type);
>> > -void register_io_handler(
>> > -    struct domain *d, unsigned long addr, unsigned long size,
>> > -    void *action, int type);
>> > -void relocate_io_handler(
>> > -    struct domain *d, unsigned long old_addr, unsigned long new_addr,
>> > -    unsigned long size, int type);
>> > +typedef int (*portio_action_t)(
>> > +    int dir, uint32_t port, uint32_t bytes, uint32_t *val);
>> > +
>> > +#define NR_PORTIO_HANDLERS 16
>> > +
>> > +struct hvm_portio_handler {
>> > +    uint64_t              start, end;
>> 
>> Why uint64_t? (Note that this is simply the place I spotted this first
>> at - the problem appears to exist in all of the port I/O handling.)
> 
> Because the common process loop and other functions deal in 64-bit vals so 
> that they don't have to care whether they are port or mmio address values. 
> They are cast down to 32-bits before the action is invoked.

There's nothing "common" here - even if common code deals with
uint64_t-s, these fields could be uint32_t, unsigned int, or even
uint16_t. No need for them to take up more space than needed.
Same for function parameters - unless their type is dictated by a
structure field they want to be stored into, port-I/O specific
functions should take unsigned int port numbers to produce
better code.

Jan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 04/17] x86/hvm: unify dpci portio intercept with standard portio intercept
  2015-06-17 14:46     ` Paul Durrant
@ 2015-06-17 14:58       ` Jan Beulich
  2015-06-17 15:17         ` Paul Durrant
  0 siblings, 1 reply; 38+ messages in thread
From: Jan Beulich @ 2015-06-17 14:58 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Andrew Cooper, Keir(Xen.org), xen-devel@lists.xenproject.org

>>> On 17.06.15 at 16:46, <Paul.Durrant@citrix.com> wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: 17 June 2015 15:37
>> >>> On 11.06.15 at 17:42, <paul.durrant@citrix.com> wrote:
>> > + found:
>> > +    mport = (gport - start) + g2m_ioport->mport;
>> > +
>> > +    if ( !ioports_access_permitted(current->domain, mport,
>> 
>> Isn't v == current here (which of course should be made obvious
>> again by naming it curr)? And then - is this check needed here at
>> all? I realize you just move it, but the permission check should be
>> (and is being) done in the XEN_DOMCTL_ioport_mapping handler.
>> 
> 
> Ok. I'll happily drop the check.

Just please make sure you mention this in the description.

>> > +void register_dpci_portio_handler(struct domain *d)
>> > +{
>> > +    struct hvm_io_handler *handler = &d-
>> >arch.hvm_domain.dpci_handler;
>> >
>> > -    switch ( p->dir )
>> > -    {
>> > -    case IOREQ_READ:
>> > -        rc = dpci_ioport_read(mport, p);
>> > -        break;
>> > -    case IOREQ_WRITE:
>> > -        rc = dpci_ioport_write(mport, p);
>> > -        break;
>> > -    default:
>> > -        gdprintk(XENLOG_ERR, "Error: couldn't handle p->dir = %d", p->dir);
>> > -        rc = X86EMUL_UNHANDLEABLE;
>> > -    }
>> > +    handler->type = IOREQ_TYPE_PIO;
>> > +    handler->ops = &dpci_portio_ops;
>> >
>> > -    return rc;
>> > +    hvm_register_io_handler(d, handler, 1);
>> 
>> Again registering a pointer into struct domain, with invariant
>> initialization.
> 
> Agreed, but I don't see any other way of unifying this with the main process 
> loop. Do you have an alternative suggestion?

Since the list elements are all part of struct domain, why can't this
be an array instead (presumably yielding cheaper iteration over it
at once)?

Jan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 01/17] x86/hvm: simplify hvmemul_do_io()
  2015-06-17 14:55         ` Paul Durrant
@ 2015-06-17 14:59           ` Jan Beulich
  0 siblings, 0 replies; 38+ messages in thread
From: Jan Beulich @ 2015-06-17 14:59 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Andrew Cooper, Keir(Xen.org), xen-devel@lists.xenproject.org

>>> On 17.06.15 at 16:55, <Paul.Durrant@citrix.com> wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: 17 June 2015 15:48
>> >>> On 17.06.15 at 15:54, <Paul.Durrant@citrix.com> wrote:
>> >>  -----Original Message-----
>> >> From: Jan Beulich [mailto:JBeulich@suse.com]
>> >> Sent: 17 June 2015 14:31
>> >> >>> On 11.06.15 at 17:42, <paul.durrant@citrix.com> wrote:
>> >> > +/*
>> >> > + * Perform I/O between <port> and <buffer>. <dir> indicates the
>> >> > + * direction: IOREQ_READ means a read from <port> to <buffer> and
>> >> > + * IOREQ_WRITE means a write from <buffer> to <port>. Each access
>> has
>> >> > + * width <size> and up to *<reps> accesses will be performed. If
>> >> > + * X86EMUL_OKAY is returned then <reps> will be updated with the
>> >> number
>> >> > + * of accesses actually performed.
>> >> > + *
>> >> > + * NOTE: If *<reps> is greater than 1, each access will use the
>> >> > + *       <buffer> pointer; there is no implicit interation over a
>> >> > + *       block of memory starting at <buffer>.
>> >> > + */
>> >> > +int hvmemul_do_pio_buffer(uint16_t port,
>> >> > +                          unsigned long *reps,
>> >>
>> >> Considering the comment - can't you instead drop the reps parameter
>> >> here then?
>> >>
>> >
>> > No. A rep stos does multiple port writes from the same buffer pointer.
>> 
>> A REP STOS doesn't do any port writes at all. REP OUTS does, but
>> it accesses "a block of memory", which the comment specifically says
>> doesn't happen here. I.e. I still think either the comment is wrong (or
>> at least misleading) or the function could be simplified.
>> 
> 
> Hmm. Good point. I'm sure I ran into a case where something was trying to 
> multi rep port I/O from a buffer, but you're right... it makes no sense.

IDE drivers are the most common ones to use such. It really all
depends on whether REP INS and REP OUTS can make it here.

Jan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 04/17] x86/hvm: unify dpci portio intercept with standard portio intercept
  2015-06-17 14:58       ` Jan Beulich
@ 2015-06-17 15:17         ` Paul Durrant
  0 siblings, 0 replies; 38+ messages in thread
From: Paul Durrant @ 2015-06-17 15:17 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Keir (Xen.org), xen-devel@lists.xenproject.org

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 17 June 2015 15:58
> To: Paul Durrant
> Cc: Andrew Cooper; xen-devel@lists.xenproject.org; Keir (Xen.org)
> Subject: RE: [PATCH v2 04/17] x86/hvm: unify dpci portio intercept with
> standard portio intercept
> 
> >>> On 17.06.15 at 16:46, <Paul.Durrant@citrix.com> wrote:
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> Sent: 17 June 2015 15:37
> >> >>> On 11.06.15 at 17:42, <paul.durrant@citrix.com> wrote:
> >> > + found:
> >> > +    mport = (gport - start) + g2m_ioport->mport;
> >> > +
> >> > +    if ( !ioports_access_permitted(current->domain, mport,
> >>
> >> Isn't v == current here (which of course should be made obvious
> >> again by naming it curr)? And then - is this check needed here at
> >> all? I realize you just move it, but the permission check should be
> >> (and is being) done in the XEN_DOMCTL_ioport_mapping handler.
> >>
> >
> > Ok. I'll happily drop the check.
> 
> Just please make sure you mention this in the description.
> 

Ok.

> >> > +void register_dpci_portio_handler(struct domain *d)
> >> > +{
> >> > +    struct hvm_io_handler *handler = &d-
> >> >arch.hvm_domain.dpci_handler;
> >> >
> >> > -    switch ( p->dir )
> >> > -    {
> >> > -    case IOREQ_READ:
> >> > -        rc = dpci_ioport_read(mport, p);
> >> > -        break;
> >> > -    case IOREQ_WRITE:
> >> > -        rc = dpci_ioport_write(mport, p);
> >> > -        break;
> >> > -    default:
> >> > -        gdprintk(XENLOG_ERR, "Error: couldn't handle p->dir = %d", p-
> >dir);
> >> > -        rc = X86EMUL_UNHANDLEABLE;
> >> > -    }
> >> > +    handler->type = IOREQ_TYPE_PIO;
> >> > +    handler->ops = &dpci_portio_ops;
> >> >
> >> > -    return rc;
> >> > +    hvm_register_io_handler(d, handler, 1);
> >>
> >> Again registering a pointer into struct domain, with invariant
> >> initialization.
> >
> > Agreed, but I don't see any other way of unifying this with the main
> process
> > loop. Do you have an alternative suggestion?
> 
> Since the list elements are all part of struct domain, why can't this
> be an array instead (presumably yielding cheaper iteration over it
> at once)?
> 

I had envisaged that maybe msixtbl code might be adapted to use the list too (rather than iterating over its own list but I never got as far as that). I'll use an array for now.

  Paul

> Jan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 05/17] x86/hvm: unify stdvga mmio intercept with standard mmio intercept
  2015-06-11 15:42 ` [PATCH v2 05/17] x86/hvm: unify stdvga mmio intercept with standard mmio intercept Paul Durrant
@ 2015-06-17 15:50   ` Jan Beulich
  2015-06-17 16:30     ` Paul Durrant
  0 siblings, 1 reply; 38+ messages in thread
From: Jan Beulich @ 2015-06-17 15:50 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Andrew Cooper, Keir Fraser, xen-devel

>>> On 11.06.15 at 17:42, <paul.durrant@citrix.com> wrote:
> It's clear from the following check in hvmemul_rep_movs:
> 
>     if ( sp2mt == p2m_mmio_direct || dp2mt == p2m_mmio_direct ||
>          (sp2mt == p2m_mmio_dm && dp2mt == p2m_mmio_dm) )
>         return X86EMUL_UNHANDLEABLE;
> 
> that mmio <-> mmio copy is not handled. This means the code in the
> stdvga mmio intercept that explicitly handles mmio <-> mmio copy when
> hvm_copy_to/from_guest_phys() fails is never going to be executed.

Looking at the code I agree with the analysis, but I doubt the code
was put there originally just for fun, and I wonder whether some of
the slowness we "gained" over the years in boot loader graphics
performance isn't related to this code having got hooked off. And in
the end we should get to the point where MMIO -> MMIO transfers
get emulated efficiently anyway, so I wonder whether this patch
moves us in the right direction.

> --- a/xen/arch/x86/hvm/emulate.c
> +++ b/xen/arch/x86/hvm/emulate.c
> @@ -273,6 +273,15 @@ static int hvmemul_acquire_page(unsigned long gmfn, struct page_info **page)
>          return X86EMUL_RETRY;
>      }
>  
> +    /* This code should not be reached if the gmfn is not RAM */
> +    if ( p2m_is_mmio(p2mt) )
> +    {
> +        domain_crash(d);

Log a message at least? Silent domain death is rather ugly to analyze.

> --- a/xen/arch/x86/hvm/stdvga.c
> +++ b/xen/arch/x86/hvm/stdvga.c
> @@ -275,7 +275,8 @@ static uint8_t stdvga_mem_readb(uint64_t addr)
>      return ret;
>  }
>  
> -static uint64_t stdvga_mem_read(uint64_t addr, uint64_t size)
> +static int stdvga_mem_read(struct vcpu *v, unsigned long addr,
> +                           unsigned long size, unsigned long *p_data)
>  {
>      uint64_t data = 0;
>  
> @@ -313,7 +314,9 @@ static uint64_t stdvga_mem_read(uint64_t addr, uint64_t size)
>          break;
>      }
>  
> -    return data;
> +    *p_data = data;

Even if sizeof(long) == sizeof(uint64_t), please try to use
consistent types (i.e. the new function parameter should be of
uint64_t * type).

> +    if (!hvm_buffered_io_send(&p))
> +        return X86EMUL_UNHANDLEABLE;

Coding style.

> +static int stdvga_mem_check(struct vcpu *v, unsigned long addr)
>  {
> +    struct hvm_hw_stdvga *s = &v->domain->arch.hvm_domain.stdvga;
> +    int rc;
>  
>      spin_lock(&s->lock);
>  
> +    rc = s->stdvga && s->cache &&
> +        (addr >= VGA_MEM_BASE) &&
> +        (addr < (VGA_MEM_BASE + VGA_MEM_SIZE));

This (compared with the code being removed) makes clear that the
check() mechanism lacks the size (it did so before, but with your
generalization I think this needs to be corrected).

> +const struct hvm_mmio_ops stdvga_mem_ops = {

static

Jan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 05/17] x86/hvm: unify stdvga mmio intercept with standard mmio intercept
  2015-06-17 15:50   ` Jan Beulich
@ 2015-06-17 16:30     ` Paul Durrant
  2015-06-18  6:23       ` Jan Beulich
  0 siblings, 1 reply; 38+ messages in thread
From: Paul Durrant @ 2015-06-17 16:30 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Keir (Xen.org), xen-devel@lists.xenproject.org

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 17 June 2015 16:51
> To: Paul Durrant
> Cc: Andrew Cooper; xen-devel@lists.xenproject.org; Keir (Xen.org)
> Subject: Re: [PATCH v2 05/17] x86/hvm: unify stdvga mmio intercept with
> standard mmio intercept
> 
> >>> On 11.06.15 at 17:42, <paul.durrant@citrix.com> wrote:
> > It's clear from the following check in hvmemul_rep_movs:
> >
> >     if ( sp2mt == p2m_mmio_direct || dp2mt == p2m_mmio_direct ||
> >          (sp2mt == p2m_mmio_dm && dp2mt == p2m_mmio_dm) )
> >         return X86EMUL_UNHANDLEABLE;
> >
> > that mmio <-> mmio copy is not handled. This means the code in the
> > stdvga mmio intercept that explicitly handles mmio <-> mmio copy when
> > hvm_copy_to/from_guest_phys() fails is never going to be executed.
> 
> Looking at the code I agree with the analysis, but I doubt the code
> was put there originally just for fun, and I wonder whether some of
> the slowness we "gained" over the years in boot loader graphics
> performance isn't related to this code having got hooked off. And in
> the end we should get to the point where MMIO -> MMIO transfers
> get emulated efficiently anyway, so I wonder whether this patch
> moves us in the right direction.
> 

I think removing the stdvga code is the right thing to so. If we want to handle MMIO <-> MMIO then this needs to be done in a generic fashon. Given that (upstream) QEMU translates the 'data_is_addr' addresses through its memory map as well as the actual ioreq addr values I believe it will actually do MMIO <-> MMIO moves as it stands, so it's probably only Xen's code that needs modifying and I think this can be done fairly easily after this series. I'll see if I can come up with something.

> > --- a/xen/arch/x86/hvm/emulate.c
> > +++ b/xen/arch/x86/hvm/emulate.c
> > @@ -273,6 +273,15 @@ static int hvmemul_acquire_page(unsigned long
> gmfn, struct page_info **page)
> >          return X86EMUL_RETRY;
> >      }
> >
> > +    /* This code should not be reached if the gmfn is not RAM */
> > +    if ( p2m_is_mmio(p2mt) )
> > +    {
> > +        domain_crash(d);
> 
> Log a message at least? Silent domain death is rather ugly to analyze.

You do get a file+line (since domain_crash is actually a macro with a printk in it too) but I can also log something.

> 
> > --- a/xen/arch/x86/hvm/stdvga.c
> > +++ b/xen/arch/x86/hvm/stdvga.c
> > @@ -275,7 +275,8 @@ static uint8_t stdvga_mem_readb(uint64_t addr)
> >      return ret;
> >  }
> >
> > -static uint64_t stdvga_mem_read(uint64_t addr, uint64_t size)
> > +static int stdvga_mem_read(struct vcpu *v, unsigned long addr,
> > +                           unsigned long size, unsigned long *p_data)
> >  {
> >      uint64_t data = 0;
> >
> > @@ -313,7 +314,9 @@ static uint64_t stdvga_mem_read(uint64_t addr,
> uint64_t size)
> >          break;
> >      }
> >
> > -    return data;
> > +    *p_data = data;
> 
> Even if sizeof(long) == sizeof(uint64_t), please try to use
> consistent types (i.e. the new function parameter should be of
> uint64_t * type).
> 

Ok.

> > +    if (!hvm_buffered_io_send(&p))
> > +        return X86EMUL_UNHANDLEABLE;
> 
> Coding style.
> 

True.

> > +static int stdvga_mem_check(struct vcpu *v, unsigned long addr)
> >  {
> > +    struct hvm_hw_stdvga *s = &v->domain->arch.hvm_domain.stdvga;
> > +    int rc;
> >
> >      spin_lock(&s->lock);
> >
> > +    rc = s->stdvga && s->cache &&
> > +        (addr >= VGA_MEM_BASE) &&
> > +        (addr < (VGA_MEM_BASE + VGA_MEM_SIZE));
> 
> This (compared with the code being removed) makes clear that the
> check() mechanism lacks the size (it did so before, but with your
> generalization I think this needs to be corrected).
> 
> > +const struct hvm_mmio_ops stdvga_mem_ops = {
> 
> static
> 

Ok.

  Paul

> Jan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 05/17] x86/hvm: unify stdvga mmio intercept with standard mmio intercept
  2015-06-17 16:30     ` Paul Durrant
@ 2015-06-18  6:23       ` Jan Beulich
  0 siblings, 0 replies; 38+ messages in thread
From: Jan Beulich @ 2015-06-18  6:23 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Andrew Cooper, Keir(Xen.org), xen-devel@lists.xenproject.org

>>> On 17.06.15 at 18:30, <Paul.Durrant@citrix.com> wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: 17 June 2015 16:51
>> >>> On 11.06.15 at 17:42, <paul.durrant@citrix.com> wrote:
>> > --- a/xen/arch/x86/hvm/emulate.c
>> > +++ b/xen/arch/x86/hvm/emulate.c
>> > @@ -273,6 +273,15 @@ static int hvmemul_acquire_page(unsigned long
>> gmfn, struct page_info **page)
>> >          return X86EMUL_RETRY;
>> >      }
>> >
>> > +    /* This code should not be reached if the gmfn is not RAM */
>> > +    if ( p2m_is_mmio(p2mt) )
>> > +    {
>> > +        domain_crash(d);
>> 
>> Log a message at least? Silent domain death is rather ugly to analyze.
> 
> You do get a file+line (since domain_crash is actually a macro with a printk 
> in it too) but I can also log something.

Ah, right, I forgot about that. No extra message necessary then
I guess.

Jan

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2015-06-18  6:23 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-11 15:42 [PATCH v2 00/17] x86/hvm: I/O emulation cleanup and fix Paul Durrant
2015-06-11 15:42 ` [PATCH v2 01/17] x86/hvm: simplify hvmemul_do_io() Paul Durrant
2015-06-17 13:31   ` Jan Beulich
2015-06-17 13:54     ` Paul Durrant
2015-06-17 14:47       ` Jan Beulich
2015-06-17 14:55         ` Paul Durrant
2015-06-17 14:59           ` Jan Beulich
2015-06-11 15:42 ` [PATCH v2 02/17] x86/hvm: re-name struct hvm_mmio_handler to hvm_mmio_ops Paul Durrant
2015-06-17 12:43   ` Jan Beulich
2015-06-17 12:45     ` Paul Durrant
2015-06-11 15:42 ` [PATCH v2 03/17] x86/hvm: unify internal portio and mmio intercepts Paul Durrant
2015-06-17 14:22   ` Jan Beulich
2015-06-17 14:40     ` Paul Durrant
2015-06-17 14:55       ` Jan Beulich
2015-06-11 15:42 ` [PATCH v2 04/17] x86/hvm: unify dpci portio intercept with standard portio intercept Paul Durrant
2015-06-17 14:36   ` Jan Beulich
2015-06-17 14:46     ` Paul Durrant
2015-06-17 14:58       ` Jan Beulich
2015-06-17 15:17         ` Paul Durrant
2015-06-11 15:42 ` [PATCH v2 05/17] x86/hvm: unify stdvga mmio intercept with standard mmio intercept Paul Durrant
2015-06-17 15:50   ` Jan Beulich
2015-06-17 16:30     ` Paul Durrant
2015-06-18  6:23       ` Jan Beulich
2015-06-11 15:42 ` [PATCH v2 06/17] x86/hvm: revert 82ed8716b "fix direct PCI port I/O emulation retry Paul Durrant
2015-06-17 10:54   ` Paul Durrant
2015-06-11 15:42 ` [PATCH v2 07/17] x86/hvm: only call hvm_io_assist() from hvm_wait_for_io() Paul Durrant
2015-06-11 15:42 ` [PATCH v2 08/17] x86/hvm: split I/O completion handling from state model Paul Durrant
2015-06-11 15:42 ` [PATCH v2 09/17] x86/hvm: remove hvm_io_pending() check in hvmemul_do_io() Paul Durrant
2015-06-11 15:42 ` [PATCH v2 10/17] x86/hvm: remove HVMIO_dispatched I/O state Paul Durrant
2015-06-11 15:42 ` [PATCH v2 11/17] x86/hvm: remove hvm_io_state enumeration Paul Durrant
2015-06-11 15:42 ` [PATCH v2 12/17] x86/hvm: use ioreq_t to track in-flight state Paul Durrant
2015-06-11 15:42 ` [PATCH v2 13/17] x86/hvm: only acquire RAM pages for emulation when we need to Paul Durrant
2015-06-11 15:42 ` [PATCH v2 14/17] x86/hvm: remove extraneous parameter from hvmtrace_io_assist() Paul Durrant
2015-06-11 15:42 ` [PATCH v2 15/17] x86/hvm: make sure translated MMIO reads or writes fall within a page Paul Durrant
2015-06-11 15:43 ` [PATCH v2 16/17] x86/hvm: remove multiple open coded 'chunking' loops Paul Durrant
2015-06-11 15:43 ` [PATCH v2 17/17] x86/hvm: track large memory mapped accesses by buffer offset Paul Durrant
2015-06-12 10:44 ` [PATCH v2 00/17] x86/hvm: I/O emulation cleanup and fix Fabio Fantoni
2015-06-12 11:45   ` Paul Durrant

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.