All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1 0/2] vhost: memslot handling improvements
@ 2023-02-16 11:47 David Hildenbrand
  2023-02-16 11:47 ` [PATCH v1 1/2] vhost: Defer filtering memory sections until building the vhost memory structure David Hildenbrand
                   ` (3 more replies)
  0 siblings, 4 replies; 22+ messages in thread
From: David Hildenbrand @ 2023-02-16 11:47 UTC (permalink / raw
  To: qemu-devel
  Cc: David Hildenbrand, Michael S. Tsirkin, Stefan Hajnoczi,
	Dr . David Alan Gilbert

Following up on my previous work to make virtio-mem consume multiple
memslots dynamically [1] that requires precise accounting between used vs.
reserved memslots, I realized that vhost makes this extra hard by
filtering out some memory region sections (so they don't consume a
memslot) in the vhost-user case, which messes up the whole memslot
accounting.

This series fixes what I found to be broken and prepares for more work on
[1]. Further, it cleanes up the merge checks that I consider unnecessary.

[1] https://lkml.kernel.org/r/20211027124531.57561-8-david@redhat.com

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>

David Hildenbrand (2):
  vhost: Defer filtering memory sections until building the vhost memory
    structure
  vhost: Remove vhost_backend_can_merge() callback

 hw/virtio/vhost-user.c            | 14 -----
 hw/virtio/vhost-vdpa.c            |  1 -
 hw/virtio/vhost.c                 | 85 ++++++++++++++++++++-----------
 include/hw/virtio/vhost-backend.h |  4 --
 4 files changed, 56 insertions(+), 48 deletions(-)

-- 
2.39.1



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v1 1/2] vhost: Defer filtering memory sections until building the vhost memory structure
  2023-02-16 11:47 [PATCH v1 0/2] vhost: memslot handling improvements David Hildenbrand
@ 2023-02-16 11:47 ` David Hildenbrand
  2023-02-16 12:04   ` Michael S. Tsirkin
  2023-03-07 10:51   ` Igor Mammedov
  2023-02-16 11:47 ` [PATCH v1 2/2] vhost: Remove vhost_backend_can_merge() callback David Hildenbrand
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 22+ messages in thread
From: David Hildenbrand @ 2023-02-16 11:47 UTC (permalink / raw
  To: qemu-devel
  Cc: David Hildenbrand, Michael S. Tsirkin, Stefan Hajnoczi,
	Dr . David Alan Gilbert, Tiwei Bie

Having multiple devices, some filtering memslots and some not filtering
memslots, messes up the "used_memslot" accounting. If we'd have a device
the filters out less memory sections after a device that filters out more,
we'd be in trouble, because our memslot checks stop working reliably.
For example, hotplugging a device that filters out less memslots might end
up passing the checks based on max vs. used memslots, but can run out of
memslots when getting notified about all memory sections.

Further, it will be helpful in memory device context in the near future
to know that a RAM memory region section will consume a memslot, and be
accounted for in the used vs. free memslots, such that we can implement
reservation of memslots for memory devices properly. Whether a device
filters this out and would theoretically still have a free memslot is
then hidden internally, making overall vhost memslot accounting easier.

Let's filter the memslots when creating the vhost memory array,
accounting all RAM && !ROM memory regions as "used_memslots" even if
vhost_user isn't interested in anonymous RAM regions, because it needs
an fd.

When a device actually filters out regions (which should happen rarely
in practice), we might detect a layout change although only filtered
regions changed. We won't bother about optimizing that for now.

Note: we cannot simply filter out the region and count them as
"filtered" to add them to used, because filtered regions could get
merged and result in a smaller effective number of memslots. Further,
we won't touch the hmp/qmp virtio introspection output.

Fixes: 988a27754bbb ("vhost: allow backends to filter memory sections")
Cc: Tiwei Bie <tiwei.bie@intel.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/virtio/vhost.c | 79 +++++++++++++++++++++++++++++++++--------------
 1 file changed, 55 insertions(+), 24 deletions(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index eb8c4c378c..b7fb960fa9 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -219,8 +219,13 @@ static void vhost_log_sync_range(struct vhost_dev *dev,
     int i;
     /* FIXME: this is N^2 in number of sections */
     for (i = 0; i < dev->n_mem_sections; ++i) {
-        MemoryRegionSection *section = &dev->mem_sections[i];
-        vhost_sync_dirty_bitmap(dev, section, first, last);
+        MemoryRegionSection *mrs = &dev->mem_sections[i];
+
+        if (dev->vhost_ops->vhost_backend_mem_section_filter &&
+            !dev->vhost_ops->vhost_backend_mem_section_filter(dev, mrs)) {
+            continue;
+        }
+        vhost_sync_dirty_bitmap(dev, mrs, first, last);
     }
 }
 
@@ -503,12 +508,6 @@ static bool vhost_section(struct vhost_dev *dev, MemoryRegionSection *section)
             return false;
         }
 
-        if (dev->vhost_ops->vhost_backend_mem_section_filter &&
-            !dev->vhost_ops->vhost_backend_mem_section_filter(dev, section)) {
-            trace_vhost_reject_section(mr->name, 2);
-            return false;
-        }
-
         trace_vhost_section(mr->name);
         return true;
     } else {
@@ -525,6 +524,43 @@ static void vhost_begin(MemoryListener *listener)
     dev->n_tmp_sections = 0;
 }
 
+static void vhost_realloc_vhost_memory(struct vhost_dev *dev,
+                                       unsigned int nregions)
+{
+    const size_t size = offsetof(struct vhost_memory, regions) +
+                        nregions * sizeof dev->mem->regions[0];
+
+    dev->mem = g_realloc(dev->mem, size);
+    dev->mem->nregions = nregions;
+}
+
+static void vhost_rebuild_vhost_memory(struct vhost_dev *dev)
+{
+    unsigned int nregions = 0;
+    int i;
+
+    vhost_realloc_vhost_memory(dev, dev->n_mem_sections);
+    for (i = 0; i < dev->n_mem_sections; i++) {
+        struct MemoryRegionSection *mrs = dev->mem_sections + i;
+        struct vhost_memory_region *cur_vmr;
+
+        if (dev->vhost_ops->vhost_backend_mem_section_filter &&
+            !dev->vhost_ops->vhost_backend_mem_section_filter(dev, mrs)) {
+            continue;
+        }
+        cur_vmr = dev->mem->regions + nregions;
+        nregions++;
+
+        cur_vmr->guest_phys_addr = mrs->offset_within_address_space;
+        cur_vmr->memory_size     = int128_get64(mrs->size);
+        cur_vmr->userspace_addr  =
+            (uintptr_t)memory_region_get_ram_ptr(mrs->mr) +
+            mrs->offset_within_region;
+        cur_vmr->flags_padding   = 0;
+    }
+    vhost_realloc_vhost_memory(dev, nregions);
+}
+
 static void vhost_commit(MemoryListener *listener)
 {
     struct vhost_dev *dev = container_of(listener, struct vhost_dev,
@@ -532,7 +568,6 @@ static void vhost_commit(MemoryListener *listener)
     MemoryRegionSection *old_sections;
     int n_old_sections;
     uint64_t log_size;
-    size_t regions_size;
     int r;
     int i;
     bool changed = false;
@@ -564,23 +599,19 @@ static void vhost_commit(MemoryListener *listener)
         goto out;
     }
 
-    /* Rebuild the regions list from the new sections list */
-    regions_size = offsetof(struct vhost_memory, regions) +
-                       dev->n_mem_sections * sizeof dev->mem->regions[0];
-    dev->mem = g_realloc(dev->mem, regions_size);
-    dev->mem->nregions = dev->n_mem_sections;
+    /*
+     * Globally track the used memslots *without* device specific
+     * filtering. This way, we always know how many memslots are required
+     * when devices with differing filtering requirements get mixed, and
+     * all RAM memory regions of memory devices will consume memslots.
+     */
     used_memslots = dev->mem->nregions;
-    for (i = 0; i < dev->n_mem_sections; i++) {
-        struct vhost_memory_region *cur_vmr = dev->mem->regions + i;
-        struct MemoryRegionSection *mrs = dev->mem_sections + i;
 
-        cur_vmr->guest_phys_addr = mrs->offset_within_address_space;
-        cur_vmr->memory_size     = int128_get64(mrs->size);
-        cur_vmr->userspace_addr  =
-            (uintptr_t)memory_region_get_ram_ptr(mrs->mr) +
-            mrs->offset_within_region;
-        cur_vmr->flags_padding   = 0;
-    }
+    /*
+     * Rebuild the regions list from the new sections list, filtering out all
+     * sections that this device is not interested in.
+     */
+    vhost_rebuild_vhost_memory(dev);
 
     if (!dev->started) {
         goto out;
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v1 2/2] vhost: Remove vhost_backend_can_merge() callback
  2023-02-16 11:47 [PATCH v1 0/2] vhost: memslot handling improvements David Hildenbrand
  2023-02-16 11:47 ` [PATCH v1 1/2] vhost: Defer filtering memory sections until building the vhost memory structure David Hildenbrand
@ 2023-02-16 11:47 ` David Hildenbrand
  2023-03-07 10:25   ` Igor Mammedov
  2023-02-16 16:04 ` [PATCH v1 0/2] vhost: memslot handling improvements Stefan Hajnoczi
  2023-02-17 14:20 ` Michael S. Tsirkin
  3 siblings, 1 reply; 22+ messages in thread
From: David Hildenbrand @ 2023-02-16 11:47 UTC (permalink / raw
  To: qemu-devel
  Cc: David Hildenbrand, Michael S. Tsirkin, Stefan Hajnoczi,
	Dr . David Alan Gilbert

Checking whether the memory regions are equal is sufficient: if they are
equal, then most certainly the contained fd is equal.

The whole vhost-user memslot handling is suboptimal and overly
complicated. We shouldn't have to lookup a RAM memory regions we got
notified about in vhost_user_get_mr_data() using a host pointer. But that
requires a bigger rework -- especially an alternative vhost_set_mem_table()
backend call that simply consumes MemoryRegionSections.

For now, let's just drop vhost_backend_can_merge().

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/virtio/vhost-user.c            | 14 --------------
 hw/virtio/vhost-vdpa.c            |  1 -
 hw/virtio/vhost.c                 |  6 +-----
 include/hw/virtio/vhost-backend.h |  4 ----
 4 files changed, 1 insertion(+), 24 deletions(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index e68daa35d4..4bfaf559a7 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -2195,19 +2195,6 @@ static int vhost_user_migration_done(struct vhost_dev *dev, char* mac_addr)
     return -ENOTSUP;
 }
 
-static bool vhost_user_can_merge(struct vhost_dev *dev,
-                                 uint64_t start1, uint64_t size1,
-                                 uint64_t start2, uint64_t size2)
-{
-    ram_addr_t offset;
-    int mfd, rfd;
-
-    (void)vhost_user_get_mr_data(start1, &offset, &mfd);
-    (void)vhost_user_get_mr_data(start2, &offset, &rfd);
-
-    return mfd == rfd;
-}
-
 static int vhost_user_net_set_mtu(struct vhost_dev *dev, uint16_t mtu)
 {
     VhostUserMsg msg;
@@ -2704,7 +2691,6 @@ const VhostOps user_ops = {
         .vhost_set_vring_enable = vhost_user_set_vring_enable,
         .vhost_requires_shm_log = vhost_user_requires_shm_log,
         .vhost_migration_done = vhost_user_migration_done,
-        .vhost_backend_can_merge = vhost_user_can_merge,
         .vhost_net_set_mtu = vhost_user_net_set_mtu,
         .vhost_set_iotlb_callback = vhost_user_set_iotlb_callback,
         .vhost_send_device_iotlb_msg = vhost_user_send_device_iotlb_msg,
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 542e003101..9ab7bc8718 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -1317,7 +1317,6 @@ const VhostOps vdpa_ops = {
         .vhost_set_config = vhost_vdpa_set_config,
         .vhost_requires_shm_log = NULL,
         .vhost_migration_done = NULL,
-        .vhost_backend_can_merge = NULL,
         .vhost_net_set_mtu = NULL,
         .vhost_set_iotlb_callback = NULL,
         .vhost_send_device_iotlb_msg = NULL,
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index b7fb960fa9..9d8662aa98 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -733,11 +733,7 @@ static void vhost_region_add_section(struct vhost_dev *dev,
             size_t offset = mrs_gpa - prev_gpa_start;
 
             if (prev_host_start + offset == mrs_host &&
-                section->mr == prev_sec->mr &&
-                (!dev->vhost_ops->vhost_backend_can_merge ||
-                 dev->vhost_ops->vhost_backend_can_merge(dev,
-                    mrs_host, mrs_size,
-                    prev_host_start, prev_size))) {
+                section->mr == prev_sec->mr) {
                 uint64_t max_end = MAX(prev_host_end, mrs_host + mrs_size);
                 need_add = false;
                 prev_sec->offset_within_address_space =
diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
index c5ab49051e..abf1601ba2 100644
--- a/include/hw/virtio/vhost-backend.h
+++ b/include/hw/virtio/vhost-backend.h
@@ -86,9 +86,6 @@ typedef int (*vhost_set_vring_enable_op)(struct vhost_dev *dev,
 typedef bool (*vhost_requires_shm_log_op)(struct vhost_dev *dev);
 typedef int (*vhost_migration_done_op)(struct vhost_dev *dev,
                                        char *mac_addr);
-typedef bool (*vhost_backend_can_merge_op)(struct vhost_dev *dev,
-                                           uint64_t start1, uint64_t size1,
-                                           uint64_t start2, uint64_t size2);
 typedef int (*vhost_vsock_set_guest_cid_op)(struct vhost_dev *dev,
                                             uint64_t guest_cid);
 typedef int (*vhost_vsock_set_running_op)(struct vhost_dev *dev, int start);
@@ -160,7 +157,6 @@ typedef struct VhostOps {
     vhost_set_vring_enable_op vhost_set_vring_enable;
     vhost_requires_shm_log_op vhost_requires_shm_log;
     vhost_migration_done_op vhost_migration_done;
-    vhost_backend_can_merge_op vhost_backend_can_merge;
     vhost_vsock_set_guest_cid_op vhost_vsock_set_guest_cid;
     vhost_vsock_set_running_op vhost_vsock_set_running;
     vhost_set_iotlb_callback_op vhost_set_iotlb_callback;
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 1/2] vhost: Defer filtering memory sections until building the vhost memory structure
  2023-02-16 11:47 ` [PATCH v1 1/2] vhost: Defer filtering memory sections until building the vhost memory structure David Hildenbrand
@ 2023-02-16 12:04   ` Michael S. Tsirkin
  2023-02-16 12:10     ` David Hildenbrand
  2023-03-07 10:51   ` Igor Mammedov
  1 sibling, 1 reply; 22+ messages in thread
From: Michael S. Tsirkin @ 2023-02-16 12:04 UTC (permalink / raw
  To: David Hildenbrand
  Cc: qemu-devel, Stefan Hajnoczi, Dr . David Alan Gilbert, Tiwei Bie

On Thu, Feb 16, 2023 at 12:47:51PM +0100, David Hildenbrand wrote:
> Having multiple devices, some filtering memslots and some not filtering
> memslots, messes up the "used_memslot" accounting. If we'd have a device
> the filters out less memory sections after a device that filters out more,
> we'd be in trouble, because our memslot checks stop working reliably.
> For example, hotplugging a device that filters out less memslots might end
> up passing the checks based on max vs. used memslots, but can run out of
> memslots when getting notified about all memory sections.
> 
> Further, it will be helpful in memory device context in the near future
> to know that a RAM memory region section will consume a memslot, and be
> accounted for in the used vs. free memslots, such that we can implement
> reservation of memslots for memory devices properly. Whether a device
> filters this out and would theoretically still have a free memslot is
> then hidden internally, making overall vhost memslot accounting easier.
> 
> Let's filter the memslots when creating the vhost memory array,
> accounting all RAM && !ROM memory regions as "used_memslots" even if
> vhost_user isn't interested in anonymous RAM regions, because it needs
> an fd.
> 
> When a device actually filters out regions (which should happen rarely
> in practice), we might detect a layout change although only filtered
> regions changed. We won't bother about optimizing that for now.

That caused trouble in the past when using VGA because it is playing
with mappings in weird ways.
I think we have to optimize it, sorry.


> Note: we cannot simply filter out the region and count them as
> "filtered" to add them to used, because filtered regions could get
> merged and result in a smaller effective number of memslots. Further,
> we won't touch the hmp/qmp virtio introspection output.
> 
> Fixes: 988a27754bbb ("vhost: allow backends to filter memory sections")
> Cc: Tiwei Bie <tiwei.bie@intel.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

I didn't review this yet but maybe you can answer:
will this create more slots for the backend?
Because some backends are limited in # of slots and breaking them is
not a good idea.

Thanks!

> ---
>  hw/virtio/vhost.c | 79 +++++++++++++++++++++++++++++++++--------------
>  1 file changed, 55 insertions(+), 24 deletions(-)
> 
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index eb8c4c378c..b7fb960fa9 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -219,8 +219,13 @@ static void vhost_log_sync_range(struct vhost_dev *dev,
>      int i;
>      /* FIXME: this is N^2 in number of sections */
>      for (i = 0; i < dev->n_mem_sections; ++i) {
> -        MemoryRegionSection *section = &dev->mem_sections[i];
> -        vhost_sync_dirty_bitmap(dev, section, first, last);
> +        MemoryRegionSection *mrs = &dev->mem_sections[i];
> +
> +        if (dev->vhost_ops->vhost_backend_mem_section_filter &&
> +            !dev->vhost_ops->vhost_backend_mem_section_filter(dev, mrs)) {
> +            continue;
> +        }
> +        vhost_sync_dirty_bitmap(dev, mrs, first, last);
>      }
>  }
>  
> @@ -503,12 +508,6 @@ static bool vhost_section(struct vhost_dev *dev, MemoryRegionSection *section)
>              return false;
>          }
>  
> -        if (dev->vhost_ops->vhost_backend_mem_section_filter &&
> -            !dev->vhost_ops->vhost_backend_mem_section_filter(dev, section)) {
> -            trace_vhost_reject_section(mr->name, 2);
> -            return false;
> -        }
> -
>          trace_vhost_section(mr->name);
>          return true;
>      } else {
> @@ -525,6 +524,43 @@ static void vhost_begin(MemoryListener *listener)
>      dev->n_tmp_sections = 0;
>  }
>  
> +static void vhost_realloc_vhost_memory(struct vhost_dev *dev,
> +                                       unsigned int nregions)
> +{
> +    const size_t size = offsetof(struct vhost_memory, regions) +
> +                        nregions * sizeof dev->mem->regions[0];
> +
> +    dev->mem = g_realloc(dev->mem, size);
> +    dev->mem->nregions = nregions;
> +}
> +
> +static void vhost_rebuild_vhost_memory(struct vhost_dev *dev)
> +{
> +    unsigned int nregions = 0;
> +    int i;
> +
> +    vhost_realloc_vhost_memory(dev, dev->n_mem_sections);
> +    for (i = 0; i < dev->n_mem_sections; i++) {
> +        struct MemoryRegionSection *mrs = dev->mem_sections + i;
> +        struct vhost_memory_region *cur_vmr;
> +
> +        if (dev->vhost_ops->vhost_backend_mem_section_filter &&
> +            !dev->vhost_ops->vhost_backend_mem_section_filter(dev, mrs)) {
> +            continue;
> +        }
> +        cur_vmr = dev->mem->regions + nregions;
> +        nregions++;
> +
> +        cur_vmr->guest_phys_addr = mrs->offset_within_address_space;
> +        cur_vmr->memory_size     = int128_get64(mrs->size);
> +        cur_vmr->userspace_addr  =
> +            (uintptr_t)memory_region_get_ram_ptr(mrs->mr) +
> +            mrs->offset_within_region;
> +        cur_vmr->flags_padding   = 0;
> +    }
> +    vhost_realloc_vhost_memory(dev, nregions);
> +}
> +
>  static void vhost_commit(MemoryListener *listener)
>  {
>      struct vhost_dev *dev = container_of(listener, struct vhost_dev,
> @@ -532,7 +568,6 @@ static void vhost_commit(MemoryListener *listener)
>      MemoryRegionSection *old_sections;
>      int n_old_sections;
>      uint64_t log_size;
> -    size_t regions_size;
>      int r;
>      int i;
>      bool changed = false;
> @@ -564,23 +599,19 @@ static void vhost_commit(MemoryListener *listener)
>          goto out;
>      }
>  
> -    /* Rebuild the regions list from the new sections list */
> -    regions_size = offsetof(struct vhost_memory, regions) +
> -                       dev->n_mem_sections * sizeof dev->mem->regions[0];
> -    dev->mem = g_realloc(dev->mem, regions_size);
> -    dev->mem->nregions = dev->n_mem_sections;
> +    /*
> +     * Globally track the used memslots *without* device specific
> +     * filtering. This way, we always know how many memslots are required
> +     * when devices with differing filtering requirements get mixed, and
> +     * all RAM memory regions of memory devices will consume memslots.
> +     */
>      used_memslots = dev->mem->nregions;
> -    for (i = 0; i < dev->n_mem_sections; i++) {
> -        struct vhost_memory_region *cur_vmr = dev->mem->regions + i;
> -        struct MemoryRegionSection *mrs = dev->mem_sections + i;
>  
> -        cur_vmr->guest_phys_addr = mrs->offset_within_address_space;
> -        cur_vmr->memory_size     = int128_get64(mrs->size);
> -        cur_vmr->userspace_addr  =
> -            (uintptr_t)memory_region_get_ram_ptr(mrs->mr) +
> -            mrs->offset_within_region;
> -        cur_vmr->flags_padding   = 0;
> -    }
> +    /*
> +     * Rebuild the regions list from the new sections list, filtering out all
> +     * sections that this device is not interested in.
> +     */
> +    vhost_rebuild_vhost_memory(dev);
>  
>      if (!dev->started) {
>          goto out;
> -- 
> 2.39.1



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 1/2] vhost: Defer filtering memory sections until building the vhost memory structure
  2023-02-16 12:04   ` Michael S. Tsirkin
@ 2023-02-16 12:10     ` David Hildenbrand
  2023-02-16 12:13       ` David Hildenbrand
  2023-02-16 12:21       ` Michael S. Tsirkin
  0 siblings, 2 replies; 22+ messages in thread
From: David Hildenbrand @ 2023-02-16 12:10 UTC (permalink / raw
  To: Michael S. Tsirkin
  Cc: qemu-devel, Stefan Hajnoczi, Dr . David Alan Gilbert, Tiwei Bie

On 16.02.23 13:04, Michael S. Tsirkin wrote:
> On Thu, Feb 16, 2023 at 12:47:51PM +0100, David Hildenbrand wrote:
>> Having multiple devices, some filtering memslots and some not filtering
>> memslots, messes up the "used_memslot" accounting. If we'd have a device
>> the filters out less memory sections after a device that filters out more,
>> we'd be in trouble, because our memslot checks stop working reliably.
>> For example, hotplugging a device that filters out less memslots might end
>> up passing the checks based on max vs. used memslots, but can run out of
>> memslots when getting notified about all memory sections.
>>
>> Further, it will be helpful in memory device context in the near future
>> to know that a RAM memory region section will consume a memslot, and be
>> accounted for in the used vs. free memslots, such that we can implement
>> reservation of memslots for memory devices properly. Whether a device
>> filters this out and would theoretically still have a free memslot is
>> then hidden internally, making overall vhost memslot accounting easier.
>>
>> Let's filter the memslots when creating the vhost memory array,
>> accounting all RAM && !ROM memory regions as "used_memslots" even if
>> vhost_user isn't interested in anonymous RAM regions, because it needs
>> an fd.
>>
>> When a device actually filters out regions (which should happen rarely
>> in practice), we might detect a layout change although only filtered
>> regions changed. We won't bother about optimizing that for now.
> 
> That caused trouble in the past when using VGA because it is playing
> with mappings in weird ways.
> I think we have to optimize it, sorry.

We still filter them out, just later.

>> Note: we cannot simply filter out the region and count them as
>> "filtered" to add them to used, because filtered regions could get
>> merged and result in a smaller effective number of memslots. Further,
>> we won't touch the hmp/qmp virtio introspection output.
>>
>> Fixes: 988a27754bbb ("vhost: allow backends to filter memory sections")
>> Cc: Tiwei Bie <tiwei.bie@intel.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
> 
> I didn't review this yet but maybe you can answer:
> will this create more slots for the backend?
> Because some backends are limited in # of slots and breaking them is
> not a good idea.

It restores the handling we had before 988a27754bbb. RAM without an fd 
should be rare for vhost-user setups (where we actually filter) I assume?

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 1/2] vhost: Defer filtering memory sections until building the vhost memory structure
  2023-02-16 12:10     ` David Hildenbrand
@ 2023-02-16 12:13       ` David Hildenbrand
  2023-02-16 12:22         ` Michael S. Tsirkin
  2023-02-16 12:21       ` Michael S. Tsirkin
  1 sibling, 1 reply; 22+ messages in thread
From: David Hildenbrand @ 2023-02-16 12:13 UTC (permalink / raw
  To: Michael S. Tsirkin
  Cc: qemu-devel, Stefan Hajnoczi, Dr . David Alan Gilbert, Tiwei Bie

On 16.02.23 13:10, David Hildenbrand wrote:
> On 16.02.23 13:04, Michael S. Tsirkin wrote:
>> On Thu, Feb 16, 2023 at 12:47:51PM +0100, David Hildenbrand wrote:
>>> Having multiple devices, some filtering memslots and some not filtering
>>> memslots, messes up the "used_memslot" accounting. If we'd have a device
>>> the filters out less memory sections after a device that filters out more,
>>> we'd be in trouble, because our memslot checks stop working reliably.
>>> For example, hotplugging a device that filters out less memslots might end
>>> up passing the checks based on max vs. used memslots, but can run out of
>>> memslots when getting notified about all memory sections.
>>>
>>> Further, it will be helpful in memory device context in the near future
>>> to know that a RAM memory region section will consume a memslot, and be
>>> accounted for in the used vs. free memslots, such that we can implement
>>> reservation of memslots for memory devices properly. Whether a device
>>> filters this out and would theoretically still have a free memslot is
>>> then hidden internally, making overall vhost memslot accounting easier.
>>>
>>> Let's filter the memslots when creating the vhost memory array,
>>> accounting all RAM && !ROM memory regions as "used_memslots" even if
>>> vhost_user isn't interested in anonymous RAM regions, because it needs
>>> an fd.
>>>
>>> When a device actually filters out regions (which should happen rarely
>>> in practice), we might detect a layout change although only filtered
>>> regions changed. We won't bother about optimizing that for now.
>>
>> That caused trouble in the past when using VGA because it is playing
>> with mappings in weird ways.
>> I think we have to optimize it, sorry.
> 
> We still filter them out, just later.

To be precise, we still filter out all DIRTY_MEMORY_VGA as we used to. 
Only the device-specific filtering (vhost-user) is modified.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 1/2] vhost: Defer filtering memory sections until building the vhost memory structure
  2023-02-16 12:10     ` David Hildenbrand
  2023-02-16 12:13       ` David Hildenbrand
@ 2023-02-16 12:21       ` Michael S. Tsirkin
  2023-02-16 12:39         ` David Hildenbrand
  1 sibling, 1 reply; 22+ messages in thread
From: Michael S. Tsirkin @ 2023-02-16 12:21 UTC (permalink / raw
  To: David Hildenbrand
  Cc: qemu-devel, Stefan Hajnoczi, Dr . David Alan Gilbert, Tiwei Bie

On Thu, Feb 16, 2023 at 01:10:54PM +0100, David Hildenbrand wrote:
> On 16.02.23 13:04, Michael S. Tsirkin wrote:
> > On Thu, Feb 16, 2023 at 12:47:51PM +0100, David Hildenbrand wrote:
> > > Having multiple devices, some filtering memslots and some not filtering
> > > memslots, messes up the "used_memslot" accounting. If we'd have a device
> > > the filters out less memory sections after a device that filters out more,
> > > we'd be in trouble, because our memslot checks stop working reliably.
> > > For example, hotplugging a device that filters out less memslots might end
> > > up passing the checks based on max vs. used memslots, but can run out of
> > > memslots when getting notified about all memory sections.
> > > 
> > > Further, it will be helpful in memory device context in the near future
> > > to know that a RAM memory region section will consume a memslot, and be
> > > accounted for in the used vs. free memslots, such that we can implement
> > > reservation of memslots for memory devices properly. Whether a device
> > > filters this out and would theoretically still have a free memslot is
> > > then hidden internally, making overall vhost memslot accounting easier.
> > > 
> > > Let's filter the memslots when creating the vhost memory array,
> > > accounting all RAM && !ROM memory regions as "used_memslots" even if
> > > vhost_user isn't interested in anonymous RAM regions, because it needs
> > > an fd.
> > > 
> > > When a device actually filters out regions (which should happen rarely
> > > in practice), we might detect a layout change although only filtered
> > > regions changed. We won't bother about optimizing that for now.
> > 
> > That caused trouble in the past when using VGA because it is playing
> > with mappings in weird ways.
> > I think we have to optimize it, sorry.
> 
> We still filter them out, just later.


The issue is sending lots of unnecessary system calls to update the kernel which
goes through a slow RCU.

> > > Note: we cannot simply filter out the region and count them as
> > > "filtered" to add them to used, because filtered regions could get
> > > merged and result in a smaller effective number of memslots. Further,
> > > we won't touch the hmp/qmp virtio introspection output.
> > > 
> > > Fixes: 988a27754bbb ("vhost: allow backends to filter memory sections")
> > > Cc: Tiwei Bie <tiwei.bie@intel.com>
> > > Signed-off-by: David Hildenbrand <david@redhat.com>
> > 
> > I didn't review this yet but maybe you can answer:
> > will this create more slots for the backend?
> > Because some backends are limited in # of slots and breaking them is
> > not a good idea.
> 
> It restores the handling we had before 988a27754bbb. RAM without an fd
> should be rare for vhost-user setups (where we actually filter) I assume?

Hmm, I guess so.

> -- 
> Thanks,
> 
> David / dhildenb



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 1/2] vhost: Defer filtering memory sections until building the vhost memory structure
  2023-02-16 12:13       ` David Hildenbrand
@ 2023-02-16 12:22         ` Michael S. Tsirkin
  0 siblings, 0 replies; 22+ messages in thread
From: Michael S. Tsirkin @ 2023-02-16 12:22 UTC (permalink / raw
  To: David Hildenbrand
  Cc: qemu-devel, Stefan Hajnoczi, Dr . David Alan Gilbert, Tiwei Bie

On Thu, Feb 16, 2023 at 01:13:07PM +0100, David Hildenbrand wrote:
> On 16.02.23 13:10, David Hildenbrand wrote:
> > On 16.02.23 13:04, Michael S. Tsirkin wrote:
> > > On Thu, Feb 16, 2023 at 12:47:51PM +0100, David Hildenbrand wrote:
> > > > Having multiple devices, some filtering memslots and some not filtering
> > > > memslots, messes up the "used_memslot" accounting. If we'd have a device
> > > > the filters out less memory sections after a device that filters out more,
> > > > we'd be in trouble, because our memslot checks stop working reliably.
> > > > For example, hotplugging a device that filters out less memslots might end
> > > > up passing the checks based on max vs. used memslots, but can run out of
> > > > memslots when getting notified about all memory sections.
> > > > 
> > > > Further, it will be helpful in memory device context in the near future
> > > > to know that a RAM memory region section will consume a memslot, and be
> > > > accounted for in the used vs. free memslots, such that we can implement
> > > > reservation of memslots for memory devices properly. Whether a device
> > > > filters this out and would theoretically still have a free memslot is
> > > > then hidden internally, making overall vhost memslot accounting easier.
> > > > 
> > > > Let's filter the memslots when creating the vhost memory array,
> > > > accounting all RAM && !ROM memory regions as "used_memslots" even if
> > > > vhost_user isn't interested in anonymous RAM regions, because it needs
> > > > an fd.
> > > > 
> > > > When a device actually filters out regions (which should happen rarely
> > > > in practice), we might detect a layout change although only filtered
> > > > regions changed. We won't bother about optimizing that for now.
> > > 
> > > That caused trouble in the past when using VGA because it is playing
> > > with mappings in weird ways.
> > > I think we have to optimize it, sorry.
> > 
> > We still filter them out, just later.
> 
> To be precise, we still filter out all DIRTY_MEMORY_VGA as we used to. Only
> the device-specific filtering (vhost-user) is modified.


Oh good so the VGA use-case is unaffected.

> -- 
> Thanks,
> 
> David / dhildenb



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 1/2] vhost: Defer filtering memory sections until building the vhost memory structure
  2023-02-16 12:21       ` Michael S. Tsirkin
@ 2023-02-16 12:39         ` David Hildenbrand
  0 siblings, 0 replies; 22+ messages in thread
From: David Hildenbrand @ 2023-02-16 12:39 UTC (permalink / raw
  To: Michael S. Tsirkin
  Cc: qemu-devel, Stefan Hajnoczi, Dr . David Alan Gilbert, Tiwei Bie

On 16.02.23 13:21, Michael S. Tsirkin wrote:
> On Thu, Feb 16, 2023 at 01:10:54PM +0100, David Hildenbrand wrote:
>> On 16.02.23 13:04, Michael S. Tsirkin wrote:
>>> On Thu, Feb 16, 2023 at 12:47:51PM +0100, David Hildenbrand wrote:
>>>> Having multiple devices, some filtering memslots and some not filtering
>>>> memslots, messes up the "used_memslot" accounting. If we'd have a device
>>>> the filters out less memory sections after a device that filters out more,
>>>> we'd be in trouble, because our memslot checks stop working reliably.
>>>> For example, hotplugging a device that filters out less memslots might end
>>>> up passing the checks based on max vs. used memslots, but can run out of
>>>> memslots when getting notified about all memory sections.
>>>>
>>>> Further, it will be helpful in memory device context in the near future
>>>> to know that a RAM memory region section will consume a memslot, and be
>>>> accounted for in the used vs. free memslots, such that we can implement
>>>> reservation of memslots for memory devices properly. Whether a device
>>>> filters this out and would theoretically still have a free memslot is
>>>> then hidden internally, making overall vhost memslot accounting easier.
>>>>
>>>> Let's filter the memslots when creating the vhost memory array,
>>>> accounting all RAM && !ROM memory regions as "used_memslots" even if
>>>> vhost_user isn't interested in anonymous RAM regions, because it needs
>>>> an fd.
>>>>
>>>> When a device actually filters out regions (which should happen rarely
>>>> in practice), we might detect a layout change although only filtered
>>>> regions changed. We won't bother about optimizing that for now.
>>>
>>> That caused trouble in the past when using VGA because it is playing
>>> with mappings in weird ways.
>>> I think we have to optimize it, sorry.
>>
>> We still filter them out, just later.
> 
> 
> The issue is sending lots of unnecessary system calls to update the kernel which
> goes through a slow RCU.

I don't think this is the case when deferring the device-specific 
filtering. As discussed, the generic filtering (ignore !ram, ignore rom, 
ignore VMA) remains in place because that is identical for all devices.

> 
>>>> Note: we cannot simply filter out the region and count them as
>>>> "filtered" to add them to used, because filtered regions could get
>>>> merged and result in a smaller effective number of memslots. Further,
>>>> we won't touch the hmp/qmp virtio introspection output.
>>>>
>>>> Fixes: 988a27754bbb ("vhost: allow backends to filter memory sections")
>>>> Cc: Tiwei Bie <tiwei.bie@intel.com>
>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>
>>> I didn't review this yet but maybe you can answer:
>>> will this create more slots for the backend?
>>> Because some backends are limited in # of slots and breaking them is
>>> not a good idea.
>>
>> It restores the handling we had before 988a27754bbb. RAM without an fd
>> should be rare for vhost-user setups (where we actually filter) I assume?
> 
> Hmm, I guess so.

At least on simplistic QEMU invocations with vhost-user (and proper 
shared memory as backend) I don't see any such filtering happening, 
because everything that is RAM is proper fd-based.

IMHO the chance of braking a sane VM setup are very small.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 0/2] vhost: memslot handling improvements
  2023-02-16 11:47 [PATCH v1 0/2] vhost: memslot handling improvements David Hildenbrand
  2023-02-16 11:47 ` [PATCH v1 1/2] vhost: Defer filtering memory sections until building the vhost memory structure David Hildenbrand
  2023-02-16 11:47 ` [PATCH v1 2/2] vhost: Remove vhost_backend_can_merge() callback David Hildenbrand
@ 2023-02-16 16:04 ` Stefan Hajnoczi
  2023-02-17 13:48   ` David Hildenbrand
  2023-02-17 14:20 ` Michael S. Tsirkin
  3 siblings, 1 reply; 22+ messages in thread
From: Stefan Hajnoczi @ 2023-02-16 16:04 UTC (permalink / raw
  To: David Hildenbrand; +Cc: qemu-devel, Michael S. Tsirkin, Dr . David Alan Gilbert

[-- Attachment #1: Type: text/plain, Size: 1452 bytes --]

On Thu, Feb 16, 2023 at 12:47:50PM +0100, David Hildenbrand wrote:
> Following up on my previous work to make virtio-mem consume multiple
> memslots dynamically [1] that requires precise accounting between used vs.
> reserved memslots, I realized that vhost makes this extra hard by
> filtering out some memory region sections (so they don't consume a
> memslot) in the vhost-user case, which messes up the whole memslot
> accounting.
> 
> This series fixes what I found to be broken and prepares for more work on
> [1]. Further, it cleanes up the merge checks that I consider unnecessary.
> 
> [1] https://lkml.kernel.org/r/20211027124531.57561-8-david@redhat.com
> 
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Stefan Hajnoczi <stefanha@redhat.com>
> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
> 
> David Hildenbrand (2):
>   vhost: Defer filtering memory sections until building the vhost memory
>     structure
>   vhost: Remove vhost_backend_can_merge() callback
> 
>  hw/virtio/vhost-user.c            | 14 -----
>  hw/virtio/vhost-vdpa.c            |  1 -
>  hw/virtio/vhost.c                 | 85 ++++++++++++++++++++-----------
>  include/hw/virtio/vhost-backend.h |  4 --
>  4 files changed, 56 insertions(+), 48 deletions(-)
> 
> -- 
> 2.39.1
> 

I'm not familiar enough with the memslot code to review this properly,
but overall it looks okay:

Acked-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 0/2] vhost: memslot handling improvements
  2023-02-16 16:04 ` [PATCH v1 0/2] vhost: memslot handling improvements Stefan Hajnoczi
@ 2023-02-17 13:48   ` David Hildenbrand
  0 siblings, 0 replies; 22+ messages in thread
From: David Hildenbrand @ 2023-02-17 13:48 UTC (permalink / raw
  To: Stefan Hajnoczi; +Cc: qemu-devel, Michael S. Tsirkin, Dr . David Alan Gilbert

On 16.02.23 17:04, Stefan Hajnoczi wrote:
> Acked-by: Stefan Hajnoczi<stefanha@redhat.com>

Thanks!

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 0/2] vhost: memslot handling improvements
  2023-02-16 11:47 [PATCH v1 0/2] vhost: memslot handling improvements David Hildenbrand
                   ` (2 preceding siblings ...)
  2023-02-16 16:04 ` [PATCH v1 0/2] vhost: memslot handling improvements Stefan Hajnoczi
@ 2023-02-17 14:20 ` Michael S. Tsirkin
  2023-02-17 14:27   ` David Hildenbrand
  2023-03-07 11:14   ` Igor Mammedov
  3 siblings, 2 replies; 22+ messages in thread
From: Michael S. Tsirkin @ 2023-02-17 14:20 UTC (permalink / raw
  To: David Hildenbrand; +Cc: qemu-devel, Stefan Hajnoczi, Dr . David Alan Gilbert

On Thu, Feb 16, 2023 at 12:47:50PM +0100, David Hildenbrand wrote:
> Following up on my previous work to make virtio-mem consume multiple
> memslots dynamically [1] that requires precise accounting between used vs.
> reserved memslots, I realized that vhost makes this extra hard by
> filtering out some memory region sections (so they don't consume a
> memslot) in the vhost-user case, which messes up the whole memslot
> accounting.
> 
> This series fixes what I found to be broken and prepares for more work on
> [1]. Further, it cleanes up the merge checks that I consider unnecessary.
> 
> [1] https://lkml.kernel.org/r/20211027124531.57561-8-david@redhat.com
> 
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Stefan Hajnoczi <stefanha@redhat.com>
> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>


Igor worked on memslots a lot previously and he asked for
a bit of time to review this, so I'll wait a bit before
applying.

> David Hildenbrand (2):
>   vhost: Defer filtering memory sections until building the vhost memory
>     structure
>   vhost: Remove vhost_backend_can_merge() callback
> 
>  hw/virtio/vhost-user.c            | 14 -----
>  hw/virtio/vhost-vdpa.c            |  1 -
>  hw/virtio/vhost.c                 | 85 ++++++++++++++++++++-----------
>  include/hw/virtio/vhost-backend.h |  4 --
>  4 files changed, 56 insertions(+), 48 deletions(-)
> 
> -- 
> 2.39.1



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 0/2] vhost: memslot handling improvements
  2023-02-17 14:20 ` Michael S. Tsirkin
@ 2023-02-17 14:27   ` David Hildenbrand
  2023-03-07 11:14   ` Igor Mammedov
  1 sibling, 0 replies; 22+ messages in thread
From: David Hildenbrand @ 2023-02-17 14:27 UTC (permalink / raw
  To: Michael S. Tsirkin; +Cc: qemu-devel, Stefan Hajnoczi, Dr . David Alan Gilbert

On 17.02.23 15:20, Michael S. Tsirkin wrote:
> On Thu, Feb 16, 2023 at 12:47:50PM +0100, David Hildenbrand wrote:
>> Following up on my previous work to make virtio-mem consume multiple
>> memslots dynamically [1] that requires precise accounting between used vs.
>> reserved memslots, I realized that vhost makes this extra hard by
>> filtering out some memory region sections (so they don't consume a
>> memslot) in the vhost-user case, which messes up the whole memslot
>> accounting.
>>
>> This series fixes what I found to be broken and prepares for more work on
>> [1]. Further, it cleanes up the merge checks that I consider unnecessary.
>>
>> [1] https://lkml.kernel.org/r/20211027124531.57561-8-david@redhat.com
>>
>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
>> Cc: Stefan Hajnoczi <stefanha@redhat.com>
>> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
> 
> 
> Igor worked on memslots a lot previously and he asked for
> a bit of time to review this, so I'll wait a bit before
> applying.

Sure, no need to rush. I'm still working on the other bits of the 
virtio-mem approach.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 2/2] vhost: Remove vhost_backend_can_merge() callback
  2023-02-16 11:47 ` [PATCH v1 2/2] vhost: Remove vhost_backend_can_merge() callback David Hildenbrand
@ 2023-03-07 10:25   ` Igor Mammedov
  2023-03-07 11:16     ` Igor Mammedov
  0 siblings, 1 reply; 22+ messages in thread
From: Igor Mammedov @ 2023-03-07 10:25 UTC (permalink / raw
  To: David Hildenbrand
  Cc: qemu-devel, Michael S. Tsirkin, Stefan Hajnoczi,
	Dr . David Alan Gilbert

On Thu, 16 Feb 2023 12:47:52 +0100
David Hildenbrand <david@redhat.com> wrote:

> Checking whether the memory regions are equal is sufficient: if they are
> equal, then most certainly the contained fd is equal.
sounds reasonable to me.

> 
> The whole vhost-user memslot handling is suboptimal and overly
> complicated. We shouldn't have to lookup a RAM memory regions we got
> notified about in vhost_user_get_mr_data() using a host pointer. But that

While on janitor duty can you fixup following?

vhost_user_get_mr_data() -> memory_region_from_host -> 
 -> qemu_ram_block_from_host()
for qemu_ram_block_from_host doc comment seems to out of
sync (ram_addr not longer exists)

> requires a bigger rework -- especially an alternative vhost_set_mem_table()
> backend call that simply consumes MemoryRegionSections.

just skimming through  usage of vhost_user_get_mr_data() it looks like
we are first collecting MemoryRegionSection-s into tmp_sections
then we do vhost_commit we convert then into vhost_memory_region list
and the we are trying hard to convert addresses from the later
to back to MemoryRegions we've lost during tmp_sections conversion
all over the place.

To me it looks like we should drop conversion to vhost_dev::mem
and replace its usage with vhost_dev::mem_sections directly
to get rid of data duplication and back and forth addr<->mr conversion.

> For now, let's just drop vhost_backend_can_merge().
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Igor Mammedov <imammedo@redhat.com>

> ---
>  hw/virtio/vhost-user.c            | 14 --------------
>  hw/virtio/vhost-vdpa.c            |  1 -
>  hw/virtio/vhost.c                 |  6 +-----
>  include/hw/virtio/vhost-backend.h |  4 ----
>  4 files changed, 1 insertion(+), 24 deletions(-)
> 
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index e68daa35d4..4bfaf559a7 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -2195,19 +2195,6 @@ static int vhost_user_migration_done(struct vhost_dev *dev, char* mac_addr)
>      return -ENOTSUP;
>  }
>  
> -static bool vhost_user_can_merge(struct vhost_dev *dev,
> -                                 uint64_t start1, uint64_t size1,
> -                                 uint64_t start2, uint64_t size2)
> -{
> -    ram_addr_t offset;
> -    int mfd, rfd;
> -
> -    (void)vhost_user_get_mr_data(start1, &offset, &mfd);
> -    (void)vhost_user_get_mr_data(start2, &offset, &rfd);
> -
> -    return mfd == rfd;
> -}
> -
>  static int vhost_user_net_set_mtu(struct vhost_dev *dev, uint16_t mtu)
>  {
>      VhostUserMsg msg;
> @@ -2704,7 +2691,6 @@ const VhostOps user_ops = {
>          .vhost_set_vring_enable = vhost_user_set_vring_enable,
>          .vhost_requires_shm_log = vhost_user_requires_shm_log,
>          .vhost_migration_done = vhost_user_migration_done,
> -        .vhost_backend_can_merge = vhost_user_can_merge,
>          .vhost_net_set_mtu = vhost_user_net_set_mtu,
>          .vhost_set_iotlb_callback = vhost_user_set_iotlb_callback,
>          .vhost_send_device_iotlb_msg = vhost_user_send_device_iotlb_msg,
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 542e003101..9ab7bc8718 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -1317,7 +1317,6 @@ const VhostOps vdpa_ops = {
>          .vhost_set_config = vhost_vdpa_set_config,
>          .vhost_requires_shm_log = NULL,
>          .vhost_migration_done = NULL,
> -        .vhost_backend_can_merge = NULL,
>          .vhost_net_set_mtu = NULL,
>          .vhost_set_iotlb_callback = NULL,
>          .vhost_send_device_iotlb_msg = NULL,
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index b7fb960fa9..9d8662aa98 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -733,11 +733,7 @@ static void vhost_region_add_section(struct vhost_dev *dev,
>              size_t offset = mrs_gpa - prev_gpa_start;
>  
>              if (prev_host_start + offset == mrs_host &&
> -                section->mr == prev_sec->mr &&
> -                (!dev->vhost_ops->vhost_backend_can_merge ||
> -                 dev->vhost_ops->vhost_backend_can_merge(dev,
> -                    mrs_host, mrs_size,
> -                    prev_host_start, prev_size))) {
> +                section->mr == prev_sec->mr) {
>                  uint64_t max_end = MAX(prev_host_end, mrs_host + mrs_size);
>                  need_add = false;
>                  prev_sec->offset_within_address_space =
> diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
> index c5ab49051e..abf1601ba2 100644
> --- a/include/hw/virtio/vhost-backend.h
> +++ b/include/hw/virtio/vhost-backend.h
> @@ -86,9 +86,6 @@ typedef int (*vhost_set_vring_enable_op)(struct vhost_dev *dev,
>  typedef bool (*vhost_requires_shm_log_op)(struct vhost_dev *dev);
>  typedef int (*vhost_migration_done_op)(struct vhost_dev *dev,
>                                         char *mac_addr);
> -typedef bool (*vhost_backend_can_merge_op)(struct vhost_dev *dev,
> -                                           uint64_t start1, uint64_t size1,
> -                                           uint64_t start2, uint64_t size2);
>  typedef int (*vhost_vsock_set_guest_cid_op)(struct vhost_dev *dev,
>                                              uint64_t guest_cid);
>  typedef int (*vhost_vsock_set_running_op)(struct vhost_dev *dev, int start);
> @@ -160,7 +157,6 @@ typedef struct VhostOps {
>      vhost_set_vring_enable_op vhost_set_vring_enable;
>      vhost_requires_shm_log_op vhost_requires_shm_log;
>      vhost_migration_done_op vhost_migration_done;
> -    vhost_backend_can_merge_op vhost_backend_can_merge;
>      vhost_vsock_set_guest_cid_op vhost_vsock_set_guest_cid;
>      vhost_vsock_set_running_op vhost_vsock_set_running;
>      vhost_set_iotlb_callback_op vhost_set_iotlb_callback;



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 1/2] vhost: Defer filtering memory sections until building the vhost memory structure
  2023-02-16 11:47 ` [PATCH v1 1/2] vhost: Defer filtering memory sections until building the vhost memory structure David Hildenbrand
  2023-02-16 12:04   ` Michael S. Tsirkin
@ 2023-03-07 10:51   ` Igor Mammedov
  2023-03-07 12:46     ` David Hildenbrand
  1 sibling, 1 reply; 22+ messages in thread
From: Igor Mammedov @ 2023-03-07 10:51 UTC (permalink / raw
  To: David Hildenbrand
  Cc: qemu-devel, Michael S. Tsirkin, Stefan Hajnoczi,
	Dr . David Alan Gilbert, Tiwei Bie

On Thu, 16 Feb 2023 12:47:51 +0100
David Hildenbrand <david@redhat.com> wrote:

> Having multiple devices, some filtering memslots and some not filtering
> memslots, messes up the "used_memslot" accounting. If we'd have a device
> the filters out less memory sections after a device that filters out more,
> we'd be in trouble, because our memslot checks stop working reliably.
> For example, hotplugging a device that filters out less memslots might end
> up passing the checks based on max vs. used memslots, but can run out of
> memslots when getting notified about all memory sections.

an hypothetical example of such case would be appreciated
(I don't really get how above can happen, perhaps more detailed explanation
would help)
 
> Further, it will be helpful in memory device context in the near future
> to know that a RAM memory region section will consume a memslot, and be
> accounted for in the used vs. free memslots, such that we can implement
> reservation of memslots for memory devices properly. Whether a device
> filters this out and would theoretically still have a free memslot is
> then hidden internally, making overall vhost memslot accounting easier.
> 
> Let's filter the memslots when creating the vhost memory array,
> accounting all RAM && !ROM memory regions as "used_memslots" even if
> vhost_user isn't interested in anonymous RAM regions, because it needs
> an fd.
> 
> When a device actually filters out regions (which should happen rarely
> in practice), we might detect a layout change although only filtered
> regions changed. We won't bother about optimizing that for now.
> 
> Note: we cannot simply filter out the region and count them as
> "filtered" to add them to used, because filtered regions could get
> merged and result in a smaller effective number of memslots. Further,
> we won't touch the hmp/qmp virtio introspection output.
What output exactly you are talking about?

PS:
If we drop vhost_dev::memm the bulk of this patch would go away

side questions:
do we have MemorySection merging on qemu's kvm side?
also does KVM merge merge memslots?
 
> Fixes: 988a27754bbb ("vhost: allow backends to filter memory sections")
> Cc: Tiwei Bie <tiwei.bie@intel.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  hw/virtio/vhost.c | 79 +++++++++++++++++++++++++++++++++--------------
>  1 file changed, 55 insertions(+), 24 deletions(-)
> 
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index eb8c4c378c..b7fb960fa9 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -219,8 +219,13 @@ static void vhost_log_sync_range(struct vhost_dev *dev,
>      int i;
>      /* FIXME: this is N^2 in number of sections */
>      for (i = 0; i < dev->n_mem_sections; ++i) {
> -        MemoryRegionSection *section = &dev->mem_sections[i];
> -        vhost_sync_dirty_bitmap(dev, section, first, last);
> +        MemoryRegionSection *mrs = &dev->mem_sections[i];
> +
> +        if (dev->vhost_ops->vhost_backend_mem_section_filter &&
> +            !dev->vhost_ops->vhost_backend_mem_section_filter(dev, mrs)) {
> +            continue;
> +        }
> +        vhost_sync_dirty_bitmap(dev, mrs, first, last);
>      }
>  }
>  
> @@ -503,12 +508,6 @@ static bool vhost_section(struct vhost_dev *dev, MemoryRegionSection *section)
>              return false;
>          }
>  
> -        if (dev->vhost_ops->vhost_backend_mem_section_filter &&
> -            !dev->vhost_ops->vhost_backend_mem_section_filter(dev, section)) {
> -            trace_vhost_reject_section(mr->name, 2);
> -            return false;
> -        }
> -
>          trace_vhost_section(mr->name);
>          return true;
>      } else {
> @@ -525,6 +524,43 @@ static void vhost_begin(MemoryListener *listener)
>      dev->n_tmp_sections = 0;
>  }
>  
> +static void vhost_realloc_vhost_memory(struct vhost_dev *dev,
> +                                       unsigned int nregions)
> +{
> +    const size_t size = offsetof(struct vhost_memory, regions) +
> +                        nregions * sizeof dev->mem->regions[0];
> +
> +    dev->mem = g_realloc(dev->mem, size);
> +    dev->mem->nregions = nregions;
> +}
> +
> +static void vhost_rebuild_vhost_memory(struct vhost_dev *dev)
> +{
> +    unsigned int nregions = 0;
> +    int i;
> +
> +    vhost_realloc_vhost_memory(dev, dev->n_mem_sections);
> +    for (i = 0; i < dev->n_mem_sections; i++) {
> +        struct MemoryRegionSection *mrs = dev->mem_sections + i;
> +        struct vhost_memory_region *cur_vmr;
> +
> +        if (dev->vhost_ops->vhost_backend_mem_section_filter &&
> +            !dev->vhost_ops->vhost_backend_mem_section_filter(dev, mrs)) {
> +            continue;
> +        }
> +        cur_vmr = dev->mem->regions + nregions;
> +        nregions++;
> +
> +        cur_vmr->guest_phys_addr = mrs->offset_within_address_space;
> +        cur_vmr->memory_size     = int128_get64(mrs->size);
> +        cur_vmr->userspace_addr  =
> +            (uintptr_t)memory_region_get_ram_ptr(mrs->mr) +
> +            mrs->offset_within_region;
> +        cur_vmr->flags_padding   = 0;
> +    }
> +    vhost_realloc_vhost_memory(dev, nregions);
> +}
> +
>  static void vhost_commit(MemoryListener *listener)
>  {
>      struct vhost_dev *dev = container_of(listener, struct vhost_dev,
> @@ -532,7 +568,6 @@ static void vhost_commit(MemoryListener *listener)
>      MemoryRegionSection *old_sections;
>      int n_old_sections;
>      uint64_t log_size;
> -    size_t regions_size;
>      int r;
>      int i;
>      bool changed = false;
> @@ -564,23 +599,19 @@ static void vhost_commit(MemoryListener *listener)
>          goto out;
>      }
>  
> -    /* Rebuild the regions list from the new sections list */
> -    regions_size = offsetof(struct vhost_memory, regions) +
> -                       dev->n_mem_sections * sizeof dev->mem->regions[0];
> -    dev->mem = g_realloc(dev->mem, regions_size);
> -    dev->mem->nregions = dev->n_mem_sections;
> +    /*
> +     * Globally track the used memslots *without* device specific
> +     * filtering. This way, we always know how many memslots are required
> +     * when devices with differing filtering requirements get mixed, and
> +     * all RAM memory regions of memory devices will consume memslots.
> +     */
>      used_memslots = dev->mem->nregions;
> -    for (i = 0; i < dev->n_mem_sections; i++) {
> -        struct vhost_memory_region *cur_vmr = dev->mem->regions + i;
> -        struct MemoryRegionSection *mrs = dev->mem_sections + i;
>  
> -        cur_vmr->guest_phys_addr = mrs->offset_within_address_space;
> -        cur_vmr->memory_size     = int128_get64(mrs->size);
> -        cur_vmr->userspace_addr  =
> -            (uintptr_t)memory_region_get_ram_ptr(mrs->mr) +
> -            mrs->offset_within_region;
> -        cur_vmr->flags_padding   = 0;
> -    }
> +    /*
> +     * Rebuild the regions list from the new sections list, filtering out all
> +     * sections that this device is not interested in.
> +     */
> +    vhost_rebuild_vhost_memory(dev);
>  
>      if (!dev->started) {
>          goto out;



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 0/2] vhost: memslot handling improvements
  2023-02-17 14:20 ` Michael S. Tsirkin
  2023-02-17 14:27   ` David Hildenbrand
@ 2023-03-07 11:14   ` Igor Mammedov
  2023-03-08 10:08     ` David Hildenbrand
  1 sibling, 1 reply; 22+ messages in thread
From: Igor Mammedov @ 2023-03-07 11:14 UTC (permalink / raw
  To: Michael S. Tsirkin, Dr . David Alan Gilbert
  Cc: David Hildenbrand, qemu-devel, Stefan Hajnoczi,
	Dr . David Alan Gilbert

On Fri, 17 Feb 2023 09:20:27 -0500
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Thu, Feb 16, 2023 at 12:47:50PM +0100, David Hildenbrand wrote:
> > Following up on my previous work to make virtio-mem consume multiple
> > memslots dynamically [1] that requires precise accounting between used vs.
> > reserved memslots, I realized that vhost makes this extra hard by
> > filtering out some memory region sections (so they don't consume a
> > memslot) in the vhost-user case, which messes up the whole memslot
> > accounting.
> > 
> > This series fixes what I found to be broken and prepares for more work on
> > [1]. Further, it cleanes up the merge checks that I consider unnecessary.
> > 
> > [1] https://lkml.kernel.org/r/20211027124531.57561-8-david@redhat.com
> > 
> > Cc: "Michael S. Tsirkin" <mst@redhat.com>
> > Cc: Stefan Hajnoczi <stefanha@redhat.com>
> > Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>  
> 
> 
> Igor worked on memslots a lot previously and he asked for
> a bit of time to review this, so I'll wait a bit before
> applying.

I've reviewed it as much as I could.
(That said, vhost mem map code was mostly rewritten by dgilbert,
since the last time I've touched it, so his review would be
more valuable in this case than mine)

> 
> > David Hildenbrand (2):
> >   vhost: Defer filtering memory sections until building the vhost memory
> >     structure
> >   vhost: Remove vhost_backend_can_merge() callback
> > 
> >  hw/virtio/vhost-user.c            | 14 -----
> >  hw/virtio/vhost-vdpa.c            |  1 -
> >  hw/virtio/vhost.c                 | 85 ++++++++++++++++++++-----------
> >  include/hw/virtio/vhost-backend.h |  4 --
> >  4 files changed, 56 insertions(+), 48 deletions(-)
> > 
> > -- 
> > 2.39.1  
> 
> 



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 2/2] vhost: Remove vhost_backend_can_merge() callback
  2023-03-07 10:25   ` Igor Mammedov
@ 2023-03-07 11:16     ` Igor Mammedov
  2023-03-07 11:24       ` David Hildenbrand
  0 siblings, 1 reply; 22+ messages in thread
From: Igor Mammedov @ 2023-03-07 11:16 UTC (permalink / raw
  To: David Hildenbrand
  Cc: qemu-devel, Michael S. Tsirkin, Stefan Hajnoczi,
	Dr . David Alan Gilbert

On Tue, 7 Mar 2023 11:25:48 +0100
Igor Mammedov <imammedo@redhat.com> wrote:

> On Thu, 16 Feb 2023 12:47:52 +0100
> David Hildenbrand <david@redhat.com> wrote:
> 
> > Checking whether the memory regions are equal is sufficient: if they are
> > equal, then most certainly the contained fd is equal.  
> sounds reasonable to me.
> 
> > 
> > The whole vhost-user memslot handling is suboptimal and overly
> > complicated. We shouldn't have to lookup a RAM memory regions we got
> > notified about in vhost_user_get_mr_data() using a host pointer. But that  
> 
> While on janitor duty can you fixup following?
> 
> vhost_user_get_mr_data() -> memory_region_from_host -> 
>  -> qemu_ram_block_from_host()  
> for qemu_ram_block_from_host doc comment seems to out of
> sync (ram_addr not longer exists)
> 
> > requires a bigger rework -- especially an alternative vhost_set_mem_table()
> > backend call that simply consumes MemoryRegionSections.  
> 
> just skimming through  usage of vhost_user_get_mr_data() it looks like
> we are first collecting MemoryRegionSection-s into tmp_sections
> then we do vhost_commit we convert then into vhost_memory_region list
> and the we are trying hard to convert addresses from the later
> to back to MemoryRegions we've lost during tmp_sections conversion
> all over the place.
> 
> To me it looks like we should drop conversion to vhost_dev::mem
> and replace its usage with vhost_dev::mem_sections directly
> to get rid of data duplication and back and forth addr<->mr conversion.
> 
> > For now, let's just drop vhost_backend_can_merge().
> > 
> > Signed-off-by: David Hildenbrand <david@redhat.com>  
> 
> Reviewed-by: Igor Mammedov <imammedo@redhat.com>
> 
> > ---
> >  hw/virtio/vhost-user.c            | 14 --------------
> >  hw/virtio/vhost-vdpa.c            |  1 -
> >  hw/virtio/vhost.c                 |  6 +-----
> >  include/hw/virtio/vhost-backend.h |  4 ----
> >  4 files changed, 1 insertion(+), 24 deletions(-)
> > 
> > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > index e68daa35d4..4bfaf559a7 100644
> > --- a/hw/virtio/vhost-user.c
> > +++ b/hw/virtio/vhost-user.c
> > @@ -2195,19 +2195,6 @@ static int vhost_user_migration_done(struct vhost_dev *dev, char* mac_addr)
> >      return -ENOTSUP;
> >  }
> >  
> > -static bool vhost_user_can_merge(struct vhost_dev *dev,
> > -                                 uint64_t start1, uint64_t size1,
> > -                                 uint64_t start2, uint64_t size2)
> > -{
> > -    ram_addr_t offset;
> > -    int mfd, rfd;
> > -
> > -    (void)vhost_user_get_mr_data(start1, &offset, &mfd);
> > -    (void)vhost_user_get_mr_data(start2, &offset, &rfd);
> > -
> > -    return mfd == rfd;
> > -}
> > -
> >  static int vhost_user_net_set_mtu(struct vhost_dev *dev, uint16_t mtu)
> >  {
> >      VhostUserMsg msg;
> > @@ -2704,7 +2691,6 @@ const VhostOps user_ops = {
> >          .vhost_set_vring_enable = vhost_user_set_vring_enable,
> >          .vhost_requires_shm_log = vhost_user_requires_shm_log,
> >          .vhost_migration_done = vhost_user_migration_done,
> > -        .vhost_backend_can_merge = vhost_user_can_merge,
> >          .vhost_net_set_mtu = vhost_user_net_set_mtu,
> >          .vhost_set_iotlb_callback = vhost_user_set_iotlb_callback,
> >          .vhost_send_device_iotlb_msg = vhost_user_send_device_iotlb_msg,
> > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > index 542e003101..9ab7bc8718 100644
> > --- a/hw/virtio/vhost-vdpa.c
> > +++ b/hw/virtio/vhost-vdpa.c
> > @@ -1317,7 +1317,6 @@ const VhostOps vdpa_ops = {
> >          .vhost_set_config = vhost_vdpa_set_config,
> >          .vhost_requires_shm_log = NULL,
> >          .vhost_migration_done = NULL,
> > -        .vhost_backend_can_merge = NULL,
> >          .vhost_net_set_mtu = NULL,
> >          .vhost_set_iotlb_callback = NULL,
> >          .vhost_send_device_iotlb_msg = NULL,
> > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> > index b7fb960fa9..9d8662aa98 100644
> > --- a/hw/virtio/vhost.c
> > +++ b/hw/virtio/vhost.c
> > @@ -733,11 +733,7 @@ static void vhost_region_add_section(struct vhost_dev *dev,
> >              size_t offset = mrs_gpa - prev_gpa_start;
> >  
> >              if (prev_host_start + offset == mrs_host &&
> > -                section->mr == prev_sec->mr &&
> > -                (!dev->vhost_ops->vhost_backend_can_merge ||
> > -                 dev->vhost_ops->vhost_backend_can_merge(dev,

another question, can it relly happen, i.e. having 2 abut memory sections
with the same memory region, is yes then when/why?


> > -                    mrs_host, mrs_size,
> > -                    prev_host_start, prev_size))) {
> > +                section->mr == prev_sec->mr) {
> >                  uint64_t max_end = MAX(prev_host_end, mrs_host + mrs_size);
> >                  need_add = false;
> >                  prev_sec->offset_within_address_space =
> > diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
> > index c5ab49051e..abf1601ba2 100644
> > --- a/include/hw/virtio/vhost-backend.h
> > +++ b/include/hw/virtio/vhost-backend.h
> > @@ -86,9 +86,6 @@ typedef int (*vhost_set_vring_enable_op)(struct vhost_dev *dev,
> >  typedef bool (*vhost_requires_shm_log_op)(struct vhost_dev *dev);
> >  typedef int (*vhost_migration_done_op)(struct vhost_dev *dev,
> >                                         char *mac_addr);
> > -typedef bool (*vhost_backend_can_merge_op)(struct vhost_dev *dev,
> > -                                           uint64_t start1, uint64_t size1,
> > -                                           uint64_t start2, uint64_t size2);
> >  typedef int (*vhost_vsock_set_guest_cid_op)(struct vhost_dev *dev,
> >                                              uint64_t guest_cid);
> >  typedef int (*vhost_vsock_set_running_op)(struct vhost_dev *dev, int start);
> > @@ -160,7 +157,6 @@ typedef struct VhostOps {
> >      vhost_set_vring_enable_op vhost_set_vring_enable;
> >      vhost_requires_shm_log_op vhost_requires_shm_log;
> >      vhost_migration_done_op vhost_migration_done;
> > -    vhost_backend_can_merge_op vhost_backend_can_merge;
> >      vhost_vsock_set_guest_cid_op vhost_vsock_set_guest_cid;
> >      vhost_vsock_set_running_op vhost_vsock_set_running;
> >      vhost_set_iotlb_callback_op vhost_set_iotlb_callback;  
> 



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 2/2] vhost: Remove vhost_backend_can_merge() callback
  2023-03-07 11:16     ` Igor Mammedov
@ 2023-03-07 11:24       ` David Hildenbrand
  0 siblings, 0 replies; 22+ messages in thread
From: David Hildenbrand @ 2023-03-07 11:24 UTC (permalink / raw
  To: Igor Mammedov
  Cc: qemu-devel, Michael S. Tsirkin, Stefan Hajnoczi,
	Dr . David Alan Gilbert

On 07.03.23 12:16, Igor Mammedov wrote:
> On Tue, 7 Mar 2023 11:25:48 +0100
> Igor Mammedov <imammedo@redhat.com> wrote:
> 
>> On Thu, 16 Feb 2023 12:47:52 +0100
>> David Hildenbrand <david@redhat.com> wrote:
>>
>>> Checking whether the memory regions are equal is sufficient: if they are
>>> equal, then most certainly the contained fd is equal.
>> sounds reasonable to me.
>>
>>>
>>> The whole vhost-user memslot handling is suboptimal and overly
>>> complicated. We shouldn't have to lookup a RAM memory regions we got
>>> notified about in vhost_user_get_mr_data() using a host pointer. But that
>>
>> While on janitor duty can you fixup following?
>>
>> vhost_user_get_mr_data() -> memory_region_from_host ->
>>   -> qemu_ram_block_from_host()
>> for qemu_ram_block_from_host doc comment seems to out of
>> sync (ram_addr not longer exists)
>>
>>> requires a bigger rework -- especially an alternative vhost_set_mem_table()
>>> backend call that simply consumes MemoryRegionSections.
>>
>> just skimming through  usage of vhost_user_get_mr_data() it looks like
>> we are first collecting MemoryRegionSection-s into tmp_sections
>> then we do vhost_commit we convert then into vhost_memory_region list
>> and the we are trying hard to convert addresses from the later
>> to back to MemoryRegions we've lost during tmp_sections conversion
>> all over the place.
>>
>> To me it looks like we should drop conversion to vhost_dev::mem
>> and replace its usage with vhost_dev::mem_sections directly
>> to get rid of data duplication and back and forth addr<->mr conversion.
>>
>>> For now, let's just drop vhost_backend_can_merge().
>>>
>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>
>> Reviewed-by: Igor Mammedov <imammedo@redhat.com>
>>
>>> ---
>>>   hw/virtio/vhost-user.c            | 14 --------------
>>>   hw/virtio/vhost-vdpa.c            |  1 -
>>>   hw/virtio/vhost.c                 |  6 +-----
>>>   include/hw/virtio/vhost-backend.h |  4 ----
>>>   4 files changed, 1 insertion(+), 24 deletions(-)
>>>
>>> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
>>> index e68daa35d4..4bfaf559a7 100644
>>> --- a/hw/virtio/vhost-user.c
>>> +++ b/hw/virtio/vhost-user.c
>>> @@ -2195,19 +2195,6 @@ static int vhost_user_migration_done(struct vhost_dev *dev, char* mac_addr)
>>>       return -ENOTSUP;
>>>   }
>>>   
>>> -static bool vhost_user_can_merge(struct vhost_dev *dev,
>>> -                                 uint64_t start1, uint64_t size1,
>>> -                                 uint64_t start2, uint64_t size2)
>>> -{
>>> -    ram_addr_t offset;
>>> -    int mfd, rfd;
>>> -
>>> -    (void)vhost_user_get_mr_data(start1, &offset, &mfd);
>>> -    (void)vhost_user_get_mr_data(start2, &offset, &rfd);
>>> -
>>> -    return mfd == rfd;
>>> -}
>>> -
>>>   static int vhost_user_net_set_mtu(struct vhost_dev *dev, uint16_t mtu)
>>>   {
>>>       VhostUserMsg msg;
>>> @@ -2704,7 +2691,6 @@ const VhostOps user_ops = {
>>>           .vhost_set_vring_enable = vhost_user_set_vring_enable,
>>>           .vhost_requires_shm_log = vhost_user_requires_shm_log,
>>>           .vhost_migration_done = vhost_user_migration_done,
>>> -        .vhost_backend_can_merge = vhost_user_can_merge,
>>>           .vhost_net_set_mtu = vhost_user_net_set_mtu,
>>>           .vhost_set_iotlb_callback = vhost_user_set_iotlb_callback,
>>>           .vhost_send_device_iotlb_msg = vhost_user_send_device_iotlb_msg,
>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>>> index 542e003101..9ab7bc8718 100644
>>> --- a/hw/virtio/vhost-vdpa.c
>>> +++ b/hw/virtio/vhost-vdpa.c
>>> @@ -1317,7 +1317,6 @@ const VhostOps vdpa_ops = {
>>>           .vhost_set_config = vhost_vdpa_set_config,
>>>           .vhost_requires_shm_log = NULL,
>>>           .vhost_migration_done = NULL,
>>> -        .vhost_backend_can_merge = NULL,
>>>           .vhost_net_set_mtu = NULL,
>>>           .vhost_set_iotlb_callback = NULL,
>>>           .vhost_send_device_iotlb_msg = NULL,
>>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
>>> index b7fb960fa9..9d8662aa98 100644
>>> --- a/hw/virtio/vhost.c
>>> +++ b/hw/virtio/vhost.c
>>> @@ -733,11 +733,7 @@ static void vhost_region_add_section(struct vhost_dev *dev,
>>>               size_t offset = mrs_gpa - prev_gpa_start;
>>>   
>>>               if (prev_host_start + offset == mrs_host &&
>>> -                section->mr == prev_sec->mr &&
>>> -                (!dev->vhost_ops->vhost_backend_can_merge ||
>>> -                 dev->vhost_ops->vhost_backend_can_merge(dev,
> 
> another question, can it relly happen, i.e. having 2 abut memory sections
> with the same memory region, is yes then when/why?

Unfortunately yet, because vhost relies on some hacks (sorry, but
that's what it is) to make huge pages work. The following commit
contains some details:

commit 76525114736e8f669766e69b715fa59ce8648aae
Author: Dr. David Alan Gilbert <dgilbert@redhat.com>
Date:   Thu Jan 16 20:24:14 2020 +0000

     vhost: Only align sections for vhost-user
     
     I added hugepage alignment code in c1ece84e7c9 to deal with
     vhost-user + postcopy which needs aligned pages when using userfault.
     However, on x86 the lower 2MB of address space tends to be shotgun'd
     with small fragments around the 512-640k range - e.g. video RAM, and
     with HyperV synic pages tend to sit around there - again splitting
     it up.  The alignment code complains with a 'Section rounded to ...'
     error and gives up.


Otherwise it wouldn't be needed, because flatview simplification code already
merges what's reasonable.

[I'll reply to you pother mail regarding that shortly]

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 1/2] vhost: Defer filtering memory sections until building the vhost memory structure
  2023-03-07 10:51   ` Igor Mammedov
@ 2023-03-07 12:46     ` David Hildenbrand
  2023-03-08 12:30       ` Igor Mammedov
  0 siblings, 1 reply; 22+ messages in thread
From: David Hildenbrand @ 2023-03-07 12:46 UTC (permalink / raw
  To: Igor Mammedov
  Cc: qemu-devel, Michael S. Tsirkin, Stefan Hajnoczi,
	Dr . David Alan Gilbert, Tiwei Bie

On 07.03.23 11:51, Igor Mammedov wrote:
> On Thu, 16 Feb 2023 12:47:51 +0100
> David Hildenbrand <david@redhat.com> wrote:
> 
>> Having multiple devices, some filtering memslots and some not filtering
>> memslots, messes up the "used_memslot" accounting. If we'd have a device
>> the filters out less memory sections after a device that filters out more,
>> we'd be in trouble, because our memslot checks stop working reliably.
>> For example, hotplugging a device that filters out less memslots might end
>> up passing the checks based on max vs. used memslots, but can run out of
>> memslots when getting notified about all memory sections.
> 
> an hypothetical example of such case would be appreciated
> (I don't really get how above can happen, perhaps more detailed explanation
> would help)

Thanks for asking! AFAIKT, it's mostly about hot-adding first a vhost devices
that filters (and messes up used_memslots), and then messing with memslots that
get filtered out,

$ sudo rmmod vhost
$ sudo modprobe vhost max_mem_regions=4

// startup guest with virtio-net device
...

// hotplug a NVDIMM, resulting in used_memslots=4
echo "object_add memory-backend-ram,id=mem0,size=128M" | sudo nc -U /var/tmp/mon_src; echo ""
echo "device_add nvdimm,id=nvdimm0,memdev=mem0" | sudo nc -U /var/tmp/mon_src

// hotplug vhost-user device that overwrites "used_memslots=3"
echo "device_add vhost-user-fs-pci,queue-size=1024,chardev=char0,tag=myfs,bus=root" | sudo nc -U /var/tmp/mon_src

// hotplug another NVDIMM
echo "object_add memory-backend-ram,id=mem1,size=128M" | sudo nc -U /var/tmp/mon_src; echo ""
echo "device_add pc-dimm,id=nvdimm1,memdev=mem1" | sudo nc -U /var/tmp/mon_src

// vvhost will fail to update the memslots
vhost_set_mem_table failed: Argument list too long (7)


So we tricked used_memslots to be smaller than it actually has to be, because
we're ignoring the memslots filtered out by the vhost-user device.


Now, this is all far from relevant in practice as of now I think, and
usually would indicate user errors already (memory that's not shared with
vhost-user?).

It might gets more relevant when virtio-mem dynamically adds/removes memslots and
relies on precise tracking of used vs. free memslots.


But maybe I should just ignore that case and live a happy life instead, it's
certainly hard to even trigger right now :)

>   
>> Further, it will be helpful in memory device context in the near future
>> to know that a RAM memory region section will consume a memslot, and be
>> accounted for in the used vs. free memslots, such that we can implement
>> reservation of memslots for memory devices properly. Whether a device
>> filters this out and would theoretically still have a free memslot is
>> then hidden internally, making overall vhost memslot accounting easier.
>>
>> Let's filter the memslots when creating the vhost memory array,
>> accounting all RAM && !ROM memory regions as "used_memslots" even if
>> vhost_user isn't interested in anonymous RAM regions, because it needs
>> an fd.
>>
>> When a device actually filters out regions (which should happen rarely
>> in practice), we might detect a layout change although only filtered
>> regions changed. We won't bother about optimizing that for now.
>>
>> Note: we cannot simply filter out the region and count them as
>> "filtered" to add them to used, because filtered regions could get
>> merged and result in a smaller effective number of memslots. Further,
>> we won't touch the hmp/qmp virtio introspection output.
> What output exactly you are talking about?

hw/virtio/virtio-qmp.c:qmp_x_query_virtio_status

Prints hdev->n_mem_sections and hdev->n_tmp_sections. I won't be
touching that (debug) output.

> 
> PS:
> If we drop vhost_dev::memm the bulk of this patch would go away

Yes, unfortunately we can't I think.

> 
> side questions:
> do we have MemorySection merging on qemu's kvm side?

Yes, we properly merge in flatview_simplify(). It's all about handling holes
in huge pages IIUC.

> also does KVM merge merge memslots?

No, for good reasons not. Mapping more than we're instructed to map via a notifier
sounds is kind-of hacky already. But I guess there is no easy way around it (e.g., if
mapping that part of memory doesn't work, we'd have to bounce the reads/writes
through QEMU instead).

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 0/2] vhost: memslot handling improvements
  2023-03-07 11:14   ` Igor Mammedov
@ 2023-03-08 10:08     ` David Hildenbrand
  0 siblings, 0 replies; 22+ messages in thread
From: David Hildenbrand @ 2023-03-08 10:08 UTC (permalink / raw
  To: Igor Mammedov, Michael S. Tsirkin, Dr . David Alan Gilbert
  Cc: qemu-devel, Stefan Hajnoczi

On 07.03.23 12:14, Igor Mammedov wrote:
> On Fri, 17 Feb 2023 09:20:27 -0500
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
>> On Thu, Feb 16, 2023 at 12:47:50PM +0100, David Hildenbrand wrote:
>>> Following up on my previous work to make virtio-mem consume multiple
>>> memslots dynamically [1] that requires precise accounting between used vs.
>>> reserved memslots, I realized that vhost makes this extra hard by
>>> filtering out some memory region sections (so they don't consume a
>>> memslot) in the vhost-user case, which messes up the whole memslot
>>> accounting.
>>>
>>> This series fixes what I found to be broken and prepares for more work on
>>> [1]. Further, it cleanes up the merge checks that I consider unnecessary.
>>>
>>> [1] https://lkml.kernel.org/r/20211027124531.57561-8-david@redhat.com
>>>
>>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
>>> Cc: Stefan Hajnoczi <stefanha@redhat.com>
>>> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
>>
>>
>> Igor worked on memslots a lot previously and he asked for
>> a bit of time to review this, so I'll wait a bit before
>> applying.
> 
> I've reviewed it as much as I could.
> (That said, vhost mem map code was mostly rewritten by dgilbert,
> since the last time I've touched it, so his review would be
> more valuable in this case than mine)

Thanks for the review! I'll resend (extending the patch description) and 
will move patch #1 last, so we can decide if we want to leave that 
broken in corner cases.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 1/2] vhost: Defer filtering memory sections until building the vhost memory structure
  2023-03-07 12:46     ` David Hildenbrand
@ 2023-03-08 12:30       ` Igor Mammedov
  2023-03-08 15:21         ` David Hildenbrand
  0 siblings, 1 reply; 22+ messages in thread
From: Igor Mammedov @ 2023-03-08 12:30 UTC (permalink / raw
  To: David Hildenbrand
  Cc: qemu-devel, Michael S. Tsirkin, Stefan Hajnoczi,
	Dr . David Alan Gilbert, Tiwei Bie

On Tue, 7 Mar 2023 13:46:36 +0100
David Hildenbrand <david@redhat.com> wrote:

> On 07.03.23 11:51, Igor Mammedov wrote:
> > On Thu, 16 Feb 2023 12:47:51 +0100
> > David Hildenbrand <david@redhat.com> wrote:
> >   
> >> Having multiple devices, some filtering memslots and some not filtering
> >> memslots, messes up the "used_memslot" accounting. If we'd have a device
> >> the filters out less memory sections after a device that filters out more,
> >> we'd be in trouble,

it should say why/when it happens (in example you've provided
it's caused by mix of in kernel vhost and vhost-user devices)

> >> because our memslot checks stop working reliably.
> >> For example, hotplugging a device that filters out less memslots might end
> >> up passing the checks based on max vs. used memslots, but can run out of
> >> memslots when getting notified about all memory sections.  
> > 
> > an hypothetical example of such case would be appreciated
> > (I don't really get how above can happen, perhaps more detailed explanation
> > would help)  
> 
> Thanks for asking! AFAIKT, it's mostly about hot-adding first a vhost devices
> that filters (and messes up used_memslots), and then messing with memslots that
> get filtered out,
> 
> $ sudo rmmod vhost
> $ sudo modprobe vhost max_mem_regions=4
> 
> // startup guest with virtio-net device
> ...
> 
> // hotplug a NVDIMM, resulting in used_memslots=4
> echo "object_add memory-backend-ram,id=mem0,size=128M" | sudo nc -U /var/tmp/mon_src; echo ""
> echo "device_add nvdimm,id=nvdimm0,memdev=mem0" | sudo nc -U /var/tmp/mon_src
> 
> // hotplug vhost-user device that overwrites "used_memslots=3"
> echo "device_add vhost-user-fs-pci,queue-size=1024,chardev=char0,tag=myfs,bus=root" | sudo nc -U /var/tmp/mon_src
> 
> // hotplug another NVDIMM
> echo "object_add memory-backend-ram,id=mem1,size=128M" | sudo nc -U /var/tmp/mon_src; echo ""
> echo "device_add pc-dimm,id=nvdimm1,memdev=mem1" | sudo nc -U /var/tmp/mon_src
> 
> // vvhost will fail to update the memslots
> vhost_set_mem_table failed: Argument list too long (7)
> 
> 
> So we tricked used_memslots to be smaller than it actually has to be, because
> we're ignoring the memslots filtered out by the vhost-user device.
> 
> 
> Now, this is all far from relevant in practice as of now I think, and
> usually would indicate user errors already (memory that's not shared with
> vhost-user?).

well vhost-user device_add should fail if it can't get hands on all RAM
(if it doesn't we have a bug somewhere else)

> 
> It might gets more relevant when virtio-mem dynamically adds/removes memslots and
> relies on precise tracking of used vs. free memslots.
> 
> 
> But maybe I should just ignore that case and live a happy life instead, it's
> certainly hard to even trigger right now :)
> >     
> >> Further, it will be helpful in memory device context in the near future
> >> to know that a RAM memory region section will consume a memslot, and be
> >> accounted for in the used vs. free memslots, such that we can implement
> >> reservation of memslots for memory devices properly. Whether a device
> >> filters this out and would theoretically still have a free memslot is
> >> then hidden internally, making overall vhost memslot accounting easier.
> >>

> >> Let's filter the memslots when creating the vhost memory array,
> >> accounting all RAM && !ROM memory regions as "used_memslots" even if
> >> vhost_user isn't interested in anonymous RAM regions, because it needs
> >> an fd.

that would regress existing setups where it was possible
to start with N DIMMs and after this patch the same VM could fail to
start if N was close to vhost's limit in otherwise perfectly working
configuration. So this approach doesn't seem right. 

Perhaps redoing vhost's used_memslots accounting would be
a better approach, right down to introducing reservations you'd
like to have eventually.

Something like:
  1: s/vhost_has_free_slot/vhost_memory_region_limit/
     and maybe the same for kvm_has_free_slot
  then rewrite memory_device_check_addable() moving all
  used_memslots accounting into memory_device core.

> >> When a device actually filters out regions (which should happen rarely
> >> in practice), we might detect a layout change although only filtered
> >> regions changed. We won't bother about optimizing that for now.
> >>
> >> Note: we cannot simply filter out the region and count them as
> >> "filtered" to add them to used, because filtered regions could get
> >> merged and result in a smaller effective number of memslots. Further,
> >> we won't touch the hmp/qmp virtio introspection output.  
> > What output exactly you are talking about?  
> 
> hw/virtio/virtio-qmp.c:qmp_x_query_virtio_status
> 
> Prints hdev->n_mem_sections and hdev->n_tmp_sections. I won't be
> touching that (debug) output.
> 
> > 
> > PS:
> > If we drop vhost_dev::memm the bulk of this patch would go away  
> 
> Yes, unfortunately we can't I think.
> 
> > 
> > side questions:
> > do we have MemorySection merging on qemu's kvm side?  
> 
> Yes, we properly merge in flatview_simplify(). It's all about handling holes
> in huge pages IIUC.
> 
> > also does KVM merge merge memslots?  
> 
> No, for good reasons not. Mapping more than we're instructed to map via a notifier
> sounds is kind-of hacky already. But I guess there is no easy way around it (e.g., if
> mapping that part of memory doesn't work, we'd have to bounce the reads/writes
> through QEMU instead).
> 



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 1/2] vhost: Defer filtering memory sections until building the vhost memory structure
  2023-03-08 12:30       ` Igor Mammedov
@ 2023-03-08 15:21         ` David Hildenbrand
  0 siblings, 0 replies; 22+ messages in thread
From: David Hildenbrand @ 2023-03-08 15:21 UTC (permalink / raw
  To: Igor Mammedov
  Cc: qemu-devel, Michael S. Tsirkin, Stefan Hajnoczi,
	Dr . David Alan Gilbert, Tiwei Bie

>>
>> So we tricked used_memslots to be smaller than it actually has to be, because
>> we're ignoring the memslots filtered out by the vhost-user device.
>>
>>
>> Now, this is all far from relevant in practice as of now I think, and
>> usually would indicate user errors already (memory that's not shared with
>> vhost-user?).
> 
> well vhost-user device_add should fail if it can't get hands on all RAM
> (if it doesn't we have a bug somewhere else)

There are some corner cases already. Like R/O NVDIMMs are completely 
ignored -- I assume because we wouldn't be able to use them for virtio 
queues either way, so it kind-of makes sense I guess.

I have not yet figured out *why* 988a27754bbb ("vhost: allow backends to 
filter memory sections") was included at all. Why should we have 
memory-backend-ram mapped into guest address space that vhost-user 
cannot handle?

Take my NVDIMM example, if we'd use that NVDIMM memory in the guest as 
ordinary RAM, it could eventually be used for virtio queues ... and we 
don't even warn the user.

So I agree that hotplugging that device should fail. But it could also 
break some existing setups (we could work around it using compat 
machines I guess).

But we also have to fail hotplugging such a vhost-user device, ... and I 
am not sure where to even put such checks.


> 
>>
>> It might gets more relevant when virtio-mem dynamically adds/removes memslots and
>> relies on precise tracking of used vs. free memslots.
>>
>>
>> But maybe I should just ignore that case and live a happy life instead, it's
>> certainly hard to even trigger right now :)
>>>      
>>>> Further, it will be helpful in memory device context in the near future
>>>> to know that a RAM memory region section will consume a memslot, and be
>>>> accounted for in the used vs. free memslots, such that we can implement
>>>> reservation of memslots for memory devices properly. Whether a device
>>>> filters this out and would theoretically still have a free memslot is
>>>> then hidden internally, making overall vhost memslot accounting easier.
>>>>
> 
>>>> Let's filter the memslots when creating the vhost memory array,
>>>> accounting all RAM && !ROM memory regions as "used_memslots" even if
>>>> vhost_user isn't interested in anonymous RAM regions, because it needs
>>>> an fd.
> 
> that would regress existing setups where it was possible
> to start with N DIMMs and after this patch the same VM could fail to
> start if N was close to vhost's limit in otherwise perfectly working
> configuration. So this approach doesn't seem right.

As discussed already with MST, this was the state before that change. So 
I strongly doubt that we would break anything because using 
memory-backend-ram in that setup would already be suspicious.

Again, I did not figure out *why* that change was even included and 
which setups even care about that.

Maybe Tiwei can comment.

> 
> Perhaps redoing vhost's used_memslots accounting would be
> a better approach, right down to introducing reservations you'd
> like to have eventually.

The question what to do with memory-backend-ram for vhost-user still 
remains. It's independent of the "used_memslot" tracking, because used 
vs. reserved would depend on the vhost-backend filtering demands ... and 
I really wanted to avoid that with this commit. It just makes tracking 
much harder.

> 
> Something like:
>    1: s/vhost_has_free_slot/vhost_memory_region_limit/
>       and maybe the same for kvm_has_free_slot
>    then rewrite memory_device_check_addable() moving all
>    used_memslots accounting into memory_device core.

I do something similar already, however, by querying the free memslots 
from kvm and vhost, not the limits (limits are not very expressive).

Thanks!

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2023-03-08 15:22 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-02-16 11:47 [PATCH v1 0/2] vhost: memslot handling improvements David Hildenbrand
2023-02-16 11:47 ` [PATCH v1 1/2] vhost: Defer filtering memory sections until building the vhost memory structure David Hildenbrand
2023-02-16 12:04   ` Michael S. Tsirkin
2023-02-16 12:10     ` David Hildenbrand
2023-02-16 12:13       ` David Hildenbrand
2023-02-16 12:22         ` Michael S. Tsirkin
2023-02-16 12:21       ` Michael S. Tsirkin
2023-02-16 12:39         ` David Hildenbrand
2023-03-07 10:51   ` Igor Mammedov
2023-03-07 12:46     ` David Hildenbrand
2023-03-08 12:30       ` Igor Mammedov
2023-03-08 15:21         ` David Hildenbrand
2023-02-16 11:47 ` [PATCH v1 2/2] vhost: Remove vhost_backend_can_merge() callback David Hildenbrand
2023-03-07 10:25   ` Igor Mammedov
2023-03-07 11:16     ` Igor Mammedov
2023-03-07 11:24       ` David Hildenbrand
2023-02-16 16:04 ` [PATCH v1 0/2] vhost: memslot handling improvements Stefan Hajnoczi
2023-02-17 13:48   ` David Hildenbrand
2023-02-17 14:20 ` Michael S. Tsirkin
2023-02-17 14:27   ` David Hildenbrand
2023-03-07 11:14   ` Igor Mammedov
2023-03-08 10:08     ` David Hildenbrand

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.