All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/4] multifd: various fixes
@ 2023-10-11 18:43 Elena Ufimtseva
  2023-10-11 18:43 ` [PATCH v2 1/4] migration: check for rate_limit_max for RATE_LIMIT_DISABLED Elena Ufimtseva
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Elena Ufimtseva @ 2023-10-11 18:43 UTC (permalink / raw
  Cc: quintela, peterx, farosas, leobras, qemu-devel

Hello

While working and testing various live migration scenarios,
a few issues were found.

This is the version 2 of the changes with few non-functional
modifications minus the dropped patch.
I have dropped the patch [1/4] since the discussion with Fabiano
and his proposed changes:
https://www.mail-archive.com/qemu-devel@nongnu.org/msg995782.html

In new patchset, the [PATCH 1/4] addresses Peter's and Fabiano's
comments and Reviewed-by are added.

I added [PATCH 2/4] to add more description about the packet_len
and next_packet_size.

Patches 3,4 are unchanged, added Reviewed-by and moved discussion
of the other issues under "---".


Thank you in advance and looking forward for your feedback.
 
Elena Ufimtseva (4):
  migration: check for rate_limit_max for RATE_LIMIT_DISABLED
  multifd: document packet_len, next_packet_size
  multifd: fix counters in multifd_send_thread
  multifd: reset next_packet_len after sending pages

 migration/migration-stats.c |  9 +++++----
 migration/multifd.c         |  9 +++++----
 migration/multifd.h         | 35 ++++++++++++++++++++++++++++++-----
 3 files changed, 40 insertions(+), 13 deletions(-)

-- 
2.34.1



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2 1/4] migration: check for rate_limit_max for RATE_LIMIT_DISABLED
  2023-10-11 18:43 [PATCH v2 0/4] multifd: various fixes Elena Ufimtseva
@ 2023-10-11 18:43 ` Elena Ufimtseva
  2023-10-16  7:23   ` Juan Quintela
  2023-10-11 18:43 ` [PATCH v2 2/4] multifd: document packet_len, next_packet_size Elena Ufimtseva
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 9+ messages in thread
From: Elena Ufimtseva @ 2023-10-11 18:43 UTC (permalink / raw
  Cc: quintela, peterx, farosas, leobras, qemu-devel

In migration rate limiting atomic operations are used
to read the rate limit variables and transferred bytes and
they are expensive. Check first if rate_limit_max is equal
to RATE_LIMIT_DISABLED and return false immediately if so.

Note that with this patch we will also will stop flushing
by not calling qemu_fflush() from migration_transferred_bytes()
if the migration rate is not exceeded.
This should be fine since migration thread calls in the loop
migration_update_counters from migration_rate_limit() that
calls the migration_transferred_bytes() and flushes there.

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
---
 migration/migration-stats.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/migration/migration-stats.c b/migration/migration-stats.c
index 84e11e6dd8..4cc989d975 100644
--- a/migration/migration-stats.c
+++ b/migration/migration-stats.c
@@ -24,14 +24,15 @@ bool migration_rate_exceeded(QEMUFile *f)
         return true;
     }
 
+    uint64_t rate_limit_max = migration_rate_get();
+    if (rate_limit_max == RATE_LIMIT_DISABLED) {
+        return false;
+    }
+
     uint64_t rate_limit_start = stat64_get(&mig_stats.rate_limit_start);
     uint64_t rate_limit_current = migration_transferred_bytes(f);
     uint64_t rate_limit_used = rate_limit_current - rate_limit_start;
-    uint64_t rate_limit_max = stat64_get(&mig_stats.rate_limit_max);
 
-    if (rate_limit_max == RATE_LIMIT_DISABLED) {
-        return false;
-    }
     if (rate_limit_max > 0 && rate_limit_used > rate_limit_max) {
         return true;
     }
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 2/4] multifd: document packet_len, next_packet_size
  2023-10-11 18:43 [PATCH v2 0/4] multifd: various fixes Elena Ufimtseva
  2023-10-11 18:43 ` [PATCH v2 1/4] migration: check for rate_limit_max for RATE_LIMIT_DISABLED Elena Ufimtseva
@ 2023-10-11 18:43 ` Elena Ufimtseva
  2023-10-11 19:04   ` Fabiano Rosas
  2023-10-11 18:43 ` [PATCH v2 3/4] multifd: fix counters in multifd_send_thread Elena Ufimtseva
  2023-10-11 18:43 ` [PATCH v2 4/4] multifd: reset next_packet_len after sending pages Elena Ufimtseva
  3 siblings, 1 reply; 9+ messages in thread
From: Elena Ufimtseva @ 2023-10-11 18:43 UTC (permalink / raw
  Cc: quintela, peterx, farosas, leobras, qemu-devel

next_packet_size name is a bit misleading, so add more comments
where its defined.
We send data in two chunks in multifd thread:
 - send the packet with normal (non-zero) guest pages offsets that are
   dirty.
   This uses the packet_len and we increment number of packets
   for this thread that are sent;
 - send the normal (non-zero) guest dirty pages themselves in iovs.
   The total size of the data pointed by all iovs for this chunk
   is next_packet_size. We do not increment the packet_num for this
   thread when sending actual pages;

When compression is enabled, next_packet_size is used to indicate
the size of the compressed buffer on source and destination.

Will be it helpful to rename it as data_size or dirty_data_size?

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
---
 migration/multifd.h | 35 ++++++++++++++++++++++++++++++-----
 1 file changed, 30 insertions(+), 5 deletions(-)

diff --git a/migration/multifd.h b/migration/multifd.h
index a835643b48..37da9b68c2 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -45,7 +45,13 @@ typedef struct {
     uint32_t pages_alloc;
     /* non zero pages */
     uint32_t normal_pages;
-    /* size of the next packet that contains pages */
+    /*
+     * amount of data to be sent to the destination
+     * that is calculated as
+     *  - number of the normal guest dirty pages * page_size in non
+     *    compression case;
+     *  - equals of the compressed data size to be received;
+     */
     uint32_t next_packet_size;
     uint64_t packet_num;
     uint64_t unused[4];    /* Reserved for future use */
@@ -79,11 +85,18 @@ typedef struct {
     QIOChannel *c;
     /* is the yank function registered */
     bool registered_yank;
-    /* packet allocated len */
+    /*
+     * allocated length of a packet to be transferred.
+     * It has a size of MultiFDPacket struct plus
+     * the size of the array of guest page offsets (page_count * page_size).
+     */
     uint32_t packet_len;
     /* guest page size */
     uint32_t page_size;
-    /* number of pages in a full packet */
+    /*
+     * maximum number of dirty pages in a full packet calculated as
+     * MULTIFD_PACKET_SIZE / qemu_target_page_size()
+     */
     uint32_t page_count;
     /* multifd flags for sending ram */
     int write_flags;
@@ -116,7 +129,13 @@ typedef struct {
 
     /* pointer to the packet */
     MultiFDPacket_t *packet;
-    /* size of the next packet that contains pages */
+    /*
+     * amount of data to be sent to the destination
+     * that is calculated as
+     *  - number of the normal guest dirty pages * page_size in non
+     *    compression case;
+     *  - equals of the compressed data size to be received;
+     */
     uint32_t next_packet_size;
     /* packets sent through this channel */
     uint64_t num_packets;
@@ -171,7 +190,13 @@ typedef struct {
 
     /* pointer to the packet */
     MultiFDPacket_t *packet;
-    /* size of the next packet that contains pages */
+    /*
+     * amount of data to be received by the destination
+     * that is calculated as
+     *  - number of the normal guest dirty pages * page_size in non
+     *    compression case;
+     *  - equals of the compressed data size to be received;
+     */
     uint32_t next_packet_size;
     /* packets sent through this channel */
     uint64_t num_packets;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 3/4] multifd: fix counters in multifd_send_thread
  2023-10-11 18:43 [PATCH v2 0/4] multifd: various fixes Elena Ufimtseva
  2023-10-11 18:43 ` [PATCH v2 1/4] migration: check for rate_limit_max for RATE_LIMIT_DISABLED Elena Ufimtseva
  2023-10-11 18:43 ` [PATCH v2 2/4] multifd: document packet_len, next_packet_size Elena Ufimtseva
@ 2023-10-11 18:43 ` Elena Ufimtseva
  2023-10-16  7:25   ` Juan Quintela
  2023-10-11 18:43 ` [PATCH v2 4/4] multifd: reset next_packet_len after sending pages Elena Ufimtseva
  3 siblings, 1 reply; 9+ messages in thread
From: Elena Ufimtseva @ 2023-10-11 18:43 UTC (permalink / raw
  Cc: quintela, peterx, farosas, leobras, qemu-devel

Previous commit cbec7eb76879d419e7dbf531ee2506ec0722e825
"migration/multifd: Compute transferred bytes correctly"
removed accounting for packet_len in non-rdma
case, but the next_packet_size only accounts for pages, not for
the header packet (normal_pages * PAGE_SIZE) that is being sent
as iov[0]. The packet_len part should be added to account for
the size of MultiFDPacket and the array of the offsets.

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
---
 migration/multifd.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index 0f6b203877..e6e0013c16 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -714,8 +714,6 @@ static void *multifd_send_thread(void *opaque)
                 if (ret != 0) {
                     break;
                 }
-                stat64_add(&mig_stats.multifd_bytes, p->packet_len);
-                stat64_add(&mig_stats.transferred, p->packet_len);
             } else {
                 /* Send header using the same writev call */
                 p->iov[0].iov_len = p->packet_len;
@@ -728,8 +726,10 @@ static void *multifd_send_thread(void *opaque)
                 break;
             }
 
-            stat64_add(&mig_stats.multifd_bytes, p->next_packet_size);
-            stat64_add(&mig_stats.transferred, p->next_packet_size);
+            stat64_add(&mig_stats.multifd_bytes,
+                       p->next_packet_size + p->packet_len);
+            stat64_add(&mig_stats.transferred,
+                       p->next_packet_size + p->packet_len);
             qemu_mutex_lock(&p->mutex);
             p->pending_job--;
             qemu_mutex_unlock(&p->mutex);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 4/4] multifd: reset next_packet_len after sending pages
  2023-10-11 18:43 [PATCH v2 0/4] multifd: various fixes Elena Ufimtseva
                   ` (2 preceding siblings ...)
  2023-10-11 18:43 ` [PATCH v2 3/4] multifd: fix counters in multifd_send_thread Elena Ufimtseva
@ 2023-10-11 18:43 ` Elena Ufimtseva
  2023-10-16  7:26   ` Juan Quintela
  3 siblings, 1 reply; 9+ messages in thread
From: Elena Ufimtseva @ 2023-10-11 18:43 UTC (permalink / raw
  Cc: quintela, peterx, farosas, leobras, qemu-devel

Sometimes multifd sends just sync packet with no pages
(normal_num is 0). In this case the old value is being
preserved and being accounted for while only packet_len
is being transferred.
Reset it to 0 after sending and accounting for.

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
---
 migration/multifd.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/migration/multifd.c b/migration/multifd.c
index e6e0013c16..c45f5015f8 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -730,6 +730,7 @@ static void *multifd_send_thread(void *opaque)
                        p->next_packet_size + p->packet_len);
             stat64_add(&mig_stats.transferred,
                        p->next_packet_size + p->packet_len);
+            p->next_packet_size = 0;
             qemu_mutex_lock(&p->mutex);
             p->pending_job--;
             qemu_mutex_unlock(&p->mutex);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/4] multifd: document packet_len, next_packet_size
  2023-10-11 18:43 ` [PATCH v2 2/4] multifd: document packet_len, next_packet_size Elena Ufimtseva
@ 2023-10-11 19:04   ` Fabiano Rosas
  0 siblings, 0 replies; 9+ messages in thread
From: Fabiano Rosas @ 2023-10-11 19:04 UTC (permalink / raw
  To: Elena Ufimtseva; +Cc: quintela, peterx, leobras, qemu-devel

Elena Ufimtseva <elena.ufimtseva@oracle.com> writes:

> next_packet_size name is a bit misleading, so add more comments
> where its defined.
> We send data in two chunks in multifd thread:
>  - send the packet with normal (non-zero) guest pages offsets that are
>    dirty.
>    This uses the packet_len and we increment number of packets
>    for this thread that are sent;
>  - send the normal (non-zero) guest dirty pages themselves in iovs.
>    The total size of the data pointed by all iovs for this chunk
>    is next_packet_size. We do not increment the packet_num for this
>    thread when sending actual pages;
>
> When compression is enabled, next_packet_size is used to indicate
> the size of the compressed buffer on source and destination.
>
> Will be it helpful to rename it as data_size or dirty_data_size?
>
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> ---
>  migration/multifd.h | 35 ++++++++++++++++++++++++++++++-----
>  1 file changed, 30 insertions(+), 5 deletions(-)
>
> diff --git a/migration/multifd.h b/migration/multifd.h
> index a835643b48..37da9b68c2 100644
> --- a/migration/multifd.h
> +++ b/migration/multifd.h
> @@ -45,7 +45,13 @@ typedef struct {
>      uint32_t pages_alloc;
>      /* non zero pages */
>      uint32_t normal_pages;
> -    /* size of the next packet that contains pages */
> +    /*
> +     * amount of data to be sent to the destination
> +     * that is calculated as
> +     *  - number of the normal guest dirty pages * page_size in non
> +     *    compression case;
> +     *  - equals of the compressed data size to be received;
> +     */
>      uint32_t next_packet_size;

So maybe "payload_size"? This one, not the "next".

Let's see what other people think, but I'm in favor of just renaming
instead of documenting. Later on the maths change and the comment might
become off-sync with the code.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/4] migration: check for rate_limit_max for RATE_LIMIT_DISABLED
  2023-10-11 18:43 ` [PATCH v2 1/4] migration: check for rate_limit_max for RATE_LIMIT_DISABLED Elena Ufimtseva
@ 2023-10-16  7:23   ` Juan Quintela
  0 siblings, 0 replies; 9+ messages in thread
From: Juan Quintela @ 2023-10-16  7:23 UTC (permalink / raw
  To: Elena Ufimtseva; +Cc: peterx, farosas, leobras, qemu-devel

Elena Ufimtseva <elena.ufimtseva@oracle.com> wrote:
> In migration rate limiting atomic operations are used
> to read the rate limit variables and transferred bytes and
> they are expensive. Check first if rate_limit_max is equal
> to RATE_LIMIT_DISABLED and return false immediately if so.
>
> Note that with this patch we will also will stop flushing
> by not calling qemu_fflush() from migration_transferred_bytes()
> if the migration rate is not exceeded.
> This should be fine since migration thread calls in the loop
> migration_update_counters from migration_rate_limit() that
> calls the migration_transferred_bytes() and flushes there.
>
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Reviewed-by: Fabiano Rosas <farosas@suse.de>
> Reviewed-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

queued.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 3/4] multifd: fix counters in multifd_send_thread
  2023-10-11 18:43 ` [PATCH v2 3/4] multifd: fix counters in multifd_send_thread Elena Ufimtseva
@ 2023-10-16  7:25   ` Juan Quintela
  0 siblings, 0 replies; 9+ messages in thread
From: Juan Quintela @ 2023-10-16  7:25 UTC (permalink / raw
  To: Elena Ufimtseva; +Cc: peterx, farosas, leobras, qemu-devel

Elena Ufimtseva <elena.ufimtseva@oracle.com> wrote:
> Previous commit cbec7eb76879d419e7dbf531ee2506ec0722e825
> "migration/multifd: Compute transferred bytes correctly"
> removed accounting for packet_len in non-rdma
> case, but the next_packet_size only accounts for pages, not for
> the header packet (normal_pages * PAGE_SIZE) that is being sent
> as iov[0]. The packet_len part should be added to account for
> the size of MultiFDPacket and the array of the offsets.
>
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Reviewed-by: Fabiano Rosas <farosas@suse.de>

Reviewed-by: Juan Quintela <quintela@redhat.com>

queued.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 4/4] multifd: reset next_packet_len after sending pages
  2023-10-11 18:43 ` [PATCH v2 4/4] multifd: reset next_packet_len after sending pages Elena Ufimtseva
@ 2023-10-16  7:26   ` Juan Quintela
  0 siblings, 0 replies; 9+ messages in thread
From: Juan Quintela @ 2023-10-16  7:26 UTC (permalink / raw
  To: Elena Ufimtseva; +Cc: peterx, farosas, leobras, qemu-devel

Elena Ufimtseva <elena.ufimtseva@oracle.com> wrote:
> Sometimes multifd sends just sync packet with no pages
> (normal_num is 0). In this case the old value is being
> preserved and being accounted for while only packet_len
> is being transferred.
> Reset it to 0 after sending and accounting for.
>
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Reviewed-by: Fabiano Rosas <farosas@suse.de>

Reviewed-by: Juan Quintela <quintela@redhat.com>



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-10-16  7:27 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-11 18:43 [PATCH v2 0/4] multifd: various fixes Elena Ufimtseva
2023-10-11 18:43 ` [PATCH v2 1/4] migration: check for rate_limit_max for RATE_LIMIT_DISABLED Elena Ufimtseva
2023-10-16  7:23   ` Juan Quintela
2023-10-11 18:43 ` [PATCH v2 2/4] multifd: document packet_len, next_packet_size Elena Ufimtseva
2023-10-11 19:04   ` Fabiano Rosas
2023-10-11 18:43 ` [PATCH v2 3/4] multifd: fix counters in multifd_send_thread Elena Ufimtseva
2023-10-16  7:25   ` Juan Quintela
2023-10-11 18:43 ` [PATCH v2 4/4] multifd: reset next_packet_len after sending pages Elena Ufimtseva
2023-10-16  7:26   ` Juan Quintela

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.