QEMU-Devel Archive mirror
 help / color / mirror / Atom feed
From: Fabiano Rosas <farosas@suse.de>
To: "Peter Xu" <peterx@redhat.com>,
	"Daniel P. Berrangé" <berrange@redhat.com>
Cc: qemu-devel@nongnu.org, armbru@redhat.com,
	Claudio Fontana <cfontana@suse.de>, Jim Fehlig <jfehlig@suse.com>
Subject: Re: [PATCH 8/9] migration: Add support for fdset with multifd + file
Date: Wed, 08 May 2024 17:39:53 -0300	[thread overview]
Message-ID: <87cypwkpgm.fsf@suse.de> (raw)
In-Reply-To: <ZjvDCA9QvTI-zFf9@x1n>

Peter Xu <peterx@redhat.com> writes:

> On Wed, May 08, 2024 at 09:53:48AM +0100, Daniel P. Berrangé wrote:
>> On Fri, Apr 26, 2024 at 11:20:41AM -0300, Fabiano Rosas wrote:
>> > Allow multifd to use an fdset when migrating to a file. This is useful
>> > for the scenario where the management layer wants to have control over
>> > the migration file.
>> > 
>> > By receiving the file descriptors directly, QEMU can delegate some
>> > high level operating system operations to the management layer (such
>> > as mandatory access control). The management layer might also want to
>> > add its own headers before the migration stream.
>> > 
>> > Enable the "file:/dev/fdset/#" syntax for the multifd migration with
>> > mapped-ram. The requirements for the fdset mechanism are:
>> > 
>> > On the migration source side:
>> > 
>> > - the fdset must contain two fds that are not duplicates between
>> >   themselves;
>> > - if direct-io is to be used, exactly one of the fds must have the
>> >   O_DIRECT flag set;
>> > - the file must be opened with WRONLY both times.
>> > 
>> > On the migration destination side:
>> > 
>> > - the fdset must contain one fd;
>> > - the file must be opened with RDONLY.
>> > 
>> > Signed-off-by: Fabiano Rosas <farosas@suse.de>
>> > ---
>> >  docs/devel/migration/main.rst       | 18 ++++++++++++++
>> >  docs/devel/migration/mapped-ram.rst |  6 ++++-
>> >  migration/file.c                    | 38 ++++++++++++++++++++++++++++-
>> >  3 files changed, 60 insertions(+), 2 deletions(-)
>> > 
>> > diff --git a/docs/devel/migration/main.rst b/docs/devel/migration/main.rst
>> > index 54385a23e5..50f6096470 100644
>> > --- a/docs/devel/migration/main.rst
>> > +++ b/docs/devel/migration/main.rst
>> > @@ -47,6 +47,24 @@ over any transport.
>> >    QEMU interference. Note that QEMU does not flush cached file
>> >    data/metadata at the end of migration.
>> >  
>> > +  The file migration also supports using a file that has already been
>> > +  opened. A set of file descriptors is passed to QEMU via an "fdset"
>> > +  (see add-fd QMP command documentation). This method allows a
>> > +  management application to have control over the migration file
>> > +  opening operation. There are, however, strict requirements to this
>> > +  interface:
>> > +
>> > +  On the migration source side:
>> > +    - if the multifd capability is to be used, the fdset must contain
>> > +      two file descriptors that are not duplicates between themselves;
>> > +    - if the direct-io capability is to be used, exactly one of the
>> > +      file descriptors must have the O_DIRECT flag set;
>> > +    - the file must be opened with WRONLY.
>> > +
>> > +  On the migration destination side:
>> > +    - the fdset must contain one file descriptor;
>> > +    - the file must be opened with RDONLY.
>> > +
>> >  In addition, support is included for migration using RDMA, which
>> >  transports the page data using ``RDMA``, where the hardware takes care of
>> >  transporting the pages, and the load on the CPU is much lower.  While the
>> > diff --git a/docs/devel/migration/mapped-ram.rst b/docs/devel/migration/mapped-ram.rst
>> > index fa4cefd9fc..e6505511f0 100644
>> > --- a/docs/devel/migration/mapped-ram.rst
>> > +++ b/docs/devel/migration/mapped-ram.rst
>> > @@ -16,7 +16,7 @@ location in the file, rather than constantly being added to a
>> >  sequential stream. Having the pages at fixed offsets also allows the
>> >  usage of O_DIRECT for save/restore of the migration stream as the
>> >  pages are ensured to be written respecting O_DIRECT alignment
>> > -restrictions (direct-io support not yet implemented).
>> > +restrictions.
>> >  
>> >  Usage
>> >  -----
>> > @@ -35,6 +35,10 @@ Use a ``file:`` URL for migration:
>> >  Mapped-ram migration is best done non-live, i.e. by stopping the VM on
>> >  the source side before migrating.
>> >  
>> > +For best performance enable the ``direct-io`` capability as well:
>> > +
>> > +    ``migrate_set_capability direct-io on``
>> > +
>> >  Use-cases
>> >  ---------
>> >  
>> > diff --git a/migration/file.c b/migration/file.c
>> > index b9265b14dd..3bc8bc7463 100644
>> > --- a/migration/file.c
>> > +++ b/migration/file.c
>> > @@ -17,6 +17,7 @@
>> >  #include "io/channel-file.h"
>> >  #include "io/channel-socket.h"
>> >  #include "io/channel-util.h"
>> > +#include "monitor/monitor.h"
>> >  #include "options.h"
>> >  #include "trace.h"
>> >  
>> > @@ -54,10 +55,18 @@ static void file_remove_fdset(void)
>> >      }
>> >  }
>> >  
>> > +/*
>> > + * With multifd, due to the behavior of the dup() system call, we need
>> > + * the fdset to have two non-duplicate fds so we can enable direct IO
>> > + * in the secondary channels without affecting the main channel.
>> > + */
>> >  static bool file_parse_fdset(const char *filename, int64_t *fdset_id,
>> >                               Error **errp)
>> >  {
>> > +    FdsetInfoList *fds_info;
>> > +    FdsetFdInfoList *fd_info;
>> >      const char *fdset_id_str;
>> > +    int nfds = 0;
>> >  
>> >      *fdset_id = -1;
>> >  
>> > @@ -71,6 +80,32 @@ static bool file_parse_fdset(const char *filename, int64_t *fdset_id,
>> >          return false;
>> >      }
>> >  
>> > +    if (!migrate_multifd() || !migrate_direct_io()) {
>> > +        return true;
>> > +    }
>> > +
>> > +    for (fds_info = qmp_query_fdsets(NULL); fds_info;
>> > +         fds_info = fds_info->next) {
>> > +
>> > +        if (*fdset_id != fds_info->value->fdset_id) {
>> > +            continue;
>> > +        }
>> > +
>> > +        for (fd_info = fds_info->value->fds; fd_info; fd_info = fd_info->next) {
>> > +            if (nfds++ > 2) {
>> > +                break;
>> > +            }
>> > +        }
>> > +    }
>> > +
>> > +    if (nfds != 2) {
>> > +        error_setg(errp, "Outgoing migration needs two fds in the fdset, "
>> > +                   "got %d", nfds);
>> > +        qmp_remove_fd(*fdset_id, false, -1, NULL);
>> > +        *fdset_id = -1;
>> > +        return false;
>> > +    }
>> > +
>> >      return true;
>> >  }
>> 
>> Related to my thoughts in an earlier patch, where I say that use of fdsets
>> ought to be transparent to QEMU code, I'm not a fan of having this logic
>> in migration code.
>> 
>> IIUC, the migration code will call  qio_channel_file_new_path twice,
>> once with O_DIRECT and once without. This should trigger two calls
>> into monitor_fdset_dup_fd_add with different flags. If we're matching
>> flags in that monitor_fdset_dup_fd_add(), then if only 1 FD was
>> provided, are we not able to report an error there ?
>
> Right, this sounds working.

It works, but due to how low-level fdset is, it's difficult to match the
low level error to anything meaningful we can report to the user. I'll
have to add an errp to monitor_fdset_dup_fd_add(). Its returns are not
very useful.

-1 with no errno
-1 with EACCES (should actually be EBADF)
-1 with ENOENT

There has been some discusstion around this before actually:

https://lists.gnu.org/archive/html/qemu-devel/2021-08/msg02544.html

Or, you know, let the management layer figure it out. We seem to be
heading in this direction already. I imagine once the code is written to
interact with QEMU, it would not have any reason to change, so it might
be ok to replace some of the code I'm adding in this series with
documentation and call it a day. I don't like this approach very much,
but it would definitely make this series way simpler.

>
> For a real sanity check, we may want to somehow check the two fds returned
> from qio_channel_file_new_path() to point to the same file underneath.
>
> What mentioned in the other thread (kcmp with KCMP_FILE) might not work, as
> the whole purpose of having two fds is to make sure they have different
> struct file to back the fd (and only one of them has O_DIRECT).  fstat()
> might work in this case over the st_ino field, etc. maybe fstatfs() too but
> perhaps that's over cautious.  Just a pain to use two fds as a start..
>
> Thanks,


  reply	other threads:[~2024-05-08 20:40 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-26 14:20 [PATCH 0/9] migration/mapped-ram: Add direct-io support Fabiano Rosas
2024-04-26 14:20 ` [PATCH 1/9] monitor: Honor QMP request for fd removal immediately Fabiano Rosas
2024-05-03 16:02   ` Peter Xu
2024-05-16 21:46     ` Fabiano Rosas
2024-05-08  7:17   ` Daniel P. Berrangé
2024-05-16 22:00     ` Fabiano Rosas
2024-05-17  7:33       ` Daniel P. Berrangé
2024-04-26 14:20 ` [PATCH 2/9] migration: Fix file migration with fdset Fabiano Rosas
2024-05-03 16:23   ` Peter Xu
2024-05-03 19:56     ` Fabiano Rosas
2024-05-03 21:04       ` Peter Xu
2024-05-03 21:31         ` Fabiano Rosas
2024-05-03 21:56           ` Peter Xu
2024-05-08  8:02     ` Daniel P. Berrangé
2024-05-08 12:49       ` Peter Xu
2024-05-08  8:00   ` Daniel P. Berrangé
2024-05-08 20:45     ` Fabiano Rosas
2024-04-26 14:20 ` [PATCH 3/9] tests/qtest/migration: Fix file migration offset check Fabiano Rosas
2024-05-03 16:47   ` Peter Xu
2024-05-03 20:36     ` Fabiano Rosas
2024-05-03 21:08       ` Peter Xu
2024-05-08  8:10       ` Daniel P. Berrangé
2024-04-26 14:20 ` [PATCH 4/9] migration: Add direct-io parameter Fabiano Rosas
2024-04-26 14:33   ` Markus Armbruster
2024-05-03 18:05   ` Peter Xu
2024-05-03 20:49     ` Fabiano Rosas
2024-05-03 21:16       ` Peter Xu
2024-05-14 14:10         ` Markus Armbruster
2024-05-14 17:57           ` Fabiano Rosas
2024-05-15  7:17             ` Markus Armbruster
2024-05-15 12:51               ` Fabiano Rosas
2024-05-08  8:25   ` Daniel P. Berrangé
2024-04-26 14:20 ` [PATCH 5/9] migration/multifd: Add direct-io support Fabiano Rosas
2024-05-03 18:29   ` Peter Xu
2024-05-03 20:54     ` Fabiano Rosas
2024-05-03 21:18       ` Peter Xu
2024-05-08  8:27   ` Daniel P. Berrangé
2024-04-26 14:20 ` [PATCH 6/9] tests/qtest/migration: Add tests for file migration with direct-io Fabiano Rosas
2024-05-03 18:38   ` Peter Xu
2024-05-03 21:05     ` Fabiano Rosas
2024-05-03 21:25       ` Peter Xu
2024-05-08  8:34   ` Daniel P. Berrangé
2024-04-26 14:20 ` [PATCH 7/9] monitor: fdset: Match against O_DIRECT Fabiano Rosas
2024-05-03 18:53   ` Peter Xu
2024-05-03 21:19     ` Fabiano Rosas
2024-05-03 22:16       ` Peter Xu
2024-04-26 14:20 ` [PATCH 8/9] migration: Add support for fdset with multifd + file Fabiano Rosas
2024-05-08  8:53   ` Daniel P. Berrangé
2024-05-08 18:23     ` Peter Xu
2024-05-08 20:39       ` Fabiano Rosas [this message]
2024-05-09  8:08         ` Daniel P. Berrangé
2024-05-17 22:43           ` Fabiano Rosas
2024-05-18  8:36             ` Daniel P. Berrangé
2024-04-26 14:20 ` [PATCH 9/9] tests/qtest/migration: Add a test for mapped-ram with passing of fds Fabiano Rosas
2024-05-08  8:56   ` Daniel P. Berrangé
2024-05-02 20:01 ` [PATCH 0/9] migration/mapped-ram: Add direct-io support Peter Xu
2024-05-02 20:34   ` Fabiano Rosas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87cypwkpgm.fsf@suse.de \
    --to=farosas@suse.de \
    --cc=armbru@redhat.com \
    --cc=berrange@redhat.com \
    --cc=cfontana@suse.de \
    --cc=jfehlig@suse.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).