From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:38356)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <dgilbert@redhat.com>) id 1Zf3tR-0001me-Po
	for qemu-devel@nongnu.org; Thu, 24 Sep 2015 06:36:50 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <dgilbert@redhat.com>) id 1Zf3tM-0003Ak-EP
	for qemu-devel@nongnu.org; Thu, 24 Sep 2015 06:36:49 -0400
Received: from mx1.redhat.com ([209.132.183.28]:41859)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <dgilbert@redhat.com>) id 1Zf3tM-0003A9-7J
	for qemu-devel@nongnu.org; Thu, 24 Sep 2015 06:36:44 -0400
Date: Thu, 24 Sep 2015 11:36:38 +0100
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Message-ID: <20150924103637.GB31664@work-vm>
References: <1434450415-11339-1-git-send-email-dgilbert@redhat.com>
	<1434450415-11339-42-git-send-email-dgilbert@redhat.com>
	<87zj2yepcq.fsf@neno.neno>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <87zj2yepcq.fsf@neno.neno>
Subject: Re: [Qemu-devel] [PATCH v7 41/42] Disable mlock around incoming
	postcopy
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Juan Quintela <quintela@redhat.com>
Cc: aarcange@redhat.com, yamahata@private.email.ne.jp, liang.z.li@intel.com, qemu-devel@nongnu.org, luis@cs.umu.se, amit.shah@redhat.com, pbonzini@redhat.com, david@gibson.dropbear.id.au

* Juan Quintela (quintela@redhat.com) wrote:
> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > Userfault doesn't work with mlock; mlock is designed to nail down pages
> > so they don't move, userfault is designed to tell you when they're not
> > there.
> >
> > munlock the pages we userfault protect before postcopy.
> > mlock everything again at the end if mlock is enabled.
> >
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  include/sysemu/sysemu.h  |  1 +
> >  migration/postcopy-ram.c | 24 ++++++++++++++++++++++++
> >  2 files changed, 25 insertions(+)
> >
> > diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> > index 1af2ea0..c1f3da4 100644
> > --- a/include/sysemu/sysemu.h
> > +++ b/include/sysemu/sysemu.h
> > @@ -171,6 +171,7 @@ extern int boot_menu;
> >  extern bool boot_strict;
> >  extern uint8_t *boot_splash_filedata;
> >  extern size_t boot_splash_filedata_size;
> > +extern bool enable_mlock;
> >  extern uint8_t qemu_extra_params_fw[2];
> >  extern QEMUClockType rtc_clock;
> >  extern const char *mem_path;
> > diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> > index 7eb1fb9..be7e5f2 100644
> > --- a/migration/postcopy-ram.c
> > +++ b/migration/postcopy-ram.c
> > @@ -85,6 +85,11 @@ static bool ufd_version_check(int ufd)
> >      return true;
> >  }
> >  
> > +/*
> > + * Note: This has the side effect of munlock'ing all of RAM, that's
> > + * normally fine since if the postcopy succeeds it gets turned back on at the
> > + * end.
> > + */
> >  bool postcopy_ram_supported_by_host(void)
> >  {
> >      long pagesize = getpagesize();
> > @@ -113,6 +118,15 @@ bool postcopy_ram_supported_by_host(void)
> >      }
> >  
> >      /*
> > +     * userfault and mlock don't go together; we'll put it back later if
> > +     * it was enabled.
> > +     */
> > +    if (munlockall()) {
> > +        error_report("%s: munlockall: %s", __func__,  strerror(errno));
> 
> 
> why is this not proteced by enable_mlock?

Because there's no harm in doing the 'unlock', and if something else somewhere
other than the enable_mlock had enabled it then postcopy would break.

> > +        return -1;
> > +    }
> > +
> > +    /*
> >       *  We need to check that the ops we need are supported on anon memory
> >       *  To do that we need to register a chunk and see the flags that
> >       *  are returned.
> > @@ -303,6 +317,16 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
> >          mis->have_fault_thread = false;
> >      }
> >  
> > +    if (enable_mlock) {
> > +        if (os_mlock() < 0) {
> > +            error_report("mlock: %s", strerror(errno));
> > +            /*
> > +             * It doesn't feel right to fail at this point, we have a valid
> > +             * VM state.
> > +             */
> 
> realtime_init() exit in case of os_mlock() fails, so current code is:
> 
> - we start qemu with mlock requset
> - we mlock memory
> - we start postcopy
> - we munlock memory
> - we mlock memory
> 
> I wmill really, really preffer having a check if memory is mlocked, and
> it that case, just abort migration altogether.

Although it does look like users want the two together.

>  Or better still, wait to
> enable mlock *until* we have finished postcopy, no?

I think that is likely to:
   a) Produce a longer downtime
    Lets follow your summary points above:

    - we start qemu with mlock requset
    - we mlock memory
      !! This can be expensive - we might have to do some swap and the kernel
         has to make sure it has this available.  But at the end we have enough
         memory.    Anyway, this isn't on any critical path; the source is still
         running.
    - we start postcopy
    - we munlock memory
      !! OK, that's not good, but....
    - we mlock memory
      !! There's a pretty good chance that most of the memory we force allocated
         during the 1st mlock is still available, so this should be faster than the
         1st mlock.

    If we flip it so we do just the mlock at the end of postcopy, it's got a much
    higher chance of needing to swap stuff in.


   b) the main mlock for realtime happens very early in startup in vl.c;
     and that's way before the destination knows it's about to have
     a postcopy migration incoming.

Dave
> 
      
> 
> Later, Juan.
> 
> > +        }
> > +    }
> > +
> >      postcopy_state_set(mis, POSTCOPY_INCOMING_END);
> >      migrate_send_rp_shut(mis, qemu_file_get_error(mis->file) != 0);
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK