All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Ian Campbell <ian.campbell@citrix.com>
To: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>,
	George Dunlap <george.dunlap@eu.citrix.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Ian Jackson <Ian.Jackson@eu.citrix.com>,
	dslutz@verizon.com, xen-devel@lists.xen.org
Subject: Re: QEMU bumping memory bug analysis
Date: Mon, 8 Jun 2015 14:52:27 +0100	[thread overview]
Message-ID: <1433771547.7108.505.camel@citrix.com> (raw)
In-Reply-To: <alpine.DEB.2.02.1506081338480.19838@kaball.uk.xensource.com>

On Mon, 2015-06-08 at 14:22 +0100, Stefano Stabellini wrote:
> On Mon, 8 Jun 2015, Ian Campbell wrote:
> > On Mon, 2015-06-08 at 12:40 +0100, Stefano Stabellini wrote:
> > 
> > > > > I disagree that libxl should be the arbitrator of a property that is
> > > > > stored, maintained and enforced by Xen. Xen should be the arbitrator.
> > > > 
> > > > That's not what "arbitrator" means, an arbitrator decides what the value
> > > > should be, but that doesn't necessarily imply that it either stores,
> > > > maintains nor enforces that value. 
> > > 
> > > The way you describe it, it looks like some kind of host wide memory
> > > policy manager,
> > 
> > Not necessarily.
> > 
> > We need to make a distinction between the entity which enforces
> > maxpages, that (or those) which decide on what maxpages should be and
> > tells Xen.
> > 
> > Xen is the enforcer. It is also therefore by necessity the holder of the
> > canonical value for the maxpages, since it has to be.
> > 
> > Xen cannot be the second, it simply doesn't have the necessary
> > information to make a decision about a domain's memory limit, it just
> > blindly does whatever it is told by the last thing to tell it something.
> > 
> > There are lots of ways the desired value of maxpages can be decided on,
> > given a bunch of entities all with some level of responsibility for the
> > maxpages of a given domain:
> > 
> >      1. One single entity which decides on the limit and tells Xen. I
> >         think this is the xapi model (not squeezed/balloond, they just
> >         ask xapi to do things AFAIK, so are a red-herring). The unwary
> >         might think "libxl" was in this category, but it isn't since it
> >         is in fact multiple instances of the library.
> >      2. A group of entities which cooperate as a distributed system such
> >         that any one of them can recalculate (from from the shared
> >         state, or by intra entity communication) the current desired
> >         value of max pages and propagate it to Xen. Prior to QEMU
> >         starting to play with the max pages the multiple libxl's on a
> >         system fell into this case, coordinating via xenstore
> >         transactions and libxl userdata.
> 
> The current libxl/xl code doesn't use xenstore or userdata to coordinate
> maxmem values. Xenstore is used to store the memory "target" for the
> in-guest balloon driver. Userdata is used to store a few QEMU
> properties.
> 
> Maxmem is set at runtime by libxl is two cases:
> 
> - when the user calls xl mem-max
> xl mem-max -> libxl_domain_setmaxmem -> xc_domain_setmaxmem
> 
> libxl_domain_setmaxmem does a xenstore read to check the target value,
> not the maxmem value. It doesn't allow the user to lower maxmem below
> the current target value. Arguably this doesn't make too much sense.
> 
> - when libxl_set_memory_target is called with enforce = 1, that happens
> when the user executes xl mem-set
> 
> 
> There is no coordination as far as I can tell. Libxl is just exposing
> xc_domain_setmaxmem straight to the user. There is no locking, no
> synchronization, nothing. Maybe this is where our different views come
> from: I am not seeing libxl doing any arbitrating at all, at least today.

I think this is consistent with what I describe as #2, except that as it
happens there is no need for communication because other than the new
value for the "user requested max memory size" all the other factors are
hardcoded constants (LIBXL_MAXMEM_CONSTANT and friends), so there wasn't
actually any need to communicate in order to be able to recalculate the
new correct value for max pages.

If for example we supported multiple versions of libxl in parallel and
they might have different constants then this would be broken. Luckily
we don't support that.

> >      3. A group of entities which operate in isolation by only ever
> >         increasing or descreasing the max pages according to their own
> >         requirements, without reference to anyone else. When QEMU
> >         entered the fray, and with the various libxl fixes since, you
> >         might think we are implementing this model, but we aren't
> >         because the hypervisor interface is only "set", not
> >         "increment/decrement" and so there is a racy read/modify/write
> >         cycle in every entity now.
> 
> I don't think this is true: QEMU only sets maxmem at domain creation,
> before "xl create" even returns. I think that is safe. We had an email
> exchange at the time, I explained this behaviour and the general opinon
> was that it is acceptable. I don't understand why it is not anymore.

At the time we didn't appreciate the race inherent in this, I think. Or
else the argument is complex enough that it is going to need explaining
again (and again). In which case it really ought to be clearly written
down somewhere. The third option is that we ended up here by a series of
plausible looking changes (the frog in boiling water effect).

Anyway, without locking  the current scheme is not #3 as defined above,
due to the R/M/W race, it is just something a bit like #3 (but broken).

Maybe there is an argument around the locking in libxl being sufficient
at start of day, I'm not sure, it certainly doesn't look like
libxl_set_memory_target(enforce=1) does anything to interlock against
domain creation, so if it were called half way I'm not sure it would do
the right thing. IOW. I think libxl_set_memory_target is now racy
against parallel invocations vs libxl_domain_setmaxmem or during
creation.

I think/hope the xs transaction makes libxl_set_memory_target safe
against itself, but I'm not sure if the transaction fails and we go
around again we might use a stale ptr.max_memkb (because the lost
transaction might have been another libxl_set_memory_target).

> >      4. Xen exposes multiple "maxpages" variables corresponding to the
> >         various entities which have a need to control this (or multiple
> >         per entity). Xen adds all of those up to come up with the actual
> >         maxpages.
> >      5. I thought there wasw one more but I've forgotten what it was.
> > 
> > We used to have #2, we now have a broken version of #3 (broken because
> > it is racy due to the xen interface used).
> 
> I don't think we had #2, we always had a working version of #3 and we
> still have, except for migration being broken.

Before QEMU got involved #2 and #3 were for most practical purposes the
same, I suppose (given the trivial nop coordination in #2 described
above).

Adding QEMU strictly at start of day into the mix didn't inherently
break that (if we assume the libxl creation lock sufficient, which I'm
not sure of), it was the following changes to libxl to use a R/M/W for
updates which is where things start to get much more fishy IMHO.

> I think libxc just needs to properly save/restore maxmem. I understand
> that given the current interfaces it is difficult to do, but rethinking
> the world because we cannot save/restore maxmem in a simple manner
> doesn't seem the right answer to me.

I think save/restore is just what has caused us to notice that the
current model has issues, you are correct that we could apply a band aid
to (perhaps) fix save/restore with the current model. I don't think that
changes the fact that the model has issues though.

Ian.

  reply	other threads:[~2015-06-08 13:52 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-05 16:43 QEMU bumping memory bug analysis Wei Liu
2015-06-05 16:58 ` Ian Campbell
2015-06-05 17:13   ` Stefano Stabellini
2015-06-05 19:06     ` Wei Liu
2015-06-05 17:17   ` Andrew Cooper
2015-06-05 17:39   ` Wei Liu
2015-06-05 17:10 ` Stefano Stabellini
2015-06-05 18:10   ` Wei Liu
2015-06-08 11:39     ` Stefano Stabellini
2015-06-08 12:14       ` Andrew Cooper
2015-06-08 13:01         ` Stefano Stabellini
2015-06-08 13:33           ` Jan Beulich
2015-06-08 13:10       ` Wei Liu
2015-06-08 13:27         ` Stefano Stabellini
2015-06-08 13:32           ` Wei Liu
2015-06-08 13:38             ` Stefano Stabellini
2015-06-08 13:44               ` Andrew Cooper
2015-06-08 13:45                 ` Stefano Stabellini
2015-06-05 18:49   ` Ian Campbell
2015-06-08 11:40     ` Stefano Stabellini
2015-06-08 12:11       ` Ian Campbell
2015-06-08 13:22         ` Stefano Stabellini
2015-06-08 13:52           ` Ian Campbell [this message]
2015-06-08 14:20           ` George Dunlap
2015-06-08 15:01             ` Don Slutz
2015-06-08 15:37               ` George Dunlap
2015-06-08 16:06                 ` Don Slutz
2015-06-09 10:00                   ` George Dunlap
2015-06-09 10:17                     ` Wei Liu
2015-06-09 10:14                 ` Stefano Stabellini
2015-06-09 11:20                   ` George Dunlap
2015-06-16 16:44                     ` Stefano Stabellini
2015-06-09 12:45                   ` Ian Campbell
2015-06-17 13:35                     ` Stefano Stabellini
2015-06-08 14:53         ` Konrad Rzeszutek Wilk
2015-06-08 15:20           ` George Dunlap
2015-06-08 15:42             ` Konrad Rzeszutek Wilk
2015-06-08 14:14   ` George Dunlap

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1433771547.7108.505.camel@citrix.com \
    --to=ian.campbell@citrix.com \
    --cc=Ian.Jackson@eu.citrix.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=dslutz@verizon.com \
    --cc=george.dunlap@eu.citrix.com \
    --cc=stefano.stabellini@eu.citrix.com \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.