Re: QEMU bumping memory bug analysis

All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed

From: Don Slutz <dslutz@verizon.com>
To: George Dunlap <george.dunlap@eu.citrix.com>,
	Stefano Stabellini <stefano.stabellini@eu.citrix.com>,
	Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <Ian.Jackson@eu.citrix.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Wei Liu <wei.liu2@citrix.com>,
	"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Subject: Re: QEMU bumping memory bug analysis
Date: Mon, 08 Jun 2015 12:06:44 -0400	[thread overview]
Message-ID: <5575BD94.2010408@one.verizon.com> (raw)
In-Reply-To: <5575B6D0.8010407@eu.citrix.com>

On 06/08/15 11:37, George Dunlap wrote:
> On 06/08/2015 04:01 PM, Don Slutz wrote:
>> On 06/08/15 10:20, George Dunlap wrote:
>>> And at the moment, pages in the p2m are allocated by a number of entities:
>>> * In the libxc domain builder.
>>> * In the guest balloon driver
>>> * And now, in qemu, to allocate extra memory for virtual ROMs.
>>
>> This is not correct.  QEMU and hvmloader both allocate pages for their
>> use.  LIBXL_MAXMEM_CONSTANT allows QEMU and hvmloader to allocate some
>> pages.  The QEMU change only comes into play after LIBXL_MAXMEM_CONSTANT
>> has been reached.
> 
> Thanks -- so the correct statement here is (in time order):
> 
> Pages in the p2m are allocated by a number of entities:
> * In the libxc domain builder
> * In qemu
> * In hvmloader
> * In the guest balloon driver
> 

That is my understanding.  As Ian C pointed out there is a file:

docs/misc/libxl_memory.txt

That attempts to talk about this.

>>> For the first two, it's libxl that sets maxmem, based in its calculation
>>> of the size of virtual RAM plus various other bits that will be needed.
>>>  Having qemu *also* set maxmem was always the wrong thing to do, IMHO.
>>>
>>
>> It does it for all 3 (4?) because it adds LIBXL_MAXMEM_CONSTANT.
> 
> So the correct statement is:
> 
> In the past, libxl has set maxmem for all of those, based on its
> calculation of virtual RAM plus various other bits that might be needed
> (including pages needed by qemu or hvmloader).
> 
> The change as of qemu $WHATEVER is that now qemu also sets it if it
> finds that libxl didn't give it enough "slack".  That was always the
> wrong thing to do, IMHO.
> 

Ok.

>>> In theory, from the interface perspective, what libxl promises to
>>> provide is virtual RAM.  When you say "memory=8192" in a domain config,
>>> that means (or should mean) 8192MiB of virtual RAM, exclusive of video
>>> RAM, virtual ROMs, and magic pages.  Then when you say "xl mem-set
>>> 4096", it should again be aiming at giving the VM the equivalent of
>>> 4096MiB of virtual RAM, exclusive of video RAM, &c &c.
>>
>>
>> Not what is currently done.  virtual video RAM is subtracted from "memory=".
> 
> Right.
> 
> After I sent this, it occurred to me that there were two sensible
> interpretations of "memory=".  The first is, "This is how much virtual
> RAM to give the guest.  Please allocate non-RAM pages in addition to
> this."  The second is, "This is the total amount of host RAM I want the
> guest to use.  Please take non-RAM pages from this total amount."
> 
> In reality we apparently do neither of these. :-)
> 
> I think both break the "principle of least surprise" in different ways,
> but I suspect that admins on the whole would rather have the second
> interpretation, as I think it makes their lives a bit easier.
> 

Before I knew as much about this as I currently do, I had assumed that
second interpretation was what libxl did.  Normally video RAM is the
largest amount and so the smaller delta (LIBXL_MAXMEM_CONSTANT 1MiB
and LIBXL_HVM_EXTRA_MEMORY 2MiB) just was not noticed.

There is also shadow memory, which needs to be in the above.

>>> We already have the problem that the balloon driver at the moment
>>> doesn't actually know how big the guest RAM is, nor , but is being told
>>> to make a balloon exactly big enough to bring the total RAM down to a
>>> specific target.
>>>
>>> I think we do need to have some place in the middle that actually knows
>>> how much memory is actually needed for the different sub-systems, so it
>>> can calculate and set maxmem appropriately.  libxl is the obvious place.
>>
>> Maybe.  So you want libxl to know the detail of balloon overhead?  How
>> about the different sizes of all possible Option ROMs in all QEMU
>> version?  What about hvmloader usage of memory?
> 
> I'm not sure what you mean by "balloon overhead", but if you mean "guest
> pages wasted keeping track of pages which have been ballooned out", then
> no, that's not what I mean.  Neither libxl nor the balloon driver keep
> track of that at the moment.
> 

I was trying to refer to:

NOTE: Because of the way ballooning works, the guest has to allocate
memory to keep track of maxmem pages, regardless of how much memory it
actually has available to it.  A guest with maxmem=262144 and
memory=8096 will report significantly less memory available for use than
a system with maxmem=8096 memory=8096 due to the memory overhead of
having to track the unused pages.

(from xl.cfg man page).

> I think that qemu needs to tell libxl how much memory it is using for
> all of its needs -- including option ROMs.  (See my example below.)  For
> older qemus we can just make some assumptions like we always have.
> 

I am happy with this.  Note: I think libxl could determine this number
now without QEMU changes.  However it does depend on no other thread
changing a "staring" domain's memory while libxl is calculating this.

> I do think it would make sense to have the hvmloader amount listed
> somewhere explicitly.  I'm not sure how often hvmloader may need to
> change the amount it uses for itself.
> 

hvmloader does yet a different method.  If
xc_domain_populate_physmap_exact() fails, it reduces guest RAM (if my
memory is correct).

>>> What about this:
>>> * Libxl has a maximum amount of RAM that qemu is *allowed* to use to set
>>> up virtual ROMs, video ram for virtual devices, &c
>>> * At start-of-day, it sets maxpages to PAGES(virtual RAM)+PAGES(magic) +
>>> max_qemu_pages
>>> * Qemu allocates as many pages as it needs for option ROMS, and writes
>>> the amount that it actually did use into a special node in xenstore.
>>> * When the domain is unpaused, libxl will set maxpages to PAGES(virtual
>>> RAM) + PAGES(magic) + actual_qemu_pages that it gets from xenstore.
>>>
>>
>> I think this does match What Wei Liu said:
> 
> The suggestion you quote below is that the *user* should have to put in
> some number in the config file, not that qemu should write the number
> into xenstore.
> 
> The key distinction of my suggestion was to set maxpages purposely high,
> wait for qemu to use what it needs, then to reduce it down to what was
> needed.
> 

Sorry, I did not get that.

   -Don Slutz

>  -George
>

next prev parent reply	other threads:[~2015-06-08 16:06 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-05 16:43 QEMU bumping memory bug analysis Wei Liu
2015-06-05 16:58 ` Ian Campbell
2015-06-05 17:13   ` Stefano Stabellini
2015-06-05 19:06     ` Wei Liu
2015-06-05 17:17   ` Andrew Cooper
2015-06-05 17:39   ` Wei Liu
2015-06-05 17:10 ` Stefano Stabellini
2015-06-05 18:10   ` Wei Liu
2015-06-08 11:39     ` Stefano Stabellini
2015-06-08 12:14       ` Andrew Cooper
2015-06-08 13:01         ` Stefano Stabellini
2015-06-08 13:33           ` Jan Beulich
2015-06-08 13:10       ` Wei Liu
2015-06-08 13:27         ` Stefano Stabellini
2015-06-08 13:32           ` Wei Liu
2015-06-08 13:38             ` Stefano Stabellini
2015-06-08 13:44               ` Andrew Cooper
2015-06-08 13:45                 ` Stefano Stabellini
2015-06-05 18:49   ` Ian Campbell
2015-06-08 11:40     ` Stefano Stabellini
2015-06-08 12:11       ` Ian Campbell
2015-06-08 13:22         ` Stefano Stabellini
2015-06-08 13:52           ` Ian Campbell
2015-06-08 14:20           ` George Dunlap
2015-06-08 15:01             ` Don Slutz
2015-06-08 15:37               ` George Dunlap
2015-06-08 16:06                 ` Don Slutz [this message]
2015-06-09 10:00                   ` George Dunlap
2015-06-09 10:17                     ` Wei Liu
2015-06-09 10:14                 ` Stefano Stabellini
2015-06-09 11:20                   ` George Dunlap
2015-06-16 16:44                     ` Stefano Stabellini
2015-06-09 12:45                   ` Ian Campbell
2015-06-17 13:35                     ` Stefano Stabellini
2015-06-08 14:53         ` Konrad Rzeszutek Wilk
2015-06-08 15:20           ` George Dunlap
2015-06-08 15:42             ` Konrad Rzeszutek Wilk
2015-06-08 14:14   ` George Dunlap

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5575BD94.2010408@one.verizon.com \
    --to=dslutz@verizon.com \
    --cc=Ian.Jackson@eu.citrix.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=george.dunlap@eu.citrix.com \
    --cc=ian.campbell@citrix.com \
    --cc=stefano.stabellini@eu.citrix.com \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.