From mboxrd@z Thu Jan 1 00:00:00 1970 From: Don Slutz Subject: Re: QEMU bumping memory bug analysis Date: Mon, 08 Jun 2015 12:06:44 -0400 Message-ID: <5575BD94.2010408@one.verizon.com> References: <20150605164354.GK29102@zion.uk.xensource.com> <1433530180.3342.17.camel@citrix.com> <1433765498.7108.480.camel@citrix.com> <5575A4C5.5070702@eu.citrix.com> <5575AE47.3020208@one.verizon.com> <5575B6D0.8010407@eu.citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <5575B6D0.8010407@eu.citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: George Dunlap , Stefano Stabellini , Ian Campbell Cc: Ian Jackson , Andrew Cooper , Wei Liu , "xen-devel@lists.xen.org" List-Id: xen-devel@lists.xenproject.org On 06/08/15 11:37, George Dunlap wrote: > On 06/08/2015 04:01 PM, Don Slutz wrote: >> On 06/08/15 10:20, George Dunlap wrote: >>> And at the moment, pages in the p2m are allocated by a number of entities: >>> * In the libxc domain builder. >>> * In the guest balloon driver >>> * And now, in qemu, to allocate extra memory for virtual ROMs. >> >> This is not correct. QEMU and hvmloader both allocate pages for their >> use. LIBXL_MAXMEM_CONSTANT allows QEMU and hvmloader to allocate some >> pages. The QEMU change only comes into play after LIBXL_MAXMEM_CONSTANT >> has been reached. > > Thanks -- so the correct statement here is (in time order): > > Pages in the p2m are allocated by a number of entities: > * In the libxc domain builder > * In qemu > * In hvmloader > * In the guest balloon driver > That is my understanding. As Ian C pointed out there is a file: docs/misc/libxl_memory.txt That attempts to talk about this. >>> For the first two, it's libxl that sets maxmem, based in its calculation >>> of the size of virtual RAM plus various other bits that will be needed. >>> Having qemu *also* set maxmem was always the wrong thing to do, IMHO. >>> >> >> It does it for all 3 (4?) because it adds LIBXL_MAXMEM_CONSTANT. > > So the correct statement is: > > In the past, libxl has set maxmem for all of those, based on its > calculation of virtual RAM plus various other bits that might be needed > (including pages needed by qemu or hvmloader). > > The change as of qemu $WHATEVER is that now qemu also sets it if it > finds that libxl didn't give it enough "slack". That was always the > wrong thing to do, IMHO. > Ok. >>> In theory, from the interface perspective, what libxl promises to >>> provide is virtual RAM. When you say "memory=8192" in a domain config, >>> that means (or should mean) 8192MiB of virtual RAM, exclusive of video >>> RAM, virtual ROMs, and magic pages. Then when you say "xl mem-set >>> 4096", it should again be aiming at giving the VM the equivalent of >>> 4096MiB of virtual RAM, exclusive of video RAM, &c &c. >> >> >> Not what is currently done. virtual video RAM is subtracted from "memory=". > > Right. > > After I sent this, it occurred to me that there were two sensible > interpretations of "memory=". The first is, "This is how much virtual > RAM to give the guest. Please allocate non-RAM pages in addition to > this." The second is, "This is the total amount of host RAM I want the > guest to use. Please take non-RAM pages from this total amount." > > In reality we apparently do neither of these. :-) > > I think both break the "principle of least surprise" in different ways, > but I suspect that admins on the whole would rather have the second > interpretation, as I think it makes their lives a bit easier. > Before I knew as much about this as I currently do, I had assumed that second interpretation was what libxl did. Normally video RAM is the largest amount and so the smaller delta (LIBXL_MAXMEM_CONSTANT 1MiB and LIBXL_HVM_EXTRA_MEMORY 2MiB) just was not noticed. There is also shadow memory, which needs to be in the above. >>> We already have the problem that the balloon driver at the moment >>> doesn't actually know how big the guest RAM is, nor , but is being told >>> to make a balloon exactly big enough to bring the total RAM down to a >>> specific target. >>> >>> I think we do need to have some place in the middle that actually knows >>> how much memory is actually needed for the different sub-systems, so it >>> can calculate and set maxmem appropriately. libxl is the obvious place. >> >> Maybe. So you want libxl to know the detail of balloon overhead? How >> about the different sizes of all possible Option ROMs in all QEMU >> version? What about hvmloader usage of memory? > > I'm not sure what you mean by "balloon overhead", but if you mean "guest > pages wasted keeping track of pages which have been ballooned out", then > no, that's not what I mean. Neither libxl nor the balloon driver keep > track of that at the moment. > I was trying to refer to: NOTE: Because of the way ballooning works, the guest has to allocate memory to keep track of maxmem pages, regardless of how much memory it actually has available to it. A guest with maxmem=262144 and memory=8096 will report significantly less memory available for use than a system with maxmem=8096 memory=8096 due to the memory overhead of having to track the unused pages. (from xl.cfg man page). > I think that qemu needs to tell libxl how much memory it is using for > all of its needs -- including option ROMs. (See my example below.) For > older qemus we can just make some assumptions like we always have. > I am happy with this. Note: I think libxl could determine this number now without QEMU changes. However it does depend on no other thread changing a "staring" domain's memory while libxl is calculating this. > I do think it would make sense to have the hvmloader amount listed > somewhere explicitly. I'm not sure how often hvmloader may need to > change the amount it uses for itself. > hvmloader does yet a different method. If xc_domain_populate_physmap_exact() fails, it reduces guest RAM (if my memory is correct). >>> What about this: >>> * Libxl has a maximum amount of RAM that qemu is *allowed* to use to set >>> up virtual ROMs, video ram for virtual devices, &c >>> * At start-of-day, it sets maxpages to PAGES(virtual RAM)+PAGES(magic) + >>> max_qemu_pages >>> * Qemu allocates as many pages as it needs for option ROMS, and writes >>> the amount that it actually did use into a special node in xenstore. >>> * When the domain is unpaused, libxl will set maxpages to PAGES(virtual >>> RAM) + PAGES(magic) + actual_qemu_pages that it gets from xenstore. >>> >> >> I think this does match What Wei Liu said: > > The suggestion you quote below is that the *user* should have to put in > some number in the config file, not that qemu should write the number > into xenstore. > > The key distinction of my suggestion was to set maxpages purposely high, > wait for qemu to use what it needs, then to reduce it down to what was > needed. > Sorry, I did not get that. -Don Slutz > -George >