All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Stefano Stabellini <stefano.stabellini@eu.citrix.com>,
	Wei Liu <wei.liu2@citrix.com>
Cc: George Dunlap <george.dunlap@eu.citrix.com>,
	Ian Jackson <Ian.Jackson@eu.citrix.com>,
	Ian Campbell <ian.campbell@citrix.com>,
	dslutz@verizon.com, xen-devel@lists.xen.org
Subject: Re: QEMU bumping memory bug analysis
Date: Mon, 8 Jun 2015 13:14:58 +0100	[thread overview]
Message-ID: <55758742.5060602@citrix.com> (raw)
In-Reply-To: <alpine.DEB.2.02.1506081211210.19838@kaball.uk.xensource.com>

On 08/06/15 12:39, Stefano Stabellini wrote:
> On Fri, 5 Jun 2015, Wei Liu wrote:
>> On Fri, Jun 05, 2015 at 06:10:17PM +0100, Stefano Stabellini wrote:
>>> On Fri, 5 Jun 2015, Wei Liu wrote:
>>>> Hi all
>>>>
>>>> This bug is now considered a blocker for 4.6 release.
>>>>
>>>> The premises of the problem remain the same (George's translated
>>>> version):
>>>>
>>>> 1. QEMU may need extra pages from Xen to implement option ROMS, and so at
>>>>    the moment it calls set_max_mem() to increase max_pages so that it can
>>>>    allocate more pages to the guest.  libxl doesn't know what max_pages a
>>>>    domain needs prior to qemu start-up.
>>>>
>>>> 2. Libxl doesn't know max_pages even after qemu start-up, because there
>>>>    is no mechanism to communicate between qemu and libxl.
>>> I might not know what is the right design for the overall solution, but
>>> I do know that libxl shouldn't have its own state tracking for
>>> max_pages, because max_pages is kept, maintained and enforced by Xen.
>>>
>>> Ian might still remember, but at the beginning of the xl/libxl project,
>>> we had few simple design principles. One of which was that we should not
>>> have two places where we keep track of the same thing. If Xen keeps
>>> track of something, libxl should avoid it.
>>>
>>> In this specific case, libxl can ask Xen at any time what max_pages is
>>> for the domain, so I don't think that libxl should store it or have its
>>> own tracking for it.
>>>
>>> Even if QEMU called into libxl to change max_pages, I don't think that
>>> libxl should store max_pages anywhere. It is already stored in Xen and
>>> can be retrieved at any time.
>>>
>> I think you're talking about keep track of that record permanently. I
>> only care about getting the right value at the right time and transfer
>> it to the other end. Getting the value whenever needed is OK.
>>
>>>> 3. QEMU calls xc_domain_setmaxmem to increase max_pages by N pages.
>>>>    Those pages are only accounted for in the hypervisor.  Libxl
>>>>    (currently) does not extract that value from the hypervisor.
>>>>
>>>> Several solutions were proposed:
>>>>
>>>> 1. Add a new record type in libxc migration stream and call setmaxmem
>>>>    in the middle of xc migration stream.
>>>>
>>>> Main objections are calling xc_domain_setmaxmem in the middle of xc
>>>> migration stream is layer violation. Also this prevents us from
>>>> disaggregating domain construction to a less privileged domain.
>>> It seems to me that max_pages is one of the memory properties of the
>>> domain, so it should be saved and restored together with the rest of
>>> memory.
>>>
>> #1 was actually referring to Don's patch. When processing libxc record
>> in the middle of the stream, we should not alter the size of memory.
>> It's not safe because we don't know whether that record comes early
>> enough before we exceed the limit.
>>
>> The only safe way of doing it is to mandate that specific record at
>> the beginning of libxc stream, which might have other implications.
> I see, that makes sense
>
>
>>>> 2. Use libxl toolstack save restore blob to tranmit max pages
>>>>    information to remote end.
>>>>
>>>> This is considered a bodge and has been proven not to work because
>>>> toolstack blob restore happens after xc_domain_restore.
>>> Saving and restoring max_pages in libxl seems to me like a layering
>>> violation. I would avoid 2. 3. 4. and 5.
>>>
>> No, not really. If we follow the principle of "libxl is the
>> arbitrator". It needs to have thorough information on every aspect of
>> the domain and set up limit.
> I am not sure I buy this "libxl is the arbitrator" concept. I am not
> seeing libxl as adding much value in this context.
>
>  
>>>> 3. Add a libxl layer that wraps necessary information, take over
>>>>    Andrew's work on libxl migration v2.  Having a libxl layer that's not
>>>>    part of migration v2 is a waste of effort.
>>>>
>>>> There are several obstacles for libxl migration v2 at the moment. Libxl
>>>> layer in migration v2 still has unresolved issues. It has
>>>> inter-dependency with Remus / COLO.
>>>>
>>>> Most importantly it doesn't inherently solve the problem. It still
>>>> requires the current libxl JSON blob to contain information about max
>>>> pages (or information used to derive max pages).
>>>>
>>>> Andrew, correct me if I'm wrong.
>>>>
>>>> 4. Add a none user configurable field in current libxl JSON structure to
>>>>    record max pages information.
>>>>
>>>> This is not desirable. All fields in libxl JSON should be user
>>>> configurable.
>>>>
>>>> 5. Add a user configurable field in current libxl JSON structure to
>>>>    record how much more memory this domain needs. Admin is required to
>>>>    fill in that value manually. In the mean time we revert the change in
>>>>    QEMU and declare QEMU with that change buggy.
>>> QEMU 2.3.0 was released with that change in it, so it is not quite
>>> possible to revert it. Also I think it is the right change for QEMU.
>>>
>> It has security implications. Here is my reply copied from my mail to
>> Ian:
>>
>> I'm considering removing xc_domain_setmaxmem needs regardless of this
>> bug because that's going to cause problem in QEMU upstream stubdom with
>> strict XSM policy and deprivileged QEMU (may not have privilege to call
>> setmaxmem).
> QEMU running in the stubdom should be able to set the maxmem for its
> target domain, but not for the others.

At the moment, set_max_mem is the only method the toolstack has of
putting a hard upper bound on a domains memory usage (along with a few
others like the shadow mem size, and PoD cache.)

In a disaggregated case, no deprivileged entity should be able to play
with this limit.  Being able to do so renders the security moot, as a
compromised stubdom can force a host OOM.

~Andrew

  reply	other threads:[~2015-06-08 12:14 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-05 16:43 QEMU bumping memory bug analysis Wei Liu
2015-06-05 16:58 ` Ian Campbell
2015-06-05 17:13   ` Stefano Stabellini
2015-06-05 19:06     ` Wei Liu
2015-06-05 17:17   ` Andrew Cooper
2015-06-05 17:39   ` Wei Liu
2015-06-05 17:10 ` Stefano Stabellini
2015-06-05 18:10   ` Wei Liu
2015-06-08 11:39     ` Stefano Stabellini
2015-06-08 12:14       ` Andrew Cooper [this message]
2015-06-08 13:01         ` Stefano Stabellini
2015-06-08 13:33           ` Jan Beulich
2015-06-08 13:10       ` Wei Liu
2015-06-08 13:27         ` Stefano Stabellini
2015-06-08 13:32           ` Wei Liu
2015-06-08 13:38             ` Stefano Stabellini
2015-06-08 13:44               ` Andrew Cooper
2015-06-08 13:45                 ` Stefano Stabellini
2015-06-05 18:49   ` Ian Campbell
2015-06-08 11:40     ` Stefano Stabellini
2015-06-08 12:11       ` Ian Campbell
2015-06-08 13:22         ` Stefano Stabellini
2015-06-08 13:52           ` Ian Campbell
2015-06-08 14:20           ` George Dunlap
2015-06-08 15:01             ` Don Slutz
2015-06-08 15:37               ` George Dunlap
2015-06-08 16:06                 ` Don Slutz
2015-06-09 10:00                   ` George Dunlap
2015-06-09 10:17                     ` Wei Liu
2015-06-09 10:14                 ` Stefano Stabellini
2015-06-09 11:20                   ` George Dunlap
2015-06-16 16:44                     ` Stefano Stabellini
2015-06-09 12:45                   ` Ian Campbell
2015-06-17 13:35                     ` Stefano Stabellini
2015-06-08 14:53         ` Konrad Rzeszutek Wilk
2015-06-08 15:20           ` George Dunlap
2015-06-08 15:42             ` Konrad Rzeszutek Wilk
2015-06-08 14:14   ` George Dunlap

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55758742.5060602@citrix.com \
    --to=andrew.cooper3@citrix.com \
    --cc=Ian.Jackson@eu.citrix.com \
    --cc=dslutz@verizon.com \
    --cc=george.dunlap@eu.citrix.com \
    --cc=ian.campbell@citrix.com \
    --cc=stefano.stabellini@eu.citrix.com \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.