From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stefano Stabellini Subject: Re: QEMU bumping memory bug analysis Date: Mon, 8 Jun 2015 12:39:52 +0100 Message-ID: References: <20150605164354.GK29102@zion.uk.xensource.com> <20150605181014.GM29102@zion.uk.xensource.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20150605181014.GM29102@zion.uk.xensource.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Wei Liu Cc: Ian Campbell , Stefano Stabellini , George Dunlap , Andrew Cooper , Ian Jackson , dslutz@verizon.com, xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org On Fri, 5 Jun 2015, Wei Liu wrote: > On Fri, Jun 05, 2015 at 06:10:17PM +0100, Stefano Stabellini wrote: > > On Fri, 5 Jun 2015, Wei Liu wrote: > > > Hi all > > > > > > This bug is now considered a blocker for 4.6 release. > > > > > > The premises of the problem remain the same (George's translated > > > version): > > > > > > 1. QEMU may need extra pages from Xen to implement option ROMS, and so at > > > the moment it calls set_max_mem() to increase max_pages so that it can > > > allocate more pages to the guest. libxl doesn't know what max_pages a > > > domain needs prior to qemu start-up. > > > > > > 2. Libxl doesn't know max_pages even after qemu start-up, because there > > > is no mechanism to communicate between qemu and libxl. > > > > I might not know what is the right design for the overall solution, but > > I do know that libxl shouldn't have its own state tracking for > > max_pages, because max_pages is kept, maintained and enforced by Xen. > > > > Ian might still remember, but at the beginning of the xl/libxl project, > > we had few simple design principles. One of which was that we should not > > have two places where we keep track of the same thing. If Xen keeps > > track of something, libxl should avoid it. > > > > In this specific case, libxl can ask Xen at any time what max_pages is > > for the domain, so I don't think that libxl should store it or have its > > own tracking for it. > > > > Even if QEMU called into libxl to change max_pages, I don't think that > > libxl should store max_pages anywhere. It is already stored in Xen and > > can be retrieved at any time. > > > > I think you're talking about keep track of that record permanently. I > only care about getting the right value at the right time and transfer > it to the other end. Getting the value whenever needed is OK. > > > > > > 3. QEMU calls xc_domain_setmaxmem to increase max_pages by N pages. > > > Those pages are only accounted for in the hypervisor. Libxl > > > (currently) does not extract that value from the hypervisor. > > > > > > Several solutions were proposed: > > > > > > 1. Add a new record type in libxc migration stream and call setmaxmem > > > in the middle of xc migration stream. > > > > > > Main objections are calling xc_domain_setmaxmem in the middle of xc > > > migration stream is layer violation. Also this prevents us from > > > disaggregating domain construction to a less privileged domain. > > > > It seems to me that max_pages is one of the memory properties of the > > domain, so it should be saved and restored together with the rest of > > memory. > > > > #1 was actually referring to Don's patch. When processing libxc record > in the middle of the stream, we should not alter the size of memory. > It's not safe because we don't know whether that record comes early > enough before we exceed the limit. > > The only safe way of doing it is to mandate that specific record at > the beginning of libxc stream, which might have other implications. I see, that makes sense > > > > > 2. Use libxl toolstack save restore blob to tranmit max pages > > > information to remote end. > > > > > > This is considered a bodge and has been proven not to work because > > > toolstack blob restore happens after xc_domain_restore. > > > > Saving and restoring max_pages in libxl seems to me like a layering > > violation. I would avoid 2. 3. 4. and 5. > > > > No, not really. If we follow the principle of "libxl is the > arbitrator". It needs to have thorough information on every aspect of > the domain and set up limit. I am not sure I buy this "libxl is the arbitrator" concept. I am not seeing libxl as adding much value in this context. > > > 3. Add a libxl layer that wraps necessary information, take over > > > Andrew's work on libxl migration v2. Having a libxl layer that's not > > > part of migration v2 is a waste of effort. > > > > > > There are several obstacles for libxl migration v2 at the moment. Libxl > > > layer in migration v2 still has unresolved issues. It has > > > inter-dependency with Remus / COLO. > > > > > > Most importantly it doesn't inherently solve the problem. It still > > > requires the current libxl JSON blob to contain information about max > > > pages (or information used to derive max pages). > > > > > > Andrew, correct me if I'm wrong. > > > > > > 4. Add a none user configurable field in current libxl JSON structure to > > > record max pages information. > > > > > > This is not desirable. All fields in libxl JSON should be user > > > configurable. > > > > > > 5. Add a user configurable field in current libxl JSON structure to > > > record how much more memory this domain needs. Admin is required to > > > fill in that value manually. In the mean time we revert the change in > > > QEMU and declare QEMU with that change buggy. > > > > QEMU 2.3.0 was released with that change in it, so it is not quite > > possible to revert it. Also I think it is the right change for QEMU. > > > > It has security implications. Here is my reply copied from my mail to > Ian: > > I'm considering removing xc_domain_setmaxmem needs regardless of this > bug because that's going to cause problem in QEMU upstream stubdom with > strict XSM policy and deprivileged QEMU (may not have privilege to call > setmaxmem). QEMU running in the stubdom should be able to set the maxmem for its target domain, but not for the others. > The security implication as it is now is big enough. One malicious guest > that controls QEMU has a vector to DoS hypervisor by setting its own > max_pages to -1; Is that an issue in the implementation of XEN_DOMCTL_max_mem on the Xen side? Could you please give me a bit more info on this security issue? > > > > > No response to this so far. But in fact I consider this the most viable > > > solution. > > > > > > It's a simple enough solution that is achievable within 4.6 time frame. > > > It doesn't prevent us from doing useful work in the future > > > (disaggregated architecture with stricter security policy). It provides > > > a way to work around buggy QEMU (admin sets that value to prevent QEMU > > > from bumping memory limit). It's orgthogonal to migration v2 which means > > > it won't be blocked by migration v2 or block migration v2. > > > > > > I tend to go with solution 5. Speak up if you don't agree with my > > > analysis or you think I miss some aspects. > > > > I disagree > > > > > > > For long term we need to: > > > > > > 1. Establish libxl as the arbitrator how much pages a domain can have. > > > Anything else doing stuff behind arbitrator's back is considered > > > buggy. This principle probably apply to other aspects of a domain as > > > well. > > > > I disagree that libxl should be the arbitrator of a property that is > > stored, maintained and enforced by Xen. Xen should be the arbitrator. > > > > But it would be a burden for Xen once the arbitration goes beyond the > issue we discuss here. It certainly doesn't have as much as information > libxl has. In the long run we would still end up having libxl doing most > of the work so we might as well establish the principle now. I would like to see some concrete examples of how libxl is adding value in this context instead of just adding layers of indirection. > > > > > 2. Work out a solution communicate between QEMU and libxl. This can be > > > expanded to cover other components in a Xen setup, but currently we > > > only have QEMU. > > > > Even if QEMU called into libxl to change maxmem, I don't think that > > libxl should store maxmem anywhere. It is already stored in Xen. >