From mboxrd@z Thu Jan  1 00:00:00 1970
From: Wei Liu <wei.liu2@citrix.com>
Subject: Re: QEMU bumping memory bug analysis
Date: Mon, 8 Jun 2015 14:10:00 +0100
Message-ID: <20150608131000.GC29102@zion.uk.xensource.com>
References: <20150605164354.GK29102@zion.uk.xensource.com>
	<alpine.DEB.2.02.1506051751380.19838@kaball.uk.xensource.com>
	<20150605181014.GM29102@zion.uk.xensource.com>
	<alpine.DEB.2.02.1506081211210.19838@kaball.uk.xensource.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
Content-Disposition: inline
In-Reply-To: <alpine.DEB.2.02.1506081211210.19838@kaball.uk.xensource.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>, Ian Campbell <ian.campbell@citrix.com>, George Dunlap <george.dunlap@eu.citrix.com>, Andrew Cooper <andrew.cooper3@citrix.com>, Ian Jackson <Ian.Jackson@eu.citrix.com>, dslutz@verizon.com, xen-devel@lists.xen.org
List-Id: xen-devel@lists.xenproject.org

On Mon, Jun 08, 2015 at 12:39:52PM +0100, Stefano Stabellini wrote:
[...]
> > > > 3. Add a libxl layer that wraps necessary information, take over
> > > >    Andrew's work on libxl migration v2.  Having a libxl layer that's not
> > > >    part of migration v2 is a waste of effort.
> > > > 
> > > > There are several obstacles for libxl migration v2 at the moment. Libxl
> > > > layer in migration v2 still has unresolved issues. It has
> > > > inter-dependency with Remus / COLO.
> > > > 
> > > > Most importantly it doesn't inherently solve the problem. It still
> > > > requires the current libxl JSON blob to contain information about max
> > > > pages (or information used to derive max pages).
> > > > 
> > > > Andrew, correct me if I'm wrong.
> > > > 
> > > > 4. Add a none user configurable field in current libxl JSON structure to
> > > >    record max pages information.
> > > > 
> > > > This is not desirable. All fields in libxl JSON should be user
> > > > configurable.
> > > > 
> > > > 5. Add a user configurable field in current libxl JSON structure to
> > > >    record how much more memory this domain needs. Admin is required to
> > > >    fill in that value manually. In the mean time we revert the change in
> > > >    QEMU and declare QEMU with that change buggy.
> > > 
> > > QEMU 2.3.0 was released with that change in it, so it is not quite
> > > possible to revert it. Also I think it is the right change for QEMU.
> > > 
> > 
> > It has security implications. Here is my reply copied from my mail to
> > Ian:
> > 
> > I'm considering removing xc_domain_setmaxmem needs regardless of this
> > bug because that's going to cause problem in QEMU upstream stubdom with
> > strict XSM policy and deprivileged QEMU (may not have privilege to call
> > setmaxmem).
> 
> QEMU running in the stubdom should be able to set the maxmem for its
> target domain, but not for the others.
> 

Right, but this is still not safe. I will explain below.

> 
> > The security implication as it is now is big enough. One malicious guest
> > that controls QEMU has a vector to DoS hypervisor by setting its own
> > max_pages to -1;
> 
> Is that an issue in the implementation of XEN_DOMCTL_max_mem on the Xen
> side?  Could you please give me a bit more info on this security issue?
> 

Consider now we have a malicious guest. It has gained control of QEMU.
It then calls xc_domain_setmaxmem to set it's own max_pages inside
hypervisor to be -1 and start calling populate_physmap inside the guest
kernel. This is going to make hypervisor OOM.

XEN_DOMCTL_max_mem only sets the limit of domain at the moment. I don't
think there is sensible way of distinguishing a valid setmaxmem call
from a malicious setmaxmem call from QEMU. It's untrusted code base
after all.

> 
> > > 
> > > > No response to this so far. But in fact I consider this the most viable
> > > > solution.
> > > > 
> > > > It's a simple enough solution that is achievable within 4.6 time frame.
> > > > It doesn't prevent us from doing useful work in the future
> > > > (disaggregated architecture with stricter security policy). It provides
> > > > a way to work around buggy QEMU (admin sets that value to prevent QEMU
> > > > from bumping memory limit). It's orgthogonal to migration v2 which means
> > > > it won't be blocked by migration v2 or block migration v2.
> > > > 
> > > > I tend to go with solution 5. Speak up if you don't agree with my
> > > > analysis or you think I miss some aspects.
> > > 
> > > I disagree
> > > 
> > > 
> > > > For long term we need to:
> > > > 
> > > > 1. Establish libxl as the arbitrator how much pages a domain can have.
> > > >    Anything else doing stuff behind arbitrator's back is considered
> > > >    buggy. This principle probably apply to other aspects of a domain as
> > > >    well.
> > > 
> > > I disagree that libxl should be the arbitrator of a property that is
> > > stored, maintained and enforced by Xen. Xen should be the arbitrator.
> > > 
> > 
> > But it would be a burden for Xen once the arbitration goes beyond the
> > issue we discuss here. It certainly doesn't have as much as information
> > libxl has. In the long run we would still end up having libxl doing most
> > of the work so we might as well establish the principle now.
> 
> I would like to see some concrete examples of how libxl is adding value
> in this context instead of just adding layers of indirection.
> 

No, not layer of indirection. The fact that initial max_pages being
calculated in libxl suggests that libxl already has that responsibility
to arbitrate that value.

Wei.