From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1754380AbbFQKqX (ORCPT <rfc822;w@1wt.eu>);
	Wed, 17 Jun 2015 06:46:23 -0400
Received: from mx1.redhat.com ([209.132.183.28]:35655 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1752282AbbFQKqM (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
	Wed, 17 Jun 2015 06:46:12 -0400
Date: Wed, 17 Jun 2015 12:46:09 +0200
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Igor Mammedov <imammedo@redhat.com>
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, pbonzini@redhat.com
Subject: Re: [PATCH 3/5] vhost: support upto 509 memory regions
Message-ID: <20150617123842-mutt-send-email-mst@redhat.com>
References: <1434472419-148742-1-git-send-email-imammedo@redhat.com>
 <1434472419-148742-4-git-send-email-imammedo@redhat.com>
 <20150616231201-mutt-send-email-mst@redhat.com>
 <20150617000056.481e4a96@igors-macbook-pro.local>
 <20150617083202-mutt-send-email-mst@redhat.com>
 <20150617092802.5c8d8475@igors-macbook-pro.local>
 <20150617092910-mutt-send-email-mst@redhat.com>
 <20150617105421.71751f44@nial.brq.redhat.com>
 <20150617110711-mutt-send-email-mst@redhat.com>
 <20150617123742.5c3fec30@nial.brq.redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20150617123742.5c3fec30@nial.brq.redhat.com>
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Wed, Jun 17, 2015 at 12:37:42PM +0200, Igor Mammedov wrote:
> On Wed, 17 Jun 2015 12:11:09 +0200
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > On Wed, Jun 17, 2015 at 10:54:21AM +0200, Igor Mammedov wrote:
> > > On Wed, 17 Jun 2015 09:39:06 +0200
> > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > 
> > > > On Wed, Jun 17, 2015 at 09:28:02AM +0200, Igor Mammedov wrote:
> > > > > On Wed, 17 Jun 2015 08:34:26 +0200
> > > > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > 
> > > > > > On Wed, Jun 17, 2015 at 12:00:56AM +0200, Igor Mammedov wrote:
> > > > > > > On Tue, 16 Jun 2015 23:14:20 +0200
> > > > > > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > 
> > > > > > > > On Tue, Jun 16, 2015 at 06:33:37PM +0200, Igor Mammedov wrote:
> > > > > > > > > since commit
> > > > > > > > >  1d4e7e3 kvm: x86: increase user memory slots to 509
> > > > > > > > > 
> > > > > > > > > it became possible to use a bigger amount of memory
> > > > > > > > > slots, which is used by memory hotplug for
> > > > > > > > > registering hotplugged memory.
> > > > > > > > > However QEMU crashes if it's used with more than ~60
> > > > > > > > > pc-dimm devices and vhost-net since host kernel
> > > > > > > > > in module vhost-net refuses to accept more than 65
> > > > > > > > > memory regions.
> > > > > > > > > 
> > > > > > > > > Increase VHOST_MEMORY_MAX_NREGIONS from 65 to 509
> > > > > > > > 
> > > > > > > > It was 64, not 65.
> > > > > > > > 
> > > > > > > > > to match KVM_USER_MEM_SLOTS fixes issue for vhost-net.
> > > > > > > > > 
> > > > > > > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > > > > > > > 
> > > > > > > > Still thinking about this: can you reorder this to
> > > > > > > > be the last patch in the series please?
> > > > > > > sure
> > > > > > > 
> > > > > > > > 
> > > > > > > > Also - 509?
> > > > > > > userspace memory slots in terms of KVM, I made it match
> > > > > > > KVM's allotment of memory slots for userspace side.
> > > > > > 
> > > > > > Maybe KVM has its reasons for this #. I don't see
> > > > > > why we need to match this exactly.
> > > > > np, I can cap it at safe 300 slots but it's unlikely that it
> > > > > would take cut off 1 extra hop since it's capped by QEMU
> > > > > at 256+[initial fragmented memory]
> > > > 
> > > > But what's the point? We allocate 32 bytes per slot.
> > > > 300*32 = 9600 which is more than 8K, so we are doing
> > > > an order-3 allocation anyway.
> > > > If we could cap it at 8K (256 slots) that would make sense
> > > > since we could avoid wasting vmalloc space.
> > > 256 is amount of hotpluggable slots  and there is no way
> > > to predict how initial memory would be fragmented
> > > (i.e. amount of slots it would take), if we guess wrong
> > > we are back to square one with crashing userspace.
> > > So I'd stay consistent with KVM's limit 509 since
> > > it's only limit, i.e. not actual amount of allocated slots.
> > > 
> > > > I'm still not very happy with the whole approach,
> > > > giving userspace ability allocate 4 whole pages
> > > > of kernel memory like this.
> > > I'm working in parallel so that userspace won't take so
> > > many slots but it won't prevent its current versions
> > > crashing due to kernel limitation.
> > 
> > Right but at least it's not a regression. If we promise userspace to
> > support a ton of regions, we can't take it back later, and I'm concerned
> > about the memory usage.
> > 
> > I think it's already safe to merge the binary lookup patches, and maybe
> > cache and vmalloc, so that the remaining patch will be small.
> it isn't regression with switching to binary search and increasing
> slots to 509 either performance wise it's more on improvment side.
> And I was thinking about memory usage as well, that's why I've dropped
> faster radix tree in favor of more compact array, can't do better
> on kernel side of fix.
> 
> Yes we will give userspace to ability to use more slots/and lock up
> more memory if it's not able to consolidate memory regions but
> that leaves an option for user to run guest with vhost performance
> vs crashing it at runtime.

Crashing is entirely QEMU's own doing in not handling
the error gracefully.

> 
> userspace/targets that could consolidate memory regions should
> do so and I'm working on that as well but that doesn't mean
> that users shouldn't have a choice.

It's a fairly unusual corner case, I'm not yet
convinced we need to quickly add support to it when just waiting a bit
longer will get us an equivalent (or even more efficient) fix in
userspace.

> So far it's kernel limitation and this patch fixes crashes
> that users see now, with the rest of patches enabling performance
> not to regress.

When I say regression I refer to an option to limit the array
size again after userspace started using the larger size.


> > 
> > >  
> > > > > > > > I think if we are changing this, it'd be nice to
> > > > > > > > create a way for userspace to discover the support
> > > > > > > > and the # of regions supported.
> > > > > > > That was my first idea before extending KVM's memslots
> > > > > > > to teach kernel to tell qemu this number so that QEMU
> > > > > > > at least would be able to check if new memory slot could
> > > > > > > be added but I was redirected to a more simple solution
> > > > > > > of just extending vs everdoing things.
> > > > > > > Currently QEMU supports upto ~250 memslots so 509
> > > > > > > is about twice high we need it so it should work for near
> > > > > > > future
> > > > > > 
> > > > > > Yes but old kernels are still around. Would be nice if you
> > > > > > can detect them.
> > > > > > 
> > > > > > > but eventually we might still teach kernel and QEMU
> > > > > > > to make things more robust.
> > > > > > 
> > > > > > A new ioctl would be easy to add, I think it's a good
> > > > > > idea generally.
> > > > > I can try to do something like this on top of this series.
> > > > > 
> > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > > ---
> > > > > > > > >  drivers/vhost/vhost.c | 2 +-
> > > > > > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > > > > 
> > > > > > > > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > > > > > > > > index 99931a0..6a18c92 100644
> > > > > > > > > --- a/drivers/vhost/vhost.c
> > > > > > > > > +++ b/drivers/vhost/vhost.c
> > > > > > > > > @@ -30,7 +30,7 @@
> > > > > > > > >  #include "vhost.h"
> > > > > > > > >  
> > > > > > > > >  enum {
> > > > > > > > > -	VHOST_MEMORY_MAX_NREGIONS = 64,
> > > > > > > > > +	VHOST_MEMORY_MAX_NREGIONS = 509,
> > > > > > > > >  	VHOST_MEMORY_F_LOG = 0x1,
> > > > > > > > >  };
> > > > > > > > >  
> > > > > > > > > -- 
> > > > > > > > > 1.8.3.1
> > --
> > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html