From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754380AbbFQKqX (ORCPT ); Wed, 17 Jun 2015 06:46:23 -0400 Received: from mx1.redhat.com ([209.132.183.28]:35655 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752282AbbFQKqM (ORCPT ); Wed, 17 Jun 2015 06:46:12 -0400 Date: Wed, 17 Jun 2015 12:46:09 +0200 From: "Michael S. Tsirkin" To: Igor Mammedov Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, pbonzini@redhat.com Subject: Re: [PATCH 3/5] vhost: support upto 509 memory regions Message-ID: <20150617123842-mutt-send-email-mst@redhat.com> References: <1434472419-148742-1-git-send-email-imammedo@redhat.com> <1434472419-148742-4-git-send-email-imammedo@redhat.com> <20150616231201-mutt-send-email-mst@redhat.com> <20150617000056.481e4a96@igors-macbook-pro.local> <20150617083202-mutt-send-email-mst@redhat.com> <20150617092802.5c8d8475@igors-macbook-pro.local> <20150617092910-mutt-send-email-mst@redhat.com> <20150617105421.71751f44@nial.brq.redhat.com> <20150617110711-mutt-send-email-mst@redhat.com> <20150617123742.5c3fec30@nial.brq.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150617123742.5c3fec30@nial.brq.redhat.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jun 17, 2015 at 12:37:42PM +0200, Igor Mammedov wrote: > On Wed, 17 Jun 2015 12:11:09 +0200 > "Michael S. Tsirkin" wrote: > > > On Wed, Jun 17, 2015 at 10:54:21AM +0200, Igor Mammedov wrote: > > > On Wed, 17 Jun 2015 09:39:06 +0200 > > > "Michael S. Tsirkin" wrote: > > > > > > > On Wed, Jun 17, 2015 at 09:28:02AM +0200, Igor Mammedov wrote: > > > > > On Wed, 17 Jun 2015 08:34:26 +0200 > > > > > "Michael S. Tsirkin" wrote: > > > > > > > > > > > On Wed, Jun 17, 2015 at 12:00:56AM +0200, Igor Mammedov wrote: > > > > > > > On Tue, 16 Jun 2015 23:14:20 +0200 > > > > > > > "Michael S. Tsirkin" wrote: > > > > > > > > > > > > > > > On Tue, Jun 16, 2015 at 06:33:37PM +0200, Igor Mammedov wrote: > > > > > > > > > since commit > > > > > > > > > 1d4e7e3 kvm: x86: increase user memory slots to 509 > > > > > > > > > > > > > > > > > > it became possible to use a bigger amount of memory > > > > > > > > > slots, which is used by memory hotplug for > > > > > > > > > registering hotplugged memory. > > > > > > > > > However QEMU crashes if it's used with more than ~60 > > > > > > > > > pc-dimm devices and vhost-net since host kernel > > > > > > > > > in module vhost-net refuses to accept more than 65 > > > > > > > > > memory regions. > > > > > > > > > > > > > > > > > > Increase VHOST_MEMORY_MAX_NREGIONS from 65 to 509 > > > > > > > > > > > > > > > > It was 64, not 65. > > > > > > > > > > > > > > > > > to match KVM_USER_MEM_SLOTS fixes issue for vhost-net. > > > > > > > > > > > > > > > > > > Signed-off-by: Igor Mammedov > > > > > > > > > > > > > > > > Still thinking about this: can you reorder this to > > > > > > > > be the last patch in the series please? > > > > > > > sure > > > > > > > > > > > > > > > > > > > > > > > Also - 509? > > > > > > > userspace memory slots in terms of KVM, I made it match > > > > > > > KVM's allotment of memory slots for userspace side. > > > > > > > > > > > > Maybe KVM has its reasons for this #. I don't see > > > > > > why we need to match this exactly. > > > > > np, I can cap it at safe 300 slots but it's unlikely that it > > > > > would take cut off 1 extra hop since it's capped by QEMU > > > > > at 256+[initial fragmented memory] > > > > > > > > But what's the point? We allocate 32 bytes per slot. > > > > 300*32 = 9600 which is more than 8K, so we are doing > > > > an order-3 allocation anyway. > > > > If we could cap it at 8K (256 slots) that would make sense > > > > since we could avoid wasting vmalloc space. > > > 256 is amount of hotpluggable slots and there is no way > > > to predict how initial memory would be fragmented > > > (i.e. amount of slots it would take), if we guess wrong > > > we are back to square one with crashing userspace. > > > So I'd stay consistent with KVM's limit 509 since > > > it's only limit, i.e. not actual amount of allocated slots. > > > > > > > I'm still not very happy with the whole approach, > > > > giving userspace ability allocate 4 whole pages > > > > of kernel memory like this. > > > I'm working in parallel so that userspace won't take so > > > many slots but it won't prevent its current versions > > > crashing due to kernel limitation. > > > > Right but at least it's not a regression. If we promise userspace to > > support a ton of regions, we can't take it back later, and I'm concerned > > about the memory usage. > > > > I think it's already safe to merge the binary lookup patches, and maybe > > cache and vmalloc, so that the remaining patch will be small. > it isn't regression with switching to binary search and increasing > slots to 509 either performance wise it's more on improvment side. > And I was thinking about memory usage as well, that's why I've dropped > faster radix tree in favor of more compact array, can't do better > on kernel side of fix. > > Yes we will give userspace to ability to use more slots/and lock up > more memory if it's not able to consolidate memory regions but > that leaves an option for user to run guest with vhost performance > vs crashing it at runtime. Crashing is entirely QEMU's own doing in not handling the error gracefully. > > userspace/targets that could consolidate memory regions should > do so and I'm working on that as well but that doesn't mean > that users shouldn't have a choice. It's a fairly unusual corner case, I'm not yet convinced we need to quickly add support to it when just waiting a bit longer will get us an equivalent (or even more efficient) fix in userspace. > So far it's kernel limitation and this patch fixes crashes > that users see now, with the rest of patches enabling performance > not to regress. When I say regression I refer to an option to limit the array size again after userspace started using the larger size. > > > > > > > > > > > > > I think if we are changing this, it'd be nice to > > > > > > > > create a way for userspace to discover the support > > > > > > > > and the # of regions supported. > > > > > > > That was my first idea before extending KVM's memslots > > > > > > > to teach kernel to tell qemu this number so that QEMU > > > > > > > at least would be able to check if new memory slot could > > > > > > > be added but I was redirected to a more simple solution > > > > > > > of just extending vs everdoing things. > > > > > > > Currently QEMU supports upto ~250 memslots so 509 > > > > > > > is about twice high we need it so it should work for near > > > > > > > future > > > > > > > > > > > > Yes but old kernels are still around. Would be nice if you > > > > > > can detect them. > > > > > > > > > > > > > but eventually we might still teach kernel and QEMU > > > > > > > to make things more robust. > > > > > > > > > > > > A new ioctl would be easy to add, I think it's a good > > > > > > idea generally. > > > > > I can try to do something like this on top of this series. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --- > > > > > > > > > drivers/vhost/vhost.c | 2 +- > > > > > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > > > > > > > > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > > > > > > > > > index 99931a0..6a18c92 100644 > > > > > > > > > --- a/drivers/vhost/vhost.c > > > > > > > > > +++ b/drivers/vhost/vhost.c > > > > > > > > > @@ -30,7 +30,7 @@ > > > > > > > > > #include "vhost.h" > > > > > > > > > > > > > > > > > > enum { > > > > > > > > > - VHOST_MEMORY_MAX_NREGIONS = 64, > > > > > > > > > + VHOST_MEMORY_MAX_NREGIONS = 509, > > > > > > > > > VHOST_MEMORY_F_LOG = 0x1, > > > > > > > > > }; > > > > > > > > > > > > > > > > > > -- > > > > > > > > > 1.8.3.1 > > -- > > To unsubscribe from this list: send the line "unsubscribe kvm" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html