From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757196AbbFPWT3 (ORCPT ); Tue, 16 Jun 2015 18:19:29 -0400 Received: from mx1.redhat.com ([209.132.183.28]:54659 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752985AbbFPWTV convert rfc822-to-8bit (ORCPT ); Tue, 16 Jun 2015 18:19:21 -0400 Date: Wed, 17 Jun 2015 00:19:15 +0200 From: Igor Mammedov To: "Michael S. Tsirkin" Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, pbonzini@redhat.com Subject: Re: [PATCH 0/5] vhost: support upto 509 memory regions Message-ID: <20150617001915.23f062b0@igors-macbook-pro.local> In-Reply-To: <20150616231505-mutt-send-email-mst@redhat.com> References: <1434472419-148742-1-git-send-email-imammedo@redhat.com> <20150616231505-mutt-send-email-mst@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 16 Jun 2015 23:16:07 +0200 "Michael S. Tsirkin" wrote: > On Tue, Jun 16, 2015 at 06:33:34PM +0200, Igor Mammedov wrote: > > Series extends vhost to support upto 509 memory regions, > > and adds some vhost:translate_desc() performance improvemnts > > so it won't regress when memslots are increased to 509. > > > > It fixes running VM crashing during memory hotplug due > > to vhost refusing accepting more than 64 memory regions. > > > > It's only host kernel side fix to make it work with QEMU > > versions that support memory hotplug. But I'll continue > > to work on QEMU side solution to reduce amount of memory > > regions to make things even better. > > I'm concerned userspace work will be harder, in particular, > performance gains will be harder to measure. it appears so, so far. > How about a flag to disable caching? I've tried to measure cost of cache miss but without much luck, difference between version with cache and with caching removed was within margin of error (±10ns) (i.e. not mensurable on my 5min/10*10^6 test workload). Also I'm concerned about adding extra fetch+branch for flag checking will make things worse for likely path of cache hit, so I'd avoid it if possible. Or do you mean a simple global per module flag to disable it and wrap thing in static key so that it will be cheap jump to skip cache? > > Performance wise for guest with (in my case 3 memory regions) > > and netperf's UDP_RR workload translate_desc() execution > > time from total workload takes: > > > > Memory |1G RAM|cached|non cached > > regions # | 3 | 53 | 53 > > ------------------------------------ > > upstream | 0.3% | - | 3.5% > > ------------------------------------ > > this series | 0.2% | 0.5% | 0.7% > > > > where "non cached" column reflects trashing wokload > > with constant cache miss. More details on timing in > > respective patches. > > > > Igor Mammedov (5): > > vhost: use binary search instead of linear in find_region() > > vhost: extend memory regions allocation to vmalloc > > vhost: support upto 509 memory regions > > vhost: add per VQ memory region caching > > vhost: translate_desc: optimization for desc.len < region size > > > > drivers/vhost/vhost.c | 95 > > +++++++++++++++++++++++++++++++++++++-------------- > > drivers/vhost/vhost.h | 1 + 2 files changed, 71 insertions(+), 25 > > deletions(-) > > > > -- > > 1.8.3.1