From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1752800AbbIOKSq (ORCPT <rfc822;w@1wt.eu>);
	Tue, 15 Sep 2015 06:18:46 -0400
Received: from blu004-omc1s4.hotmail.com ([65.55.116.15]:60527 "EHLO
	BLU004-OMC1S4.hotmail.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1751728AbbIOKSp (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Tue, 15 Sep 2015 06:18:45 -0400
X-TMN: [rkMxmKMHR3lOgM75tTvU3BcZe5sHSBfq]
X-Originating-Email: [wanpeng.li@hotmail.com]
Message-ID: <BLU436-SMTP5E49A9FB8B5A20AC13EF6805C0@phx.gbl>
Subject: Re: [PATCH] KVM: nVMX: nested VPID emulation
To: Bandan Das <bsd@redhat.com>
References: <BLU436-SMTP172159ECF89EA5C21830024805D0@phx.gbl>
 <jpgfv2ht258.fsf@linux.bootlegged.copy>
CC: Paolo Bonzini <pbonzini@redhat.com>, kvm@vger.kernel.org,
        linux-kernel@vger.kernel.org
From: Wanpeng Li <wanpeng.li@hotmail.com>
Date: Tue, 15 Sep 2015 18:18:32 +0800
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:38.0)
 Gecko/20100101 Thunderbird/38.2.0
MIME-Version: 1.0
In-Reply-To: <jpgfv2ht258.fsf@linux.bootlegged.copy>
Content-Type: text/plain; charset="windows-1252"; format=flowed
Content-Transfer-Encoding: 7bit
X-OriginalArrivalTime: 15 Sep 2015 10:18:43.0227 (UTC) FILETIME=[E60E26B0:01D0EF9F]
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On 9/15/15 12:08 AM, Bandan Das wrote:
> Wanpeng Li <wanpeng.li@hotmail.com> writes:
>
>> VPID is used to tag address space and avoid a TLB flush. Currently L0 use
>> the same VPID to run L1 and all its guests. KVM flushes VPID when switching
>> between L1 and L2.
>>
>> This patch advertises VPID to the L1 hypervisor, then address space of L1 and
>> L2 can be separately treated and avoid TLB flush when swithing between L1 and
>> L2. This patch gets ~3x performance improvement for lmbench 8p/64k ctxsw.
> TLB flush does context invalidation and while that should result in
> some improvement, I never expected a 3x improvement for any workload!
> Interesting :)

The result still looks good when test v2.

>
>> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
>> ---
>>   arch/x86/kvm/vmx.c | 39 ++++++++++++++++++++++++++++++++-------
>>   1 file changed, 32 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index da1590e..06bc31e 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -1157,6 +1157,11 @@ static inline bool nested_cpu_has_virt_x2apic_mode(struct vmcs12 *vmcs12)
>>   	return nested_cpu_has2(vmcs12, SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE);
>>   }
>>   
>> +static inline bool nested_cpu_has_vpid(struct vmcs12 *vmcs12)
>> +{
>> +	return nested_cpu_has2(vmcs12, SECONDARY_EXEC_ENABLE_VPID);
>> +}
>> +
>>   static inline bool nested_cpu_has_apic_reg_virt(struct vmcs12 *vmcs12)
>>   {
>>   	return nested_cpu_has2(vmcs12, SECONDARY_EXEC_APIC_REGISTER_VIRT);
>> @@ -2471,6 +2476,7 @@ static void nested_vmx_setup_ctls_msrs(struct vcpu_vmx *vmx)
>>   		SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES |
>>   		SECONDARY_EXEC_RDTSCP |
>>   		SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE |
>> +		SECONDARY_EXEC_ENABLE_VPID |
>>   		SECONDARY_EXEC_APIC_REGISTER_VIRT |
>>   		SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY |
>>   		SECONDARY_EXEC_WBINVD_EXITING |
>> @@ -4160,7 +4166,7 @@ static void allocate_vpid(struct vcpu_vmx *vmx)
>>   	int vpid;
>>   
>>   	vmx->vpid = 0;
>> -	if (!enable_vpid)
>> +	if (!enable_vpid || is_guest_mode(&vmx->vcpu))
>>   		return;
>>   	spin_lock(&vmx_vpid_lock);
>>   	vpid = find_first_zero_bit(vmx_vpid_bitmap, VMX_NR_VPIDS);
>> @@ -6738,6 +6744,14 @@ static int handle_vmclear(struct kvm_vcpu *vcpu)
>>   	}
>>   	vmcs12 = kmap(page);
>>   	vmcs12->launch_state = 0;
>> +	if (enable_vpid) {
>> +		if (nested_cpu_has_vpid(vmcs12)) {
>> +			spin_lock(&vmx_vpid_lock);
>> +			if (vmcs12->virtual_processor_id != 0)
>> +				__clear_bit(vmcs12->virtual_processor_id, vmx_vpid_bitmap);
>> +			spin_unlock(&vmx_vpid_lock);
>> +		}
>> +	}
>>   	kunmap(page);
>>   	nested_release_page(page);
> I don't think this is enough, we should also check for set "nested" bits
> in free_vpid() and clear them. There should be some sort of a mapping between the
> nested guest vpid and the actual vpid so that we can just clear those bits.

Agreed.

>
>> @@ -9189,6 +9203,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
>>   {
>>   	struct vcpu_vmx *vmx = to_vmx(vcpu);
>>   	u32 exec_control;
>> +	int vpid;
>>   
>>   	vmcs_write16(GUEST_ES_SELECTOR, vmcs12->guest_es_selector);
>>   	vmcs_write16(GUEST_CS_SELECTOR, vmcs12->guest_cs_selector);
>> @@ -9438,13 +9453,21 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
>>   	else
>>   		vmcs_write64(TSC_OFFSET, vmx->nested.vmcs01_tsc_offset);
>>   
>> +
> Empty space here.
>
>>   	if (enable_vpid) {
>> -		/*
>> -		 * Trivially support vpid by letting L2s share their parent
>> -		 * L1's vpid. TODO: move to a more elaborate solution, giving
>> -		 * each L2 its own vpid and exposing the vpid feature to L1.
>> -		 */
>> -		vmcs_write16(VIRTUAL_PROCESSOR_ID, vmx->vpid);
>> +		if (nested_cpu_has_vpid(vmcs12)) {
>> +			if (vmcs12->virtual_processor_id == 0) {
> Ok, so if we advertise vpid to the nested hypervisor, isn't it going to
> attempt writing this field when setting up ? Atleast
> that's what Linux does, no ?

Agreed, I do the allocation of vpid02 during initialization in v2.

>
>> +				spin_lock(&vmx_vpid_lock);
>> +				vpid = find_first_zero_bit(vmx_vpid_bitmap, VMX_NR_VPIDS);
>> +				if (vpid < VMX_NR_VPIDS)
>> +					__set_bit(vpid, vmx_vpid_bitmap);
>> +				spin_unlock(&vmx_vpid_lock);
>> +				vmcs_write16(VIRTUAL_PROCESSOR_ID, vpid);
>> +			} else
>> +				vmcs_write16(VIRTUAL_PROCESSOR_ID, vmcs12->virtual_processor_id);
>> +		} else
>> +			vmcs_write16(VIRTUAL_PROCESSOR_ID, vmx->vpid);
>> +
> I guess L1 shouldn't know what vpid L0 chose to run L2. If L1 vmreads,
> it should get what it expects for the value of vpid, not the one L0 chose.

Agreed.

>
>>   		vmx_flush_tlb(vcpu);
>>   	}
> So, this isn't removed ? I thought it's not needed anymore ?

Please review v2. :-)

Regards,
Wanpeng Li