[patch 0/4] VMX: configure posted interrupt descriptor when assigning device

All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed

* [patch 0/4] VMX: configure posted interrupt descriptor when assigning device
@ 2021-05-07 13:06 Marcelo Tosatti
  2021-05-07 13:06 ` [patch 1/4] KVM: x86: add start_assignment hook to kvm_x86_ops Marcelo Tosatti
                   ` (3 more replies)
  0 siblings, 4 replies; 26+ messages in thread
From: Marcelo Tosatti @ 2021-05-07 13:06 UTC (permalink / raw)
  To: kvm; +Cc: Paolo Bonzini, Alex Williamson, Sean Christopherson

Configuration of the posted interrupt descriptor is incorrect when devices
are hotplugged to the guest (and vcpus are halted).

See patch 4 for details.

---

v2: rather than using a potentially racy IPI (vs vcpu->cpu switches),
    kick the vcpus when assigning a device and let the blocked per-CPU
    list manipulation happen locally at ->pre_block and ->post_block
    (Sean Christopherson).




^ permalink raw reply	[flat|nested] 26+ messages in thread

* [patch 1/4] KVM: x86: add start_assignment hook to kvm_x86_ops
  2021-05-07 13:06 [patch 0/4] VMX: configure posted interrupt descriptor when assigning device Marcelo Tosatti
@ 2021-05-07 13:06 ` Marcelo Tosatti
  2021-05-07 19:16   ` Peter Xu
  2021-05-07 13:06 ` [patch 2/4] KVM: add arch specific vcpu_check_block callback Marcelo Tosatti
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 26+ messages in thread
From: Marcelo Tosatti @ 2021-05-07 13:06 UTC (permalink / raw)
  To: kvm; +Cc: Paolo Bonzini, Alex Williamson, Sean Christopherson,
	Marcelo Tosatti

Add a start_assignment hook to kvm_x86_ops, which is called when 
kvm_arch_start_assignment is done.

The hook is required to update the wakeup vector of a sleeping vCPU
when a device is assigned to the guest.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: kvm/arch/x86/include/asm/kvm_host.h
===================================================================
--- kvm.orig/arch/x86/include/asm/kvm_host.h
+++ kvm/arch/x86/include/asm/kvm_host.h
@@ -1322,6 +1322,7 @@ struct kvm_x86_ops {
 
 	int (*update_pi_irte)(struct kvm *kvm, unsigned int host_irq,
 			      uint32_t guest_irq, bool set);
+	void (*start_assignment)(struct kvm *kvm, int device_count);
 	void (*apicv_post_state_restore)(struct kvm_vcpu *vcpu);
 	bool (*dy_apicv_has_pending_interrupt)(struct kvm_vcpu *vcpu);
 
Index: kvm/arch/x86/kvm/svm/svm.c
===================================================================
--- kvm.orig/arch/x86/kvm/svm/svm.c
+++ kvm/arch/x86/kvm/svm/svm.c
@@ -4601,6 +4601,7 @@ static struct kvm_x86_ops svm_x86_ops __
 	.deliver_posted_interrupt = svm_deliver_avic_intr,
 	.dy_apicv_has_pending_interrupt = svm_dy_apicv_has_pending_interrupt,
 	.update_pi_irte = svm_update_pi_irte,
+	.start_assignment = NULL,
 	.setup_mce = svm_setup_mce,
 
 	.smi_allowed = svm_smi_allowed,
Index: kvm/arch/x86/kvm/vmx/vmx.c
===================================================================
--- kvm.orig/arch/x86/kvm/vmx/vmx.c
+++ kvm/arch/x86/kvm/vmx/vmx.c
@@ -7732,6 +7732,7 @@ static struct kvm_x86_ops vmx_x86_ops __
 	.nested_ops = &vmx_nested_ops,
 
 	.update_pi_irte = pi_update_irte,
+	.start_assignment = NULL,
 
 #ifdef CONFIG_X86_64
 	.set_hv_timer = vmx_set_hv_timer,
Index: kvm/arch/x86/kvm/x86.c
===================================================================
--- kvm.orig/arch/x86/kvm/x86.c
+++ kvm/arch/x86/kvm/x86.c
@@ -11295,7 +11295,10 @@ bool kvm_arch_can_dequeue_async_page_pre
 
 void kvm_arch_start_assignment(struct kvm *kvm)
 {
-	atomic_inc(&kvm->arch.assigned_device_count);
+	int ret;
+
+	ret = atomic_inc_return(&kvm->arch.assigned_device_count);
+	static_call_cond(kvm_x86_start_assignment)(kvm, ret);
 }
 EXPORT_SYMBOL_GPL(kvm_arch_start_assignment);
 
Index: kvm/arch/x86/include/asm/kvm-x86-ops.h
===================================================================
--- kvm.orig/arch/x86/include/asm/kvm-x86-ops.h
+++ kvm/arch/x86/include/asm/kvm-x86-ops.h
@@ -99,6 +99,7 @@ KVM_X86_OP_NULL(post_block)
 KVM_X86_OP_NULL(vcpu_blocking)
 KVM_X86_OP_NULL(vcpu_unblocking)
 KVM_X86_OP_NULL(update_pi_irte)
+KVM_X86_OP_NULL(start_assignment)
 KVM_X86_OP_NULL(apicv_post_state_restore)
 KVM_X86_OP_NULL(dy_apicv_has_pending_interrupt)
 KVM_X86_OP_NULL(set_hv_timer)



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [patch 2/4] KVM: add arch specific vcpu_check_block callback
  2021-05-07 13:06 [patch 0/4] VMX: configure posted interrupt descriptor when assigning device Marcelo Tosatti
  2021-05-07 13:06 ` [patch 1/4] KVM: x86: add start_assignment hook to kvm_x86_ops Marcelo Tosatti
@ 2021-05-07 13:06 ` Marcelo Tosatti
  2021-05-07 13:06 ` [patch 3/4] KVM: x86: implement kvm_arch_vcpu_check_block callback Marcelo Tosatti
  2021-05-07 13:06 ` [patch 4/4] KVM: VMX: update vcpu posted-interrupt descriptor when assigning device Marcelo Tosatti
  3 siblings, 0 replies; 26+ messages in thread
From: Marcelo Tosatti @ 2021-05-07 13:06 UTC (permalink / raw)
  To: kvm; +Cc: Paolo Bonzini, Alex Williamson, Sean Christopherson,
	Marcelo Tosatti

Add callback in kvm_vcpu_check_block, so that architectures
can direct a vcpu to exit the vcpu block loop without requiring
events that would unhalt it.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: kvm/include/linux/kvm_host.h
===================================================================
--- kvm.orig/include/linux/kvm_host.h
+++ kvm/include/linux/kvm_host.h
@@ -971,6 +971,13 @@ static inline int kvm_arch_flush_remote_
 }
 #endif
 
+#ifndef __KVM_HAVE_ARCH_VCPU_CHECK_BLOCK
+static inline int kvm_arch_vcpu_check_block(struct kvm_vcpu *vcpu)
+{
+	return 0;
+}
+#endif
+
 #ifdef __KVM_HAVE_ARCH_NONCOHERENT_DMA
 void kvm_arch_register_noncoherent_dma(struct kvm *kvm);
 void kvm_arch_unregister_noncoherent_dma(struct kvm *kvm);
Index: kvm/virt/kvm/kvm_main.c
===================================================================
--- kvm.orig/virt/kvm/kvm_main.c
+++ kvm/virt/kvm/kvm_main.c
@@ -2794,6 +2794,8 @@ static int kvm_vcpu_check_block(struct k
 		goto out;
 	if (signal_pending(current))
 		goto out;
+	if (kvm_arch_vcpu_check_block(vcpu))
+		goto out;
 
 	ret = 0;
 out:



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [patch 3/4] KVM: x86: implement kvm_arch_vcpu_check_block callback
  2021-05-07 13:06 [patch 0/4] VMX: configure posted interrupt descriptor when assigning device Marcelo Tosatti
  2021-05-07 13:06 ` [patch 1/4] KVM: x86: add start_assignment hook to kvm_x86_ops Marcelo Tosatti
  2021-05-07 13:06 ` [patch 2/4] KVM: add arch specific vcpu_check_block callback Marcelo Tosatti
@ 2021-05-07 13:06 ` Marcelo Tosatti
  2021-05-07 13:06 ` [patch 4/4] KVM: VMX: update vcpu posted-interrupt descriptor when assigning device Marcelo Tosatti
  3 siblings, 0 replies; 26+ messages in thread
From: Marcelo Tosatti @ 2021-05-07 13:06 UTC (permalink / raw)
  To: kvm; +Cc: Paolo Bonzini, Alex Williamson, Sean Christopherson,
	Marcelo Tosatti

Implement kvm_arch_vcpu_check_block for x86. Next patch will add
implementation of kvm_x86_ops.vcpu_check_block for VMX.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: kvm/arch/x86/include/asm/kvm_host.h
===================================================================
--- kvm.orig/arch/x86/include/asm/kvm_host.h
+++ kvm/arch/x86/include/asm/kvm_host.h
@@ -1320,6 +1320,8 @@ struct kvm_x86_ops {
 	void (*vcpu_blocking)(struct kvm_vcpu *vcpu);
 	void (*vcpu_unblocking)(struct kvm_vcpu *vcpu);
 
+	int (*vcpu_check_block)(struct kvm_vcpu *vcpu);
+
 	int (*update_pi_irte)(struct kvm *kvm, unsigned int host_irq,
 			      uint32_t guest_irq, bool set);
 	void (*start_assignment)(struct kvm *kvm, int device_count);
@@ -1801,6 +1803,15 @@ static inline bool kvm_irq_is_postable(s
 		irq->delivery_mode == APIC_DM_LOWEST);
 }
 
+#define __KVM_HAVE_ARCH_VCPU_CHECK_BLOCK
+static inline int kvm_arch_vcpu_check_block(struct kvm_vcpu *vcpu)
+{
+	if (kvm_x86_ops.vcpu_check_block)
+		return static_call(kvm_x86_vcpu_check_block)(vcpu);
+
+	return 0;
+}
+
 static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu)
 {
 	static_call_cond(kvm_x86_vcpu_blocking)(vcpu);
Index: kvm/arch/x86/kvm/vmx/vmx.c
===================================================================
--- kvm.orig/arch/x86/kvm/vmx/vmx.c
+++ kvm/arch/x86/kvm/vmx/vmx.c
@@ -7727,6 +7727,7 @@ static struct kvm_x86_ops vmx_x86_ops __
 
 	.pre_block = vmx_pre_block,
 	.post_block = vmx_post_block,
+	.vcpu_check_block = NULL,
 
 	.pmu_ops = &intel_pmu_ops,
 	.nested_ops = &vmx_nested_ops,
Index: kvm/arch/x86/include/asm/kvm-x86-ops.h
===================================================================
--- kvm.orig/arch/x86/include/asm/kvm-x86-ops.h
+++ kvm/arch/x86/include/asm/kvm-x86-ops.h
@@ -98,6 +98,7 @@ KVM_X86_OP_NULL(pre_block)
 KVM_X86_OP_NULL(post_block)
 KVM_X86_OP_NULL(vcpu_blocking)
 KVM_X86_OP_NULL(vcpu_unblocking)
+KVM_X86_OP_NULL(vcpu_check_block)
 KVM_X86_OP_NULL(update_pi_irte)
 KVM_X86_OP_NULL(start_assignment)
 KVM_X86_OP_NULL(apicv_post_state_restore)
Index: kvm/arch/x86/kvm/svm/svm.c
===================================================================
--- kvm.orig/arch/x86/kvm/svm/svm.c
+++ kvm/arch/x86/kvm/svm/svm.c
@@ -4517,6 +4517,7 @@ static struct kvm_x86_ops svm_x86_ops __
 	.vcpu_put = svm_vcpu_put,
 	.vcpu_blocking = svm_vcpu_blocking,
 	.vcpu_unblocking = svm_vcpu_unblocking,
+	.vcpu_check_block = NULL,
 
 	.update_exception_bitmap = svm_update_exception_bitmap,
 	.get_msr_feature = svm_get_msr_feature,



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [patch 4/4] KVM: VMX: update vcpu posted-interrupt descriptor when assigning device
  2021-05-07 13:06 [patch 0/4] VMX: configure posted interrupt descriptor when assigning device Marcelo Tosatti
                   ` (2 preceding siblings ...)
  2021-05-07 13:06 ` [patch 3/4] KVM: x86: implement kvm_arch_vcpu_check_block callback Marcelo Tosatti
@ 2021-05-07 13:06 ` Marcelo Tosatti
  2021-05-07 17:22   ` Sean Christopherson
  3 siblings, 1 reply; 26+ messages in thread
From: Marcelo Tosatti @ 2021-05-07 13:06 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Alex Williamson, Sean Christopherson, Pei Zhang,
	Marcelo Tosatti

For VMX, when a vcpu enters HLT emulation, pi_post_block will:

1) Add vcpu to per-cpu list of blocked vcpus.

2) Program the posted-interrupt descriptor "notification vector" 
to POSTED_INTR_WAKEUP_VECTOR

With interrupt remapping, an interrupt will set the PIR bit for the 
vector programmed for the device on the CPU, test-and-set the 
ON bit on the posted interrupt descriptor, and if the ON bit is clear
generate an interrupt for the notification vector.

This way, the target CPU wakes upon a device interrupt and wakes up
the target vcpu.

Problem is that pi_post_block only programs the notification vector
if kvm_arch_has_assigned_device() is true. Its possible for the
following to happen:

1) vcpu V HLTs on pcpu P, kvm_arch_has_assigned_device is false,
notification vector is not programmed
2) device is assigned to VM
3) device interrupts vcpu V, sets ON bit
(notification vector not programmed, so pcpu P remains in idle)
4) vcpu 0 IPIs vcpu V (in guest), but since pi descriptor ON bit is set,
kvm_vcpu_kick is skipped
5) vcpu 0 busy spins on vcpu V's response for several seconds, until
RCU watchdog NMIs all vCPUs.

To fix this, use the start_assignment kvm_x86_ops callback to kick
vcpus out of the halt loop, so the notification vector is 
properly reprogrammed to the wakeup vector.

Reported-by: Pei Zhang <pezhang@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

---

v2: add vmx_pi_start_assignment to vmx's kvm_x86_ops

Index: kvm/arch/x86/kvm/vmx/posted_intr.c
===================================================================
--- kvm.orig/arch/x86/kvm/vmx/posted_intr.c
+++ kvm/arch/x86/kvm/vmx/posted_intr.c
@@ -203,6 +203,25 @@ void pi_post_block(struct kvm_vcpu *vcpu
 	local_irq_enable();
 }
 
+int vmx_vcpu_check_block(struct kvm_vcpu *vcpu)
+{
+	struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
+
+	if (!irq_remapping_cap(IRQ_POSTING_CAP))
+		return 0;
+
+	if (!kvm_vcpu_apicv_active(vcpu))
+		return 0;
+
+	if (!kvm_arch_has_assigned_device(vcpu->kvm))
+		return 0;
+
+	if (pi_desc->nv == POSTED_INTR_WAKEUP_VECTOR)
+		return 0;
+
+	return 1;
+}
+
 /*
  * Handler for POSTED_INTERRUPT_WAKEUP_VECTOR.
  */
@@ -236,6 +255,26 @@ bool pi_has_pending_interrupt(struct kvm
 		(pi_test_sn(pi_desc) && !pi_is_pir_empty(pi_desc));
 }
 
+void vmx_pi_start_assignment(struct kvm *kvm, int device_count)
+{
+	struct kvm_vcpu *vcpu;
+	int i;
+
+	if (!irq_remapping_cap(IRQ_POSTING_CAP))
+		return;
+
+	/* only care about first device assignment */
+	if (device_count != 1)
+		return;
+
+	/* Update wakeup vector and add vcpu to blocked_vcpu_list */
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		if (!kvm_vcpu_apicv_active(vcpu))
+			continue;
+
+		kvm_vcpu_kick(vcpu);
+	}
+}
 
 /*
  * pi_update_irte - set IRTE for Posted-Interrupts
Index: kvm/arch/x86/kvm/vmx/posted_intr.h
===================================================================
--- kvm.orig/arch/x86/kvm/vmx/posted_intr.h
+++ kvm/arch/x86/kvm/vmx/posted_intr.h
@@ -95,5 +95,7 @@ void __init pi_init_cpu(int cpu);
 bool pi_has_pending_interrupt(struct kvm_vcpu *vcpu);
 int pi_update_irte(struct kvm *kvm, unsigned int host_irq, uint32_t guest_irq,
 		   bool set);
+void vmx_pi_start_assignment(struct kvm *kvm, int device_count);
+int vmx_vcpu_check_block(struct kvm_vcpu *vcpu);
 
 #endif /* __KVM_X86_VMX_POSTED_INTR_H */
Index: kvm/arch/x86/kvm/vmx/vmx.c
===================================================================
--- kvm.orig/arch/x86/kvm/vmx/vmx.c
+++ kvm/arch/x86/kvm/vmx/vmx.c
@@ -7727,13 +7727,13 @@ static struct kvm_x86_ops vmx_x86_ops __
 
 	.pre_block = vmx_pre_block,
 	.post_block = vmx_post_block,
-	.vcpu_check_block = NULL,
+	.vcpu_check_block = vmx_vcpu_check_block,
 
 	.pmu_ops = &intel_pmu_ops,
 	.nested_ops = &vmx_nested_ops,
 
 	.update_pi_irte = pi_update_irte,
-	.start_assignment = NULL,
+	.start_assignment = vmx_pi_start_assignment,
 
 #ifdef CONFIG_X86_64
 	.set_hv_timer = vmx_set_hv_timer,



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch 4/4] KVM: VMX: update vcpu posted-interrupt descriptor when assigning device
  2021-05-07 13:06 ` [patch 4/4] KVM: VMX: update vcpu posted-interrupt descriptor when assigning device Marcelo Tosatti
@ 2021-05-07 17:22   ` Sean Christopherson
  2021-05-07 19:29     ` Peter Xu
  0 siblings, 1 reply; 26+ messages in thread
From: Sean Christopherson @ 2021-05-07 17:22 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm, Paolo Bonzini, Alex Williamson, Pei Zhang

On Fri, May 07, 2021, Marcelo Tosatti wrote:
> Index: kvm/arch/x86/kvm/vmx/posted_intr.c
> ===================================================================
> --- kvm.orig/arch/x86/kvm/vmx/posted_intr.c
> +++ kvm/arch/x86/kvm/vmx/posted_intr.c
> @@ -203,6 +203,25 @@ void pi_post_block(struct kvm_vcpu *vcpu
>  	local_irq_enable();
>  }
>  
> +int vmx_vcpu_check_block(struct kvm_vcpu *vcpu)
> +{
> +	struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> +
> +	if (!irq_remapping_cap(IRQ_POSTING_CAP))
> +		return 0;
> +
> +	if (!kvm_vcpu_apicv_active(vcpu))
> +		return 0;
> +
> +	if (!kvm_arch_has_assigned_device(vcpu->kvm))
> +		return 0;
> +
> +	if (pi_desc->nv == POSTED_INTR_WAKEUP_VECTOR)
> +		return 0;
> +
> +	return 1;

IIUC, the logic is to bail out of the block loop if the VM has an assigned
device, but the blocking vCPU didn't reconfigure the PI.NV to the wakeup vector,
i.e. the assigned device came along after the initial check in vcpu_block().
That makes sense, but you can add a comment somewhere in/above this function?

> +}
> +
>  /*
>   * Handler for POSTED_INTERRUPT_WAKEUP_VECTOR.
>   */
> @@ -236,6 +255,26 @@ bool pi_has_pending_interrupt(struct kvm
>  		(pi_test_sn(pi_desc) && !pi_is_pir_empty(pi_desc));
>  }
>  
> +void vmx_pi_start_assignment(struct kvm *kvm, int device_count)
> +{
> +	struct kvm_vcpu *vcpu;
> +	int i;
> +
> +	if (!irq_remapping_cap(IRQ_POSTING_CAP))
> +		return;
> +
> +	/* only care about first device assignment */
> +	if (device_count != 1)
> +		return;
> +
> +	/* Update wakeup vector and add vcpu to blocked_vcpu_list */

Can you expand this comment, too?  Specifically, I think what you're saying is
that the wakeup will cause the vCPU to bail out of kvm_vcpu_block() and go back
through vcpu_block() and thus pi_pre_block().

> +	kvm_for_each_vcpu(i, vcpu, kvm) {
> +		if (!kvm_vcpu_apicv_active(vcpu))
> +			continue;
> +
> +		kvm_vcpu_kick(vcpu);

Actually, can't we avoid the full kick and instead just do kvm_vcpu_wake_up()?
If the vCPU is in guest mode, i.e. kvm_arch_vcpu_should_kick() returns true,
then by definition it can't be blocking.  And if it about to block, it's
guaranteed to see the assigned device.

> +	}
> +}

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch 1/4] KVM: x86: add start_assignment hook to kvm_x86_ops
  2021-05-07 13:06 ` [patch 1/4] KVM: x86: add start_assignment hook to kvm_x86_ops Marcelo Tosatti
@ 2021-05-07 19:16   ` Peter Xu
  2021-05-10 17:53     ` Marcelo Tosatti
  0 siblings, 1 reply; 26+ messages in thread
From: Peter Xu @ 2021-05-07 19:16 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm, Paolo Bonzini, Alex Williamson, Sean Christopherson

On Fri, May 07, 2021 at 10:06:10AM -0300, Marcelo Tosatti wrote:
> Add a start_assignment hook to kvm_x86_ops, which is called when 
> kvm_arch_start_assignment is done.
> 
> The hook is required to update the wakeup vector of a sleeping vCPU
> when a device is assigned to the guest.
> 
> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> 
> Index: kvm/arch/x86/include/asm/kvm_host.h
> ===================================================================
> --- kvm.orig/arch/x86/include/asm/kvm_host.h
> +++ kvm/arch/x86/include/asm/kvm_host.h
> @@ -1322,6 +1322,7 @@ struct kvm_x86_ops {
>  
>  	int (*update_pi_irte)(struct kvm *kvm, unsigned int host_irq,
>  			      uint32_t guest_irq, bool set);
> +	void (*start_assignment)(struct kvm *kvm, int device_count);

I'm thinking what the hook could do with the device_count besides comparing it
against 1...

If we can't think of any, perhaps we can directly make it an enablement hook
instead (so we avoid calling the hook at all when count>1)?

   /* Called when the first assignment registers (count from 0 to 1) */
   void (*enable_assignment)(struct kvm *kvm);

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch 4/4] KVM: VMX: update vcpu posted-interrupt descriptor when assigning device
  2021-05-07 17:22   ` Sean Christopherson
@ 2021-05-07 19:29     ` Peter Xu
  2021-05-07 22:08       ` Marcelo Tosatti
  0 siblings, 1 reply; 26+ messages in thread
From: Peter Xu @ 2021-05-07 19:29 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Marcelo Tosatti, kvm, Paolo Bonzini, Alex Williamson, Pei Zhang

On Fri, May 07, 2021 at 05:22:07PM +0000, Sean Christopherson wrote:
> On Fri, May 07, 2021, Marcelo Tosatti wrote:
> > Index: kvm/arch/x86/kvm/vmx/posted_intr.c
> > ===================================================================
> > --- kvm.orig/arch/x86/kvm/vmx/posted_intr.c
> > +++ kvm/arch/x86/kvm/vmx/posted_intr.c
> > @@ -203,6 +203,25 @@ void pi_post_block(struct kvm_vcpu *vcpu
> >  	local_irq_enable();
> >  }
> >  
> > +int vmx_vcpu_check_block(struct kvm_vcpu *vcpu)
> > +{
> > +	struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> > +
> > +	if (!irq_remapping_cap(IRQ_POSTING_CAP))
> > +		return 0;
> > +
> > +	if (!kvm_vcpu_apicv_active(vcpu))
> > +		return 0;
> > +
> > +	if (!kvm_arch_has_assigned_device(vcpu->kvm))
> > +		return 0;
> > +
> > +	if (pi_desc->nv == POSTED_INTR_WAKEUP_VECTOR)
> > +		return 0;
> > +
> > +	return 1;
> 
> IIUC, the logic is to bail out of the block loop if the VM has an assigned
> device, but the blocking vCPU didn't reconfigure the PI.NV to the wakeup vector,
> i.e. the assigned device came along after the initial check in vcpu_block().
> That makes sense, but you can add a comment somewhere in/above this function?

Wondering whether we should add a pi_test_on() check in kvm_vcpu_has_events()
somehow, so that even without customized ->vcpu_check_block we should be able
to break the block loop (as kvm_arch_vcpu_runnable will return true properly)?

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch 4/4] KVM: VMX: update vcpu posted-interrupt descriptor when assigning device
  2021-05-07 19:29     ` Peter Xu
@ 2021-05-07 22:08       ` Marcelo Tosatti
  2021-05-11 14:39         ` Peter Xu
  0 siblings, 1 reply; 26+ messages in thread
From: Marcelo Tosatti @ 2021-05-07 22:08 UTC (permalink / raw)
  To: Peter Xu
  Cc: Sean Christopherson, kvm, Paolo Bonzini, Alex Williamson,
	Pei Zhang

On Fri, May 07, 2021 at 03:29:05PM -0400, Peter Xu wrote:
> On Fri, May 07, 2021 at 05:22:07PM +0000, Sean Christopherson wrote:
> > On Fri, May 07, 2021, Marcelo Tosatti wrote:
> > > Index: kvm/arch/x86/kvm/vmx/posted_intr.c
> > > ===================================================================
> > > --- kvm.orig/arch/x86/kvm/vmx/posted_intr.c
> > > +++ kvm/arch/x86/kvm/vmx/posted_intr.c
> > > @@ -203,6 +203,25 @@ void pi_post_block(struct kvm_vcpu *vcpu
> > >  	local_irq_enable();
> > >  }
> > >  
> > > +int vmx_vcpu_check_block(struct kvm_vcpu *vcpu)
> > > +{
> > > +	struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> > > +
> > > +	if (!irq_remapping_cap(IRQ_POSTING_CAP))
> > > +		return 0;
> > > +
> > > +	if (!kvm_vcpu_apicv_active(vcpu))
> > > +		return 0;
> > > +
> > > +	if (!kvm_arch_has_assigned_device(vcpu->kvm))
> > > +		return 0;
> > > +
> > > +	if (pi_desc->nv == POSTED_INTR_WAKEUP_VECTOR)
> > > +		return 0;
> > > +
> > > +	return 1;
> > 
> > IIUC, the logic is to bail out of the block loop if the VM has an assigned
> > device, but the blocking vCPU didn't reconfigure the PI.NV to the wakeup vector,
> > i.e. the assigned device came along after the initial check in vcpu_block().
> > That makes sense, but you can add a comment somewhere in/above this function?
> 
> Wondering whether we should add a pi_test_on() check in kvm_vcpu_has_events()
> somehow, so that even without customized ->vcpu_check_block we should be able
> to break the block loop (as kvm_arch_vcpu_runnable will return true properly)?

static int kvm_vcpu_check_block(struct kvm_vcpu *vcpu)
{
        int ret = -EINTR;
        int idx = srcu_read_lock(&vcpu->kvm->srcu);

        if (kvm_arch_vcpu_runnable(vcpu)) {
                kvm_make_request(KVM_REQ_UNHALT, vcpu); <---
                goto out;
        }

Don't want to unhalt the vcpu.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [patch 4/4] KVM: VMX: update vcpu posted-interrupt descriptor when assigning device
  2021-05-10 17:26 [patch 0/4] VMX: configure posted interrupt descriptor when assigning device (v3) Marcelo Tosatti
@ 2021-05-10 17:26 ` Marcelo Tosatti
  2021-05-24 15:55   ` Paolo Bonzini
  0 siblings, 1 reply; 26+ messages in thread
From: Marcelo Tosatti @ 2021-05-10 17:26 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Alex Williamson, Sean Christopherson, Peter Xu,
	Pei Zhang, Marcelo Tosatti

For VMX, when a vcpu enters HLT emulation, pi_post_block will:

1) Add vcpu to per-cpu list of blocked vcpus.

2) Program the posted-interrupt descriptor "notification vector" 
to POSTED_INTR_WAKEUP_VECTOR

With interrupt remapping, an interrupt will set the PIR bit for the 
vector programmed for the device on the CPU, test-and-set the 
ON bit on the posted interrupt descriptor, and if the ON bit is clear
generate an interrupt for the notification vector.

This way, the target CPU wakes upon a device interrupt and wakes up
the target vcpu.

Problem is that pi_post_block only programs the notification vector
if kvm_arch_has_assigned_device() is true. Its possible for the
following to happen:

1) vcpu V HLTs on pcpu P, kvm_arch_has_assigned_device is false,
notification vector is not programmed
2) device is assigned to VM
3) device interrupts vcpu V, sets ON bit
(notification vector not programmed, so pcpu P remains in idle)
4) vcpu 0 IPIs vcpu V (in guest), but since pi descriptor ON bit is set,
kvm_vcpu_kick is skipped
5) vcpu 0 busy spins on vcpu V's response for several seconds, until
RCU watchdog NMIs all vCPUs.

To fix this, use the start_assignment kvm_x86_ops callback to kick
vcpus out of the halt loop, so the notification vector is 
properly reprogrammed to the wakeup vector.

Reported-by: Pei Zhang <pezhang@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>


Index: kvm/arch/x86/kvm/vmx/posted_intr.c
===================================================================
--- kvm.orig/arch/x86/kvm/vmx/posted_intr.c
+++ kvm/arch/x86/kvm/vmx/posted_intr.c
@@ -204,6 +204,32 @@ void pi_post_block(struct kvm_vcpu *vcpu
 }
 
 /*
+ * Bail out of the block loop if the VM has an assigned
+ * device, but the blocking vCPU didn't reconfigure the
+ * PI.NV to the wakeup vector, i.e. the assigned device
+ * came along after the initial check in vcpu_block().
+ */
+
+int vmx_vcpu_check_block(struct kvm_vcpu *vcpu)
+{
+	struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
+
+	if (!irq_remapping_cap(IRQ_POSTING_CAP))
+		return 0;
+
+	if (!kvm_vcpu_apicv_active(vcpu))
+		return 0;
+
+	if (!kvm_arch_has_assigned_device(vcpu->kvm))
+		return 0;
+
+	if (pi_desc->nv == POSTED_INTR_WAKEUP_VECTOR)
+		return 0;
+
+	return 1;
+}
+
+/*
  * Handler for POSTED_INTERRUPT_WAKEUP_VECTOR.
  */
 void pi_wakeup_handler(void)
@@ -236,6 +262,25 @@ bool pi_has_pending_interrupt(struct kvm
 		(pi_test_sn(pi_desc) && !pi_is_pir_empty(pi_desc));
 }
 
+void vmx_pi_start_assignment(struct kvm *kvm)
+{
+	struct kvm_vcpu *vcpu;
+	int i;
+
+	if (!irq_remapping_cap(IRQ_POSTING_CAP))
+		return;
+
+	/*
+	 * Wakeup will cause the vCPU to bail out of kvm_vcpu_block() and
+	 * go back through vcpu_block().
+	 */
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		if (!kvm_vcpu_apicv_active(vcpu))
+			continue;
+
+		kvm_vcpu_wake_up(vcpu);
+	}
+}
 
 /*
  * pi_update_irte - set IRTE for Posted-Interrupts
Index: kvm/arch/x86/kvm/vmx/posted_intr.h
===================================================================
--- kvm.orig/arch/x86/kvm/vmx/posted_intr.h
+++ kvm/arch/x86/kvm/vmx/posted_intr.h
@@ -95,5 +95,7 @@ void __init pi_init_cpu(int cpu);
 bool pi_has_pending_interrupt(struct kvm_vcpu *vcpu);
 int pi_update_irte(struct kvm *kvm, unsigned int host_irq, uint32_t guest_irq,
 		   bool set);
+void vmx_pi_start_assignment(struct kvm *kvm);
+int vmx_vcpu_check_block(struct kvm_vcpu *vcpu);
 
 #endif /* __KVM_X86_VMX_POSTED_INTR_H */
Index: kvm/arch/x86/kvm/vmx/vmx.c
===================================================================
--- kvm.orig/arch/x86/kvm/vmx/vmx.c
+++ kvm/arch/x86/kvm/vmx/vmx.c
@@ -7727,13 +7727,13 @@ static struct kvm_x86_ops vmx_x86_ops __
 
 	.pre_block = vmx_pre_block,
 	.post_block = vmx_post_block,
-	.vcpu_check_block = NULL,
+	.vcpu_check_block = vmx_vcpu_check_block,
 
 	.pmu_ops = &intel_pmu_ops,
 	.nested_ops = &vmx_nested_ops,
 
 	.update_pi_irte = pi_update_irte,
-	.start_assignment = NULL,
+	.start_assignment = vmx_pi_start_assignment,
 
 #ifdef CONFIG_X86_64
 	.set_hv_timer = vmx_set_hv_timer,



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch 1/4] KVM: x86: add start_assignment hook to kvm_x86_ops
  2021-05-07 19:16   ` Peter Xu
@ 2021-05-10 17:53     ` Marcelo Tosatti
  0 siblings, 0 replies; 26+ messages in thread
From: Marcelo Tosatti @ 2021-05-10 17:53 UTC (permalink / raw)
  To: Peter Xu; +Cc: kvm, Paolo Bonzini, Alex Williamson, Sean Christopherson

On Fri, May 07, 2021 at 03:16:00PM -0400, Peter Xu wrote:
> On Fri, May 07, 2021 at 10:06:10AM -0300, Marcelo Tosatti wrote:
> > Add a start_assignment hook to kvm_x86_ops, which is called when 
> > kvm_arch_start_assignment is done.
> > 
> > The hook is required to update the wakeup vector of a sleeping vCPU
> > when a device is assigned to the guest.
> > 
> > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> > 
> > Index: kvm/arch/x86/include/asm/kvm_host.h
> > ===================================================================
> > --- kvm.orig/arch/x86/include/asm/kvm_host.h
> > +++ kvm/arch/x86/include/asm/kvm_host.h
> > @@ -1322,6 +1322,7 @@ struct kvm_x86_ops {
> >  
> >  	int (*update_pi_irte)(struct kvm *kvm, unsigned int host_irq,
> >  			      uint32_t guest_irq, bool set);
> > +	void (*start_assignment)(struct kvm *kvm, int device_count);
> 
> I'm thinking what the hook could do with the device_count besides comparing it
> against 1...
> 
> If we can't think of any, perhaps we can directly make it an enablement hook
> instead (so we avoid calling the hook at all when count>1)?
> 
>    /* Called when the first assignment registers (count from 0 to 1) */
>    void (*enable_assignment)(struct kvm *kvm);

Sure, sounds good, just kept the original name...


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch 4/4] KVM: VMX: update vcpu posted-interrupt descriptor when assigning device
  2021-05-07 22:08       ` Marcelo Tosatti
@ 2021-05-11 14:39         ` Peter Xu
  2021-05-11 14:51           ` Marcelo Tosatti
  0 siblings, 1 reply; 26+ messages in thread
From: Peter Xu @ 2021-05-11 14:39 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Sean Christopherson, kvm, Paolo Bonzini, Alex Williamson,
	Pei Zhang

On Fri, May 07, 2021 at 07:08:31PM -0300, Marcelo Tosatti wrote:
> > Wondering whether we should add a pi_test_on() check in kvm_vcpu_has_events()
> > somehow, so that even without customized ->vcpu_check_block we should be able
> > to break the block loop (as kvm_arch_vcpu_runnable will return true properly)?
> 
> static int kvm_vcpu_check_block(struct kvm_vcpu *vcpu)
> {
>         int ret = -EINTR;
>         int idx = srcu_read_lock(&vcpu->kvm->srcu);
> 
>         if (kvm_arch_vcpu_runnable(vcpu)) {
>                 kvm_make_request(KVM_REQ_UNHALT, vcpu); <---
>                 goto out;
>         }
> 
> Don't want to unhalt the vcpu.

Could you elaborate?  It's not obvious to me why we can't do that if
pi_test_on() returns true..  we have pending post interrupts anyways, so
shouldn't we stop halting?  Thanks!

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch 4/4] KVM: VMX: update vcpu posted-interrupt descriptor when assigning device
  2021-05-11 14:39         ` Peter Xu
@ 2021-05-11 14:51           ` Marcelo Tosatti
  2021-05-11 16:19             ` Peter Xu
  0 siblings, 1 reply; 26+ messages in thread
From: Marcelo Tosatti @ 2021-05-11 14:51 UTC (permalink / raw)
  To: Peter Xu
  Cc: Sean Christopherson, kvm, Paolo Bonzini, Alex Williamson,
	Pei Zhang

On Tue, May 11, 2021 at 10:39:11AM -0400, Peter Xu wrote:
> On Fri, May 07, 2021 at 07:08:31PM -0300, Marcelo Tosatti wrote:
> > > Wondering whether we should add a pi_test_on() check in kvm_vcpu_has_events()
> > > somehow, so that even without customized ->vcpu_check_block we should be able
> > > to break the block loop (as kvm_arch_vcpu_runnable will return true properly)?
> > 
> > static int kvm_vcpu_check_block(struct kvm_vcpu *vcpu)
> > {
> >         int ret = -EINTR;
> >         int idx = srcu_read_lock(&vcpu->kvm->srcu);
> > 
> >         if (kvm_arch_vcpu_runnable(vcpu)) {
> >                 kvm_make_request(KVM_REQ_UNHALT, vcpu); <---
> >                 goto out;
> >         }
> > 
> > Don't want to unhalt the vcpu.
> 
> Could you elaborate?  It's not obvious to me why we can't do that if
> pi_test_on() returns true..  we have pending post interrupts anyways, so
> shouldn't we stop halting?  Thanks!

pi_test_on() only returns true when an interrupt is signalled by the
device. But the sequence of events is:


1. pCPU idles without notification vector configured to wakeup vector.

2. PCI device is hotplugged, assigned device count increases from 0 to 1.

<arbitrary amount of time>

3. device generates interrupt, sets ON bit to true in the posted
interrupt descriptor.

We want to exit kvm_vcpu_block after 2, but before 3 (where ON bit
is not set).



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch 4/4] KVM: VMX: update vcpu posted-interrupt descriptor when assigning device
  2021-05-11 14:51           ` Marcelo Tosatti
@ 2021-05-11 16:19             ` Peter Xu
  2021-05-11 17:18               ` Marcelo Tosatti
  0 siblings, 1 reply; 26+ messages in thread
From: Peter Xu @ 2021-05-11 16:19 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Sean Christopherson, kvm, Paolo Bonzini, Alex Williamson,
	Pei Zhang

On Tue, May 11, 2021 at 11:51:57AM -0300, Marcelo Tosatti wrote:
> On Tue, May 11, 2021 at 10:39:11AM -0400, Peter Xu wrote:
> > On Fri, May 07, 2021 at 07:08:31PM -0300, Marcelo Tosatti wrote:
> > > > Wondering whether we should add a pi_test_on() check in kvm_vcpu_has_events()
> > > > somehow, so that even without customized ->vcpu_check_block we should be able
> > > > to break the block loop (as kvm_arch_vcpu_runnable will return true properly)?
> > > 
> > > static int kvm_vcpu_check_block(struct kvm_vcpu *vcpu)
> > > {
> > >         int ret = -EINTR;
> > >         int idx = srcu_read_lock(&vcpu->kvm->srcu);
> > > 
> > >         if (kvm_arch_vcpu_runnable(vcpu)) {
> > >                 kvm_make_request(KVM_REQ_UNHALT, vcpu); <---
> > >                 goto out;
> > >         }
> > > 
> > > Don't want to unhalt the vcpu.
> > 
> > Could you elaborate?  It's not obvious to me why we can't do that if
> > pi_test_on() returns true..  we have pending post interrupts anyways, so
> > shouldn't we stop halting?  Thanks!
> 
> pi_test_on() only returns true when an interrupt is signalled by the
> device. But the sequence of events is:
> 
> 
> 1. pCPU idles without notification vector configured to wakeup vector.
> 
> 2. PCI device is hotplugged, assigned device count increases from 0 to 1.
> 
> <arbitrary amount of time>
> 
> 3. device generates interrupt, sets ON bit to true in the posted
> interrupt descriptor.
> 
> We want to exit kvm_vcpu_block after 2, but before 3 (where ON bit
> is not set).

Ah yes.. thanks.

Besides the current approach, I'm thinking maybe it'll be cleaner/less LOC to
define a KVM_REQ_UNBLOCK to replace the pre_block hook (in x86's kvm_host.h):

#define KVM_REQ_UNBLOCK			KVM_ARCH_REQ(31)

We can set it in vmx_pi_start_assignment(), then check+clear it in
kvm_vcpu_has_events() (or make it a bool in kvm_vcpu struct?).

The thing is current vmx_vcpu_check_block() is mostly a sanity check and
copy-paste of the pi checks on a few items, so maybe cleaner to use
KVM_REQ_UNBLOCK, as it might be reused in the future for re-evaluating of
pre-block for similar purpose?

No strong opinion, though.

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch 4/4] KVM: VMX: update vcpu posted-interrupt descriptor when assigning device
  2021-05-11 16:19             ` Peter Xu
@ 2021-05-11 17:18               ` Marcelo Tosatti
  2021-05-11 21:35                 ` Peter Xu
  0 siblings, 1 reply; 26+ messages in thread
From: Marcelo Tosatti @ 2021-05-11 17:18 UTC (permalink / raw)
  To: Peter Xu, Paolo Bonzini
  Cc: Sean Christopherson, kvm, Paolo Bonzini, Alex Williamson,
	Pei Zhang

On Tue, May 11, 2021 at 12:19:56PM -0400, Peter Xu wrote:
> On Tue, May 11, 2021 at 11:51:57AM -0300, Marcelo Tosatti wrote:
> > On Tue, May 11, 2021 at 10:39:11AM -0400, Peter Xu wrote:
> > > On Fri, May 07, 2021 at 07:08:31PM -0300, Marcelo Tosatti wrote:
> > > > > Wondering whether we should add a pi_test_on() check in kvm_vcpu_has_events()
> > > > > somehow, so that even without customized ->vcpu_check_block we should be able
> > > > > to break the block loop (as kvm_arch_vcpu_runnable will return true properly)?
> > > > 
> > > > static int kvm_vcpu_check_block(struct kvm_vcpu *vcpu)
> > > > {
> > > >         int ret = -EINTR;
> > > >         int idx = srcu_read_lock(&vcpu->kvm->srcu);
> > > > 
> > > >         if (kvm_arch_vcpu_runnable(vcpu)) {
> > > >                 kvm_make_request(KVM_REQ_UNHALT, vcpu); <---
> > > >                 goto out;
> > > >         }
> > > > 
> > > > Don't want to unhalt the vcpu.
> > > 
> > > Could you elaborate?  It's not obvious to me why we can't do that if
> > > pi_test_on() returns true..  we have pending post interrupts anyways, so
> > > shouldn't we stop halting?  Thanks!
> > 
> > pi_test_on() only returns true when an interrupt is signalled by the
> > device. But the sequence of events is:
> > 
> > 
> > 1. pCPU idles without notification vector configured to wakeup vector.
> > 
> > 2. PCI device is hotplugged, assigned device count increases from 0 to 1.
> > 
> > <arbitrary amount of time>
> > 
> > 3. device generates interrupt, sets ON bit to true in the posted
> > interrupt descriptor.
> > 
> > We want to exit kvm_vcpu_block after 2, but before 3 (where ON bit
> > is not set).
> 
> Ah yes.. thanks.
> 
> Besides the current approach, I'm thinking maybe it'll be cleaner/less LOC to
> define a KVM_REQ_UNBLOCK to replace the pre_block hook (in x86's kvm_host.h):
> 
> #define KVM_REQ_UNBLOCK			KVM_ARCH_REQ(31)
> 
> We can set it in vmx_pi_start_assignment(), then check+clear it in
> kvm_vcpu_has_events() (or make it a bool in kvm_vcpu struct?).

Can't check it in kvm_vcpu_has_events() because that will set
KVM_REQ_UNHALT (which we don't want).

I think KVM_REQ_UNBLOCK will add more lines of code.

> The thing is current vmx_vcpu_check_block() is mostly a sanity check and
> copy-paste of the pi checks on a few items, so maybe cleaner to use
> KVM_REQ_UNBLOCK, as it might be reused in the future for re-evaluating of
> pre-block for similar purpose?
> 
> No strong opinion, though.

Hum... IMHO v3 is quite clean already (although i don't object to your
suggestion).

Paolo, what do you think?




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch 4/4] KVM: VMX: update vcpu posted-interrupt descriptor when assigning device
  2021-05-11 17:18               ` Marcelo Tosatti
@ 2021-05-11 21:35                 ` Peter Xu
  2021-05-11 23:51                   ` Marcelo Tosatti
  0 siblings, 1 reply; 26+ messages in thread
From: Peter Xu @ 2021-05-11 21:35 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Paolo Bonzini, Sean Christopherson, kvm, Alex Williamson,
	Pei Zhang

[-- Attachment #1: Type: text/plain, Size: 3516 bytes --]

On Tue, May 11, 2021 at 02:18:10PM -0300, Marcelo Tosatti wrote:
> On Tue, May 11, 2021 at 12:19:56PM -0400, Peter Xu wrote:
> > On Tue, May 11, 2021 at 11:51:57AM -0300, Marcelo Tosatti wrote:
> > > On Tue, May 11, 2021 at 10:39:11AM -0400, Peter Xu wrote:
> > > > On Fri, May 07, 2021 at 07:08:31PM -0300, Marcelo Tosatti wrote:
> > > > > > Wondering whether we should add a pi_test_on() check in kvm_vcpu_has_events()
> > > > > > somehow, so that even without customized ->vcpu_check_block we should be able
> > > > > > to break the block loop (as kvm_arch_vcpu_runnable will return true properly)?
> > > > > 
> > > > > static int kvm_vcpu_check_block(struct kvm_vcpu *vcpu)
> > > > > {
> > > > >         int ret = -EINTR;
> > > > >         int idx = srcu_read_lock(&vcpu->kvm->srcu);
> > > > > 
> > > > >         if (kvm_arch_vcpu_runnable(vcpu)) {
> > > > >                 kvm_make_request(KVM_REQ_UNHALT, vcpu); <---
> > > > >                 goto out;
> > > > >         }
> > > > > 
> > > > > Don't want to unhalt the vcpu.
> > > > 
> > > > Could you elaborate?  It's not obvious to me why we can't do that if
> > > > pi_test_on() returns true..  we have pending post interrupts anyways, so
> > > > shouldn't we stop halting?  Thanks!
> > > 
> > > pi_test_on() only returns true when an interrupt is signalled by the
> > > device. But the sequence of events is:
> > > 
> > > 
> > > 1. pCPU idles without notification vector configured to wakeup vector.
> > > 
> > > 2. PCI device is hotplugged, assigned device count increases from 0 to 1.
> > > 
> > > <arbitrary amount of time>
> > > 
> > > 3. device generates interrupt, sets ON bit to true in the posted
> > > interrupt descriptor.
> > > 
> > > We want to exit kvm_vcpu_block after 2, but before 3 (where ON bit
> > > is not set).
> > 
> > Ah yes.. thanks.
> > 
> > Besides the current approach, I'm thinking maybe it'll be cleaner/less LOC to
> > define a KVM_REQ_UNBLOCK to replace the pre_block hook (in x86's kvm_host.h):
> > 
> > #define KVM_REQ_UNBLOCK			KVM_ARCH_REQ(31)
> > 
> > We can set it in vmx_pi_start_assignment(), then check+clear it in
> > kvm_vcpu_has_events() (or make it a bool in kvm_vcpu struct?).
> 
> Can't check it in kvm_vcpu_has_events() because that will set
> KVM_REQ_UNHALT (which we don't want).

I thought it was okay to break the guest HLT? As IMHO the guest code should
always be able to re-run the HLT when interrupted?  As IIUC HLT can easily be
interrupted by e.g., SMIs, according to SDM Vol.2.  Not to mention vfio hotplug
should be rare, and we'll only trigger this once for the 1st device.

> 
> I think KVM_REQ_UNBLOCK will add more lines of code.

It's very possible I overlooked something above... but if breaking HLT
unregularly is okay, I attached one patch that is based on your v3 series, just
dropped the vcpu_check_block() but use KVM_REQ_UNBLOCK (no compile test even,
just to satisfy my own curiosity on how many loc we can save.. :), it gives me:

 7 files changed, 5 insertions(+), 41 deletions(-)

But again, I could have missed something...

Thanks,

> 
> > The thing is current vmx_vcpu_check_block() is mostly a sanity check and
> > copy-paste of the pi checks on a few items, so maybe cleaner to use
> > KVM_REQ_UNBLOCK, as it might be reused in the future for re-evaluating of
> > pre-block for similar purpose?
> > 
> > No strong opinion, though.
> 
> Hum... IMHO v3 is quite clean already (although i don't object to your
> suggestion).
> 
> Paolo, what do you think?
> 
> 
> 

-- 
Peter Xu

[-- Attachment #2: 0001-replace-vcpu_check_block-hook-with-KVM_REQ_UNBLOCK.patch --]
[-- Type: text/plain, Size: 5567 bytes --]

From 1131248f3c8f1f2715dd49d439c9fab25b4db9b8 Mon Sep 17 00:00:00 2001
From: Peter Xu <peterx@redhat.com>
Date: Tue, 11 May 2021 17:33:21 -0400
Subject: [PATCH] replace vcpu_check_block() hook with KVM_REQ_UNBLOCK

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |  1 -
 arch/x86/include/asm/kvm_host.h    | 12 +-----------
 arch/x86/kvm/svm/svm.c             |  1 -
 arch/x86/kvm/vmx/posted_intr.c     | 27 +--------------------------
 arch/x86/kvm/vmx/posted_intr.h     |  1 -
 arch/x86/kvm/vmx/vmx.c             |  1 -
 arch/x86/kvm/x86.c                 |  3 +++
 7 files changed, 5 insertions(+), 41 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index fc99fb779fd21..e7bef91cee04a 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -98,7 +98,6 @@ KVM_X86_OP_NULL(pre_block)
 KVM_X86_OP_NULL(post_block)
 KVM_X86_OP_NULL(vcpu_blocking)
 KVM_X86_OP_NULL(vcpu_unblocking)
-KVM_X86_OP_NULL(vcpu_check_block)
 KVM_X86_OP_NULL(update_pi_irte)
 KVM_X86_OP_NULL(start_assignment)
 KVM_X86_OP_NULL(apicv_post_state_restore)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 5bf7bd0e59582..74ab042e9b146 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -91,6 +91,7 @@
 #define KVM_REQ_MSR_FILTER_CHANGED	KVM_ARCH_REQ(29)
 #define KVM_REQ_UPDATE_CPU_DIRTY_LOGGING \
 	KVM_ARCH_REQ_FLAGS(30, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
+#define KVM_REQ_UNBLOCK			KVM_ARCH_REQ(31)
 
 #define CR0_RESERVED_BITS                                               \
 	(~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
@@ -1350,8 +1351,6 @@ struct kvm_x86_ops {
 	void (*vcpu_blocking)(struct kvm_vcpu *vcpu);
 	void (*vcpu_unblocking)(struct kvm_vcpu *vcpu);
 
-	int (*vcpu_check_block)(struct kvm_vcpu *vcpu);
-
 	int (*update_pi_irte)(struct kvm *kvm, unsigned int host_irq,
 			      uint32_t guest_irq, bool set);
 	void (*start_assignment)(struct kvm *kvm);
@@ -1835,15 +1834,6 @@ static inline bool kvm_irq_is_postable(struct kvm_lapic_irq *irq)
 		irq->delivery_mode == APIC_DM_LOWEST);
 }
 
-#define __KVM_HAVE_ARCH_VCPU_CHECK_BLOCK
-static inline int kvm_arch_vcpu_check_block(struct kvm_vcpu *vcpu)
-{
-	if (kvm_x86_ops.vcpu_check_block)
-		return static_call(kvm_x86_vcpu_check_block)(vcpu);
-
-	return 0;
-}
-
 static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu)
 {
 	static_call_cond(kvm_x86_vcpu_blocking)(vcpu);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index cda5ccb4d9d1b..8b03795cfcd11 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4459,7 +4459,6 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.vcpu_put = svm_vcpu_put,
 	.vcpu_blocking = svm_vcpu_blocking,
 	.vcpu_unblocking = svm_vcpu_unblocking,
-	.vcpu_check_block = NULL,
 
 	.update_exception_bitmap = svm_update_exception_bitmap,
 	.get_msr_feature = svm_get_msr_feature,
diff --git a/arch/x86/kvm/vmx/posted_intr.c b/arch/x86/kvm/vmx/posted_intr.c
index 2d0d009965530..0b74d598ebcbd 100644
--- a/arch/x86/kvm/vmx/posted_intr.c
+++ b/arch/x86/kvm/vmx/posted_intr.c
@@ -203,32 +203,6 @@ void pi_post_block(struct kvm_vcpu *vcpu)
 	local_irq_enable();
 }
 
-/*
- * Bail out of the block loop if the VM has an assigned
- * device, but the blocking vCPU didn't reconfigure the
- * PI.NV to the wakeup vector, i.e. the assigned device
- * came along after the initial check in vcpu_block().
- */
-
-int vmx_vcpu_check_block(struct kvm_vcpu *vcpu)
-{
-	struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
-
-	if (!irq_remapping_cap(IRQ_POSTING_CAP))
-		return 0;
-
-	if (!kvm_vcpu_apicv_active(vcpu))
-		return 0;
-
-	if (!kvm_arch_has_assigned_device(vcpu->kvm))
-		return 0;
-
-	if (pi_desc->nv == POSTED_INTR_WAKEUP_VECTOR)
-		return 0;
-
-	return 1;
-}
-
 /*
  * Handler for POSTED_INTERRUPT_WAKEUP_VECTOR.
  */
@@ -278,6 +252,7 @@ void vmx_pi_start_assignment(struct kvm *kvm)
 		if (!kvm_vcpu_apicv_active(vcpu))
 			continue;
 
+		kvm_make_request(KVM_REQ_UNBLOCK, vcpu);
 		kvm_vcpu_wake_up(vcpu);
 	}
 }
diff --git a/arch/x86/kvm/vmx/posted_intr.h b/arch/x86/kvm/vmx/posted_intr.h
index 2aa082fd1c7ab..7f7b2326caf53 100644
--- a/arch/x86/kvm/vmx/posted_intr.h
+++ b/arch/x86/kvm/vmx/posted_intr.h
@@ -96,6 +96,5 @@ bool pi_has_pending_interrupt(struct kvm_vcpu *vcpu);
 int pi_update_irte(struct kvm *kvm, unsigned int host_irq, uint32_t guest_irq,
 		   bool set);
 void vmx_pi_start_assignment(struct kvm *kvm);
-int vmx_vcpu_check_block(struct kvm_vcpu *vcpu);
 
 #endif /* __KVM_X86_VMX_POSTED_INTR_H */
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index ab68fed8b7e43..639ec3eba9b80 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7716,7 +7716,6 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
 
 	.pre_block = vmx_pre_block,
 	.post_block = vmx_post_block,
-	.vcpu_check_block = vmx_vcpu_check_block,
 
 	.pmu_ops = &intel_pmu_ops,
 	.nested_ops = &vmx_nested_ops,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e6fee59b5dab6..739e1bd59e8a9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11177,6 +11177,9 @@ static inline bool kvm_vcpu_has_events(struct kvm_vcpu *vcpu)
 	     static_call(kvm_x86_smi_allowed)(vcpu, false)))
 		return true;
 
+	if (kvm_check_request(KVM_REQ_UNBLOCK, vcpu))
+		return true;
+
 	if (kvm_arch_interrupt_allowed(vcpu) &&
 	    (kvm_cpu_has_interrupt(vcpu) ||
 	    kvm_guest_apic_has_interrupt(vcpu)))
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [patch 4/4] KVM: VMX: update vcpu posted-interrupt descriptor when assigning device
  2021-05-11 21:35                 ` Peter Xu
@ 2021-05-11 23:51                   ` Marcelo Tosatti
  2021-05-12  0:02                     ` Marcelo Tosatti
  0 siblings, 1 reply; 26+ messages in thread
From: Marcelo Tosatti @ 2021-05-11 23:51 UTC (permalink / raw)
  To: Peter Xu
  Cc: Paolo Bonzini, Sean Christopherson, kvm, Alex Williamson,
	Pei Zhang

On Tue, May 11, 2021 at 05:35:41PM -0400, Peter Xu wrote:
> On Tue, May 11, 2021 at 02:18:10PM -0300, Marcelo Tosatti wrote:
> > On Tue, May 11, 2021 at 12:19:56PM -0400, Peter Xu wrote:
> > > On Tue, May 11, 2021 at 11:51:57AM -0300, Marcelo Tosatti wrote:
> > > > On Tue, May 11, 2021 at 10:39:11AM -0400, Peter Xu wrote:
> > > > > On Fri, May 07, 2021 at 07:08:31PM -0300, Marcelo Tosatti wrote:
> > > > > > > Wondering whether we should add a pi_test_on() check in kvm_vcpu_has_events()
> > > > > > > somehow, so that even without customized ->vcpu_check_block we should be able
> > > > > > > to break the block loop (as kvm_arch_vcpu_runnable will return true properly)?
> > > > > > 
> > > > > > static int kvm_vcpu_check_block(struct kvm_vcpu *vcpu)
> > > > > > {
> > > > > >         int ret = -EINTR;
> > > > > >         int idx = srcu_read_lock(&vcpu->kvm->srcu);
> > > > > > 
> > > > > >         if (kvm_arch_vcpu_runnable(vcpu)) {
> > > > > >                 kvm_make_request(KVM_REQ_UNHALT, vcpu); <---
> > > > > >                 goto out;
> > > > > >         }
> > > > > > 
> > > > > > Don't want to unhalt the vcpu.
> > > > > 
> > > > > Could you elaborate?  It's not obvious to me why we can't do that if
> > > > > pi_test_on() returns true..  we have pending post interrupts anyways, so
> > > > > shouldn't we stop halting?  Thanks!
> > > > 
> > > > pi_test_on() only returns true when an interrupt is signalled by the
> > > > device. But the sequence of events is:
> > > > 
> > > > 
> > > > 1. pCPU idles without notification vector configured to wakeup vector.
> > > > 
> > > > 2. PCI device is hotplugged, assigned device count increases from 0 to 1.
> > > > 
> > > > <arbitrary amount of time>
> > > > 
> > > > 3. device generates interrupt, sets ON bit to true in the posted
> > > > interrupt descriptor.
> > > > 
> > > > We want to exit kvm_vcpu_block after 2, but before 3 (where ON bit
> > > > is not set).
> > > 
> > > Ah yes.. thanks.
> > > 
> > > Besides the current approach, I'm thinking maybe it'll be cleaner/less LOC to
> > > define a KVM_REQ_UNBLOCK to replace the pre_block hook (in x86's kvm_host.h):
> > > 
> > > #define KVM_REQ_UNBLOCK			KVM_ARCH_REQ(31)
> > > 
> > > We can set it in vmx_pi_start_assignment(), then check+clear it in
> > > kvm_vcpu_has_events() (or make it a bool in kvm_vcpu struct?).
> > 
> > Can't check it in kvm_vcpu_has_events() because that will set
> > KVM_REQ_UNHALT (which we don't want).
> 
> I thought it was okay to break the guest HLT? 

Intel:

"HLT-HALT

Description

Stops instruction execution and places the processor in a HALT state. An enabled interrupt (including NMI and
SMI), a debug exception, the BINIT# signal, the INIT# signal, or the RESET# signal will resume execution. If an
interrupt (including NMI) is used to resume execution after a HLT instruction, the saved instruction pointer
(CS:EIP) points to the instruction following the HLT instruction."

AMD:

"6.5 Processor Halt
The processor halt instruction (HLT) halts instruction execution, leaving the processor in the halt state.
No registers or machine state are modified as a result of executing the HLT instruction. The processor
remains in the halt state until one of the following occurs:
• A non-maskable interrupt (NMI).
• An enabled, maskable interrupt (INTR).
• Processor reset (RESET).
• Processor initialization (INIT).
• System-management interrupt (SMI)."

The KVM_REQ_UNBLOCK patch will resume execution even any such event
occuring. So the behaviour would be different from baremetal.

> As IMHO the guest code should
> always be able to re-run the HLT when interrupted?  As IIUC HLT can easily be
> interrupted by e.g., SMIs, according to SDM Vol.2.  

CPU will by default return to HLT'ed state, not continue to the
instruction following HLT, on SMI:

34.10 AUTO HALT RESTART
If the processor is in a HALT state (due to the prior execution of a HLT instruction) when it receives an SMI, the
processor records the fact in the auto HALT restart flag in the saved processor state (see Figure 34-3). (This flag is
located at offset 7F02H and bit 0 in the state save area of the SMRAM.)
If the processor sets the auto HALT restart flag upon entering SMM (indicating that the SMI occurred when the
processor was in the HALT state), the SMI handler has two options:
* It can leave the auto HALT restart flag set, which instructs the RSM instruction to return program control to the
HLT instruction. This option in effect causes the processor to re-enter the HALT state after handling the SMI.
(This is the default operation.)
* It can clear the auto HALT restart flag, which instructs the RSM instruction to return program control to the
instruction following the HLT instruction.

> Not to mention vfio hotplug
> should be rare, and we'll only trigger this once for the 1st device.
> 
> > 
> > I think KVM_REQ_UNBLOCK will add more lines of code.
> 
> It's very possible I overlooked something above... but if breaking HLT
> unregularly is okay, I attached one patch that is based on your v3 series, just
> dropped the vcpu_check_block() but use KVM_REQ_UNBLOCK (no compile test even,
> just to satisfy my own curiosity on how many loc we can save.. :), it gives me:
> 
>  7 files changed, 5 insertions(+), 41 deletions(-)
> 
> But again, I could have missed something...
> 
> Thanks,
> 
> > 
> > > The thing is current vmx_vcpu_check_block() is mostly a sanity check and
> > > copy-paste of the pi checks on a few items, so maybe cleaner to use
> > > KVM_REQ_UNBLOCK, as it might be reused in the future for re-evaluating of
> > > pre-block for similar purpose?
> > > 
> > > No strong opinion, though.
> > 
> > Hum... IMHO v3 is quite clean already (although i don't object to your
> > suggestion).
> > 
> > Paolo, what do you think?
> > 
> > 
> > 
> 
> -- 
> Peter Xu

> >From 1131248f3c8f1f2715dd49d439c9fab25b4db9b8 Mon Sep 17 00:00:00 2001
> From: Peter Xu <peterx@redhat.com>
> Date: Tue, 11 May 2021 17:33:21 -0400
> Subject: [PATCH] replace vcpu_check_block() hook with KVM_REQ_UNBLOCK
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  arch/x86/include/asm/kvm-x86-ops.h |  1 -
>  arch/x86/include/asm/kvm_host.h    | 12 +-----------
>  arch/x86/kvm/svm/svm.c             |  1 -
>  arch/x86/kvm/vmx/posted_intr.c     | 27 +--------------------------
>  arch/x86/kvm/vmx/posted_intr.h     |  1 -
>  arch/x86/kvm/vmx/vmx.c             |  1 -
>  arch/x86/kvm/x86.c                 |  3 +++
>  7 files changed, 5 insertions(+), 41 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> index fc99fb779fd21..e7bef91cee04a 100644
> --- a/arch/x86/include/asm/kvm-x86-ops.h
> +++ b/arch/x86/include/asm/kvm-x86-ops.h
> @@ -98,7 +98,6 @@ KVM_X86_OP_NULL(pre_block)
>  KVM_X86_OP_NULL(post_block)
>  KVM_X86_OP_NULL(vcpu_blocking)
>  KVM_X86_OP_NULL(vcpu_unblocking)
> -KVM_X86_OP_NULL(vcpu_check_block)
>  KVM_X86_OP_NULL(update_pi_irte)
>  KVM_X86_OP_NULL(start_assignment)
>  KVM_X86_OP_NULL(apicv_post_state_restore)
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 5bf7bd0e59582..74ab042e9b146 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -91,6 +91,7 @@
>  #define KVM_REQ_MSR_FILTER_CHANGED	KVM_ARCH_REQ(29)
>  #define KVM_REQ_UPDATE_CPU_DIRTY_LOGGING \
>  	KVM_ARCH_REQ_FLAGS(30, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
> +#define KVM_REQ_UNBLOCK			KVM_ARCH_REQ(31)
>  
>  #define CR0_RESERVED_BITS                                               \
>  	(~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
> @@ -1350,8 +1351,6 @@ struct kvm_x86_ops {
>  	void (*vcpu_blocking)(struct kvm_vcpu *vcpu);
>  	void (*vcpu_unblocking)(struct kvm_vcpu *vcpu);
>  
> -	int (*vcpu_check_block)(struct kvm_vcpu *vcpu);
> -
>  	int (*update_pi_irte)(struct kvm *kvm, unsigned int host_irq,
>  			      uint32_t guest_irq, bool set);
>  	void (*start_assignment)(struct kvm *kvm);
> @@ -1835,15 +1834,6 @@ static inline bool kvm_irq_is_postable(struct kvm_lapic_irq *irq)
>  		irq->delivery_mode == APIC_DM_LOWEST);
>  }
>  
> -#define __KVM_HAVE_ARCH_VCPU_CHECK_BLOCK
> -static inline int kvm_arch_vcpu_check_block(struct kvm_vcpu *vcpu)
> -{
> -	if (kvm_x86_ops.vcpu_check_block)
> -		return static_call(kvm_x86_vcpu_check_block)(vcpu);
> -
> -	return 0;
> -}
> -
>  static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu)
>  {
>  	static_call_cond(kvm_x86_vcpu_blocking)(vcpu);
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index cda5ccb4d9d1b..8b03795cfcd11 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -4459,7 +4459,6 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
>  	.vcpu_put = svm_vcpu_put,
>  	.vcpu_blocking = svm_vcpu_blocking,
>  	.vcpu_unblocking = svm_vcpu_unblocking,
> -	.vcpu_check_block = NULL,
>  
>  	.update_exception_bitmap = svm_update_exception_bitmap,
>  	.get_msr_feature = svm_get_msr_feature,
> diff --git a/arch/x86/kvm/vmx/posted_intr.c b/arch/x86/kvm/vmx/posted_intr.c
> index 2d0d009965530..0b74d598ebcbd 100644
> --- a/arch/x86/kvm/vmx/posted_intr.c
> +++ b/arch/x86/kvm/vmx/posted_intr.c
> @@ -203,32 +203,6 @@ void pi_post_block(struct kvm_vcpu *vcpu)
>  	local_irq_enable();
>  }
>  
> -/*
> - * Bail out of the block loop if the VM has an assigned
> - * device, but the blocking vCPU didn't reconfigure the
> - * PI.NV to the wakeup vector, i.e. the assigned device
> - * came along after the initial check in vcpu_block().
> - */
> -
> -int vmx_vcpu_check_block(struct kvm_vcpu *vcpu)
> -{
> -	struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> -
> -	if (!irq_remapping_cap(IRQ_POSTING_CAP))
> -		return 0;
> -
> -	if (!kvm_vcpu_apicv_active(vcpu))
> -		return 0;
> -
> -	if (!kvm_arch_has_assigned_device(vcpu->kvm))
> -		return 0;
> -
> -	if (pi_desc->nv == POSTED_INTR_WAKEUP_VECTOR)
> -		return 0;
> -
> -	return 1;
> -}
> -
>  /*
>   * Handler for POSTED_INTERRUPT_WAKEUP_VECTOR.
>   */
> @@ -278,6 +252,7 @@ void vmx_pi_start_assignment(struct kvm *kvm)
>  		if (!kvm_vcpu_apicv_active(vcpu))
>  			continue;
>  
> +		kvm_make_request(KVM_REQ_UNBLOCK, vcpu);
>  		kvm_vcpu_wake_up(vcpu);
>  	}
>  }
> diff --git a/arch/x86/kvm/vmx/posted_intr.h b/arch/x86/kvm/vmx/posted_intr.h
> index 2aa082fd1c7ab..7f7b2326caf53 100644
> --- a/arch/x86/kvm/vmx/posted_intr.h
> +++ b/arch/x86/kvm/vmx/posted_intr.h
> @@ -96,6 +96,5 @@ bool pi_has_pending_interrupt(struct kvm_vcpu *vcpu);
>  int pi_update_irte(struct kvm *kvm, unsigned int host_irq, uint32_t guest_irq,
>  		   bool set);
>  void vmx_pi_start_assignment(struct kvm *kvm);
> -int vmx_vcpu_check_block(struct kvm_vcpu *vcpu);
>  
>  #endif /* __KVM_X86_VMX_POSTED_INTR_H */
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index ab68fed8b7e43..639ec3eba9b80 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7716,7 +7716,6 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
>  
>  	.pre_block = vmx_pre_block,
>  	.post_block = vmx_post_block,
> -	.vcpu_check_block = vmx_vcpu_check_block,
>  
>  	.pmu_ops = &intel_pmu_ops,
>  	.nested_ops = &vmx_nested_ops,
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index e6fee59b5dab6..739e1bd59e8a9 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -11177,6 +11177,9 @@ static inline bool kvm_vcpu_has_events(struct kvm_vcpu *vcpu)
>  	     static_call(kvm_x86_smi_allowed)(vcpu, false)))
>  		return true;
>  
> +	if (kvm_check_request(KVM_REQ_UNBLOCK, vcpu))
> +		return true;
> +
>  	if (kvm_arch_interrupt_allowed(vcpu) &&
>  	    (kvm_cpu_has_interrupt(vcpu) ||
>  	    kvm_guest_apic_has_interrupt(vcpu)))
> -- 
> 2.31.1
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [patch 4/4] KVM: VMX: update vcpu posted-interrupt descriptor when assigning device
  2021-05-11 23:57 [patch 0/4] VMX: configure posted interrupt descriptor when assigning device (v4) Marcelo Tosatti
@ 2021-05-11 23:57 ` Marcelo Tosatti
  0 siblings, 0 replies; 26+ messages in thread
From: Marcelo Tosatti @ 2021-05-11 23:57 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Alex Williamson, Sean Christopherson, Peter Xu,
	Pei Zhang, Marcelo Tosatti

For VMX, when a vcpu enters HLT emulation, pi_post_block will:

1) Add vcpu to per-cpu list of blocked vcpus.

2) Program the posted-interrupt descriptor "notification vector" 
to POSTED_INTR_WAKEUP_VECTOR

With interrupt remapping, an interrupt will set the PIR bit for the 
vector programmed for the device on the CPU, test-and-set the 
ON bit on the posted interrupt descriptor, and if the ON bit is clear
generate an interrupt for the notification vector.

This way, the target CPU wakes upon a device interrupt and wakes up
the target vcpu.

Problem is that pi_post_block only programs the notification vector
if kvm_arch_has_assigned_device() is true. Its possible for the
following to happen:

1) vcpu V HLTs on pcpu P, kvm_arch_has_assigned_device is false,
notification vector is not programmed
2) device is assigned to VM
3) device interrupts vcpu V, sets ON bit
(notification vector not programmed, so pcpu P remains in idle)
4) vcpu 0 IPIs vcpu V (in guest), but since pi descriptor ON bit is set,
kvm_vcpu_kick is skipped
5) vcpu 0 busy spins on vcpu V's response for several seconds, until
RCU watchdog NMIs all vCPUs.

To fix this, use the start_assignment kvm_x86_ops callback to kick
vcpus out of the halt loop, so the notification vector is 
properly reprogrammed to the wakeup vector.

Reported-by: Pei Zhang <pezhang@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>


Index: kvm/arch/x86/kvm/vmx/posted_intr.c
===================================================================
--- kvm.orig/arch/x86/kvm/vmx/posted_intr.c
+++ kvm/arch/x86/kvm/vmx/posted_intr.c
@@ -204,6 +204,32 @@ void pi_post_block(struct kvm_vcpu *vcpu
 }
 
 /*
+ * Bail out of the block loop if the VM has an assigned
+ * device, but the blocking vCPU didn't reconfigure the
+ * PI.NV to the wakeup vector, i.e. the assigned device
+ * came along after the initial check in vcpu_block().
+ */
+
+int vmx_vcpu_check_block(struct kvm_vcpu *vcpu)
+{
+	struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
+
+	if (!irq_remapping_cap(IRQ_POSTING_CAP))
+		return 0;
+
+	if (!kvm_vcpu_apicv_active(vcpu))
+		return 0;
+
+	if (!kvm_arch_has_assigned_device(vcpu->kvm))
+		return 0;
+
+	if (pi_desc->nv == POSTED_INTR_WAKEUP_VECTOR)
+		return 0;
+
+	return 1;
+}
+
+/*
  * Handler for POSTED_INTERRUPT_WAKEUP_VECTOR.
  */
 void pi_wakeup_handler(void)
@@ -236,6 +262,25 @@ bool pi_has_pending_interrupt(struct kvm
 		(pi_test_sn(pi_desc) && !pi_is_pir_empty(pi_desc));
 }
 
+void vmx_pi_start_assignment(struct kvm *kvm)
+{
+	struct kvm_vcpu *vcpu;
+	int i;
+
+	if (!irq_remapping_cap(IRQ_POSTING_CAP))
+		return;
+
+	/*
+	 * Wakeup will cause the vCPU to bail out of kvm_vcpu_block() and
+	 * go back through vcpu_block().
+	 */
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		if (!kvm_vcpu_apicv_active(vcpu))
+			continue;
+
+		kvm_vcpu_wake_up(vcpu);
+	}
+}
 
 /*
  * pi_update_irte - set IRTE for Posted-Interrupts
Index: kvm/arch/x86/kvm/vmx/posted_intr.h
===================================================================
--- kvm.orig/arch/x86/kvm/vmx/posted_intr.h
+++ kvm/arch/x86/kvm/vmx/posted_intr.h
@@ -95,5 +95,7 @@ void __init pi_init_cpu(int cpu);
 bool pi_has_pending_interrupt(struct kvm_vcpu *vcpu);
 int pi_update_irte(struct kvm *kvm, unsigned int host_irq, uint32_t guest_irq,
 		   bool set);
+void vmx_pi_start_assignment(struct kvm *kvm);
+int vmx_vcpu_check_block(struct kvm_vcpu *vcpu);
 
 #endif /* __KVM_X86_VMX_POSTED_INTR_H */
Index: kvm/arch/x86/kvm/vmx/vmx.c
===================================================================
--- kvm.orig/arch/x86/kvm/vmx/vmx.c
+++ kvm/arch/x86/kvm/vmx/vmx.c
@@ -7727,11 +7727,13 @@ static struct kvm_x86_ops vmx_x86_ops __
 
 	.pre_block = vmx_pre_block,
 	.post_block = vmx_post_block,
+	.vcpu_check_block = vmx_vcpu_check_block,
 
 	.pmu_ops = &intel_pmu_ops,
 	.nested_ops = &vmx_nested_ops,
 
 	.update_pi_irte = pi_update_irte,
+	.start_assignment = vmx_pi_start_assignment,
 
 #ifdef CONFIG_X86_64
 	.set_hv_timer = vmx_set_hv_timer,



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch 4/4] KVM: VMX: update vcpu posted-interrupt descriptor when assigning device
  2021-05-11 23:51                   ` Marcelo Tosatti
@ 2021-05-12  0:02                     ` Marcelo Tosatti
  2021-05-12  0:38                       ` Peter Xu
  2021-05-12 14:41                       ` Sean Christopherson
  0 siblings, 2 replies; 26+ messages in thread
From: Marcelo Tosatti @ 2021-05-12  0:02 UTC (permalink / raw)
  To: Peter Xu
  Cc: Paolo Bonzini, Sean Christopherson, kvm, Alex Williamson,
	Pei Zhang

On Tue, May 11, 2021 at 08:51:24PM -0300, Marcelo Tosatti wrote:
> On Tue, May 11, 2021 at 05:35:41PM -0400, Peter Xu wrote:
> > On Tue, May 11, 2021 at 02:18:10PM -0300, Marcelo Tosatti wrote:
> > > On Tue, May 11, 2021 at 12:19:56PM -0400, Peter Xu wrote:
> > > > On Tue, May 11, 2021 at 11:51:57AM -0300, Marcelo Tosatti wrote:
> > > > > On Tue, May 11, 2021 at 10:39:11AM -0400, Peter Xu wrote:
> > > > > > On Fri, May 07, 2021 at 07:08:31PM -0300, Marcelo Tosatti wrote:
> > > > > > > > Wondering whether we should add a pi_test_on() check in kvm_vcpu_has_events()
> > > > > > > > somehow, so that even without customized ->vcpu_check_block we should be able
> > > > > > > > to break the block loop (as kvm_arch_vcpu_runnable will return true properly)?
> > > > > > > 
> > > > > > > static int kvm_vcpu_check_block(struct kvm_vcpu *vcpu)
> > > > > > > {
> > > > > > >         int ret = -EINTR;
> > > > > > >         int idx = srcu_read_lock(&vcpu->kvm->srcu);
> > > > > > > 
> > > > > > >         if (kvm_arch_vcpu_runnable(vcpu)) {
> > > > > > >                 kvm_make_request(KVM_REQ_UNHALT, vcpu); <---
> > > > > > >                 goto out;
> > > > > > >         }
> > > > > > > 
> > > > > > > Don't want to unhalt the vcpu.
> > > > > > 
> > > > > > Could you elaborate?  It's not obvious to me why we can't do that if
> > > > > > pi_test_on() returns true..  we have pending post interrupts anyways, so
> > > > > > shouldn't we stop halting?  Thanks!
> > > > > 
> > > > > pi_test_on() only returns true when an interrupt is signalled by the
> > > > > device. But the sequence of events is:
> > > > > 
> > > > > 
> > > > > 1. pCPU idles without notification vector configured to wakeup vector.
> > > > > 
> > > > > 2. PCI device is hotplugged, assigned device count increases from 0 to 1.
> > > > > 
> > > > > <arbitrary amount of time>
> > > > > 
> > > > > 3. device generates interrupt, sets ON bit to true in the posted
> > > > > interrupt descriptor.
> > > > > 
> > > > > We want to exit kvm_vcpu_block after 2, but before 3 (where ON bit
> > > > > is not set).
> > > > 
> > > > Ah yes.. thanks.
> > > > 
> > > > Besides the current approach, I'm thinking maybe it'll be cleaner/less LOC to
> > > > define a KVM_REQ_UNBLOCK to replace the pre_block hook (in x86's kvm_host.h):
> > > > 
> > > > #define KVM_REQ_UNBLOCK			KVM_ARCH_REQ(31)
> > > > 
> > > > We can set it in vmx_pi_start_assignment(), then check+clear it in
> > > > kvm_vcpu_has_events() (or make it a bool in kvm_vcpu struct?).
> > > 
> > > Can't check it in kvm_vcpu_has_events() because that will set
> > > KVM_REQ_UNHALT (which we don't want).
> > 
> > I thought it was okay to break the guest HLT? 
> 
> Intel:
> 
> "HLT-HALT
> 
> Description
> 
> Stops instruction execution and places the processor in a HALT state. An enabled interrupt (including NMI and
> SMI), a debug exception, the BINIT# signal, the INIT# signal, or the RESET# signal will resume execution. If an
> interrupt (including NMI) is used to resume execution after a HLT instruction, the saved instruction pointer
> (CS:EIP) points to the instruction following the HLT instruction."
> 
> AMD:
> 
> "6.5 Processor Halt
> The processor halt instruction (HLT) halts instruction execution, leaving the processor in the halt state.
> No registers or machine state are modified as a result of executing the HLT instruction. The processor
> remains in the halt state until one of the following occurs:
> • A non-maskable interrupt (NMI).
> • An enabled, maskable interrupt (INTR).
> • Processor reset (RESET).
> • Processor initialization (INIT).
> • System-management interrupt (SMI)."
> 
> The KVM_REQ_UNBLOCK patch will resume execution even any such event

						  even without any such event

> occuring. So the behaviour would be different from baremetal.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch 4/4] KVM: VMX: update vcpu posted-interrupt descriptor when assigning device
  2021-05-12  0:02                     ` Marcelo Tosatti
@ 2021-05-12  0:38                       ` Peter Xu
  2021-05-12 11:10                         ` Marcelo Tosatti
  2021-05-12 14:41                       ` Sean Christopherson
  1 sibling, 1 reply; 26+ messages in thread
From: Peter Xu @ 2021-05-12  0:38 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Paolo Bonzini, Sean Christopherson, kvm, Alex Williamson,
	Pei Zhang

On Tue, May 11, 2021 at 09:02:59PM -0300, Marcelo Tosatti wrote:
> On Tue, May 11, 2021 at 08:51:24PM -0300, Marcelo Tosatti wrote:
> > On Tue, May 11, 2021 at 05:35:41PM -0400, Peter Xu wrote:
> > > On Tue, May 11, 2021 at 02:18:10PM -0300, Marcelo Tosatti wrote:
> > > > On Tue, May 11, 2021 at 12:19:56PM -0400, Peter Xu wrote:
> > > > > On Tue, May 11, 2021 at 11:51:57AM -0300, Marcelo Tosatti wrote:
> > > > > > On Tue, May 11, 2021 at 10:39:11AM -0400, Peter Xu wrote:
> > > > > > > On Fri, May 07, 2021 at 07:08:31PM -0300, Marcelo Tosatti wrote:
> > > > > > > > > Wondering whether we should add a pi_test_on() check in kvm_vcpu_has_events()
> > > > > > > > > somehow, so that even without customized ->vcpu_check_block we should be able
> > > > > > > > > to break the block loop (as kvm_arch_vcpu_runnable will return true properly)?
> > > > > > > > 
> > > > > > > > static int kvm_vcpu_check_block(struct kvm_vcpu *vcpu)
> > > > > > > > {
> > > > > > > >         int ret = -EINTR;
> > > > > > > >         int idx = srcu_read_lock(&vcpu->kvm->srcu);
> > > > > > > > 
> > > > > > > >         if (kvm_arch_vcpu_runnable(vcpu)) {
> > > > > > > >                 kvm_make_request(KVM_REQ_UNHALT, vcpu); <---
> > > > > > > >                 goto out;
> > > > > > > >         }
> > > > > > > > 
> > > > > > > > Don't want to unhalt the vcpu.
> > > > > > > 
> > > > > > > Could you elaborate?  It's not obvious to me why we can't do that if
> > > > > > > pi_test_on() returns true..  we have pending post interrupts anyways, so
> > > > > > > shouldn't we stop halting?  Thanks!
> > > > > > 
> > > > > > pi_test_on() only returns true when an interrupt is signalled by the
> > > > > > device. But the sequence of events is:
> > > > > > 
> > > > > > 
> > > > > > 1. pCPU idles without notification vector configured to wakeup vector.
> > > > > > 
> > > > > > 2. PCI device is hotplugged, assigned device count increases from 0 to 1.
> > > > > > 
> > > > > > <arbitrary amount of time>
> > > > > > 
> > > > > > 3. device generates interrupt, sets ON bit to true in the posted
> > > > > > interrupt descriptor.
> > > > > > 
> > > > > > We want to exit kvm_vcpu_block after 2, but before 3 (where ON bit
> > > > > > is not set).
> > > > > 
> > > > > Ah yes.. thanks.
> > > > > 
> > > > > Besides the current approach, I'm thinking maybe it'll be cleaner/less LOC to
> > > > > define a KVM_REQ_UNBLOCK to replace the pre_block hook (in x86's kvm_host.h):
> > > > > 
> > > > > #define KVM_REQ_UNBLOCK			KVM_ARCH_REQ(31)
> > > > > 
> > > > > We can set it in vmx_pi_start_assignment(), then check+clear it in
> > > > > kvm_vcpu_has_events() (or make it a bool in kvm_vcpu struct?).
> > > > 
> > > > Can't check it in kvm_vcpu_has_events() because that will set
> > > > KVM_REQ_UNHALT (which we don't want).
> > > 
> > > I thought it was okay to break the guest HLT? 
> > 
> > Intel:
> > 
> > "HLT-HALT
> > 
> > Description
> > 
> > Stops instruction execution and places the processor in a HALT state. An enabled interrupt (including NMI and
> > SMI), a debug exception, the BINIT# signal, the INIT# signal, or the RESET# signal will resume execution. If an
> > interrupt (including NMI) is used to resume execution after a HLT instruction, the saved instruction pointer
> > (CS:EIP) points to the instruction following the HLT instruction."
> > 
> > AMD:
> > 
> > "6.5 Processor Halt
> > The processor halt instruction (HLT) halts instruction execution, leaving the processor in the halt state.
> > No registers or machine state are modified as a result of executing the HLT instruction. The processor
> > remains in the halt state until one of the following occurs:
> > • A non-maskable interrupt (NMI).
> > • An enabled, maskable interrupt (INTR).
> > • Processor reset (RESET).
> > • Processor initialization (INIT).
> > • System-management interrupt (SMI)."
> > 
> > The KVM_REQ_UNBLOCK patch will resume execution even any such event
> 
> 						  even without any such event
> 
> > occuring. So the behaviour would be different from baremetal.
> 

What if we move that kvm_check_request() into kvm_vcpu_check_block()?

---8<---
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 739e1bd59e8a9..e6fee59b5dab6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11177,9 +11177,6 @@ static inline bool kvm_vcpu_has_events(struct kvm_vcpu *vcpu)
             static_call(kvm_x86_smi_allowed)(vcpu, false)))
                return true;
 
-       if (kvm_check_request(KVM_REQ_UNBLOCK, vcpu))
-               return true;
-
        if (kvm_arch_interrupt_allowed(vcpu) &&
            (kvm_cpu_has_interrupt(vcpu) ||
            kvm_guest_apic_has_interrupt(vcpu)))
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index f68035355c08a..fc5f6bffff7fc 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2925,6 +2925,10 @@ static int kvm_vcpu_check_block(struct kvm_vcpu *vcpu)
                kvm_make_request(KVM_REQ_UNHALT, vcpu);
                goto out;
        }
+#ifdef CONFIG_X86
+       if (kvm_check_request(KVM_REQ_UNBLOCK, vcpu))
+               return true;
+#endif
        if (kvm_cpu_has_pending_timer(vcpu))
                goto out;
        if (signal_pending(current))
---8<---

(The CONFIG_X86 is ugly indeed.. but just to show what I meant, e.g. it can be
 a boolean too I think)

Would this work?

Thanks,

-- 
Peter Xu


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [patch 4/4] KVM: VMX: update vcpu posted-interrupt descriptor when assigning device
  2021-05-12  0:38                       ` Peter Xu
@ 2021-05-12 11:10                         ` Marcelo Tosatti
  0 siblings, 0 replies; 26+ messages in thread
From: Marcelo Tosatti @ 2021-05-12 11:10 UTC (permalink / raw)
  To: Peter Xu
  Cc: Paolo Bonzini, Sean Christopherson, kvm, Alex Williamson,
	Pei Zhang

On Tue, May 11, 2021 at 08:38:16PM -0400, Peter Xu wrote:
> On Tue, May 11, 2021 at 09:02:59PM -0300, Marcelo Tosatti wrote:
> > On Tue, May 11, 2021 at 08:51:24PM -0300, Marcelo Tosatti wrote:
> > > On Tue, May 11, 2021 at 05:35:41PM -0400, Peter Xu wrote:
> > > > On Tue, May 11, 2021 at 02:18:10PM -0300, Marcelo Tosatti wrote:
> > > > > On Tue, May 11, 2021 at 12:19:56PM -0400, Peter Xu wrote:
> > > > > > On Tue, May 11, 2021 at 11:51:57AM -0300, Marcelo Tosatti wrote:
> > > > > > > On Tue, May 11, 2021 at 10:39:11AM -0400, Peter Xu wrote:
> > > > > > > > On Fri, May 07, 2021 at 07:08:31PM -0300, Marcelo Tosatti wrote:
> > > > > > > > > > Wondering whether we should add a pi_test_on() check in kvm_vcpu_has_events()
> > > > > > > > > > somehow, so that even without customized ->vcpu_check_block we should be able
> > > > > > > > > > to break the block loop (as kvm_arch_vcpu_runnable will return true properly)?
> > > > > > > > > 
> > > > > > > > > static int kvm_vcpu_check_block(struct kvm_vcpu *vcpu)
> > > > > > > > > {
> > > > > > > > >         int ret = -EINTR;
> > > > > > > > >         int idx = srcu_read_lock(&vcpu->kvm->srcu);
> > > > > > > > > 
> > > > > > > > >         if (kvm_arch_vcpu_runnable(vcpu)) {
> > > > > > > > >                 kvm_make_request(KVM_REQ_UNHALT, vcpu); <---
> > > > > > > > >                 goto out;
> > > > > > > > >         }
> > > > > > > > > 
> > > > > > > > > Don't want to unhalt the vcpu.
> > > > > > > > 
> > > > > > > > Could you elaborate?  It's not obvious to me why we can't do that if
> > > > > > > > pi_test_on() returns true..  we have pending post interrupts anyways, so
> > > > > > > > shouldn't we stop halting?  Thanks!
> > > > > > > 
> > > > > > > pi_test_on() only returns true when an interrupt is signalled by the
> > > > > > > device. But the sequence of events is:
> > > > > > > 
> > > > > > > 
> > > > > > > 1. pCPU idles without notification vector configured to wakeup vector.
> > > > > > > 
> > > > > > > 2. PCI device is hotplugged, assigned device count increases from 0 to 1.
> > > > > > > 
> > > > > > > <arbitrary amount of time>
> > > > > > > 
> > > > > > > 3. device generates interrupt, sets ON bit to true in the posted
> > > > > > > interrupt descriptor.
> > > > > > > 
> > > > > > > We want to exit kvm_vcpu_block after 2, but before 3 (where ON bit
> > > > > > > is not set).
> > > > > > 
> > > > > > Ah yes.. thanks.
> > > > > > 
> > > > > > Besides the current approach, I'm thinking maybe it'll be cleaner/less LOC to
> > > > > > define a KVM_REQ_UNBLOCK to replace the pre_block hook (in x86's kvm_host.h):
> > > > > > 
> > > > > > #define KVM_REQ_UNBLOCK			KVM_ARCH_REQ(31)
> > > > > > 
> > > > > > We can set it in vmx_pi_start_assignment(), then check+clear it in
> > > > > > kvm_vcpu_has_events() (or make it a bool in kvm_vcpu struct?).
> > > > > 
> > > > > Can't check it in kvm_vcpu_has_events() because that will set
> > > > > KVM_REQ_UNHALT (which we don't want).
> > > > 
> > > > I thought it was okay to break the guest HLT? 
> > > 
> > > Intel:
> > > 
> > > "HLT-HALT
> > > 
> > > Description
> > > 
> > > Stops instruction execution and places the processor in a HALT state. An enabled interrupt (including NMI and
> > > SMI), a debug exception, the BINIT# signal, the INIT# signal, or the RESET# signal will resume execution. If an
> > > interrupt (including NMI) is used to resume execution after a HLT instruction, the saved instruction pointer
> > > (CS:EIP) points to the instruction following the HLT instruction."
> > > 
> > > AMD:
> > > 
> > > "6.5 Processor Halt
> > > The processor halt instruction (HLT) halts instruction execution, leaving the processor in the halt state.
> > > No registers or machine state are modified as a result of executing the HLT instruction. The processor
> > > remains in the halt state until one of the following occurs:
> > > • A non-maskable interrupt (NMI).
> > > • An enabled, maskable interrupt (INTR).
> > > • Processor reset (RESET).
> > > • Processor initialization (INIT).
> > > • System-management interrupt (SMI)."
> > > 
> > > The KVM_REQ_UNBLOCK patch will resume execution even any such event
> > 
> > 						  even without any such event
> > 
> > > occuring. So the behaviour would be different from baremetal.
> > 
> 
> What if we move that kvm_check_request() into kvm_vcpu_check_block()?
> 
> ---8<---
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 739e1bd59e8a9..e6fee59b5dab6 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -11177,9 +11177,6 @@ static inline bool kvm_vcpu_has_events(struct kvm_vcpu *vcpu)
>              static_call(kvm_x86_smi_allowed)(vcpu, false)))
>                 return true;
>  
> -       if (kvm_check_request(KVM_REQ_UNBLOCK, vcpu))
> -               return true;
> -
>         if (kvm_arch_interrupt_allowed(vcpu) &&
>             (kvm_cpu_has_interrupt(vcpu) ||
>             kvm_guest_apic_has_interrupt(vcpu)))
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index f68035355c08a..fc5f6bffff7fc 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2925,6 +2925,10 @@ static int kvm_vcpu_check_block(struct kvm_vcpu *vcpu)
>                 kvm_make_request(KVM_REQ_UNHALT, vcpu);
>                 goto out;
>         }
> +#ifdef CONFIG_X86
> +       if (kvm_check_request(KVM_REQ_UNBLOCK, vcpu))
> +               return true;
> +#endif
>         if (kvm_cpu_has_pending_timer(vcpu))
>                 goto out;
>         if (signal_pending(current))
> ---8<---
> 
> (The CONFIG_X86 is ugly indeed.. but just to show what I meant, e.g. it can be
>  a boolean too I think)
> 
> Would this work?

That would work: but vcpu->requests are nicely checked (and processed) 
at vcpu_enter_guest, before guest entry. The proposed request does not 
follow that pattern.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch 4/4] KVM: VMX: update vcpu posted-interrupt descriptor when assigning device
  2021-05-12  0:02                     ` Marcelo Tosatti
  2021-05-12  0:38                       ` Peter Xu
@ 2021-05-12 14:41                       ` Sean Christopherson
  2021-05-12 15:34                         ` Peter Xu
  1 sibling, 1 reply; 26+ messages in thread
From: Sean Christopherson @ 2021-05-12 14:41 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Peter Xu, Paolo Bonzini, kvm, Alex Williamson, Pei Zhang

On Tue, May 11, 2021, Marcelo Tosatti wrote:
> > The KVM_REQ_UNBLOCK patch will resume execution even any such event
> 
> 						  even without any such event
> 
> > occuring. So the behaviour would be different from baremetal.

I agree with Marcelo, we don't want to spuriously unhalt the vCPU.  It's legal,
albeit risky, to do something like

	hlt
	/* #UD to triple fault if this CPU is awakened. */
	ud2

when offlining a CPU, in which case the spurious wake event will crash the guest.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch 4/4] KVM: VMX: update vcpu posted-interrupt descriptor when assigning device
  2021-05-12 14:41                       ` Sean Christopherson
@ 2021-05-12 15:34                         ` Peter Xu
  0 siblings, 0 replies; 26+ messages in thread
From: Peter Xu @ 2021-05-12 15:34 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Marcelo Tosatti, Paolo Bonzini, kvm, Alex Williamson, Pei Zhang

On Wed, May 12, 2021 at 02:41:56PM +0000, Sean Christopherson wrote:
> On Tue, May 11, 2021, Marcelo Tosatti wrote:
> > > The KVM_REQ_UNBLOCK patch will resume execution even any such event
> > 
> > 						  even without any such event
> > 
> > > occuring. So the behaviour would be different from baremetal.
> 
> I agree with Marcelo, we don't want to spuriously unhalt the vCPU.  It's legal,
> albeit risky, to do something like
> 
> 	hlt
> 	/* #UD to triple fault if this CPU is awakened. */
> 	ud2
> 
> when offlining a CPU, in which case the spurious wake event will crash the guest.

We can avoid that by moving the check+clear of KVM_REQ_UNBLOCK from
kvm_vcpu_has_events() into kvm_vcpu_check_block() as replied in the other
thread.  But I also agree Marcelo's series should work already to fix the bug,
hence no strong opinion on this.

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch 4/4] KVM: VMX: update vcpu posted-interrupt descriptor when assigning device
  2021-05-10 17:26 ` [patch 4/4] KVM: VMX: update vcpu posted-interrupt descriptor when assigning device Marcelo Tosatti
@ 2021-05-24 15:55   ` Paolo Bonzini
  2021-05-24 17:53     ` Marcelo Tosatti
  0 siblings, 1 reply; 26+ messages in thread
From: Paolo Bonzini @ 2021-05-24 15:55 UTC (permalink / raw)
  To: Marcelo Tosatti, kvm
  Cc: Alex Williamson, Sean Christopherson, Peter Xu, Pei Zhang

On 10/05/21 19:26, Marcelo Tosatti wrote:
> +void vmx_pi_start_assignment(struct kvm *kvm)
> +{
> +	struct kvm_vcpu *vcpu;
> +	int i;
> +
> +	if (!irq_remapping_cap(IRQ_POSTING_CAP))
> +		return;
> +
> +	/*
> +	 * Wakeup will cause the vCPU to bail out of kvm_vcpu_block() and
> +	 * go back through vcpu_block().
> +	 */
> +	kvm_for_each_vcpu(i, vcpu, kvm) {
> +		if (!kvm_vcpu_apicv_active(vcpu))
> +			continue;
> +
> +		kvm_vcpu_wake_up(vcpu);

Would you still need the check_block callback, if you also added a 
kvm_make_request(KVM_REQ_EVENT)?

In fact, since this is entirely not a hot path, can you just do 
kvm_make_all_cpus_request(kvm, KVM_REQ_EVENT) instead of this loop?

Thanks,

Paolo

> +	}
> +}
>   
>   /*
>    * pi_update_irte - set IRTE for Posted-Interrupts
> Index: kvm/arch/x86/kvm/vmx/posted_intr.h
> ===================================================================
> --- kvm.orig/arch/x86/kvm/vmx/posted_intr.h
> +++ kvm/arch/x86/kvm/vmx/posted_intr.h
> @@ -95,5 +95,7 @@ void __init pi_init_cpu(int cpu);
>   bool pi_has_pending_interrupt(struct kvm_vcpu *vcpu);
>   int pi_update_irte(struct kvm *kvm, unsigned int host_irq, uint32_t guest_irq,
>   		   bool set);
> +void vmx_pi_start_assignment(struct kvm *kvm);
> +int vmx_vcpu_check_block(struct kvm_vcpu *vcpu);
>   
>   #endif /* __KVM_X86_VMX_POSTED_INTR_H */
> Index: kvm/arch/x86/kvm/vmx/vmx.c
> ===================================================================
> --- kvm.orig/arch/x86/kvm/vmx/vmx.c
> +++ kvm/arch/x86/kvm/vmx/vmx.c
> @@ -7727,13 +7727,13 @@ static struct kvm_x86_ops vmx_x86_ops __
>   
>   	.pre_block = vmx_pre_block,
>   	.post_block = vmx_post_block,
> -	.vcpu_check_block = NULL,
> +	.vcpu_check_block = vmx_vcpu_check_block,
>   
>   	.pmu_ops = &intel_pmu_ops,
>   	.nested_ops = &vmx_nested_ops,
>   
>   	.update_pi_irte = pi_update_irte,
> -	.start_assignment = NULL,
> +	.start_assignment = vmx_pi_start_assignment,
>   
>   #ifdef CONFIG_X86_64
>   	.set_hv_timer = vmx_set_hv_timer,
> 
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch 4/4] KVM: VMX: update vcpu posted-interrupt descriptor when assigning device
  2021-05-24 15:55   ` Paolo Bonzini
@ 2021-05-24 17:53     ` Marcelo Tosatti
  2021-05-25 11:58       ` Paolo Bonzini
  0 siblings, 1 reply; 26+ messages in thread
From: Marcelo Tosatti @ 2021-05-24 17:53 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, Alex Williamson, Sean Christopherson, Peter Xu, Pei Zhang

On Mon, May 24, 2021 at 05:55:18PM +0200, Paolo Bonzini wrote:
> On 10/05/21 19:26, Marcelo Tosatti wrote:
> > +void vmx_pi_start_assignment(struct kvm *kvm)
> > +{
> > +	struct kvm_vcpu *vcpu;
> > +	int i;
> > +
> > +	if (!irq_remapping_cap(IRQ_POSTING_CAP))
> > +		return;
> > +
> > +	/*
> > +	 * Wakeup will cause the vCPU to bail out of kvm_vcpu_block() and
> > +	 * go back through vcpu_block().
> > +	 */
> > +	kvm_for_each_vcpu(i, vcpu, kvm) {
> > +		if (!kvm_vcpu_apicv_active(vcpu))
> > +			continue;
> > +
> > +		kvm_vcpu_wake_up(vcpu);
> 
> Would you still need the check_block callback, if you also added a
> kvm_make_request(KVM_REQ_EVENT)?
> 
> In fact, since this is entirely not a hot path, can you just do
> kvm_make_all_cpus_request(kvm, KVM_REQ_EVENT) instead of this loop?
> 
> Thanks,
> 
> Paolo

Hi Paolo,

Don't think so:

int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
{
        return kvm_vcpu_running(vcpu) || kvm_vcpu_has_events(vcpu);
}

static int kvm_vcpu_check_block(struct kvm_vcpu *vcpu)
{
        int ret = -EINTR;
        int idx = srcu_read_lock(&vcpu->kvm->srcu);

        if (kvm_arch_vcpu_runnable(vcpu)) {
                kvm_make_request(KVM_REQ_UNHALT, vcpu);  <---- don't want KVM_REQ_UNHALT
                goto out;
        }
        if (kvm_cpu_has_pending_timer(vcpu))
                goto out;
        if (signal_pending(current))
                goto out;

        ret = 0;
out:
        srcu_read_unlock(&vcpu->kvm->srcu, idx);
        return ret;
}

See previous discussion:


Date: Wed, 12 May 2021 14:41:56 +0000                                                                                   
From: Sean Christopherson <seanjc@google.com>                                                                           
To: Marcelo Tosatti <mtosatti@redhat.com>                                                                               
Cc: Peter Xu <peterx@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>, kvm@vger.kernel.org, Alex Williamson             
        <alex.williamson@redhat.com>, Pei Zhang <pezhang@redhat.com>                                                    
Subject: Re: [patch 4/4] KVM: VMX: update vcpu posted-interrupt descriptor when assigning device                        

On Tue, May 11, 2021, Marcelo Tosatti wrote:
> > The KVM_REQ_UNBLOCK patch will resume execution even any such event                                                 
>                                                                                                                       
>                                                 even without any such event                                           
>                                                                                                                       
> > occuring. So the behaviour would be different from baremetal.                                                       

I agree with Marcelo, we don't want to spuriously unhalt the vCPU.  It's legal,
albeit risky, to do something like

       	hlt
       	/* #UD to triple fault if this CPU is awakened. */
       	ud2

when offlining a CPU, in which case the spurious wake event will crash the guest.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch 4/4] KVM: VMX: update vcpu posted-interrupt descriptor when assigning device
  2021-05-24 17:53     ` Marcelo Tosatti
@ 2021-05-25 11:58       ` Paolo Bonzini
  0 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2021-05-25 11:58 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: kvm, Alex Williamson, Sean Christopherson, Peter Xu, Pei Zhang

On 24/05/21 19:53, Marcelo Tosatti wrote:
> On Mon, May 24, 2021 at 05:55:18PM +0200, Paolo Bonzini wrote:
>> On 10/05/21 19:26, Marcelo Tosatti wrote:
>>> +void vmx_pi_start_assignment(struct kvm *kvm)
>>> +{
>>> +	struct kvm_vcpu *vcpu;
>>> +	int i;
>>> +
>>> +	if (!irq_remapping_cap(IRQ_POSTING_CAP))
>>> +		return;
>>> +
>>> +	/*
>>> +	 * Wakeup will cause the vCPU to bail out of kvm_vcpu_block() and
>>> +	 * go back through vcpu_block().
>>> +	 */
>>> +	kvm_for_each_vcpu(i, vcpu, kvm) {
>>> +		if (!kvm_vcpu_apicv_active(vcpu))
>>> +			continue;
>>> +
>>> +		kvm_vcpu_wake_up(vcpu);
>>
>> Would you still need the check_block callback, if you also added a
>> kvm_make_request(KVM_REQ_EVENT)?
>>
>> In fact, since this is entirely not a hot path, can you just do
>> kvm_make_all_cpus_request(kvm, KVM_REQ_EVENT) instead of this loop?
>>
>> Thanks,
>>
>> Paolo
> 
> Hi Paolo,
> 
> Don't think so:
> 
> static int kvm_vcpu_check_block(struct kvm_vcpu *vcpu)
> {
>          int ret = -EINTR;
>          int idx = srcu_read_lock(&vcpu->kvm->srcu);
> 
>          if (kvm_arch_vcpu_runnable(vcpu)) {
>                  kvm_make_request(KVM_REQ_UNHALT, vcpu);  <---- don't want KVM_REQ_UNHALT

UNHALT is incorrect indeed, but requests don't have to unhalt the vCPU.

This case is somewhat similar to signal_pending(), where the next 
KVM_RUN ioctl resumes the halt.  It's also similar to 
KVM_REQ_PENDING_TIMER.  So you can:

- rename KVM_REQ_PENDING_TIMER to KVM_REQ_UNBLOCK except in 
arch/powerpc, where instead you add KVM_REQ_PENDING_TIMER to 
arch/powerpc/include/asm/kvm_host.h

- here, you add

	if (kvm_check_request(KVM_REQ_UNBLOCK, vcpu))
		goto out;

- then vmx_pi_start_assignment only needs to

	if (!irq_remapping_cap(IRQ_POSTING_CAP))
		return;
	kvm_make_all_cpus_request(kvm, KVM_REQ_UNBLOCK);

kvm_arch_vcpu_runnable() would still return false, so the mp_state would 
not change.

Paolo


^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2021-05-25 11:58 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-07 13:06 [patch 0/4] VMX: configure posted interrupt descriptor when assigning device Marcelo Tosatti
2021-05-07 13:06 ` [patch 1/4] KVM: x86: add start_assignment hook to kvm_x86_ops Marcelo Tosatti
2021-05-07 19:16   ` Peter Xu
2021-05-10 17:53     ` Marcelo Tosatti
2021-05-07 13:06 ` [patch 2/4] KVM: add arch specific vcpu_check_block callback Marcelo Tosatti
2021-05-07 13:06 ` [patch 3/4] KVM: x86: implement kvm_arch_vcpu_check_block callback Marcelo Tosatti
2021-05-07 13:06 ` [patch 4/4] KVM: VMX: update vcpu posted-interrupt descriptor when assigning device Marcelo Tosatti
2021-05-07 17:22   ` Sean Christopherson
2021-05-07 19:29     ` Peter Xu
2021-05-07 22:08       ` Marcelo Tosatti
2021-05-11 14:39         ` Peter Xu
2021-05-11 14:51           ` Marcelo Tosatti
2021-05-11 16:19             ` Peter Xu
2021-05-11 17:18               ` Marcelo Tosatti
2021-05-11 21:35                 ` Peter Xu
2021-05-11 23:51                   ` Marcelo Tosatti
2021-05-12  0:02                     ` Marcelo Tosatti
2021-05-12  0:38                       ` Peter Xu
2021-05-12 11:10                         ` Marcelo Tosatti
2021-05-12 14:41                       ` Sean Christopherson
2021-05-12 15:34                         ` Peter Xu
  -- strict thread matches above, loose matches on Subject: below --
2021-05-10 17:26 [patch 0/4] VMX: configure posted interrupt descriptor when assigning device (v3) Marcelo Tosatti
2021-05-10 17:26 ` [patch 4/4] KVM: VMX: update vcpu posted-interrupt descriptor when assigning device Marcelo Tosatti
2021-05-24 15:55   ` Paolo Bonzini
2021-05-24 17:53     ` Marcelo Tosatti
2021-05-25 11:58       ` Paolo Bonzini
2021-05-11 23:57 [patch 0/4] VMX: configure posted interrupt descriptor when assigning device (v4) Marcelo Tosatti
2021-05-11 23:57 ` [patch 4/4] KVM: VMX: update vcpu posted-interrupt descriptor when assigning device Marcelo Tosatti

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.