[PATCH 00/10] arm/arm64: KVM: Active interrupt state switching for shared devices

All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 00/10] arm/arm64: KVM: Active interrupt state switching for shared devices
@ 2015-06-08 17:03 ` Marc Zyngier
  0 siblings, 0 replies; 118+ messages in thread
From: Marc Zyngier @ 2015-06-08 17:03 UTC (permalink / raw)
  To: linux-arm-kernel

>From day 1, our timer code has been using a terrible hack: whenever
the guest is scheduled with a timer interrupt pending (i.e. the HW
timer has expired), we restore the timer state with the MASK bit set,
in order to avoid the physical interrupt to fire again. And again. And
again...

This is absolutely silly, for at least two reasons:

- This relies on the device (the timer) having a mask bit that we can
  play with. Not all devices are built like this.

- This expects some behaviour of the guest that only works because the
  both the kernel timer code and the KVM counterpart have been written
  by the same idiot (the idiot being me).

The One True Way is to set the GIC active bit when injecting the
interrupt, and to context-switch across the world switch. This is what
this series implements.

We introduce a relatively simple infrastructure enabling the mapping
of a virtual interrupt with its physical counterpart:

- Whenever an virtual interrupt is injected, we look it up in an
  rbtree. If we have a match, the interrupt is injected with the HW
  bit set in the LR, together with the physical interrupt.

- Across the world switch, we save/restore the active state for these
  interrupts using the irqchip_state API.

- On guest EOI, the HW interrupt is automagically deactivated by the
  GIC, allowing the interrupt to be resampled.

The timer code is slightly modified to set the active state at the
same time as the injection.

The last patch also allows non-shared devices to have their interrupt
deactivated the same way (in this case we do not context-switch the
active state). This is the first step in the long overdue direction of
the mythical IRQ forwarding thing...

This series is based on v4.1-rc7, and has been tested on Juno (GICv2)
and the FVP Base model (GICv3 host, both GICv2 and GICv3 guests). I'd
appreciate any form of testing, specially in the context of guest
migration (there is obviously some interesting stuff there...).

The code is otherwise available at
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git kvm-arm64/active-timer

Marc Zyngier (10):
  arm/arm64: KVM: Fix ordering of timer/GIC on guest entry
  arm/arm64: KVM: Move vgic handling to a non-preemptible section
  KVM: arm/arm64: vgic: Convert struct vgic_lr to use bitfields
  KVM: arm/arm64: vgic: Allow HW irq to be encoded in LR
  KVM: arm/arm64: vgic: Relax vgic_can_sample_irq for edge IRQs
  KVM: arm/arm64: vgic: Allow dynamic mapping of physical/virtual
    interrupts
  KVM: arm/arm64: vgic: Allow HW interrupts to be queued to a guest
  KVM: arm/arm64: vgic: Add vgic_{get,set}_phys_irq_active
  KVM: arm/arm64: timer: Allow the timer to control the active state
  KVM: arm/arm64: vgic: Allow non-shared device HW interrupts

 arch/arm/kvm/arm.c                 |  21 +++-
 include/kvm/arm_arch_timer.h       |   3 +
 include/kvm/arm_vgic.h             |  31 +++++-
 include/linux/irqchip/arm-gic-v3.h |   3 +
 include/linux/irqchip/arm-gic.h    |   3 +-
 virt/kvm/arm/arch_timer.c          |  13 ++-
 virt/kvm/arm/vgic-v2.c             |  16 ++-
 virt/kvm/arm/vgic-v3.c             |  21 +++-
 virt/kvm/arm/vgic.c                | 206 ++++++++++++++++++++++++++++++++++++-
 9 files changed, 300 insertions(+), 17 deletions(-)

-- 
2.1.4

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [PATCH 00/10] arm/arm64: KVM: Active interrupt state switching for shared devices
@ 2015-06-08 17:03 ` Marc Zyngier
  0 siblings, 0 replies; 118+ messages in thread
From: Marc Zyngier @ 2015-06-08 17:03 UTC (permalink / raw)
  To: kvm, kvmarm, linux-arm-kernel
  Cc: Christoffer Dall, Eric Auger, Alex Bennée, Andre Przywara

>From day 1, our timer code has been using a terrible hack: whenever
the guest is scheduled with a timer interrupt pending (i.e. the HW
timer has expired), we restore the timer state with the MASK bit set,
in order to avoid the physical interrupt to fire again. And again. And
again...

This is absolutely silly, for at least two reasons:

- This relies on the device (the timer) having a mask bit that we can
  play with. Not all devices are built like this.

- This expects some behaviour of the guest that only works because the
  both the kernel timer code and the KVM counterpart have been written
  by the same idiot (the idiot being me).

The One True Way is to set the GIC active bit when injecting the
interrupt, and to context-switch across the world switch. This is what
this series implements.

We introduce a relatively simple infrastructure enabling the mapping
of a virtual interrupt with its physical counterpart:

- Whenever an virtual interrupt is injected, we look it up in an
  rbtree. If we have a match, the interrupt is injected with the HW
  bit set in the LR, together with the physical interrupt.

- Across the world switch, we save/restore the active state for these
  interrupts using the irqchip_state API.

- On guest EOI, the HW interrupt is automagically deactivated by the
  GIC, allowing the interrupt to be resampled.

The timer code is slightly modified to set the active state at the
same time as the injection.

The last patch also allows non-shared devices to have their interrupt
deactivated the same way (in this case we do not context-switch the
active state). This is the first step in the long overdue direction of
the mythical IRQ forwarding thing...

This series is based on v4.1-rc7, and has been tested on Juno (GICv2)
and the FVP Base model (GICv3 host, both GICv2 and GICv3 guests). I'd
appreciate any form of testing, specially in the context of guest
migration (there is obviously some interesting stuff there...).

The code is otherwise available at
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git kvm-arm64/active-timer

Marc Zyngier (10):
  arm/arm64: KVM: Fix ordering of timer/GIC on guest entry
  arm/arm64: KVM: Move vgic handling to a non-preemptible section
  KVM: arm/arm64: vgic: Convert struct vgic_lr to use bitfields
  KVM: arm/arm64: vgic: Allow HW irq to be encoded in LR
  KVM: arm/arm64: vgic: Relax vgic_can_sample_irq for edge IRQs
  KVM: arm/arm64: vgic: Allow dynamic mapping of physical/virtual
    interrupts
  KVM: arm/arm64: vgic: Allow HW interrupts to be queued to a guest
  KVM: arm/arm64: vgic: Add vgic_{get,set}_phys_irq_active
  KVM: arm/arm64: timer: Allow the timer to control the active state
  KVM: arm/arm64: vgic: Allow non-shared device HW interrupts

 arch/arm/kvm/arm.c                 |  21 +++-
 include/kvm/arm_arch_timer.h       |   3 +
 include/kvm/arm_vgic.h             |  31 +++++-
 include/linux/irqchip/arm-gic-v3.h |   3 +
 include/linux/irqchip/arm-gic.h    |   3 +-
 virt/kvm/arm/arch_timer.c          |  13 ++-
 virt/kvm/arm/vgic-v2.c             |  16 ++-
 virt/kvm/arm/vgic-v3.c             |  21 +++-
 virt/kvm/arm/vgic.c                | 206 ++++++++++++++++++++++++++++++++++++-
 9 files changed, 300 insertions(+), 17 deletions(-)

-- 
2.1.4


^ permalink raw reply	[flat|nested] 118+ messages in thread

* [PATCH 01/10] arm/arm64: KVM: Fix ordering of timer/GIC on guest entry
  2015-06-08 17:03 ` Marc Zyngier
@ 2015-06-08 17:03   ` Marc Zyngier
  -1 siblings, 0 replies; 118+ messages in thread
From: Marc Zyngier @ 2015-06-08 17:03 UTC (permalink / raw)
  To: linux-arm-kernel

As we now inject the timer interrupt when we're about to enter
the guest, it makes a lot more sense to make sure this happens
before the vgic code queues the pending interrupts.

Otherwise, we get the interrupt on the following exit, which is
not great for latency (and leads to all kind of bizarre issues
when using with active interrupts at the HW level).

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/kvm/arm.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index d9631ec..46db690 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -529,9 +529,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		if (vcpu->arch.pause)
 			vcpu_pause(vcpu);
 
-		kvm_vgic_flush_hwstate(vcpu);
 		kvm_timer_flush_hwstate(vcpu);
 
+		kvm_vgic_flush_hwstate(vcpu);
+
 		local_irq_disable();
 
 		/*
@@ -544,8 +545,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 
 		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
 			local_irq_enable();
-			kvm_timer_sync_hwstate(vcpu);
 			kvm_vgic_sync_hwstate(vcpu);
+			kvm_timer_sync_hwstate(vcpu);
 			continue;
 		}
 
@@ -577,9 +578,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		 * Back from guest
 		 *************************************************************/
 
-		kvm_timer_sync_hwstate(vcpu);
 		kvm_vgic_sync_hwstate(vcpu);
 
+		kvm_timer_sync_hwstate(vcpu);
+
 		ret = handle_exit(vcpu, run, ret);
 	}
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [PATCH 01/10] arm/arm64: KVM: Fix ordering of timer/GIC on guest entry
@ 2015-06-08 17:03   ` Marc Zyngier
  0 siblings, 0 replies; 118+ messages in thread
From: Marc Zyngier @ 2015-06-08 17:03 UTC (permalink / raw)
  To: kvm, kvmarm, linux-arm-kernel; +Cc: Andre Przywara

As we now inject the timer interrupt when we're about to enter
the guest, it makes a lot more sense to make sure this happens
before the vgic code queues the pending interrupts.

Otherwise, we get the interrupt on the following exit, which is
not great for latency (and leads to all kind of bizarre issues
when using with active interrupts at the HW level).

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/kvm/arm.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index d9631ec..46db690 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -529,9 +529,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		if (vcpu->arch.pause)
 			vcpu_pause(vcpu);
 
-		kvm_vgic_flush_hwstate(vcpu);
 		kvm_timer_flush_hwstate(vcpu);
 
+		kvm_vgic_flush_hwstate(vcpu);
+
 		local_irq_disable();
 
 		/*
@@ -544,8 +545,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 
 		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
 			local_irq_enable();
-			kvm_timer_sync_hwstate(vcpu);
 			kvm_vgic_sync_hwstate(vcpu);
+			kvm_timer_sync_hwstate(vcpu);
 			continue;
 		}
 
@@ -577,9 +578,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		 * Back from guest
 		 *************************************************************/
 
-		kvm_timer_sync_hwstate(vcpu);
 		kvm_vgic_sync_hwstate(vcpu);
 
+		kvm_timer_sync_hwstate(vcpu);
+
 		ret = handle_exit(vcpu, run, ret);
 	}
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [PATCH 02/10] arm/arm64: KVM: Move vgic handling to a non-preemptible section
  2015-06-08 17:03 ` Marc Zyngier
@ 2015-06-08 17:03   ` Marc Zyngier
  -1 siblings, 0 replies; 118+ messages in thread
From: Marc Zyngier @ 2015-06-08 17:03 UTC (permalink / raw)
  To: linux-arm-kernel

As we're about to introduce some serious GIC-poking to the vgic code,
it is important to make sure that we're going to poke the part of
the GIC that belongs to the CPU we're about to run on (otherwise,
we'd end up with some unexpected interrupts firing)...

Introducing a non-preemptible section in kvm_arch_vcpu_ioctl_run
prevents the problem from occuring.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/kvm/arm.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 46db690..4986300 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -529,8 +529,18 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		if (vcpu->arch.pause)
 			vcpu_pause(vcpu);
 
+		/*
+		 * Disarming the timer must be done with in a
+		 * preemptible context, as this call may sleep.
+		 */
 		kvm_timer_flush_hwstate(vcpu);
 
+		/*
+		 * Preparing the interrupts to be injected also
+		 * involves poking the GIC, which must be done in a
+		 * non-preemptible context.
+		 */
+		preempt_disable();
 		kvm_vgic_flush_hwstate(vcpu);
 
 		local_irq_disable();
@@ -546,6 +556,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
 			local_irq_enable();
 			kvm_vgic_sync_hwstate(vcpu);
+			preempt_enable();
 			kvm_timer_sync_hwstate(vcpu);
 			continue;
 		}
@@ -580,6 +591,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 
 		kvm_vgic_sync_hwstate(vcpu);
 
+		preempt_enable();
+
 		kvm_timer_sync_hwstate(vcpu);
 
 		ret = handle_exit(vcpu, run, ret);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [PATCH 02/10] arm/arm64: KVM: Move vgic handling to a non-preemptible section
@ 2015-06-08 17:03   ` Marc Zyngier
  0 siblings, 0 replies; 118+ messages in thread
From: Marc Zyngier @ 2015-06-08 17:03 UTC (permalink / raw)
  To: kvm, kvmarm, linux-arm-kernel
  Cc: Christoffer Dall, Eric Auger, Alex Bennée, Andre Przywara

As we're about to introduce some serious GIC-poking to the vgic code,
it is important to make sure that we're going to poke the part of
the GIC that belongs to the CPU we're about to run on (otherwise,
we'd end up with some unexpected interrupts firing)...

Introducing a non-preemptible section in kvm_arch_vcpu_ioctl_run
prevents the problem from occuring.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/kvm/arm.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 46db690..4986300 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -529,8 +529,18 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		if (vcpu->arch.pause)
 			vcpu_pause(vcpu);
 
+		/*
+		 * Disarming the timer must be done with in a
+		 * preemptible context, as this call may sleep.
+		 */
 		kvm_timer_flush_hwstate(vcpu);
 
+		/*
+		 * Preparing the interrupts to be injected also
+		 * involves poking the GIC, which must be done in a
+		 * non-preemptible context.
+		 */
+		preempt_disable();
 		kvm_vgic_flush_hwstate(vcpu);
 
 		local_irq_disable();
@@ -546,6 +556,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
 			local_irq_enable();
 			kvm_vgic_sync_hwstate(vcpu);
+			preempt_enable();
 			kvm_timer_sync_hwstate(vcpu);
 			continue;
 		}
@@ -580,6 +591,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 
 		kvm_vgic_sync_hwstate(vcpu);
 
+		preempt_enable();
+
 		kvm_timer_sync_hwstate(vcpu);
 
 		ret = handle_exit(vcpu, run, ret);
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [PATCH 03/10] KVM: arm/arm64: vgic: Convert struct vgic_lr to use bitfields
  2015-06-08 17:03 ` Marc Zyngier
@ 2015-06-08 17:03   ` Marc Zyngier
  -1 siblings, 0 replies; 118+ messages in thread
From: Marc Zyngier @ 2015-06-08 17:03 UTC (permalink / raw)
  To: linux-arm-kernel

As we're about to cram more information in the vgic_lr structure
(HW interrupt number and additional state information), we switch
to a layout similar to the HW's:

- use bitfields to save space (we don't need more than 10 bits
  to represent the irq numbers)
- source CPU and HW interrupt can share the same field, as
  a SGI doesn't have a physical line.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 include/kvm/arm_vgic.h | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 133ea00..4f9fa1d 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -95,11 +95,15 @@ enum vgic_type {
 #define LR_STATE_ACTIVE		(1 << 1)
 #define LR_STATE_MASK		(3 << 0)
 #define LR_EOI_INT		(1 << 2)
+#define LR_HW			(1 << 3)
 
 struct vgic_lr {
-	u16	irq;
-	u8	source;
-	u8	state;
+	unsigned irq:10;
+	union {
+		unsigned hwirq:10;
+		unsigned source:8;
+	};
+	unsigned state:4;
 };
 
 struct vgic_vmcr {
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [PATCH 03/10] KVM: arm/arm64: vgic: Convert struct vgic_lr to use bitfields
@ 2015-06-08 17:03   ` Marc Zyngier
  0 siblings, 0 replies; 118+ messages in thread
From: Marc Zyngier @ 2015-06-08 17:03 UTC (permalink / raw)
  To: kvm, kvmarm, linux-arm-kernel
  Cc: Christoffer Dall, Eric Auger, Alex Bennée, Andre Przywara

As we're about to cram more information in the vgic_lr structure
(HW interrupt number and additional state information), we switch
to a layout similar to the HW's:

- use bitfields to save space (we don't need more than 10 bits
  to represent the irq numbers)
- source CPU and HW interrupt can share the same field, as
  a SGI doesn't have a physical line.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 include/kvm/arm_vgic.h | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 133ea00..4f9fa1d 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -95,11 +95,15 @@ enum vgic_type {
 #define LR_STATE_ACTIVE		(1 << 1)
 #define LR_STATE_MASK		(3 << 0)
 #define LR_EOI_INT		(1 << 2)
+#define LR_HW			(1 << 3)
 
 struct vgic_lr {
-	u16	irq;
-	u8	source;
-	u8	state;
+	unsigned irq:10;
+	union {
+		unsigned hwirq:10;
+		unsigned source:8;
+	};
+	unsigned state:4;
 };
 
 struct vgic_vmcr {
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [PATCH 04/10] KVM: arm/arm64: vgic: Allow HW irq to be encoded in LR
  2015-06-08 17:03 ` Marc Zyngier
@ 2015-06-08 17:03   ` Marc Zyngier
  -1 siblings, 0 replies; 118+ messages in thread
From: Marc Zyngier @ 2015-06-08 17:03 UTC (permalink / raw)
  To: linux-arm-kernel

Now that struct vgic_lr supports the LR_HW bit and carries a hwirq
field, we can encode that information into the list registers.

This patch provides implementations for both GICv2 and GICv3.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 include/linux/irqchip/arm-gic-v3.h |  3 +++
 include/linux/irqchip/arm-gic.h    |  3 ++-
 virt/kvm/arm/vgic-v2.c             | 16 +++++++++++++++-
 virt/kvm/arm/vgic-v3.c             | 21 ++++++++++++++++++---
 4 files changed, 38 insertions(+), 5 deletions(-)

diff --git a/include/linux/irqchip/arm-gic-v3.h b/include/linux/irqchip/arm-gic-v3.h
index ffbc034..cf637d6 100644
--- a/include/linux/irqchip/arm-gic-v3.h
+++ b/include/linux/irqchip/arm-gic-v3.h
@@ -268,9 +268,12 @@
 
 #define ICH_LR_EOI			(1UL << 41)
 #define ICH_LR_GROUP			(1UL << 60)
+#define ICH_LR_HW			(1UL << 61)
 #define ICH_LR_STATE			(3UL << 62)
 #define ICH_LR_PENDING_BIT		(1UL << 62)
 #define ICH_LR_ACTIVE_BIT		(1UL << 63)
+#define ICH_LR_PHYS_ID_SHIFT		32
+#define ICH_LR_PHYS_ID_MASK		(0x3ffUL << ICH_LR_PHYS_ID_SHIFT)
 
 #define ICH_MISR_EOI			(1 << 0)
 #define ICH_MISR_U			(1 << 1)
diff --git a/include/linux/irqchip/arm-gic.h b/include/linux/irqchip/arm-gic.h
index 9de976b..ca88dad 100644
--- a/include/linux/irqchip/arm-gic.h
+++ b/include/linux/irqchip/arm-gic.h
@@ -71,11 +71,12 @@
 
 #define GICH_LR_VIRTUALID		(0x3ff << 0)
 #define GICH_LR_PHYSID_CPUID_SHIFT	(10)
-#define GICH_LR_PHYSID_CPUID		(7 << GICH_LR_PHYSID_CPUID_SHIFT)
+#define GICH_LR_PHYSID_CPUID		(0x3ff << GICH_LR_PHYSID_CPUID_SHIFT)
 #define GICH_LR_STATE			(3 << 28)
 #define GICH_LR_PENDING_BIT		(1 << 28)
 #define GICH_LR_ACTIVE_BIT		(1 << 29)
 #define GICH_LR_EOI			(1 << 19)
+#define GICH_LR_HW			(1 << 31)
 
 #define GICH_VMCR_CTRL_SHIFT		0
 #define GICH_VMCR_CTRL_MASK		(0x21f << GICH_VMCR_CTRL_SHIFT)
diff --git a/virt/kvm/arm/vgic-v2.c b/virt/kvm/arm/vgic-v2.c
index f9b9c7c..8d7b04d 100644
--- a/virt/kvm/arm/vgic-v2.c
+++ b/virt/kvm/arm/vgic-v2.c
@@ -48,6 +48,10 @@ static struct vgic_lr vgic_v2_get_lr(const struct kvm_vcpu *vcpu, int lr)
 		lr_desc.state |= LR_STATE_ACTIVE;
 	if (val & GICH_LR_EOI)
 		lr_desc.state |= LR_EOI_INT;
+	if (val & GICH_LR_HW) {
+		lr_desc.state |= LR_HW;
+		lr_desc.hwirq = (val & GICH_LR_PHYSID_CPUID) >> GICH_LR_PHYSID_CPUID_SHIFT;
+	}
 
 	return lr_desc;
 }
@@ -55,7 +59,9 @@ static struct vgic_lr vgic_v2_get_lr(const struct kvm_vcpu *vcpu, int lr)
 static void vgic_v2_set_lr(struct kvm_vcpu *vcpu, int lr,
 			   struct vgic_lr lr_desc)
 {
-	u32 lr_val = (lr_desc.source << GICH_LR_PHYSID_CPUID_SHIFT) | lr_desc.irq;
+	u32 lr_val;
+
+	lr_val = lr_desc.irq;
 
 	if (lr_desc.state & LR_STATE_PENDING)
 		lr_val |= GICH_LR_PENDING_BIT;
@@ -64,6 +70,14 @@ static void vgic_v2_set_lr(struct kvm_vcpu *vcpu, int lr,
 	if (lr_desc.state & LR_EOI_INT)
 		lr_val |= GICH_LR_EOI;
 
+	if (lr_desc.state & LR_HW) {
+		lr_val |= GICH_LR_HW;
+		lr_val |= (u32)lr_desc.hwirq << GICH_LR_PHYSID_CPUID_SHIFT;
+	}
+
+	if (lr_desc.irq < VGIC_NR_SGIS)
+		lr_val |= (lr_desc.source << GICH_LR_PHYSID_CPUID_SHIFT);
+
 	vcpu->arch.vgic_cpu.vgic_v2.vgic_lr[lr] = lr_val;
 }
 
diff --git a/virt/kvm/arm/vgic-v3.c b/virt/kvm/arm/vgic-v3.c
index dff0602..afbf925 100644
--- a/virt/kvm/arm/vgic-v3.c
+++ b/virt/kvm/arm/vgic-v3.c
@@ -67,6 +67,10 @@ static struct vgic_lr vgic_v3_get_lr(const struct kvm_vcpu *vcpu, int lr)
 		lr_desc.state |= LR_STATE_ACTIVE;
 	if (val & ICH_LR_EOI)
 		lr_desc.state |= LR_EOI_INT;
+	if (val & ICH_LR_HW) {
+		lr_desc.state |= LR_HW;
+		lr_desc.hwirq = (val >> ICH_LR_PHYS_ID_SHIFT) & GENMASK(9, 0);
+	}
 
 	return lr_desc;
 }
@@ -84,10 +88,17 @@ static void vgic_v3_set_lr(struct kvm_vcpu *vcpu, int lr,
 	 * Eventually we want to make this configurable, so we may revisit
 	 * this in the future.
 	 */
-	if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3)
+	switch (vcpu->kvm->arch.vgic.vgic_model) {
+	case KVM_DEV_TYPE_ARM_VGIC_V3:
 		lr_val |= ICH_LR_GROUP;
-	else
-		lr_val |= (u32)lr_desc.source << GICH_LR_PHYSID_CPUID_SHIFT;
+		break;
+	case  KVM_DEV_TYPE_ARM_VGIC_V2:
+		if (lr_desc.irq < VGIC_NR_SGIS)
+			lr_val |= (u32)lr_desc.source << GICH_LR_PHYSID_CPUID_SHIFT;
+		break;
+	default:
+		BUG();
+	}
 
 	if (lr_desc.state & LR_STATE_PENDING)
 		lr_val |= ICH_LR_PENDING_BIT;
@@ -95,6 +106,10 @@ static void vgic_v3_set_lr(struct kvm_vcpu *vcpu, int lr,
 		lr_val |= ICH_LR_ACTIVE_BIT;
 	if (lr_desc.state & LR_EOI_INT)
 		lr_val |= ICH_LR_EOI;
+	if (lr_desc.state & LR_HW) {
+		lr_val |= ICH_LR_HW;
+		lr_val |= ((u64)lr_desc.hwirq) << ICH_LR_PHYS_ID_SHIFT;
+	}
 
 	vcpu->arch.vgic_cpu.vgic_v3.vgic_lr[LR_INDEX(lr)] = lr_val;
 }
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [PATCH 04/10] KVM: arm/arm64: vgic: Allow HW irq to be encoded in LR
@ 2015-06-08 17:03   ` Marc Zyngier
  0 siblings, 0 replies; 118+ messages in thread
From: Marc Zyngier @ 2015-06-08 17:03 UTC (permalink / raw)
  To: kvm, kvmarm, linux-arm-kernel
  Cc: Christoffer Dall, Eric Auger, Alex Bennée, Andre Przywara

Now that struct vgic_lr supports the LR_HW bit and carries a hwirq
field, we can encode that information into the list registers.

This patch provides implementations for both GICv2 and GICv3.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 include/linux/irqchip/arm-gic-v3.h |  3 +++
 include/linux/irqchip/arm-gic.h    |  3 ++-
 virt/kvm/arm/vgic-v2.c             | 16 +++++++++++++++-
 virt/kvm/arm/vgic-v3.c             | 21 ++++++++++++++++++---
 4 files changed, 38 insertions(+), 5 deletions(-)

diff --git a/include/linux/irqchip/arm-gic-v3.h b/include/linux/irqchip/arm-gic-v3.h
index ffbc034..cf637d6 100644
--- a/include/linux/irqchip/arm-gic-v3.h
+++ b/include/linux/irqchip/arm-gic-v3.h
@@ -268,9 +268,12 @@
 
 #define ICH_LR_EOI			(1UL << 41)
 #define ICH_LR_GROUP			(1UL << 60)
+#define ICH_LR_HW			(1UL << 61)
 #define ICH_LR_STATE			(3UL << 62)
 #define ICH_LR_PENDING_BIT		(1UL << 62)
 #define ICH_LR_ACTIVE_BIT		(1UL << 63)
+#define ICH_LR_PHYS_ID_SHIFT		32
+#define ICH_LR_PHYS_ID_MASK		(0x3ffUL << ICH_LR_PHYS_ID_SHIFT)
 
 #define ICH_MISR_EOI			(1 << 0)
 #define ICH_MISR_U			(1 << 1)
diff --git a/include/linux/irqchip/arm-gic.h b/include/linux/irqchip/arm-gic.h
index 9de976b..ca88dad 100644
--- a/include/linux/irqchip/arm-gic.h
+++ b/include/linux/irqchip/arm-gic.h
@@ -71,11 +71,12 @@
 
 #define GICH_LR_VIRTUALID		(0x3ff << 0)
 #define GICH_LR_PHYSID_CPUID_SHIFT	(10)
-#define GICH_LR_PHYSID_CPUID		(7 << GICH_LR_PHYSID_CPUID_SHIFT)
+#define GICH_LR_PHYSID_CPUID		(0x3ff << GICH_LR_PHYSID_CPUID_SHIFT)
 #define GICH_LR_STATE			(3 << 28)
 #define GICH_LR_PENDING_BIT		(1 << 28)
 #define GICH_LR_ACTIVE_BIT		(1 << 29)
 #define GICH_LR_EOI			(1 << 19)
+#define GICH_LR_HW			(1 << 31)
 
 #define GICH_VMCR_CTRL_SHIFT		0
 #define GICH_VMCR_CTRL_MASK		(0x21f << GICH_VMCR_CTRL_SHIFT)
diff --git a/virt/kvm/arm/vgic-v2.c b/virt/kvm/arm/vgic-v2.c
index f9b9c7c..8d7b04d 100644
--- a/virt/kvm/arm/vgic-v2.c
+++ b/virt/kvm/arm/vgic-v2.c
@@ -48,6 +48,10 @@ static struct vgic_lr vgic_v2_get_lr(const struct kvm_vcpu *vcpu, int lr)
 		lr_desc.state |= LR_STATE_ACTIVE;
 	if (val & GICH_LR_EOI)
 		lr_desc.state |= LR_EOI_INT;
+	if (val & GICH_LR_HW) {
+		lr_desc.state |= LR_HW;
+		lr_desc.hwirq = (val & GICH_LR_PHYSID_CPUID) >> GICH_LR_PHYSID_CPUID_SHIFT;
+	}
 
 	return lr_desc;
 }
@@ -55,7 +59,9 @@ static struct vgic_lr vgic_v2_get_lr(const struct kvm_vcpu *vcpu, int lr)
 static void vgic_v2_set_lr(struct kvm_vcpu *vcpu, int lr,
 			   struct vgic_lr lr_desc)
 {
-	u32 lr_val = (lr_desc.source << GICH_LR_PHYSID_CPUID_SHIFT) | lr_desc.irq;
+	u32 lr_val;
+
+	lr_val = lr_desc.irq;
 
 	if (lr_desc.state & LR_STATE_PENDING)
 		lr_val |= GICH_LR_PENDING_BIT;
@@ -64,6 +70,14 @@ static void vgic_v2_set_lr(struct kvm_vcpu *vcpu, int lr,
 	if (lr_desc.state & LR_EOI_INT)
 		lr_val |= GICH_LR_EOI;
 
+	if (lr_desc.state & LR_HW) {
+		lr_val |= GICH_LR_HW;
+		lr_val |= (u32)lr_desc.hwirq << GICH_LR_PHYSID_CPUID_SHIFT;
+	}
+
+	if (lr_desc.irq < VGIC_NR_SGIS)
+		lr_val |= (lr_desc.source << GICH_LR_PHYSID_CPUID_SHIFT);
+
 	vcpu->arch.vgic_cpu.vgic_v2.vgic_lr[lr] = lr_val;
 }
 
diff --git a/virt/kvm/arm/vgic-v3.c b/virt/kvm/arm/vgic-v3.c
index dff0602..afbf925 100644
--- a/virt/kvm/arm/vgic-v3.c
+++ b/virt/kvm/arm/vgic-v3.c
@@ -67,6 +67,10 @@ static struct vgic_lr vgic_v3_get_lr(const struct kvm_vcpu *vcpu, int lr)
 		lr_desc.state |= LR_STATE_ACTIVE;
 	if (val & ICH_LR_EOI)
 		lr_desc.state |= LR_EOI_INT;
+	if (val & ICH_LR_HW) {
+		lr_desc.state |= LR_HW;
+		lr_desc.hwirq = (val >> ICH_LR_PHYS_ID_SHIFT) & GENMASK(9, 0);
+	}
 
 	return lr_desc;
 }
@@ -84,10 +88,17 @@ static void vgic_v3_set_lr(struct kvm_vcpu *vcpu, int lr,
 	 * Eventually we want to make this configurable, so we may revisit
 	 * this in the future.
 	 */
-	if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3)
+	switch (vcpu->kvm->arch.vgic.vgic_model) {
+	case KVM_DEV_TYPE_ARM_VGIC_V3:
 		lr_val |= ICH_LR_GROUP;
-	else
-		lr_val |= (u32)lr_desc.source << GICH_LR_PHYSID_CPUID_SHIFT;
+		break;
+	case  KVM_DEV_TYPE_ARM_VGIC_V2:
+		if (lr_desc.irq < VGIC_NR_SGIS)
+			lr_val |= (u32)lr_desc.source << GICH_LR_PHYSID_CPUID_SHIFT;
+		break;
+	default:
+		BUG();
+	}
 
 	if (lr_desc.state & LR_STATE_PENDING)
 		lr_val |= ICH_LR_PENDING_BIT;
@@ -95,6 +106,10 @@ static void vgic_v3_set_lr(struct kvm_vcpu *vcpu, int lr,
 		lr_val |= ICH_LR_ACTIVE_BIT;
 	if (lr_desc.state & LR_EOI_INT)
 		lr_val |= ICH_LR_EOI;
+	if (lr_desc.state & LR_HW) {
+		lr_val |= ICH_LR_HW;
+		lr_val |= ((u64)lr_desc.hwirq) << ICH_LR_PHYS_ID_SHIFT;
+	}
 
 	vcpu->arch.vgic_cpu.vgic_v3.vgic_lr[LR_INDEX(lr)] = lr_val;
 }
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [PATCH 05/10] KVM: arm/arm64: vgic: Relax vgic_can_sample_irq for edge IRQs
  2015-06-08 17:03 ` Marc Zyngier
@ 2015-06-08 17:04   ` Marc Zyngier
  -1 siblings, 0 replies; 118+ messages in thread
From: Marc Zyngier @ 2015-06-08 17:04 UTC (permalink / raw)
  To: linux-arm-kernel

We only set the irq_queued flag for level interrupts, meaning
that "!vgic_irq_is_queued(vcpu, irq)" is a good enough predicate
for all interrupts.

This will allow us to inject edge HW interrupts, for which the
state ACTIVE+PENDING is not allowed.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 virt/kvm/arm/vgic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 78fb820..59ed7a3 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -377,7 +377,7 @@ void vgic_cpu_irq_clear(struct kvm_vcpu *vcpu, int irq)
 
 static bool vgic_can_sample_irq(struct kvm_vcpu *vcpu, int irq)
 {
-	return vgic_irq_is_edge(vcpu, irq) || !vgic_irq_is_queued(vcpu, irq);
+	return !vgic_irq_is_queued(vcpu, irq);
 }
 
 /**
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [PATCH 05/10] KVM: arm/arm64: vgic: Relax vgic_can_sample_irq for edge IRQs
@ 2015-06-08 17:04   ` Marc Zyngier
  0 siblings, 0 replies; 118+ messages in thread
From: Marc Zyngier @ 2015-06-08 17:04 UTC (permalink / raw)
  To: kvm, kvmarm, linux-arm-kernel
  Cc: Christoffer Dall, Eric Auger, Alex Bennée, Andre Przywara

We only set the irq_queued flag for level interrupts, meaning
that "!vgic_irq_is_queued(vcpu, irq)" is a good enough predicate
for all interrupts.

This will allow us to inject edge HW interrupts, for which the
state ACTIVE+PENDING is not allowed.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 virt/kvm/arm/vgic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 78fb820..59ed7a3 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -377,7 +377,7 @@ void vgic_cpu_irq_clear(struct kvm_vcpu *vcpu, int irq)
 
 static bool vgic_can_sample_irq(struct kvm_vcpu *vcpu, int irq)
 {
-	return vgic_irq_is_edge(vcpu, irq) || !vgic_irq_is_queued(vcpu, irq);
+	return !vgic_irq_is_queued(vcpu, irq);
 }
 
 /**
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [PATCH 06/10] KVM: arm/arm64: vgic: Allow dynamic mapping of physical/virtual interrupts
  2015-06-08 17:03 ` Marc Zyngier
@ 2015-06-08 17:04   ` Marc Zyngier
  -1 siblings, 0 replies; 118+ messages in thread
From: Marc Zyngier @ 2015-06-08 17:04 UTC (permalink / raw)
  To: linux-arm-kernel

In order to be able to feed physical interrupts to a guest, we need
to be able to establish the virtual-physical mapping between the two
worlds.

The mapping is kept in a rbtree, indexed by virtual interrupts.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 include/kvm/arm_vgic.h |  18 ++++++++
 virt/kvm/arm/vgic.c    | 110 +++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 128 insertions(+)

diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 4f9fa1d..33d121a 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -159,6 +159,14 @@ struct vgic_io_device {
 	struct kvm_io_device dev;
 };
 
+struct irq_phys_map {
+	struct rb_node		node;
+	u32			virt_irq;
+	u32			phys_irq;
+	u32			irq;
+	bool			active;
+};
+
 struct vgic_dist {
 	spinlock_t		lock;
 	bool			in_kernel;
@@ -256,6 +264,10 @@ struct vgic_dist {
 	struct vgic_vm_ops	vm_ops;
 	struct vgic_io_device	dist_iodev;
 	struct vgic_io_device	*redist_iodevs;
+
+	/* Virtual irq to hwirq mapping */
+	spinlock_t		irq_phys_map_lock;
+	struct rb_root		irq_phys_map;
 };
 
 struct vgic_v2_cpu_if {
@@ -307,6 +319,9 @@ struct vgic_cpu {
 		struct vgic_v2_cpu_if	vgic_v2;
 		struct vgic_v3_cpu_if	vgic_v3;
 	};
+
+	/* Protected by the distributor's irq_phys_map_lock */
+	struct rb_root	irq_phys_map;
 };
 
 #define LR_EMPTY	0xff
@@ -331,6 +346,9 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int irq_num,
 void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg);
 int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
 int kvm_vgic_vcpu_active_irq(struct kvm_vcpu *vcpu);
+struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
+				       int virt_irq, int irq);
+int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map);
 
 #define irqchip_in_kernel(k)	(!!((k)->arch.vgic.in_kernel))
 #define vgic_initialized(k)	(!!((k)->arch.vgic.nr_cpus))
diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 59ed7a3..c6604f2 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -24,6 +24,7 @@
 #include <linux/of.h>
 #include <linux/of_address.h>
 #include <linux/of_irq.h>
+#include <linux/rbtree.h>
 #include <linux/uaccess.h>
 
 #include <linux/irqchip/arm-gic.h>
@@ -84,6 +85,8 @@ static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu);
 static void vgic_retire_lr(int lr_nr, int irq, struct kvm_vcpu *vcpu);
 static struct vgic_lr vgic_get_lr(const struct kvm_vcpu *vcpu, int lr);
 static void vgic_set_lr(struct kvm_vcpu *vcpu, int lr, struct vgic_lr lr_desc);
+static struct irq_phys_map *vgic_irq_map_search(struct kvm_vcpu *vcpu,
+						int virt_irq);
 
 static const struct vgic_ops *vgic_ops;
 static const struct vgic_params *vgic;
@@ -1585,6 +1588,112 @@ static irqreturn_t vgic_maintenance_handler(int irq, void *data)
 	return IRQ_HANDLED;
 }
 
+static struct rb_root *vgic_get_irq_phys_map(struct kvm_vcpu *vcpu,
+					     int virt_irq)
+{
+	if (virt_irq < VGIC_NR_PRIVATE_IRQS)
+		return &vcpu->arch.vgic_cpu.irq_phys_map;
+	else
+		return &vcpu->kvm->arch.vgic.irq_phys_map;
+}
+
+struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
+				       int virt_irq, int irq)
+{
+	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
+	struct rb_root *root = vgic_get_irq_phys_map(vcpu, virt_irq);
+	struct rb_node **new = &root->rb_node, *parent = NULL;
+	struct irq_phys_map *new_map;
+	struct irq_desc *desc;
+	struct irq_data *data;
+	int phys_irq;
+
+	desc = irq_to_desc(irq);
+	if (!desc) {
+		kvm_err("kvm_arch_timer: can't obtain interrupt descriptor\n");
+		return NULL;
+	}
+
+	data = irq_desc_get_irq_data(desc);
+	while (data->parent_data)
+		data = data->parent_data;
+
+	phys_irq = data->hwirq;
+
+	spin_lock(&dist->irq_phys_map_lock);
+
+	/* Boilerplate rb_tree code */
+	while (*new) {
+		struct irq_phys_map *this;
+
+		this = container_of(*new, struct irq_phys_map, node);
+		parent = *new;
+		if (this->virt_irq < virt_irq)
+			new = &(*new)->rb_left;
+		else if (this->virt_irq > virt_irq)
+			new = &(*new)->rb_right;
+		else {
+			new_map = this;
+			goto out;
+		}
+	}
+
+	new_map = kzalloc(sizeof(*new_map), GFP_KERNEL);
+	if (!new_map)
+		goto out;
+
+	new_map->virt_irq = virt_irq;
+	new_map->phys_irq = phys_irq;
+	new_map->irq = irq;
+
+	rb_link_node(&new_map->node, parent, new);
+	rb_insert_color(&new_map->node, root);
+
+out:
+	spin_unlock(&dist->irq_phys_map_lock);
+	return new_map;
+}
+
+static struct irq_phys_map *vgic_irq_map_search(struct kvm_vcpu *vcpu,
+						int virt_irq)
+{
+	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
+	struct rb_root *root = vgic_get_irq_phys_map(vcpu, virt_irq);
+	struct rb_node *node = root->rb_node;
+	struct irq_phys_map *this = NULL;
+
+	spin_lock(&dist->irq_phys_map_lock);
+
+	while (node) {
+		this = container_of(node, struct irq_phys_map, node);
+
+		if (this->virt_irq < virt_irq)
+			node = node->rb_left;
+		else if (this->virt_irq > virt_irq)
+			node = node->rb_right;
+		else
+			break;
+	}
+
+	spin_unlock(&dist->irq_phys_map_lock);
+	return this;
+}
+
+int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map)
+{
+	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
+
+	if (!map)
+		return -EINVAL;
+
+	spin_lock(&dist->irq_phys_map_lock);
+	rb_erase(&map->node, vgic_get_irq_phys_map(vcpu, map->virt_irq));
+	spin_unlock(&dist->irq_phys_map_lock);
+
+	kfree(map);
+	return 0;
+}
+
 void kvm_vgic_vcpu_destroy(struct kvm_vcpu *vcpu)
 {
 	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
@@ -1835,6 +1944,7 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
 		goto out_unlock;
 
 	spin_lock_init(&kvm->arch.vgic.lock);
+	spin_lock_init(&kvm->arch.vgic.irq_phys_map_lock);
 	kvm->arch.vgic.in_kernel = true;
 	kvm->arch.vgic.vgic_model = type;
 	kvm->arch.vgic.vctrl_base = vgic->vctrl_base;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [PATCH 06/10] KVM: arm/arm64: vgic: Allow dynamic mapping of physical/virtual interrupts
@ 2015-06-08 17:04   ` Marc Zyngier
  0 siblings, 0 replies; 118+ messages in thread
From: Marc Zyngier @ 2015-06-08 17:04 UTC (permalink / raw)
  To: kvm, kvmarm, linux-arm-kernel
  Cc: Christoffer Dall, Eric Auger, Alex Bennée, Andre Przywara

In order to be able to feed physical interrupts to a guest, we need
to be able to establish the virtual-physical mapping between the two
worlds.

The mapping is kept in a rbtree, indexed by virtual interrupts.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 include/kvm/arm_vgic.h |  18 ++++++++
 virt/kvm/arm/vgic.c    | 110 +++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 128 insertions(+)

diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 4f9fa1d..33d121a 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -159,6 +159,14 @@ struct vgic_io_device {
 	struct kvm_io_device dev;
 };
 
+struct irq_phys_map {
+	struct rb_node		node;
+	u32			virt_irq;
+	u32			phys_irq;
+	u32			irq;
+	bool			active;
+};
+
 struct vgic_dist {
 	spinlock_t		lock;
 	bool			in_kernel;
@@ -256,6 +264,10 @@ struct vgic_dist {
 	struct vgic_vm_ops	vm_ops;
 	struct vgic_io_device	dist_iodev;
 	struct vgic_io_device	*redist_iodevs;
+
+	/* Virtual irq to hwirq mapping */
+	spinlock_t		irq_phys_map_lock;
+	struct rb_root		irq_phys_map;
 };
 
 struct vgic_v2_cpu_if {
@@ -307,6 +319,9 @@ struct vgic_cpu {
 		struct vgic_v2_cpu_if	vgic_v2;
 		struct vgic_v3_cpu_if	vgic_v3;
 	};
+
+	/* Protected by the distributor's irq_phys_map_lock */
+	struct rb_root	irq_phys_map;
 };
 
 #define LR_EMPTY	0xff
@@ -331,6 +346,9 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int irq_num,
 void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg);
 int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
 int kvm_vgic_vcpu_active_irq(struct kvm_vcpu *vcpu);
+struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
+				       int virt_irq, int irq);
+int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map);
 
 #define irqchip_in_kernel(k)	(!!((k)->arch.vgic.in_kernel))
 #define vgic_initialized(k)	(!!((k)->arch.vgic.nr_cpus))
diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 59ed7a3..c6604f2 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -24,6 +24,7 @@
 #include <linux/of.h>
 #include <linux/of_address.h>
 #include <linux/of_irq.h>
+#include <linux/rbtree.h>
 #include <linux/uaccess.h>
 
 #include <linux/irqchip/arm-gic.h>
@@ -84,6 +85,8 @@ static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu);
 static void vgic_retire_lr(int lr_nr, int irq, struct kvm_vcpu *vcpu);
 static struct vgic_lr vgic_get_lr(const struct kvm_vcpu *vcpu, int lr);
 static void vgic_set_lr(struct kvm_vcpu *vcpu, int lr, struct vgic_lr lr_desc);
+static struct irq_phys_map *vgic_irq_map_search(struct kvm_vcpu *vcpu,
+						int virt_irq);
 
 static const struct vgic_ops *vgic_ops;
 static const struct vgic_params *vgic;
@@ -1585,6 +1588,112 @@ static irqreturn_t vgic_maintenance_handler(int irq, void *data)
 	return IRQ_HANDLED;
 }
 
+static struct rb_root *vgic_get_irq_phys_map(struct kvm_vcpu *vcpu,
+					     int virt_irq)
+{
+	if (virt_irq < VGIC_NR_PRIVATE_IRQS)
+		return &vcpu->arch.vgic_cpu.irq_phys_map;
+	else
+		return &vcpu->kvm->arch.vgic.irq_phys_map;
+}
+
+struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
+				       int virt_irq, int irq)
+{
+	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
+	struct rb_root *root = vgic_get_irq_phys_map(vcpu, virt_irq);
+	struct rb_node **new = &root->rb_node, *parent = NULL;
+	struct irq_phys_map *new_map;
+	struct irq_desc *desc;
+	struct irq_data *data;
+	int phys_irq;
+
+	desc = irq_to_desc(irq);
+	if (!desc) {
+		kvm_err("kvm_arch_timer: can't obtain interrupt descriptor\n");
+		return NULL;
+	}
+
+	data = irq_desc_get_irq_data(desc);
+	while (data->parent_data)
+		data = data->parent_data;
+
+	phys_irq = data->hwirq;
+
+	spin_lock(&dist->irq_phys_map_lock);
+
+	/* Boilerplate rb_tree code */
+	while (*new) {
+		struct irq_phys_map *this;
+
+		this = container_of(*new, struct irq_phys_map, node);
+		parent = *new;
+		if (this->virt_irq < virt_irq)
+			new = &(*new)->rb_left;
+		else if (this->virt_irq > virt_irq)
+			new = &(*new)->rb_right;
+		else {
+			new_map = this;
+			goto out;
+		}
+	}
+
+	new_map = kzalloc(sizeof(*new_map), GFP_KERNEL);
+	if (!new_map)
+		goto out;
+
+	new_map->virt_irq = virt_irq;
+	new_map->phys_irq = phys_irq;
+	new_map->irq = irq;
+
+	rb_link_node(&new_map->node, parent, new);
+	rb_insert_color(&new_map->node, root);
+
+out:
+	spin_unlock(&dist->irq_phys_map_lock);
+	return new_map;
+}
+
+static struct irq_phys_map *vgic_irq_map_search(struct kvm_vcpu *vcpu,
+						int virt_irq)
+{
+	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
+	struct rb_root *root = vgic_get_irq_phys_map(vcpu, virt_irq);
+	struct rb_node *node = root->rb_node;
+	struct irq_phys_map *this = NULL;
+
+	spin_lock(&dist->irq_phys_map_lock);
+
+	while (node) {
+		this = container_of(node, struct irq_phys_map, node);
+
+		if (this->virt_irq < virt_irq)
+			node = node->rb_left;
+		else if (this->virt_irq > virt_irq)
+			node = node->rb_right;
+		else
+			break;
+	}
+
+	spin_unlock(&dist->irq_phys_map_lock);
+	return this;
+}
+
+int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map)
+{
+	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
+
+	if (!map)
+		return -EINVAL;
+
+	spin_lock(&dist->irq_phys_map_lock);
+	rb_erase(&map->node, vgic_get_irq_phys_map(vcpu, map->virt_irq));
+	spin_unlock(&dist->irq_phys_map_lock);
+
+	kfree(map);
+	return 0;
+}
+
 void kvm_vgic_vcpu_destroy(struct kvm_vcpu *vcpu)
 {
 	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
@@ -1835,6 +1944,7 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
 		goto out_unlock;
 
 	spin_lock_init(&kvm->arch.vgic.lock);
+	spin_lock_init(&kvm->arch.vgic.irq_phys_map_lock);
 	kvm->arch.vgic.in_kernel = true;
 	kvm->arch.vgic.vgic_model = type;
 	kvm->arch.vgic.vctrl_base = vgic->vctrl_base;
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [PATCH 07/10] KVM: arm/arm64: vgic: Allow HW interrupts to be queued to a guest
  2015-06-08 17:03 ` Marc Zyngier
@ 2015-06-08 17:04   ` Marc Zyngier
  -1 siblings, 0 replies; 118+ messages in thread
From: Marc Zyngier @ 2015-06-08 17:04 UTC (permalink / raw)
  To: linux-arm-kernel

To allow a HW interrupt to be injected into a guest, we lookup the
guest virtual interrupt in the irq_phys_map rbtree, and if we have
a match, encode both interrupts in the LR.

We also mark the interrupt as "active" at the host distributor level.

On guest EOI on the virtual interrupt, the host interrupt will be
deactivated.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 virt/kvm/arm/vgic.c | 71 ++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 68 insertions(+), 3 deletions(-)

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index c6604f2..495ac7d 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -1120,6 +1120,26 @@ static void vgic_queue_irq_to_lr(struct kvm_vcpu *vcpu, int irq,
 	if (!vgic_irq_is_edge(vcpu, irq))
 		vlr.state |= LR_EOI_INT;
 
+	if (vlr.irq >= VGIC_NR_SGIS) {
+		struct irq_phys_map *map;
+		map = vgic_irq_map_search(vcpu, irq);
+
+		if (map) {
+			int ret;
+
+			BUG_ON(!map->active);
+			vlr.hwirq = map->phys_irq;
+			vlr.state |= LR_HW;
+			vlr.state &= ~LR_EOI_INT;
+
+			ret = irq_set_irqchip_state(map->irq,
+						    IRQCHIP_STATE_ACTIVE,
+						    true);
+			vgic_irq_set_queued(vcpu, irq);
+			WARN_ON(ret);
+		}
+	}
+
 	vgic_set_lr(vcpu, lr_nr, vlr);
 	vgic_sync_lr_elrsr(vcpu, lr_nr, vlr);
 }
@@ -1344,6 +1364,35 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
 	return level_pending;
 }
 
+/* Return 1 if HW interrupt went from active to inactive, and 0 otherwise */
+static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr)
+{
+	struct irq_phys_map *map;
+	int ret;
+
+	if (!(vlr.state & LR_HW))
+		return 0;
+
+	map = vgic_irq_map_search(vcpu, vlr.irq);
+	BUG_ON(!map || !map->active);
+
+	ret = irq_get_irqchip_state(map->irq,
+				    IRQCHIP_STATE_ACTIVE,
+				    &map->active);
+
+	WARN_ON(ret);
+
+	if (map->active) {
+		ret = irq_set_irqchip_state(map->irq,
+					    IRQCHIP_STATE_ACTIVE,
+					    false);
+		WARN_ON(ret);
+		return 0;
+	}
+
+	return 1;
+}
+
 /* Sync back the VGIC state after a guest run */
 static void __kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
 {
@@ -1358,14 +1407,30 @@ static void __kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
 	elrsr = vgic_get_elrsr(vcpu);
 	elrsr_ptr = u64_to_bitmask(&elrsr);
 
-	/* Clear mappings for empty LRs */
-	for_each_set_bit(lr, elrsr_ptr, vgic->nr_lr) {
+	/* Deal with HW interrupts, and clear mappings for empty LRs */
+	for (lr = 0; lr < vgic->nr_lr; lr++) {
 		struct vgic_lr vlr;
 
-		if (!test_and_clear_bit(lr, vgic_cpu->lr_used))
+		if (!test_bit(lr, vgic_cpu->lr_used))
 			continue;
 
 		vlr = vgic_get_lr(vcpu, lr);
+		if (vgic_sync_hwirq(vcpu, vlr)) {
+			/*
+			 * So this is a HW interrupt that the guest
+			 * EOI-ed. Clean the LR state and allow the
+			 * interrupt to be queued again.
+			 */
+			vlr.state &= ~LR_HW;
+			vlr.hwirq = 0;
+			vgic_set_lr(vcpu, lr, vlr);
+			vgic_irq_clear_queued(vcpu, vlr.irq);
+		}
+
+		if (!test_bit(lr, elrsr_ptr))
+			continue;
+
+		clear_bit(lr, vgic_cpu->lr_used);
 
 		BUG_ON(vlr.irq >= dist->nr_irqs);
 		vgic_cpu->vgic_irq_lr_map[vlr.irq] = LR_EMPTY;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [PATCH 07/10] KVM: arm/arm64: vgic: Allow HW interrupts to be queued to a guest
@ 2015-06-08 17:04   ` Marc Zyngier
  0 siblings, 0 replies; 118+ messages in thread
From: Marc Zyngier @ 2015-06-08 17:04 UTC (permalink / raw)
  To: kvm, kvmarm, linux-arm-kernel
  Cc: Christoffer Dall, Eric Auger, Alex Bennée, Andre Przywara

To allow a HW interrupt to be injected into a guest, we lookup the
guest virtual interrupt in the irq_phys_map rbtree, and if we have
a match, encode both interrupts in the LR.

We also mark the interrupt as "active" at the host distributor level.

On guest EOI on the virtual interrupt, the host interrupt will be
deactivated.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 virt/kvm/arm/vgic.c | 71 ++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 68 insertions(+), 3 deletions(-)

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index c6604f2..495ac7d 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -1120,6 +1120,26 @@ static void vgic_queue_irq_to_lr(struct kvm_vcpu *vcpu, int irq,
 	if (!vgic_irq_is_edge(vcpu, irq))
 		vlr.state |= LR_EOI_INT;
 
+	if (vlr.irq >= VGIC_NR_SGIS) {
+		struct irq_phys_map *map;
+		map = vgic_irq_map_search(vcpu, irq);
+
+		if (map) {
+			int ret;
+
+			BUG_ON(!map->active);
+			vlr.hwirq = map->phys_irq;
+			vlr.state |= LR_HW;
+			vlr.state &= ~LR_EOI_INT;
+
+			ret = irq_set_irqchip_state(map->irq,
+						    IRQCHIP_STATE_ACTIVE,
+						    true);
+			vgic_irq_set_queued(vcpu, irq);
+			WARN_ON(ret);
+		}
+	}
+
 	vgic_set_lr(vcpu, lr_nr, vlr);
 	vgic_sync_lr_elrsr(vcpu, lr_nr, vlr);
 }
@@ -1344,6 +1364,35 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
 	return level_pending;
 }
 
+/* Return 1 if HW interrupt went from active to inactive, and 0 otherwise */
+static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr)
+{
+	struct irq_phys_map *map;
+	int ret;
+
+	if (!(vlr.state & LR_HW))
+		return 0;
+
+	map = vgic_irq_map_search(vcpu, vlr.irq);
+	BUG_ON(!map || !map->active);
+
+	ret = irq_get_irqchip_state(map->irq,
+				    IRQCHIP_STATE_ACTIVE,
+				    &map->active);
+
+	WARN_ON(ret);
+
+	if (map->active) {
+		ret = irq_set_irqchip_state(map->irq,
+					    IRQCHIP_STATE_ACTIVE,
+					    false);
+		WARN_ON(ret);
+		return 0;
+	}
+
+	return 1;
+}
+
 /* Sync back the VGIC state after a guest run */
 static void __kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
 {
@@ -1358,14 +1407,30 @@ static void __kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
 	elrsr = vgic_get_elrsr(vcpu);
 	elrsr_ptr = u64_to_bitmask(&elrsr);
 
-	/* Clear mappings for empty LRs */
-	for_each_set_bit(lr, elrsr_ptr, vgic->nr_lr) {
+	/* Deal with HW interrupts, and clear mappings for empty LRs */
+	for (lr = 0; lr < vgic->nr_lr; lr++) {
 		struct vgic_lr vlr;
 
-		if (!test_and_clear_bit(lr, vgic_cpu->lr_used))
+		if (!test_bit(lr, vgic_cpu->lr_used))
 			continue;
 
 		vlr = vgic_get_lr(vcpu, lr);
+		if (vgic_sync_hwirq(vcpu, vlr)) {
+			/*
+			 * So this is a HW interrupt that the guest
+			 * EOI-ed. Clean the LR state and allow the
+			 * interrupt to be queued again.
+			 */
+			vlr.state &= ~LR_HW;
+			vlr.hwirq = 0;
+			vgic_set_lr(vcpu, lr, vlr);
+			vgic_irq_clear_queued(vcpu, vlr.irq);
+		}
+
+		if (!test_bit(lr, elrsr_ptr))
+			continue;
+
+		clear_bit(lr, vgic_cpu->lr_used);
 
 		BUG_ON(vlr.irq >= dist->nr_irqs);
 		vgic_cpu->vgic_irq_lr_map[vlr.irq] = LR_EMPTY;
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [PATCH 08/10] KVM: arm/arm64: vgic: Add vgic_{get, set}_phys_irq_active
  2015-06-08 17:03 ` Marc Zyngier
@ 2015-06-08 17:04   ` Marc Zyngier
  -1 siblings, 0 replies; 118+ messages in thread
From: Marc Zyngier @ 2015-06-08 17:04 UTC (permalink / raw)
  To: linux-arm-kernel

In order to control the active state of an interrupt, introduce
a pair of accessors allowing the state to be set/queried.

This only affects the logical state, and the HW state will only be
applied at world-switch time.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 include/kvm/arm_vgic.h |  2 ++
 virt/kvm/arm/vgic.c    | 12 ++++++++++++
 2 files changed, 14 insertions(+)

diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 33d121a..1c653c1 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -349,6 +349,8 @@ int kvm_vgic_vcpu_active_irq(struct kvm_vcpu *vcpu);
 struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
 				       int virt_irq, int irq);
 int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map);
+bool vgic_get_phys_irq_active(struct irq_phys_map *map);
+void vgic_set_phys_irq_active(struct irq_phys_map *map, bool active);
 
 #define irqchip_in_kernel(k)	(!!((k)->arch.vgic.in_kernel))
 #define vgic_initialized(k)	(!!((k)->arch.vgic.nr_cpus))
diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 495ac7d..f376b56 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -1744,6 +1744,18 @@ static struct irq_phys_map *vgic_irq_map_search(struct kvm_vcpu *vcpu,
 	return this;
 }
 
+bool vgic_get_phys_irq_active(struct irq_phys_map *map)
+{
+	BUG_ON(!map);
+	return map->active;
+}
+
+void vgic_set_phys_irq_active(struct irq_phys_map *map, bool active)
+{
+	BUG_ON(!map);
+	map->active = active;
+}
+
 int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map)
 {
 	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [PATCH 08/10] KVM: arm/arm64: vgic: Add vgic_{get,set}_phys_irq_active
@ 2015-06-08 17:04   ` Marc Zyngier
  0 siblings, 0 replies; 118+ messages in thread
From: Marc Zyngier @ 2015-06-08 17:04 UTC (permalink / raw)
  To: kvm, kvmarm, linux-arm-kernel
  Cc: Christoffer Dall, Eric Auger, Alex Bennée, Andre Przywara

In order to control the active state of an interrupt, introduce
a pair of accessors allowing the state to be set/queried.

This only affects the logical state, and the HW state will only be
applied at world-switch time.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 include/kvm/arm_vgic.h |  2 ++
 virt/kvm/arm/vgic.c    | 12 ++++++++++++
 2 files changed, 14 insertions(+)

diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 33d121a..1c653c1 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -349,6 +349,8 @@ int kvm_vgic_vcpu_active_irq(struct kvm_vcpu *vcpu);
 struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
 				       int virt_irq, int irq);
 int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map);
+bool vgic_get_phys_irq_active(struct irq_phys_map *map);
+void vgic_set_phys_irq_active(struct irq_phys_map *map, bool active);
 
 #define irqchip_in_kernel(k)	(!!((k)->arch.vgic.in_kernel))
 #define vgic_initialized(k)	(!!((k)->arch.vgic.nr_cpus))
diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 495ac7d..f376b56 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -1744,6 +1744,18 @@ static struct irq_phys_map *vgic_irq_map_search(struct kvm_vcpu *vcpu,
 	return this;
 }
 
+bool vgic_get_phys_irq_active(struct irq_phys_map *map)
+{
+	BUG_ON(!map);
+	return map->active;
+}
+
+void vgic_set_phys_irq_active(struct irq_phys_map *map, bool active)
+{
+	BUG_ON(!map);
+	map->active = active;
+}
+
 int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map)
 {
 	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [PATCH 09/10] KVM: arm/arm64: timer: Allow the timer to control the active state
  2015-06-08 17:03 ` Marc Zyngier
@ 2015-06-08 17:04   ` Marc Zyngier
  -1 siblings, 0 replies; 118+ messages in thread
From: Marc Zyngier @ 2015-06-08 17:04 UTC (permalink / raw)
  To: linux-arm-kernel

In order to remove the crude hack where we sneak the masked bit
into the timer's control register, make use of the phys_irq_map
API control the active state of the interrupt.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 include/kvm/arm_arch_timer.h |  3 +++
 virt/kvm/arm/arch_timer.c    | 13 +++++++++++--
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index e596675..9feebf1 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -52,6 +52,9 @@ struct arch_timer_cpu {
 
 	/* Timer IRQ */
 	const struct kvm_irq_level	*irq;
+
+	/* VGIC mapping */
+	struct irq_phys_map		*map;
 };
 
 int kvm_timer_hyp_init(void);
diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 98c95f2..b9fff78 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -64,7 +64,7 @@ static void kvm_timer_inject_irq(struct kvm_vcpu *vcpu)
 	int ret;
 	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
 
-	timer->cntv_ctl |= ARCH_TIMER_CTRL_IT_MASK;
+	vgic_set_phys_irq_active(timer->map, true);
 	ret = kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
 				  timer->irq->irq,
 				  timer->irq->level);
@@ -117,7 +117,8 @@ bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
 	cycle_t cval, now;
 
 	if ((timer->cntv_ctl & ARCH_TIMER_CTRL_IT_MASK) ||
-		!(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE))
+	    !(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE) ||
+	    vgic_get_phys_irq_active(timer->map))
 		return false;
 
 	cval = timer->cntv_cval;
@@ -196,6 +197,13 @@ void kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
 	 * vcpu timer irq number when the vcpu is reset.
 	 */
 	timer->irq = irq;
+
+	/*
+	 * Tell the VGIC that the virtual interrupt is tied to a
+	 * physical interrupt. We do that once per VCPU.
+	 */
+	timer->map = vgic_map_phys_irq(vcpu, irq->irq, host_vtimer_irq);
+	WARN_ON(!timer->map);
 }
 
 void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu)
@@ -335,6 +343,7 @@ void kvm_timer_vcpu_terminate(struct kvm_vcpu *vcpu)
 	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
 
 	timer_disarm(timer);
+	vgic_unmap_phys_irq(vcpu, timer->map);
 }
 
 void kvm_timer_enable(struct kvm *kvm)
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [PATCH 09/10] KVM: arm/arm64: timer: Allow the timer to control the active state
@ 2015-06-08 17:04   ` Marc Zyngier
  0 siblings, 0 replies; 118+ messages in thread
From: Marc Zyngier @ 2015-06-08 17:04 UTC (permalink / raw)
  To: kvm, kvmarm, linux-arm-kernel
  Cc: Christoffer Dall, Eric Auger, Alex Bennée, Andre Przywara

In order to remove the crude hack where we sneak the masked bit
into the timer's control register, make use of the phys_irq_map
API control the active state of the interrupt.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 include/kvm/arm_arch_timer.h |  3 +++
 virt/kvm/arm/arch_timer.c    | 13 +++++++++++--
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index e596675..9feebf1 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -52,6 +52,9 @@ struct arch_timer_cpu {
 
 	/* Timer IRQ */
 	const struct kvm_irq_level	*irq;
+
+	/* VGIC mapping */
+	struct irq_phys_map		*map;
 };
 
 int kvm_timer_hyp_init(void);
diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 98c95f2..b9fff78 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -64,7 +64,7 @@ static void kvm_timer_inject_irq(struct kvm_vcpu *vcpu)
 	int ret;
 	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
 
-	timer->cntv_ctl |= ARCH_TIMER_CTRL_IT_MASK;
+	vgic_set_phys_irq_active(timer->map, true);
 	ret = kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
 				  timer->irq->irq,
 				  timer->irq->level);
@@ -117,7 +117,8 @@ bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
 	cycle_t cval, now;
 
 	if ((timer->cntv_ctl & ARCH_TIMER_CTRL_IT_MASK) ||
-		!(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE))
+	    !(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE) ||
+	    vgic_get_phys_irq_active(timer->map))
 		return false;
 
 	cval = timer->cntv_cval;
@@ -196,6 +197,13 @@ void kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
 	 * vcpu timer irq number when the vcpu is reset.
 	 */
 	timer->irq = irq;
+
+	/*
+	 * Tell the VGIC that the virtual interrupt is tied to a
+	 * physical interrupt. We do that once per VCPU.
+	 */
+	timer->map = vgic_map_phys_irq(vcpu, irq->irq, host_vtimer_irq);
+	WARN_ON(!timer->map);
 }
 
 void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu)
@@ -335,6 +343,7 @@ void kvm_timer_vcpu_terminate(struct kvm_vcpu *vcpu)
 	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
 
 	timer_disarm(timer);
+	vgic_unmap_phys_irq(vcpu, timer->map);
 }
 
 void kvm_timer_enable(struct kvm *kvm)
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [PATCH 10/10] KVM: arm/arm64: vgic: Allow non-shared device HW interrupts
  2015-06-08 17:03 ` Marc Zyngier
@ 2015-06-08 17:04   ` Marc Zyngier
  -1 siblings, 0 replies; 118+ messages in thread
From: Marc Zyngier @ 2015-06-08 17:04 UTC (permalink / raw)
  To: linux-arm-kernel

So far, the only use of the HW interrupt facility is the timer,
implying that the active state is context-switched for each vcpu,
as the device is is shared across all vcpus.

This does not work for a device that has been assigned to a VM,
as the guest is entierely in control of that device (the HW is
not shared). In that case, it makes sense to bypass the whole
active state switchint, and only track the deactivation of the
interrupt.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 include/kvm/arm_vgic.h    |  5 +++--
 virt/kvm/arm/arch_timer.c |  2 +-
 virt/kvm/arm/vgic.c       | 37 ++++++++++++++++++++++++-------------
 3 files changed, 28 insertions(+), 16 deletions(-)

diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 1c653c1..5d47d60 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -164,7 +164,8 @@ struct irq_phys_map {
 	u32			virt_irq;
 	u32			phys_irq;
 	u32			irq;
-	bool			active;
+	bool			shared;
+	bool			active; /* Only valid if shared */
 };
 
 struct vgic_dist {
@@ -347,7 +348,7 @@ void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg);
 int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
 int kvm_vgic_vcpu_active_irq(struct kvm_vcpu *vcpu);
 struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
-				       int virt_irq, int irq);
+				       int virt_irq, int irq, bool shared);
 int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map);
 bool vgic_get_phys_irq_active(struct irq_phys_map *map);
 void vgic_set_phys_irq_active(struct irq_phys_map *map, bool active);
diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index b9fff78..9544d79 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -202,7 +202,7 @@ void kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
 	 * Tell the VGIC that the virtual interrupt is tied to a
 	 * physical interrupt. We do that once per VCPU.
 	 */
-	timer->map = vgic_map_phys_irq(vcpu, irq->irq, host_vtimer_irq);
+	timer->map = vgic_map_phys_irq(vcpu, irq->irq, host_vtimer_irq, true);
 	WARN_ON(!timer->map);
 }
 
diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index f376b56..4223166 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -1125,18 +1125,21 @@ static void vgic_queue_irq_to_lr(struct kvm_vcpu *vcpu, int irq,
 		map = vgic_irq_map_search(vcpu, irq);
 
 		if (map) {
-			int ret;
-
-			BUG_ON(!map->active);
 			vlr.hwirq = map->phys_irq;
 			vlr.state |= LR_HW;
 			vlr.state &= ~LR_EOI_INT;
 
-			ret = irq_set_irqchip_state(map->irq,
-						    IRQCHIP_STATE_ACTIVE,
-						    true);
 			vgic_irq_set_queued(vcpu, irq);
-			WARN_ON(ret);
+
+			if (map->shared) {
+				int ret;
+
+				BUG_ON(!map->active);
+				ret = irq_set_irqchip_state(map->irq,
+							    IRQCHIP_STATE_ACTIVE,
+							    true);
+				WARN_ON(ret);
+			}
 		}
 	}
 
@@ -1368,21 +1371,28 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
 static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr)
 {
 	struct irq_phys_map *map;
+	bool active;
 	int ret;
 
 	if (!(vlr.state & LR_HW))
 		return 0;
 
 	map = vgic_irq_map_search(vcpu, vlr.irq);
-	BUG_ON(!map || !map->active);
+	BUG_ON(!map);
+	BUG_ON(map->shared && !map->active);
 
 	ret = irq_get_irqchip_state(map->irq,
 				    IRQCHIP_STATE_ACTIVE,
-				    &map->active);
+				    &active);
 
 	WARN_ON(ret);
 
-	if (map->active) {
+	if (!map->shared)
+		return !active;
+
+	map->active = active;
+
+	if (active) {
 		ret = irq_set_irqchip_state(map->irq,
 					    IRQCHIP_STATE_ACTIVE,
 					    false);
@@ -1663,7 +1673,7 @@ static struct rb_root *vgic_get_irq_phys_map(struct kvm_vcpu *vcpu,
 }
 
 struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
-				       int virt_irq, int irq)
+				       int virt_irq, int irq, bool shared)
 {
 	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
 	struct rb_root *root = vgic_get_irq_phys_map(vcpu, virt_irq);
@@ -1710,6 +1720,7 @@ struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
 	new_map->virt_irq = virt_irq;
 	new_map->phys_irq = phys_irq;
 	new_map->irq = irq;
+	new_map->shared = shared;
 
 	rb_link_node(&new_map->node, parent, new);
 	rb_insert_color(&new_map->node, root);
@@ -1746,13 +1757,13 @@ static struct irq_phys_map *vgic_irq_map_search(struct kvm_vcpu *vcpu,
 
 bool vgic_get_phys_irq_active(struct irq_phys_map *map)
 {
-	BUG_ON(!map);
+	BUG_ON(!map || !map->shared);
 	return map->active;
 }
 
 void vgic_set_phys_irq_active(struct irq_phys_map *map, bool active)
 {
-	BUG_ON(!map);
+	BUG_ON(!map || !map->shared);
 	map->active = active;
 }
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [PATCH 10/10] KVM: arm/arm64: vgic: Allow non-shared device HW interrupts
@ 2015-06-08 17:04   ` Marc Zyngier
  0 siblings, 0 replies; 118+ messages in thread
From: Marc Zyngier @ 2015-06-08 17:04 UTC (permalink / raw)
  To: kvm, kvmarm, linux-arm-kernel
  Cc: Christoffer Dall, Eric Auger, Alex Bennée, Andre Przywara

So far, the only use of the HW interrupt facility is the timer,
implying that the active state is context-switched for each vcpu,
as the device is is shared across all vcpus.

This does not work for a device that has been assigned to a VM,
as the guest is entierely in control of that device (the HW is
not shared). In that case, it makes sense to bypass the whole
active state switchint, and only track the deactivation of the
interrupt.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 include/kvm/arm_vgic.h    |  5 +++--
 virt/kvm/arm/arch_timer.c |  2 +-
 virt/kvm/arm/vgic.c       | 37 ++++++++++++++++++++++++-------------
 3 files changed, 28 insertions(+), 16 deletions(-)

diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 1c653c1..5d47d60 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -164,7 +164,8 @@ struct irq_phys_map {
 	u32			virt_irq;
 	u32			phys_irq;
 	u32			irq;
-	bool			active;
+	bool			shared;
+	bool			active; /* Only valid if shared */
 };
 
 struct vgic_dist {
@@ -347,7 +348,7 @@ void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg);
 int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
 int kvm_vgic_vcpu_active_irq(struct kvm_vcpu *vcpu);
 struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
-				       int virt_irq, int irq);
+				       int virt_irq, int irq, bool shared);
 int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map);
 bool vgic_get_phys_irq_active(struct irq_phys_map *map);
 void vgic_set_phys_irq_active(struct irq_phys_map *map, bool active);
diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index b9fff78..9544d79 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -202,7 +202,7 @@ void kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
 	 * Tell the VGIC that the virtual interrupt is tied to a
 	 * physical interrupt. We do that once per VCPU.
 	 */
-	timer->map = vgic_map_phys_irq(vcpu, irq->irq, host_vtimer_irq);
+	timer->map = vgic_map_phys_irq(vcpu, irq->irq, host_vtimer_irq, true);
 	WARN_ON(!timer->map);
 }
 
diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index f376b56..4223166 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -1125,18 +1125,21 @@ static void vgic_queue_irq_to_lr(struct kvm_vcpu *vcpu, int irq,
 		map = vgic_irq_map_search(vcpu, irq);
 
 		if (map) {
-			int ret;
-
-			BUG_ON(!map->active);
 			vlr.hwirq = map->phys_irq;
 			vlr.state |= LR_HW;
 			vlr.state &= ~LR_EOI_INT;
 
-			ret = irq_set_irqchip_state(map->irq,
-						    IRQCHIP_STATE_ACTIVE,
-						    true);
 			vgic_irq_set_queued(vcpu, irq);
-			WARN_ON(ret);
+
+			if (map->shared) {
+				int ret;
+
+				BUG_ON(!map->active);
+				ret = irq_set_irqchip_state(map->irq,
+							    IRQCHIP_STATE_ACTIVE,
+							    true);
+				WARN_ON(ret);
+			}
 		}
 	}
 
@@ -1368,21 +1371,28 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
 static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr)
 {
 	struct irq_phys_map *map;
+	bool active;
 	int ret;
 
 	if (!(vlr.state & LR_HW))
 		return 0;
 
 	map = vgic_irq_map_search(vcpu, vlr.irq);
-	BUG_ON(!map || !map->active);
+	BUG_ON(!map);
+	BUG_ON(map->shared && !map->active);
 
 	ret = irq_get_irqchip_state(map->irq,
 				    IRQCHIP_STATE_ACTIVE,
-				    &map->active);
+				    &active);
 
 	WARN_ON(ret);
 
-	if (map->active) {
+	if (!map->shared)
+		return !active;
+
+	map->active = active;
+
+	if (active) {
 		ret = irq_set_irqchip_state(map->irq,
 					    IRQCHIP_STATE_ACTIVE,
 					    false);
@@ -1663,7 +1673,7 @@ static struct rb_root *vgic_get_irq_phys_map(struct kvm_vcpu *vcpu,
 }
 
 struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
-				       int virt_irq, int irq)
+				       int virt_irq, int irq, bool shared)
 {
 	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
 	struct rb_root *root = vgic_get_irq_phys_map(vcpu, virt_irq);
@@ -1710,6 +1720,7 @@ struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
 	new_map->virt_irq = virt_irq;
 	new_map->phys_irq = phys_irq;
 	new_map->irq = irq;
+	new_map->shared = shared;
 
 	rb_link_node(&new_map->node, parent, new);
 	rb_insert_color(&new_map->node, root);
@@ -1746,13 +1757,13 @@ static struct irq_phys_map *vgic_irq_map_search(struct kvm_vcpu *vcpu,
 
 bool vgic_get_phys_irq_active(struct irq_phys_map *map)
 {
-	BUG_ON(!map);
+	BUG_ON(!map || !map->shared);
 	return map->active;
 }
 
 void vgic_set_phys_irq_active(struct irq_phys_map *map, bool active)
 {
-	BUG_ON(!map);
+	BUG_ON(!map || !map->shared);
 	map->active = active;
 }
 
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [PATCH 01/10] arm/arm64: KVM: Fix ordering of timer/GIC on guest entry
  2015-06-08 17:03   ` Marc Zyngier
@ 2015-06-09 11:29     ` Alex Bennée
  -1 siblings, 0 replies; 118+ messages in thread
From: Alex Bennée @ 2015-06-09 11:29 UTC (permalink / raw)
  To: linux-arm-kernel


Marc Zyngier <marc.zyngier@arm.com> writes:

> As we now inject the timer interrupt when we're about to enter
> the guest, it makes a lot more sense to make sure this happens
> before the vgic code queues the pending interrupts.
>
> Otherwise, we get the interrupt on the following exit, which is
> not great for latency (and leads to all kind of bizarre issues
> when using with active interrupts at the HW level).
>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm/kvm/arm.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index d9631ec..46db690 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -529,9 +529,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		if (vcpu->arch.pause)
>  			vcpu_pause(vcpu);
>  
> -		kvm_vgic_flush_hwstate(vcpu);
>  		kvm_timer_flush_hwstate(vcpu);
>  
> +		kvm_vgic_flush_hwstate(vcpu);
> +
>  		local_irq_disable();
>  
>  		/*
> @@ -544,8 +545,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  
>  		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
>  			local_irq_enable();
> -			kvm_timer_sync_hwstate(vcpu);
>  			kvm_vgic_sync_hwstate(vcpu);
> +			kvm_timer_sync_hwstate(vcpu);
>  			continue;
>  		}
>  
> @@ -577,9 +578,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		 * Back from guest
>  		 *************************************************************/
>  
> -		kvm_timer_sync_hwstate(vcpu);
>  		kvm_vgic_sync_hwstate(vcpu);
>  
> +		kvm_timer_sync_hwstate(vcpu);
> +
>  		ret = handle_exit(vcpu, run, ret);
>  	}

Reviewed-by: Alex Benn?e <alex.bennee@linaro.org>

-- 
Alex Benn?e

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 01/10] arm/arm64: KVM: Fix ordering of timer/GIC on guest entry
@ 2015-06-09 11:29     ` Alex Bennée
  0 siblings, 0 replies; 118+ messages in thread
From: Alex Bennée @ 2015-06-09 11:29 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvm, kvmarm, linux-arm-kernel, Christoffer Dall, Eric Auger,
	Andre Przywara


Marc Zyngier <marc.zyngier@arm.com> writes:

> As we now inject the timer interrupt when we're about to enter
> the guest, it makes a lot more sense to make sure this happens
> before the vgic code queues the pending interrupts.
>
> Otherwise, we get the interrupt on the following exit, which is
> not great for latency (and leads to all kind of bizarre issues
> when using with active interrupts at the HW level).
>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm/kvm/arm.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index d9631ec..46db690 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -529,9 +529,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		if (vcpu->arch.pause)
>  			vcpu_pause(vcpu);
>  
> -		kvm_vgic_flush_hwstate(vcpu);
>  		kvm_timer_flush_hwstate(vcpu);
>  
> +		kvm_vgic_flush_hwstate(vcpu);
> +
>  		local_irq_disable();
>  
>  		/*
> @@ -544,8 +545,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  
>  		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
>  			local_irq_enable();
> -			kvm_timer_sync_hwstate(vcpu);
>  			kvm_vgic_sync_hwstate(vcpu);
> +			kvm_timer_sync_hwstate(vcpu);
>  			continue;
>  		}
>  
> @@ -577,9 +578,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		 * Back from guest
>  		 *************************************************************/
>  
> -		kvm_timer_sync_hwstate(vcpu);
>  		kvm_vgic_sync_hwstate(vcpu);
>  
> +		kvm_timer_sync_hwstate(vcpu);
> +
>  		ret = handle_exit(vcpu, run, ret);
>  	}

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [PATCH 02/10] arm/arm64: KVM: Move vgic handling to a non-preemptible section
  2015-06-08 17:03   ` Marc Zyngier
@ 2015-06-09 11:38     ` Alex Bennée
  -1 siblings, 0 replies; 118+ messages in thread
From: Alex Bennée @ 2015-06-09 11:38 UTC (permalink / raw)
  To: linux-arm-kernel


Marc Zyngier <marc.zyngier@arm.com> writes:

> As we're about to introduce some serious GIC-poking to the vgic code,
> it is important to make sure that we're going to poke the part of
> the GIC that belongs to the CPU we're about to run on (otherwise,
> we'd end up with some unexpected interrupts firing)...
>
> Introducing a non-preemptible section in kvm_arch_vcpu_ioctl_run
> prevents the problem from occuring.
>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

Reviewed-by: Alex Benn?e <alex.bennee@linaro.org>

> ---
>  arch/arm/kvm/arm.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
>
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 46db690..4986300 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -529,8 +529,18 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		if (vcpu->arch.pause)
>  			vcpu_pause(vcpu);
>  
> +		/*
> +		 * Disarming the timer must be done with in a
> +		 * preemptible context, as this call may sleep.
> +		 */
>  		kvm_timer_flush_hwstate(vcpu);
>  
> +		/*
> +		 * Preparing the interrupts to be injected also
> +		 * involves poking the GIC, which must be done in a
> +		 * non-preemptible context.
> +		 */
> +		preempt_disable();
>  		kvm_vgic_flush_hwstate(vcpu);
>  
>  		local_irq_disable();
> @@ -546,6 +556,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
>  			local_irq_enable();
>  			kvm_vgic_sync_hwstate(vcpu);
> +			preempt_enable();
>  			kvm_timer_sync_hwstate(vcpu);
>  			continue;
>  		}
> @@ -580,6 +591,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  
>  		kvm_vgic_sync_hwstate(vcpu);
>  
> +		preempt_enable();
> +
>  		kvm_timer_sync_hwstate(vcpu);
>  
>  		ret = handle_exit(vcpu, run, ret);

-- 
Alex Benn?e

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 02/10] arm/arm64: KVM: Move vgic handling to a non-preemptible section
@ 2015-06-09 11:38     ` Alex Bennée
  0 siblings, 0 replies; 118+ messages in thread
From: Alex Bennée @ 2015-06-09 11:38 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvm, kvmarm, linux-arm-kernel, Christoffer Dall, Eric Auger,
	Andre Przywara


Marc Zyngier <marc.zyngier@arm.com> writes:

> As we're about to introduce some serious GIC-poking to the vgic code,
> it is important to make sure that we're going to poke the part of
> the GIC that belongs to the CPU we're about to run on (otherwise,
> we'd end up with some unexpected interrupts firing)...
>
> Introducing a non-preemptible section in kvm_arch_vcpu_ioctl_run
> prevents the problem from occuring.
>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  arch/arm/kvm/arm.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
>
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 46db690..4986300 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -529,8 +529,18 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		if (vcpu->arch.pause)
>  			vcpu_pause(vcpu);
>  
> +		/*
> +		 * Disarming the timer must be done with in a
> +		 * preemptible context, as this call may sleep.
> +		 */
>  		kvm_timer_flush_hwstate(vcpu);
>  
> +		/*
> +		 * Preparing the interrupts to be injected also
> +		 * involves poking the GIC, which must be done in a
> +		 * non-preemptible context.
> +		 */
> +		preempt_disable();
>  		kvm_vgic_flush_hwstate(vcpu);
>  
>  		local_irq_disable();
> @@ -546,6 +556,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
>  			local_irq_enable();
>  			kvm_vgic_sync_hwstate(vcpu);
> +			preempt_enable();
>  			kvm_timer_sync_hwstate(vcpu);
>  			continue;
>  		}
> @@ -580,6 +591,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  
>  		kvm_vgic_sync_hwstate(vcpu);
>  
> +		preempt_enable();
> +
>  		kvm_timer_sync_hwstate(vcpu);
>  
>  		ret = handle_exit(vcpu, run, ret);

-- 
Alex Bennée

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [PATCH 03/10] KVM: arm/arm64: vgic: Convert struct vgic_lr to use bitfields
  2015-06-08 17:03   ` Marc Zyngier
@ 2015-06-09 13:12     ` Alex Bennée
  -1 siblings, 0 replies; 118+ messages in thread
From: Alex Bennée @ 2015-06-09 13:12 UTC (permalink / raw)
  To: linux-arm-kernel


Marc Zyngier <marc.zyngier@arm.com> writes:

> As we're about to cram more information in the vgic_lr structure
> (HW interrupt number and additional state information), we switch
> to a layout similar to the HW's:
>
> - use bitfields to save space (we don't need more than 10 bits
>   to represent the irq numbers)
> - source CPU and HW interrupt can share the same field, as
>   a SGI doesn't have a physical line.
>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Alex Benn?e <alex.bennee@linaro.org>

> ---
>  include/kvm/arm_vgic.h | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> index 133ea00..4f9fa1d 100644
> --- a/include/kvm/arm_vgic.h
> +++ b/include/kvm/arm_vgic.h
> @@ -95,11 +95,15 @@ enum vgic_type {
>  #define LR_STATE_ACTIVE		(1 << 1)
>  #define LR_STATE_MASK		(3 << 0)
>  #define LR_EOI_INT		(1 << 2)
> +#define LR_HW			(1 << 3)
>  
>  struct vgic_lr {
> -	u16	irq;
> -	u8	source;
> -	u8	state;
> +	unsigned irq:10;
> +	union {
> +		unsigned hwirq:10;
> +		unsigned source:8;
> +	};
> +	unsigned state:4;
>  };
>  
>  struct vgic_vmcr {

-- 
Alex Benn?e

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 03/10] KVM: arm/arm64: vgic: Convert struct vgic_lr to use bitfields
@ 2015-06-09 13:12     ` Alex Bennée
  0 siblings, 0 replies; 118+ messages in thread
From: Alex Bennée @ 2015-06-09 13:12 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvm, Andre Przywara, kvmarm, linux-arm-kernel


Marc Zyngier <marc.zyngier@arm.com> writes:

> As we're about to cram more information in the vgic_lr structure
> (HW interrupt number and additional state information), we switch
> to a layout similar to the HW's:
>
> - use bitfields to save space (we don't need more than 10 bits
>   to represent the irq numbers)
> - source CPU and HW interrupt can share the same field, as
>   a SGI doesn't have a physical line.
>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  include/kvm/arm_vgic.h | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> index 133ea00..4f9fa1d 100644
> --- a/include/kvm/arm_vgic.h
> +++ b/include/kvm/arm_vgic.h
> @@ -95,11 +95,15 @@ enum vgic_type {
>  #define LR_STATE_ACTIVE		(1 << 1)
>  #define LR_STATE_MASK		(3 << 0)
>  #define LR_EOI_INT		(1 << 2)
> +#define LR_HW			(1 << 3)
>  
>  struct vgic_lr {
> -	u16	irq;
> -	u8	source;
> -	u8	state;
> +	unsigned irq:10;
> +	union {
> +		unsigned hwirq:10;
> +		unsigned source:8;
> +	};
> +	unsigned state:4;
>  };
>  
>  struct vgic_vmcr {

-- 
Alex Bennée
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [PATCH 04/10] KVM: arm/arm64: vgic: Allow HW irq to be encoded in LR
  2015-06-08 17:03   ` Marc Zyngier
@ 2015-06-09 13:21     ` Alex Bennée
  -1 siblings, 0 replies; 118+ messages in thread
From: Alex Bennée @ 2015-06-09 13:21 UTC (permalink / raw)
  To: linux-arm-kernel


Marc Zyngier <marc.zyngier@arm.com> writes:

> Now that struct vgic_lr supports the LR_HW bit and carries a hwirq
> field, we can encode that information into the list registers.
>
> This patch provides implementations for both GICv2 and GICv3.
>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  include/linux/irqchip/arm-gic-v3.h |  3 +++
>  include/linux/irqchip/arm-gic.h    |  3 ++-
>  virt/kvm/arm/vgic-v2.c             | 16 +++++++++++++++-
>  virt/kvm/arm/vgic-v3.c             | 21 ++++++++++++++++++---
>  4 files changed, 38 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/irqchip/arm-gic-v3.h b/include/linux/irqchip/arm-gic-v3.h
> index ffbc034..cf637d6 100644
<snip>
> @@ -84,10 +88,17 @@ static void vgic_v3_set_lr(struct kvm_vcpu *vcpu, int lr,
>  	 * Eventually we want to make this configurable, so we may revisit
>  	 * this in the future.
>  	 */
> -	if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3)
> +	switch (vcpu->kvm->arch.vgic.vgic_model) {
> +	case KVM_DEV_TYPE_ARM_VGIC_V3:
>  		lr_val |= ICH_LR_GROUP;
> -	else
> -		lr_val |= (u32)lr_desc.source << GICH_LR_PHYSID_CPUID_SHIFT;
> +		break;
> +	case  KVM_DEV_TYPE_ARM_VGIC_V2:
> +		if (lr_desc.irq < VGIC_NR_SGIS)
> +			lr_val |= (u32)lr_desc.source << GICH_LR_PHYSID_CPUID_SHIFT;
> +		break;
> +	default:
> +		BUG();
> +	}
>  
>  	if (lr_desc.state & LR_STATE_PENDING)
>  		lr_val |= ICH_LR_PENDING_BIT;
> @@ -95,6 +106,10 @@ static void vgic_v3_set_lr(struct kvm_vcpu *vcpu, int lr,
>  		lr_val |= ICH_LR_ACTIVE_BIT;
>  	if (lr_desc.state & LR_EOI_INT)
>  		lr_val |= ICH_LR_EOI;
> +	if (lr_desc.state & LR_HW) {
> +		lr_val |= ICH_LR_HW;
> +		lr_val |= ((u64)lr_desc.hwirq) << ICH_LR_PHYS_ID_SHIFT;
> +	}
>

Why is the bracketing different for the casting of lr_desc.hwirq
compared to lr_desc.source. Surely the precedence of up-casting before
the shift is the same in both cases?

>  	vcpu->arch.vgic_cpu.vgic_v3.vgic_lr[LR_INDEX(lr)] = lr_val;
>  }

-- 
Alex Benn?e

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 04/10] KVM: arm/arm64: vgic: Allow HW irq to be encoded in LR
@ 2015-06-09 13:21     ` Alex Bennée
  0 siblings, 0 replies; 118+ messages in thread
From: Alex Bennée @ 2015-06-09 13:21 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvm, kvmarm, linux-arm-kernel, Christoffer Dall, Eric Auger,
	Andre Przywara


Marc Zyngier <marc.zyngier@arm.com> writes:

> Now that struct vgic_lr supports the LR_HW bit and carries a hwirq
> field, we can encode that information into the list registers.
>
> This patch provides implementations for both GICv2 and GICv3.
>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  include/linux/irqchip/arm-gic-v3.h |  3 +++
>  include/linux/irqchip/arm-gic.h    |  3 ++-
>  virt/kvm/arm/vgic-v2.c             | 16 +++++++++++++++-
>  virt/kvm/arm/vgic-v3.c             | 21 ++++++++++++++++++---
>  4 files changed, 38 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/irqchip/arm-gic-v3.h b/include/linux/irqchip/arm-gic-v3.h
> index ffbc034..cf637d6 100644
<snip>
> @@ -84,10 +88,17 @@ static void vgic_v3_set_lr(struct kvm_vcpu *vcpu, int lr,
>  	 * Eventually we want to make this configurable, so we may revisit
>  	 * this in the future.
>  	 */
> -	if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3)
> +	switch (vcpu->kvm->arch.vgic.vgic_model) {
> +	case KVM_DEV_TYPE_ARM_VGIC_V3:
>  		lr_val |= ICH_LR_GROUP;
> -	else
> -		lr_val |= (u32)lr_desc.source << GICH_LR_PHYSID_CPUID_SHIFT;
> +		break;
> +	case  KVM_DEV_TYPE_ARM_VGIC_V2:
> +		if (lr_desc.irq < VGIC_NR_SGIS)
> +			lr_val |= (u32)lr_desc.source << GICH_LR_PHYSID_CPUID_SHIFT;
> +		break;
> +	default:
> +		BUG();
> +	}
>  
>  	if (lr_desc.state & LR_STATE_PENDING)
>  		lr_val |= ICH_LR_PENDING_BIT;
> @@ -95,6 +106,10 @@ static void vgic_v3_set_lr(struct kvm_vcpu *vcpu, int lr,
>  		lr_val |= ICH_LR_ACTIVE_BIT;
>  	if (lr_desc.state & LR_EOI_INT)
>  		lr_val |= ICH_LR_EOI;
> +	if (lr_desc.state & LR_HW) {
> +		lr_val |= ICH_LR_HW;
> +		lr_val |= ((u64)lr_desc.hwirq) << ICH_LR_PHYS_ID_SHIFT;
> +	}
>

Why is the bracketing different for the casting of lr_desc.hwirq
compared to lr_desc.source. Surely the precedence of up-casting before
the shift is the same in both cases?

>  	vcpu->arch.vgic_cpu.vgic_v3.vgic_lr[LR_INDEX(lr)] = lr_val;
>  }

-- 
Alex Bennée

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [PATCH 04/10] KVM: arm/arm64: vgic: Allow HW irq to be encoded in LR
  2015-06-09 13:21     ` Alex Bennée
@ 2015-06-09 14:03       ` Marc Zyngier
  -1 siblings, 0 replies; 118+ messages in thread
From: Marc Zyngier @ 2015-06-09 14:03 UTC (permalink / raw)
  To: linux-arm-kernel

On 09/06/15 14:21, Alex Benn?e wrote:
> 
> Marc Zyngier <marc.zyngier@arm.com> writes:
> 
>> Now that struct vgic_lr supports the LR_HW bit and carries a hwirq
>> field, we can encode that information into the list registers.
>>
>> This patch provides implementations for both GICv2 and GICv3.
>>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  include/linux/irqchip/arm-gic-v3.h |  3 +++
>>  include/linux/irqchip/arm-gic.h    |  3 ++-
>>  virt/kvm/arm/vgic-v2.c             | 16 +++++++++++++++-
>>  virt/kvm/arm/vgic-v3.c             | 21 ++++++++++++++++++---
>>  4 files changed, 38 insertions(+), 5 deletions(-)
>>
>> diff --git a/include/linux/irqchip/arm-gic-v3.h b/include/linux/irqchip/arm-gic-v3.h
>> index ffbc034..cf637d6 100644
> <snip>
>> @@ -84,10 +88,17 @@ static void vgic_v3_set_lr(struct kvm_vcpu *vcpu, int lr,
>>  	 * Eventually we want to make this configurable, so we may revisit
>>  	 * this in the future.
>>  	 */
>> -	if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3)
>> +	switch (vcpu->kvm->arch.vgic.vgic_model) {
>> +	case KVM_DEV_TYPE_ARM_VGIC_V3:
>>  		lr_val |= ICH_LR_GROUP;
>> -	else
>> -		lr_val |= (u32)lr_desc.source << GICH_LR_PHYSID_CPUID_SHIFT;
>> +		break;
>> +	case  KVM_DEV_TYPE_ARM_VGIC_V2:
>> +		if (lr_desc.irq < VGIC_NR_SGIS)
>> +			lr_val |= (u32)lr_desc.source << GICH_LR_PHYSID_CPUID_SHIFT;
>> +		break;
>> +	default:
>> +		BUG();
>> +	}
>>  
>>  	if (lr_desc.state & LR_STATE_PENDING)
>>  		lr_val |= ICH_LR_PENDING_BIT;
>> @@ -95,6 +106,10 @@ static void vgic_v3_set_lr(struct kvm_vcpu *vcpu, int lr,
>>  		lr_val |= ICH_LR_ACTIVE_BIT;
>>  	if (lr_desc.state & LR_EOI_INT)
>>  		lr_val |= ICH_LR_EOI;
>> +	if (lr_desc.state & LR_HW) {
>> +		lr_val |= ICH_LR_HW;
>> +		lr_val |= ((u64)lr_desc.hwirq) << ICH_LR_PHYS_ID_SHIFT;
>> +	}
>>
> 
> Why is the bracketing different for the casting of lr_desc.hwirq
> compared to lr_desc.source. Surely the precedence of up-casting before
> the shift is the same in both cases?

Probably a leftover from a previous refactor...

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 04/10] KVM: arm/arm64: vgic: Allow HW irq to be encoded in LR
@ 2015-06-09 14:03       ` Marc Zyngier
  0 siblings, 0 replies; 118+ messages in thread
From: Marc Zyngier @ 2015-06-09 14:03 UTC (permalink / raw)
  To: Alex Bennée
  Cc: kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org, Christoffer Dall,
	Eric Auger, Andre Przywara

On 09/06/15 14:21, Alex Bennée wrote:
> 
> Marc Zyngier <marc.zyngier@arm.com> writes:
> 
>> Now that struct vgic_lr supports the LR_HW bit and carries a hwirq
>> field, we can encode that information into the list registers.
>>
>> This patch provides implementations for both GICv2 and GICv3.
>>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  include/linux/irqchip/arm-gic-v3.h |  3 +++
>>  include/linux/irqchip/arm-gic.h    |  3 ++-
>>  virt/kvm/arm/vgic-v2.c             | 16 +++++++++++++++-
>>  virt/kvm/arm/vgic-v3.c             | 21 ++++++++++++++++++---
>>  4 files changed, 38 insertions(+), 5 deletions(-)
>>
>> diff --git a/include/linux/irqchip/arm-gic-v3.h b/include/linux/irqchip/arm-gic-v3.h
>> index ffbc034..cf637d6 100644
> <snip>
>> @@ -84,10 +88,17 @@ static void vgic_v3_set_lr(struct kvm_vcpu *vcpu, int lr,
>>  	 * Eventually we want to make this configurable, so we may revisit
>>  	 * this in the future.
>>  	 */
>> -	if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3)
>> +	switch (vcpu->kvm->arch.vgic.vgic_model) {
>> +	case KVM_DEV_TYPE_ARM_VGIC_V3:
>>  		lr_val |= ICH_LR_GROUP;
>> -	else
>> -		lr_val |= (u32)lr_desc.source << GICH_LR_PHYSID_CPUID_SHIFT;
>> +		break;
>> +	case  KVM_DEV_TYPE_ARM_VGIC_V2:
>> +		if (lr_desc.irq < VGIC_NR_SGIS)
>> +			lr_val |= (u32)lr_desc.source << GICH_LR_PHYSID_CPUID_SHIFT;
>> +		break;
>> +	default:
>> +		BUG();
>> +	}
>>  
>>  	if (lr_desc.state & LR_STATE_PENDING)
>>  		lr_val |= ICH_LR_PENDING_BIT;
>> @@ -95,6 +106,10 @@ static void vgic_v3_set_lr(struct kvm_vcpu *vcpu, int lr,
>>  		lr_val |= ICH_LR_ACTIVE_BIT;
>>  	if (lr_desc.state & LR_EOI_INT)
>>  		lr_val |= ICH_LR_EOI;
>> +	if (lr_desc.state & LR_HW) {
>> +		lr_val |= ICH_LR_HW;
>> +		lr_val |= ((u64)lr_desc.hwirq) << ICH_LR_PHYS_ID_SHIFT;
>> +	}
>>
> 
> Why is the bracketing different for the casting of lr_desc.hwirq
> compared to lr_desc.source. Surely the precedence of up-casting before
> the shift is the same in both cases?

Probably a leftover from a previous refactor...

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [PATCH 00/10] arm/arm64: KVM: Active interrupt state switching for shared devices
  2015-06-08 17:03 ` Marc Zyngier
@ 2015-06-10  8:33   ` Eric Auger
  -1 siblings, 0 replies; 118+ messages in thread
From: Eric Auger @ 2015-06-10  8:33 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Marc,
On 06/08/2015 07:03 PM, Marc Zyngier wrote:
> From day 1, our timer code has been using a terrible hack: whenever
> the guest is scheduled with a timer interrupt pending (i.e. the HW
> timer has expired), we restore the timer state with the MASK bit set,
> in order to avoid the physical interrupt to fire again. And again. And
> again...
> 
> This is absolutely silly, for at least two reasons:
> 
> - This relies on the device (the timer) having a mask bit that we can
>   play with. Not all devices are built like this.
> 
> - This expects some behaviour of the guest that only works because the
>   both the kernel timer code and the KVM counterpart have been written
>   by the same idiot (the idiot being me).
> 
> The One True Way is to set the GIC active bit when injecting the
> interrupt, and to context-switch across the world switch. This is what
> this series implements.
> 
> We introduce a relatively simple infrastructure enabling the mapping
> of a virtual interrupt with its physical counterpart:
> 
> - Whenever an virtual interrupt is injected, we look it up in an
>   rbtree. If we have a match, the interrupt is injected with the HW
>   bit set in the LR, together with the physical interrupt.
> 
> - Across the world switch, we save/restore the active state for these
>   interrupts using the irqchip_state API.
> 
> - On guest EOI, the HW interrupt is automagically deactivated by the
>   GIC, allowing the interrupt to be resampled.

I am lost about the status of the irqchip part, allowing EOImode=1 and
only dropping the prio for physical forwarded IRQs:
http://lkml.iu.edu/hypermail/linux/kernel/1410.3/00913.html
Doesn't this series also depend on those patches or did I miss something
on the ML?

Best Regards

Eric

> 
> The timer code is slightly modified to set the active state at the
> same time as the injection.
> 
> The last patch also allows non-shared devices to have their interrupt
> deactivated the same way (in this case we do not context-switch the
> active state). This is the first step in the long overdue direction of
> the mythical IRQ forwarding thing...
> 
> This series is based on v4.1-rc7, and has been tested on Juno (GICv2)
> and the FVP Base model (GICv3 host, both GICv2 and GICv3 guests). I'd
> appreciate any form of testing, specially in the context of guest
> migration (there is obviously some interesting stuff there...).



> 
> The code is otherwise available at
> git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git kvm-arm64/active-timer
> 
> Marc Zyngier (10):
>   arm/arm64: KVM: Fix ordering of timer/GIC on guest entry
>   arm/arm64: KVM: Move vgic handling to a non-preemptible section
>   KVM: arm/arm64: vgic: Convert struct vgic_lr to use bitfields
>   KVM: arm/arm64: vgic: Allow HW irq to be encoded in LR
>   KVM: arm/arm64: vgic: Relax vgic_can_sample_irq for edge IRQs
>   KVM: arm/arm64: vgic: Allow dynamic mapping of physical/virtual
>     interrupts
>   KVM: arm/arm64: vgic: Allow HW interrupts to be queued to a guest
>   KVM: arm/arm64: vgic: Add vgic_{get,set}_phys_irq_active
>   KVM: arm/arm64: timer: Allow the timer to control the active state
>   KVM: arm/arm64: vgic: Allow non-shared device HW interrupts
> 
>  arch/arm/kvm/arm.c                 |  21 +++-
>  include/kvm/arm_arch_timer.h       |   3 +
>  include/kvm/arm_vgic.h             |  31 +++++-
>  include/linux/irqchip/arm-gic-v3.h |   3 +
>  include/linux/irqchip/arm-gic.h    |   3 +-
>  virt/kvm/arm/arch_timer.c          |  13 ++-
>  virt/kvm/arm/vgic-v2.c             |  16 ++-
>  virt/kvm/arm/vgic-v3.c             |  21 +++-
>  virt/kvm/arm/vgic.c                | 206 ++++++++++++++++++++++++++++++++++++-
>  9 files changed, 300 insertions(+), 17 deletions(-)
> 

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 00/10] arm/arm64: KVM: Active interrupt state switching for shared devices
@ 2015-06-10  8:33   ` Eric Auger
  0 siblings, 0 replies; 118+ messages in thread
From: Eric Auger @ 2015-06-10  8:33 UTC (permalink / raw)
  To: Marc Zyngier, kvm, kvmarm, linux-arm-kernel
  Cc: Christoffer Dall, Alex Bennée, Andre Przywara

Hi Marc,
On 06/08/2015 07:03 PM, Marc Zyngier wrote:
> From day 1, our timer code has been using a terrible hack: whenever
> the guest is scheduled with a timer interrupt pending (i.e. the HW
> timer has expired), we restore the timer state with the MASK bit set,
> in order to avoid the physical interrupt to fire again. And again. And
> again...
> 
> This is absolutely silly, for at least two reasons:
> 
> - This relies on the device (the timer) having a mask bit that we can
>   play with. Not all devices are built like this.
> 
> - This expects some behaviour of the guest that only works because the
>   both the kernel timer code and the KVM counterpart have been written
>   by the same idiot (the idiot being me).
> 
> The One True Way is to set the GIC active bit when injecting the
> interrupt, and to context-switch across the world switch. This is what
> this series implements.
> 
> We introduce a relatively simple infrastructure enabling the mapping
> of a virtual interrupt with its physical counterpart:
> 
> - Whenever an virtual interrupt is injected, we look it up in an
>   rbtree. If we have a match, the interrupt is injected with the HW
>   bit set in the LR, together with the physical interrupt.
> 
> - Across the world switch, we save/restore the active state for these
>   interrupts using the irqchip_state API.
> 
> - On guest EOI, the HW interrupt is automagically deactivated by the
>   GIC, allowing the interrupt to be resampled.

I am lost about the status of the irqchip part, allowing EOImode=1 and
only dropping the prio for physical forwarded IRQs:
http://lkml.iu.edu/hypermail/linux/kernel/1410.3/00913.html
Doesn't this series also depend on those patches or did I miss something
on the ML?

Best Regards

Eric

> 
> The timer code is slightly modified to set the active state at the
> same time as the injection.
> 
> The last patch also allows non-shared devices to have their interrupt
> deactivated the same way (in this case we do not context-switch the
> active state). This is the first step in the long overdue direction of
> the mythical IRQ forwarding thing...
> 
> This series is based on v4.1-rc7, and has been tested on Juno (GICv2)
> and the FVP Base model (GICv3 host, both GICv2 and GICv3 guests). I'd
> appreciate any form of testing, specially in the context of guest
> migration (there is obviously some interesting stuff there...).



> 
> The code is otherwise available at
> git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git kvm-arm64/active-timer
> 
> Marc Zyngier (10):
>   arm/arm64: KVM: Fix ordering of timer/GIC on guest entry
>   arm/arm64: KVM: Move vgic handling to a non-preemptible section
>   KVM: arm/arm64: vgic: Convert struct vgic_lr to use bitfields
>   KVM: arm/arm64: vgic: Allow HW irq to be encoded in LR
>   KVM: arm/arm64: vgic: Relax vgic_can_sample_irq for edge IRQs
>   KVM: arm/arm64: vgic: Allow dynamic mapping of physical/virtual
>     interrupts
>   KVM: arm/arm64: vgic: Allow HW interrupts to be queued to a guest
>   KVM: arm/arm64: vgic: Add vgic_{get,set}_phys_irq_active
>   KVM: arm/arm64: timer: Allow the timer to control the active state
>   KVM: arm/arm64: vgic: Allow non-shared device HW interrupts
> 
>  arch/arm/kvm/arm.c                 |  21 +++-
>  include/kvm/arm_arch_timer.h       |   3 +
>  include/kvm/arm_vgic.h             |  31 +++++-
>  include/linux/irqchip/arm-gic-v3.h |   3 +
>  include/linux/irqchip/arm-gic.h    |   3 +-
>  virt/kvm/arm/arch_timer.c          |  13 ++-
>  virt/kvm/arm/vgic-v2.c             |  16 ++-
>  virt/kvm/arm/vgic-v3.c             |  21 +++-
>  virt/kvm/arm/vgic.c                | 206 ++++++++++++++++++++++++++++++++++++-
>  9 files changed, 300 insertions(+), 17 deletions(-)
> 


^ permalink raw reply	[flat|nested] 118+ messages in thread

* [PATCH 00/10] arm/arm64: KVM: Active interrupt state switching for shared devices
  2015-06-10  8:33   ` Eric Auger
@ 2015-06-10  9:03     ` Marc Zyngier
  -1 siblings, 0 replies; 118+ messages in thread
From: Marc Zyngier @ 2015-06-10  9:03 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Eric,

On 10/06/15 09:33, Eric Auger wrote:
> Hi Marc,
> On 06/08/2015 07:03 PM, Marc Zyngier wrote:
>> From day 1, our timer code has been using a terrible hack: whenever
>> the guest is scheduled with a timer interrupt pending (i.e. the HW
>> timer has expired), we restore the timer state with the MASK bit set,
>> in order to avoid the physical interrupt to fire again. And again. And
>> again...
>>
>> This is absolutely silly, for at least two reasons:
>>
>> - This relies on the device (the timer) having a mask bit that we can
>>   play with. Not all devices are built like this.
>>
>> - This expects some behaviour of the guest that only works because the
>>   both the kernel timer code and the KVM counterpart have been written
>>   by the same idiot (the idiot being me).
>>
>> The One True Way is to set the GIC active bit when injecting the
>> interrupt, and to context-switch across the world switch. This is what
>> this series implements.
>>
>> We introduce a relatively simple infrastructure enabling the mapping
>> of a virtual interrupt with its physical counterpart:
>>
>> - Whenever an virtual interrupt is injected, we look it up in an
>>   rbtree. If we have a match, the interrupt is injected with the HW
>>   bit set in the LR, together with the physical interrupt.
>>
>> - Across the world switch, we save/restore the active state for these
>>   interrupts using the irqchip_state API.
>>
>> - On guest EOI, the HW interrupt is automagically deactivated by the
>>   GIC, allowing the interrupt to be resampled.
> 
> I am lost about the status of the irqchip part, allowing EOImode=1 and
> only dropping the prio for physical forwarded IRQs:
> http://lkml.iu.edu/hypermail/linux/kernel/1410.3/00913.html
> Doesn't this series also depend on those patches or did I miss something
> on the ML?

No, these patches are self-contained. As long as we only deal with
shared devices, we don't need EOImode=1, as as we save/restore the
active state, and the only irqchip change required (the state accessors)
went in with the 4.1 merge window.

The EOImode=1 stuff is still on the cards, and this series contains the
basic infrastructure for that (the last patch in the series is there
only for that purpose).

Hope this helps,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 00/10] arm/arm64: KVM: Active interrupt state switching for shared devices
@ 2015-06-10  9:03     ` Marc Zyngier
  0 siblings, 0 replies; 118+ messages in thread
From: Marc Zyngier @ 2015-06-10  9:03 UTC (permalink / raw)
  To: Eric Auger, kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org
  Cc: Christoffer Dall, Alex Bennée, Andre Przywara

Hi Eric,

On 10/06/15 09:33, Eric Auger wrote:
> Hi Marc,
> On 06/08/2015 07:03 PM, Marc Zyngier wrote:
>> From day 1, our timer code has been using a terrible hack: whenever
>> the guest is scheduled with a timer interrupt pending (i.e. the HW
>> timer has expired), we restore the timer state with the MASK bit set,
>> in order to avoid the physical interrupt to fire again. And again. And
>> again...
>>
>> This is absolutely silly, for at least two reasons:
>>
>> - This relies on the device (the timer) having a mask bit that we can
>>   play with. Not all devices are built like this.
>>
>> - This expects some behaviour of the guest that only works because the
>>   both the kernel timer code and the KVM counterpart have been written
>>   by the same idiot (the idiot being me).
>>
>> The One True Way is to set the GIC active bit when injecting the
>> interrupt, and to context-switch across the world switch. This is what
>> this series implements.
>>
>> We introduce a relatively simple infrastructure enabling the mapping
>> of a virtual interrupt with its physical counterpart:
>>
>> - Whenever an virtual interrupt is injected, we look it up in an
>>   rbtree. If we have a match, the interrupt is injected with the HW
>>   bit set in the LR, together with the physical interrupt.
>>
>> - Across the world switch, we save/restore the active state for these
>>   interrupts using the irqchip_state API.
>>
>> - On guest EOI, the HW interrupt is automagically deactivated by the
>>   GIC, allowing the interrupt to be resampled.
> 
> I am lost about the status of the irqchip part, allowing EOImode=1 and
> only dropping the prio for physical forwarded IRQs:
> http://lkml.iu.edu/hypermail/linux/kernel/1410.3/00913.html
> Doesn't this series also depend on those patches or did I miss something
> on the ML?

No, these patches are self-contained. As long as we only deal with
shared devices, we don't need EOImode=1, as as we save/restore the
active state, and the only irqchip change required (the state accessors)
went in with the 4.1 merge window.

The EOImode=1 stuff is still on the cards, and this series contains the
basic infrastructure for that (the last patch in the series is there
only for that purpose).

Hope this helps,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [PATCH 00/10] arm/arm64: KVM: Active interrupt state switching for shared devices
  2015-06-10  9:03     ` Marc Zyngier
@ 2015-06-10 11:13       ` Eric Auger
  -1 siblings, 0 replies; 118+ messages in thread
From: Eric Auger @ 2015-06-10 11:13 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Marc,
On 06/10/2015 11:03 AM, Marc Zyngier wrote:
> Hi Eric,
> 
> On 10/06/15 09:33, Eric Auger wrote:
>> Hi Marc,
>> On 06/08/2015 07:03 PM, Marc Zyngier wrote:
>>> From day 1, our timer code has been using a terrible hack: whenever
>>> the guest is scheduled with a timer interrupt pending (i.e. the HW
>>> timer has expired), we restore the timer state with the MASK bit set,
>>> in order to avoid the physical interrupt to fire again. And again. And
>>> again...
>>>
>>> This is absolutely silly, for at least two reasons:
>>>
>>> - This relies on the device (the timer) having a mask bit that we can
>>>   play with. Not all devices are built like this.
>>>
>>> - This expects some behaviour of the guest that only works because the
>>>   both the kernel timer code and the KVM counterpart have been written
>>>   by the same idiot (the idiot being me).
>>>
>>> The One True Way is to set the GIC active bit when injecting the
>>> interrupt, and to context-switch across the world switch. This is what
>>> this series implements.
>>>
>>> We introduce a relatively simple infrastructure enabling the mapping
>>> of a virtual interrupt with its physical counterpart:
>>>
>>> - Whenever an virtual interrupt is injected, we look it up in an
>>>   rbtree. If we have a match, the interrupt is injected with the HW
>>>   bit set in the LR, together with the physical interrupt.
>>>
>>> - Across the world switch, we save/restore the active state for these
>>>   interrupts using the irqchip_state API.
>>>
>>> - On guest EOI, the HW interrupt is automagically deactivated by the
>>>   GIC, allowing the interrupt to be resampled.
>>
>> I am lost about the status of the irqchip part, allowing EOImode=1 and
>> only dropping the prio for physical forwarded IRQs:
>> http://lkml.iu.edu/hypermail/linux/kernel/1410.3/00913.html
>> Doesn't this series also depend on those patches or did I miss something
>> on the ML?
> 
> No, these patches are self-contained. As long as we only deal with
> shared devices, we don't need EOImode=1, as as we save/restore the
> active state, and the only irqchip change required (the state accessors)
> went in with the 4.1 merge window.
> 
> The EOImode=1 stuff is still on the cards, and this series contains the
> basic infrastructure for that (the last patch in the series is there
> only for that purpose).
> 
> Hope this helps,

OK thanks. I am currently rebasing my kvm-vfio series on yours. I will
let you know the outcome.

Best Regards

Eric
> 
> 	M.
> 

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 00/10] arm/arm64: KVM: Active interrupt state switching for shared devices
@ 2015-06-10 11:13       ` Eric Auger
  0 siblings, 0 replies; 118+ messages in thread
From: Eric Auger @ 2015-06-10 11:13 UTC (permalink / raw)
  To: Marc Zyngier, kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org
  Cc: Christoffer Dall, Alex Bennée, Andre Przywara

Hi Marc,
On 06/10/2015 11:03 AM, Marc Zyngier wrote:
> Hi Eric,
> 
> On 10/06/15 09:33, Eric Auger wrote:
>> Hi Marc,
>> On 06/08/2015 07:03 PM, Marc Zyngier wrote:
>>> From day 1, our timer code has been using a terrible hack: whenever
>>> the guest is scheduled with a timer interrupt pending (i.e. the HW
>>> timer has expired), we restore the timer state with the MASK bit set,
>>> in order to avoid the physical interrupt to fire again. And again. And
>>> again...
>>>
>>> This is absolutely silly, for at least two reasons:
>>>
>>> - This relies on the device (the timer) having a mask bit that we can
>>>   play with. Not all devices are built like this.
>>>
>>> - This expects some behaviour of the guest that only works because the
>>>   both the kernel timer code and the KVM counterpart have been written
>>>   by the same idiot (the idiot being me).
>>>
>>> The One True Way is to set the GIC active bit when injecting the
>>> interrupt, and to context-switch across the world switch. This is what
>>> this series implements.
>>>
>>> We introduce a relatively simple infrastructure enabling the mapping
>>> of a virtual interrupt with its physical counterpart:
>>>
>>> - Whenever an virtual interrupt is injected, we look it up in an
>>>   rbtree. If we have a match, the interrupt is injected with the HW
>>>   bit set in the LR, together with the physical interrupt.
>>>
>>> - Across the world switch, we save/restore the active state for these
>>>   interrupts using the irqchip_state API.
>>>
>>> - On guest EOI, the HW interrupt is automagically deactivated by the
>>>   GIC, allowing the interrupt to be resampled.
>>
>> I am lost about the status of the irqchip part, allowing EOImode=1 and
>> only dropping the prio for physical forwarded IRQs:
>> http://lkml.iu.edu/hypermail/linux/kernel/1410.3/00913.html
>> Doesn't this series also depend on those patches or did I miss something
>> on the ML?
> 
> No, these patches are self-contained. As long as we only deal with
> shared devices, we don't need EOImode=1, as as we save/restore the
> active state, and the only irqchip change required (the state accessors)
> went in with the 4.1 merge window.
> 
> The EOImode=1 stuff is still on the cards, and this series contains the
> basic infrastructure for that (the last patch in the series is there
> only for that purpose).
> 
> Hope this helps,

OK thanks. I am currently rebasing my kvm-vfio series on yours. I will
let you know the outcome.

Best Regards

Eric
> 
> 	M.
> 


^ permalink raw reply	[flat|nested] 118+ messages in thread

* [PATCH 03/10] KVM: arm/arm64: vgic: Convert struct vgic_lr to use bitfields
  2015-06-08 17:03   ` Marc Zyngier
@ 2015-06-10 17:23     ` Andre Przywara
  -1 siblings, 0 replies; 118+ messages in thread
From: Andre Przywara @ 2015-06-10 17:23 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Marc,

On 06/08/2015 06:03 PM, Marc Zyngier wrote:
> As we're about to cram more information in the vgic_lr structure
> (HW interrupt number and additional state information), we switch
> to a layout similar to the HW's:
> 
> - use bitfields to save space (we don't need more than 10 bits
>   to represent the irq numbers)

But that will not be true for LPIs later, right? Before that I was lucky
with the irq field being 16 bits wide ;-)
So can we increase that to be at least 14 bits (8192 LPI offset + 8192
LPIs) here? The structure would still fit in 32 bits, then.
I guess guests should get away with only supporting 8K of LPIs, but if
we map hardware LPIs to guest IRQs I guess we may exceed 14 bits here.
Not sure if we could extend this further for ARM64 only, as we have more
room there and also need it only here.

Cheers,
Andre.

> - source CPU and HW interrupt can share the same field, as
>   a SGI doesn't have a physical line.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  include/kvm/arm_vgic.h | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> index 133ea00..4f9fa1d 100644
> --- a/include/kvm/arm_vgic.h
> +++ b/include/kvm/arm_vgic.h
> @@ -95,11 +95,15 @@ enum vgic_type {
>  #define LR_STATE_ACTIVE		(1 << 1)
>  #define LR_STATE_MASK		(3 << 0)
>  #define LR_EOI_INT		(1 << 2)
> +#define LR_HW			(1 << 3)
>  
>  struct vgic_lr {
> -	u16	irq;
> -	u8	source;
> -	u8	state;
> +	unsigned irq:10;
> +	union {
> +		unsigned hwirq:10;
> +		unsigned source:8;
> +	};
> +	unsigned state:4;
>  };
>  
>  struct vgic_vmcr {
> 

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 03/10] KVM: arm/arm64: vgic: Convert struct vgic_lr to use bitfields
@ 2015-06-10 17:23     ` Andre Przywara
  0 siblings, 0 replies; 118+ messages in thread
From: Andre Przywara @ 2015-06-10 17:23 UTC (permalink / raw)
  To: Marc Zyngier, kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org

Hi Marc,

On 06/08/2015 06:03 PM, Marc Zyngier wrote:
> As we're about to cram more information in the vgic_lr structure
> (HW interrupt number and additional state information), we switch
> to a layout similar to the HW's:
> 
> - use bitfields to save space (we don't need more than 10 bits
>   to represent the irq numbers)

But that will not be true for LPIs later, right? Before that I was lucky
with the irq field being 16 bits wide ;-)
So can we increase that to be at least 14 bits (8192 LPI offset + 8192
LPIs) here? The structure would still fit in 32 bits, then.
I guess guests should get away with only supporting 8K of LPIs, but if
we map hardware LPIs to guest IRQs I guess we may exceed 14 bits here.
Not sure if we could extend this further for ARM64 only, as we have more
room there and also need it only here.

Cheers,
Andre.

> - source CPU and HW interrupt can share the same field, as
>   a SGI doesn't have a physical line.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  include/kvm/arm_vgic.h | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> index 133ea00..4f9fa1d 100644
> --- a/include/kvm/arm_vgic.h
> +++ b/include/kvm/arm_vgic.h
> @@ -95,11 +95,15 @@ enum vgic_type {
>  #define LR_STATE_ACTIVE		(1 << 1)
>  #define LR_STATE_MASK		(3 << 0)
>  #define LR_EOI_INT		(1 << 2)
> +#define LR_HW			(1 << 3)
>  
>  struct vgic_lr {
> -	u16	irq;
> -	u8	source;
> -	u8	state;
> +	unsigned irq:10;
> +	union {
> +		unsigned hwirq:10;
> +		unsigned source:8;
> +	};
> +	unsigned state:4;
>  };
>  
>  struct vgic_vmcr {
> 

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [PATCH 03/10] KVM: arm/arm64: vgic: Convert struct vgic_lr to use bitfields
  2015-06-10 17:23     ` Andre Przywara
@ 2015-06-10 18:04       ` Marc Zyngier
  -1 siblings, 0 replies; 118+ messages in thread
From: Marc Zyngier @ 2015-06-10 18:04 UTC (permalink / raw)
  To: linux-arm-kernel

On 10/06/15 18:23, Andre Przywara wrote:
> Hi Marc,
> 
> On 06/08/2015 06:03 PM, Marc Zyngier wrote:
>> As we're about to cram more information in the vgic_lr structure
>> (HW interrupt number and additional state information), we switch
>> to a layout similar to the HW's:
>>
>> - use bitfields to save space (we don't need more than 10 bits
>>   to represent the irq numbers)
> 
> But that will not be true for LPIs later, right? Before that I was lucky
> with the irq field being 16 bits wide ;-)
> So can we increase that to be at least 14 bits (8192 LPI offset + 8192
> LPIs) here? The structure would still fit in 32 bits, then.

Yes, that's true. An oversight on my part.

> I guess guests should get away with only supporting 8K of LPIs, but if
> we map hardware LPIs to guest IRQs I guess we may exceed 14 bits here.
> Not sure if we could extend this further for ARM64 only, as we have more
> room there and also need it only here.

Mapping LPIs to IRQs is not a problem, as LPIs don't have an active
state, so you never have to deactivate it, and you never put them in an
LR. Problem solved! ;-)

Alternative layout proposal for the ITS emulation case:

#define LR_IS_LPI	(1 << 4)

struct vgic_lr {
	union {
		unsigned lpi:24;
		unsigned spi:10;
		union {
			unsigned hwirq:10;
			unsigned source:8;
		};
	};
	unsigned state:5;
};

That gives you even more space than you had before.

Now, this is absolutely disgusting, of course. This whole intermediate
structure crap is completely getting out of control, and we should get
rid of it before we merge the ITS.

/me goes looking for a chainsaw...

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 03/10] KVM: arm/arm64: vgic: Convert struct vgic_lr to use bitfields
@ 2015-06-10 18:04       ` Marc Zyngier
  0 siblings, 0 replies; 118+ messages in thread
From: Marc Zyngier @ 2015-06-10 18:04 UTC (permalink / raw)
  To: Andre Przywara, kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org
  Cc: Christoffer Dall, Eric Auger, Alex Bennée

On 10/06/15 18:23, Andre Przywara wrote:
> Hi Marc,
> 
> On 06/08/2015 06:03 PM, Marc Zyngier wrote:
>> As we're about to cram more information in the vgic_lr structure
>> (HW interrupt number and additional state information), we switch
>> to a layout similar to the HW's:
>>
>> - use bitfields to save space (we don't need more than 10 bits
>>   to represent the irq numbers)
> 
> But that will not be true for LPIs later, right? Before that I was lucky
> with the irq field being 16 bits wide ;-)
> So can we increase that to be at least 14 bits (8192 LPI offset + 8192
> LPIs) here? The structure would still fit in 32 bits, then.

Yes, that's true. An oversight on my part.

> I guess guests should get away with only supporting 8K of LPIs, but if
> we map hardware LPIs to guest IRQs I guess we may exceed 14 bits here.
> Not sure if we could extend this further for ARM64 only, as we have more
> room there and also need it only here.

Mapping LPIs to IRQs is not a problem, as LPIs don't have an active
state, so you never have to deactivate it, and you never put them in an
LR. Problem solved! ;-)

Alternative layout proposal for the ITS emulation case:

#define LR_IS_LPI	(1 << 4)

struct vgic_lr {
	union {
		unsigned lpi:24;
		unsigned spi:10;
		union {
			unsigned hwirq:10;
			unsigned source:8;
		};
	};
	unsigned state:5;
};

That gives you even more space than you had before.

Now, this is absolutely disgusting, of course. This whole intermediate
structure crap is completely getting out of control, and we should get
rid of it before we merge the ITS.

/me goes looking for a chainsaw...

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [PATCH 06/10] KVM: arm/arm64: vgic: Allow dynamic mapping of physical/virtual interrupts
  2015-06-08 17:04   ` Marc Zyngier
@ 2015-06-11  8:43     ` Andre Przywara
  -1 siblings, 0 replies; 118+ messages in thread
From: Andre Przywara @ 2015-06-11  8:43 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On 06/08/2015 06:04 PM, Marc Zyngier wrote:
> In order to be able to feed physical interrupts to a guest, we need
> to be able to establish the virtual-physical mapping between the two
> worlds.
>
> The mapping is kept in a rbtree, indexed by virtual interrupts.
>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  include/kvm/arm_vgic.h |  18 ++++++++
>  virt/kvm/arm/vgic.c    | 110 +++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 128 insertions(+)
>
> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> index 4f9fa1d..33d121a 100644
> --- a/include/kvm/arm_vgic.h
> +++ b/include/kvm/arm_vgic.h
> @@ -159,6 +159,14 @@ struct vgic_io_device {
>       struct kvm_io_device dev;
>  };
>
> +struct irq_phys_map {
> +     struct rb_node          node;
> +     u32                     virt_irq;
> +     u32                     phys_irq;
> +     u32                     irq;

Can you add comments explaining the different IRQ types here?
So I take it that phys_irq is the actual SPI number (hwirq in irqchip
lingo), virt_irq is the guest's virtual IRQ number and irq is Linux'
notion of the IRQ number (the first column in /proc/interrupts)?
Would renaming help? (phys_irq to hwirq? virt_irq to guest_irq?)

> +     bool                    active;
> +};
> +
>  struct vgic_dist {
>       spinlock_t              lock;
>       bool                    in_kernel;
> @@ -256,6 +264,10 @@ struct vgic_dist {
>       struct vgic_vm_ops      vm_ops;
>       struct vgic_io_device   dist_iodev;
>       struct vgic_io_device   *redist_iodevs;
> +
> +     /* Virtual irq to hwirq mapping */
> +     spinlock_t              irq_phys_map_lock;
> +     struct rb_root          irq_phys_map;
>  };
>
>  struct vgic_v2_cpu_if {
> @@ -307,6 +319,9 @@ struct vgic_cpu {
>               struct vgic_v2_cpu_if   vgic_v2;
>               struct vgic_v3_cpu_if   vgic_v3;
>       };
> +
> +     /* Protected by the distributor's irq_phys_map_lock */
> +     struct rb_root  irq_phys_map;
>  };
>
>  #define LR_EMPTY     0xff
> @@ -331,6 +346,9 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int irq_num,
>  void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg);
>  int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
>  int kvm_vgic_vcpu_active_irq(struct kvm_vcpu *vcpu);
> +struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
> +                                    int virt_irq, int irq);
> +int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map);
>
>  #define irqchip_in_kernel(k) (!!((k)->arch.vgic.in_kernel))
>  #define vgic_initialized(k)  (!!((k)->arch.vgic.nr_cpus))
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 59ed7a3..c6604f2 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -24,6 +24,7 @@
>  #include <linux/of.h>
>  #include <linux/of_address.h>
>  #include <linux/of_irq.h>
> +#include <linux/rbtree.h>
>  #include <linux/uaccess.h>
>
>  #include <linux/irqchip/arm-gic.h>
> @@ -84,6 +85,8 @@ static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu);
>  static void vgic_retire_lr(int lr_nr, int irq, struct kvm_vcpu *vcpu);
>  static struct vgic_lr vgic_get_lr(const struct kvm_vcpu *vcpu, int lr);
>  static void vgic_set_lr(struct kvm_vcpu *vcpu, int lr, struct vgic_lr lr_desc);
> +static struct irq_phys_map *vgic_irq_map_search(struct kvm_vcpu *vcpu,
> +                                             int virt_irq);
>
>  static const struct vgic_ops *vgic_ops;
>  static const struct vgic_params *vgic;
> @@ -1585,6 +1588,112 @@ static irqreturn_t vgic_maintenance_handler(int irq, void *data)
>       return IRQ_HANDLED;
>  }
>
> +static struct rb_root *vgic_get_irq_phys_map(struct kvm_vcpu *vcpu,
> +                                          int virt_irq)
> +{
> +     if (virt_irq < VGIC_NR_PRIVATE_IRQS)
> +             return &vcpu->arch.vgic_cpu.irq_phys_map;
> +     else
> +             return &vcpu->kvm->arch.vgic.irq_phys_map;
> +}
> +
> +struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
> +                                    int virt_irq, int irq)
> +{
> +     struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
> +     struct rb_root *root = vgic_get_irq_phys_map(vcpu, virt_irq);
> +     struct rb_node **new = &root->rb_node, *parent = NULL;
> +     struct irq_phys_map *new_map;
> +     struct irq_desc *desc;
> +     struct irq_data *data;
> +     int phys_irq;
> +
> +     desc = irq_to_desc(irq);
> +     if (!desc) {
> +             kvm_err("kvm_arch_timer: can't obtain interrupt descriptor\n");

I guess kvm_arch_timer: is a left-over of the original user?

> +             return NULL;
> +     }
> +
> +     data = irq_desc_get_irq_data(desc);
> +     while (data->parent_data)
> +             data = data->parent_data;
> +
> +     phys_irq = data->hwirq;

So if I get this correctly we "cache" hwirq/phys_irq in this map to get
a cheaper access to it, but actually it is redundant since we have
Linux' irq number, isn't it?
Are we sure that the irqdomain mapping of irq and phys_irq never will
change while this map entry is valid? This is probably true for the
timer, but does that still hold in the future with other devices?
(see also the next email for more rationale)

Cheers,
Andre.

> +
> +     spin_lock(&dist->irq_phys_map_lock);
> +
> +     /* Boilerplate rb_tree code */
> +     while (*new) {
> +             struct irq_phys_map *this;
> +
> +             this = container_of(*new, struct irq_phys_map, node);
> +             parent = *new;
> +             if (this->virt_irq < virt_irq)
> +                     new = &(*new)->rb_left;
> +             else if (this->virt_irq > virt_irq)
> +                     new = &(*new)->rb_right;
> +             else {
> +                     new_map = this;
> +                     goto out;
> +             }
> +     }
> +
> +     new_map = kzalloc(sizeof(*new_map), GFP_KERNEL);
> +     if (!new_map)
> +             goto out;
> +
> +     new_map->virt_irq = virt_irq;
> +     new_map->phys_irq = phys_irq;
> +     new_map->irq = irq;
> +
> +     rb_link_node(&new_map->node, parent, new);
> +     rb_insert_color(&new_map->node, root);
> +
> +out:
> +     spin_unlock(&dist->irq_phys_map_lock);
> +     return new_map;
> +}
> +
> +static struct irq_phys_map *vgic_irq_map_search(struct kvm_vcpu *vcpu,
> +                                             int virt_irq)
> +{
> +     struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
> +     struct rb_root *root = vgic_get_irq_phys_map(vcpu, virt_irq);
> +     struct rb_node *node = root->rb_node;
> +     struct irq_phys_map *this = NULL;
> +
> +     spin_lock(&dist->irq_phys_map_lock);
> +
> +     while (node) {
> +             this = container_of(node, struct irq_phys_map, node);
> +
> +             if (this->virt_irq < virt_irq)
> +                     node = node->rb_left;
> +             else if (this->virt_irq > virt_irq)
> +                     node = node->rb_right;
> +             else
> +                     break;
> +     }
> +
> +     spin_unlock(&dist->irq_phys_map_lock);
> +     return this;
> +}
> +
> +int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map)
> +{
> +     struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
> +
> +     if (!map)
> +             return -EINVAL;
> +
> +     spin_lock(&dist->irq_phys_map_lock);
> +     rb_erase(&map->node, vgic_get_irq_phys_map(vcpu, map->virt_irq));
> +     spin_unlock(&dist->irq_phys_map_lock);
> +
> +     kfree(map);
> +     return 0;
> +}
> +
>  void kvm_vgic_vcpu_destroy(struct kvm_vcpu *vcpu)
>  {
>       struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
> @@ -1835,6 +1944,7 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
>               goto out_unlock;
>
>       spin_lock_init(&kvm->arch.vgic.lock);
> +     spin_lock_init(&kvm->arch.vgic.irq_phys_map_lock);
>       kvm->arch.vgic.in_kernel = true;
>       kvm->arch.vgic.vgic_model = type;
>       kvm->arch.vgic.vctrl_base = vgic->vctrl_base;
>

-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium.  Thank you.

ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No:  2557590
ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No:  2548782

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 06/10] KVM: arm/arm64: vgic: Allow dynamic mapping of physical/virtual interrupts
@ 2015-06-11  8:43     ` Andre Przywara
  0 siblings, 0 replies; 118+ messages in thread
From: Andre Przywara @ 2015-06-11  8:43 UTC (permalink / raw)
  To: Marc Zyngier, kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org

Hi,

On 06/08/2015 06:04 PM, Marc Zyngier wrote:
> In order to be able to feed physical interrupts to a guest, we need
> to be able to establish the virtual-physical mapping between the two
> worlds.
>
> The mapping is kept in a rbtree, indexed by virtual interrupts.
>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  include/kvm/arm_vgic.h |  18 ++++++++
>  virt/kvm/arm/vgic.c    | 110 +++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 128 insertions(+)
>
> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> index 4f9fa1d..33d121a 100644
> --- a/include/kvm/arm_vgic.h
> +++ b/include/kvm/arm_vgic.h
> @@ -159,6 +159,14 @@ struct vgic_io_device {
>       struct kvm_io_device dev;
>  };
>
> +struct irq_phys_map {
> +     struct rb_node          node;
> +     u32                     virt_irq;
> +     u32                     phys_irq;
> +     u32                     irq;

Can you add comments explaining the different IRQ types here?
So I take it that phys_irq is the actual SPI number (hwirq in irqchip
lingo), virt_irq is the guest's virtual IRQ number and irq is Linux'
notion of the IRQ number (the first column in /proc/interrupts)?
Would renaming help? (phys_irq to hwirq? virt_irq to guest_irq?)

> +     bool                    active;
> +};
> +
>  struct vgic_dist {
>       spinlock_t              lock;
>       bool                    in_kernel;
> @@ -256,6 +264,10 @@ struct vgic_dist {
>       struct vgic_vm_ops      vm_ops;
>       struct vgic_io_device   dist_iodev;
>       struct vgic_io_device   *redist_iodevs;
> +
> +     /* Virtual irq to hwirq mapping */
> +     spinlock_t              irq_phys_map_lock;
> +     struct rb_root          irq_phys_map;
>  };
>
>  struct vgic_v2_cpu_if {
> @@ -307,6 +319,9 @@ struct vgic_cpu {
>               struct vgic_v2_cpu_if   vgic_v2;
>               struct vgic_v3_cpu_if   vgic_v3;
>       };
> +
> +     /* Protected by the distributor's irq_phys_map_lock */
> +     struct rb_root  irq_phys_map;
>  };
>
>  #define LR_EMPTY     0xff
> @@ -331,6 +346,9 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int irq_num,
>  void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg);
>  int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
>  int kvm_vgic_vcpu_active_irq(struct kvm_vcpu *vcpu);
> +struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
> +                                    int virt_irq, int irq);
> +int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map);
>
>  #define irqchip_in_kernel(k) (!!((k)->arch.vgic.in_kernel))
>  #define vgic_initialized(k)  (!!((k)->arch.vgic.nr_cpus))
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 59ed7a3..c6604f2 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -24,6 +24,7 @@
>  #include <linux/of.h>
>  #include <linux/of_address.h>
>  #include <linux/of_irq.h>
> +#include <linux/rbtree.h>
>  #include <linux/uaccess.h>
>
>  #include <linux/irqchip/arm-gic.h>
> @@ -84,6 +85,8 @@ static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu);
>  static void vgic_retire_lr(int lr_nr, int irq, struct kvm_vcpu *vcpu);
>  static struct vgic_lr vgic_get_lr(const struct kvm_vcpu *vcpu, int lr);
>  static void vgic_set_lr(struct kvm_vcpu *vcpu, int lr, struct vgic_lr lr_desc);
> +static struct irq_phys_map *vgic_irq_map_search(struct kvm_vcpu *vcpu,
> +                                             int virt_irq);
>
>  static const struct vgic_ops *vgic_ops;
>  static const struct vgic_params *vgic;
> @@ -1585,6 +1588,112 @@ static irqreturn_t vgic_maintenance_handler(int irq, void *data)
>       return IRQ_HANDLED;
>  }
>
> +static struct rb_root *vgic_get_irq_phys_map(struct kvm_vcpu *vcpu,
> +                                          int virt_irq)
> +{
> +     if (virt_irq < VGIC_NR_PRIVATE_IRQS)
> +             return &vcpu->arch.vgic_cpu.irq_phys_map;
> +     else
> +             return &vcpu->kvm->arch.vgic.irq_phys_map;
> +}
> +
> +struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
> +                                    int virt_irq, int irq)
> +{
> +     struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
> +     struct rb_root *root = vgic_get_irq_phys_map(vcpu, virt_irq);
> +     struct rb_node **new = &root->rb_node, *parent = NULL;
> +     struct irq_phys_map *new_map;
> +     struct irq_desc *desc;
> +     struct irq_data *data;
> +     int phys_irq;
> +
> +     desc = irq_to_desc(irq);
> +     if (!desc) {
> +             kvm_err("kvm_arch_timer: can't obtain interrupt descriptor\n");

I guess kvm_arch_timer: is a left-over of the original user?

> +             return NULL;
> +     }
> +
> +     data = irq_desc_get_irq_data(desc);
> +     while (data->parent_data)
> +             data = data->parent_data;
> +
> +     phys_irq = data->hwirq;

So if I get this correctly we "cache" hwirq/phys_irq in this map to get
a cheaper access to it, but actually it is redundant since we have
Linux' irq number, isn't it?
Are we sure that the irqdomain mapping of irq and phys_irq never will
change while this map entry is valid? This is probably true for the
timer, but does that still hold in the future with other devices?
(see also the next email for more rationale)

Cheers,
Andre.

> +
> +     spin_lock(&dist->irq_phys_map_lock);
> +
> +     /* Boilerplate rb_tree code */
> +     while (*new) {
> +             struct irq_phys_map *this;
> +
> +             this = container_of(*new, struct irq_phys_map, node);
> +             parent = *new;
> +             if (this->virt_irq < virt_irq)
> +                     new = &(*new)->rb_left;
> +             else if (this->virt_irq > virt_irq)
> +                     new = &(*new)->rb_right;
> +             else {
> +                     new_map = this;
> +                     goto out;
> +             }
> +     }
> +
> +     new_map = kzalloc(sizeof(*new_map), GFP_KERNEL);
> +     if (!new_map)
> +             goto out;
> +
> +     new_map->virt_irq = virt_irq;
> +     new_map->phys_irq = phys_irq;
> +     new_map->irq = irq;
> +
> +     rb_link_node(&new_map->node, parent, new);
> +     rb_insert_color(&new_map->node, root);
> +
> +out:
> +     spin_unlock(&dist->irq_phys_map_lock);
> +     return new_map;
> +}
> +
> +static struct irq_phys_map *vgic_irq_map_search(struct kvm_vcpu *vcpu,
> +                                             int virt_irq)
> +{
> +     struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
> +     struct rb_root *root = vgic_get_irq_phys_map(vcpu, virt_irq);
> +     struct rb_node *node = root->rb_node;
> +     struct irq_phys_map *this = NULL;
> +
> +     spin_lock(&dist->irq_phys_map_lock);
> +
> +     while (node) {
> +             this = container_of(node, struct irq_phys_map, node);
> +
> +             if (this->virt_irq < virt_irq)
> +                     node = node->rb_left;
> +             else if (this->virt_irq > virt_irq)
> +                     node = node->rb_right;
> +             else
> +                     break;
> +     }
> +
> +     spin_unlock(&dist->irq_phys_map_lock);
> +     return this;
> +}
> +
> +int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map)
> +{
> +     struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
> +
> +     if (!map)
> +             return -EINVAL;
> +
> +     spin_lock(&dist->irq_phys_map_lock);
> +     rb_erase(&map->node, vgic_get_irq_phys_map(vcpu, map->virt_irq));
> +     spin_unlock(&dist->irq_phys_map_lock);
> +
> +     kfree(map);
> +     return 0;
> +}
> +
>  void kvm_vgic_vcpu_destroy(struct kvm_vcpu *vcpu)
>  {
>       struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
> @@ -1835,6 +1944,7 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
>               goto out_unlock;
>
>       spin_lock_init(&kvm->arch.vgic.lock);
> +     spin_lock_init(&kvm->arch.vgic.irq_phys_map_lock);
>       kvm->arch.vgic.in_kernel = true;
>       kvm->arch.vgic.vgic_model = type;
>       kvm->arch.vgic.vctrl_base = vgic->vctrl_base;
>

-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium.  Thank you.

ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No:  2557590
ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No:  2548782

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [PATCH 07/10] KVM: arm/arm64: vgic: Allow HW interrupts to be queued to a guest
  2015-06-08 17:04   ` Marc Zyngier
@ 2015-06-11  8:44     ` Andre Przywara
  -1 siblings, 0 replies; 118+ messages in thread
From: Andre Przywara @ 2015-06-11  8:44 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Marc,

On 06/08/2015 06:04 PM, Marc Zyngier wrote:
> To allow a HW interrupt to be injected into a guest, we lookup the
> guest virtual interrupt in the irq_phys_map rbtree, and if we have
> a match, encode both interrupts in the LR.
> 
> We also mark the interrupt as "active" at the host distributor level.
> 
> On guest EOI on the virtual interrupt, the host interrupt will be
> deactivated.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  virt/kvm/arm/vgic.c | 71 ++++++++++++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 68 insertions(+), 3 deletions(-)
> 
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index c6604f2..495ac7d 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -1120,6 +1120,26 @@ static void vgic_queue_irq_to_lr(struct kvm_vcpu *vcpu, int irq,
>  	if (!vgic_irq_is_edge(vcpu, irq))
>  		vlr.state |= LR_EOI_INT;
>  
> +	if (vlr.irq >= VGIC_NR_SGIS) {
> +		struct irq_phys_map *map;
> +		map = vgic_irq_map_search(vcpu, irq);
> +
> +		if (map) {
> +			int ret;
> +
> +			BUG_ON(!map->active);
> +			vlr.hwirq = map->phys_irq;
> +			vlr.state |= LR_HW;
> +			vlr.state &= ~LR_EOI_INT;
> +
> +			ret = irq_set_irqchip_state(map->irq,
> +						    IRQCHIP_STATE_ACTIVE,
> +						    true);
> +			vgic_irq_set_queued(vcpu, irq);
> +			WARN_ON(ret);
> +		}
> +	}
> +
>  	vgic_set_lr(vcpu, lr_nr, vlr);
>  	vgic_sync_lr_elrsr(vcpu, lr_nr, vlr);
>  }
> @@ -1344,6 +1364,35 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
>  	return level_pending;
>  }
>  
> +/* Return 1 if HW interrupt went from active to inactive, and 0 otherwise */
> +static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr)
> +{
> +	struct irq_phys_map *map;
> +	int ret;
> +
> +	if (!(vlr.state & LR_HW))
> +		return 0;
> +
> +	map = vgic_irq_map_search(vcpu, vlr.irq);

I wonder if it's safe to rely on that mapping here. Are we sure that
this hasn't changed while the VCPU was running? If I got this correctly,
currently only vcpu_reset will actually add a map entry, but I guess in
the future there will be more users.
Also we rely on the irqdomain mapping to be still the same, but that is
probably a safe assumption.

But I'd still find it more natural to use the hwirq number from the LR
at this point. Can't we use irq_find_mapping() here to learn Linux'
(current) irq number from that?

Or am I too paranoid here?

Cheers,
Andre.

> +	BUG_ON(!map || !map->active);
> +
> +	ret = irq_get_irqchip_state(map->irq,
> +				    IRQCHIP_STATE_ACTIVE,
> +				    &map->active);
> +
> +	WARN_ON(ret);
> +
> +	if (map->active) {
> +		ret = irq_set_irqchip_state(map->irq,
> +					    IRQCHIP_STATE_ACTIVE,
> +					    false);
> +		WARN_ON(ret);
> +		return 0;
> +	}
> +
> +	return 1;
> +}
> +
>  /* Sync back the VGIC state after a guest run */
>  static void __kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
>  {
> @@ -1358,14 +1407,30 @@ static void __kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
>  	elrsr = vgic_get_elrsr(vcpu);
>  	elrsr_ptr = u64_to_bitmask(&elrsr);
>  
> -	/* Clear mappings for empty LRs */
> -	for_each_set_bit(lr, elrsr_ptr, vgic->nr_lr) {
> +	/* Deal with HW interrupts, and clear mappings for empty LRs */
> +	for (lr = 0; lr < vgic->nr_lr; lr++) {
>  		struct vgic_lr vlr;
>  
> -		if (!test_and_clear_bit(lr, vgic_cpu->lr_used))
> +		if (!test_bit(lr, vgic_cpu->lr_used))
>  			continue;
>  
>  		vlr = vgic_get_lr(vcpu, lr);
> +		if (vgic_sync_hwirq(vcpu, vlr)) {
> +			/*
> +			 * So this is a HW interrupt that the guest
> +			 * EOI-ed. Clean the LR state and allow the
> +			 * interrupt to be queued again.
> +			 */
> +			vlr.state &= ~LR_HW;
> +			vlr.hwirq = 0;
> +			vgic_set_lr(vcpu, lr, vlr);
> +			vgic_irq_clear_queued(vcpu, vlr.irq);
> +		}
> +
> +		if (!test_bit(lr, elrsr_ptr))
> +			continue;
> +
> +		clear_bit(lr, vgic_cpu->lr_used);
>  
>  		BUG_ON(vlr.irq >= dist->nr_irqs);
>  		vgic_cpu->vgic_irq_lr_map[vlr.irq] = LR_EMPTY;
> 

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 07/10] KVM: arm/arm64: vgic: Allow HW interrupts to be queued to a guest
@ 2015-06-11  8:44     ` Andre Przywara
  0 siblings, 0 replies; 118+ messages in thread
From: Andre Przywara @ 2015-06-11  8:44 UTC (permalink / raw)
  To: Marc Zyngier, kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org

Hi Marc,

On 06/08/2015 06:04 PM, Marc Zyngier wrote:
> To allow a HW interrupt to be injected into a guest, we lookup the
> guest virtual interrupt in the irq_phys_map rbtree, and if we have
> a match, encode both interrupts in the LR.
> 
> We also mark the interrupt as "active" at the host distributor level.
> 
> On guest EOI on the virtual interrupt, the host interrupt will be
> deactivated.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  virt/kvm/arm/vgic.c | 71 ++++++++++++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 68 insertions(+), 3 deletions(-)
> 
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index c6604f2..495ac7d 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -1120,6 +1120,26 @@ static void vgic_queue_irq_to_lr(struct kvm_vcpu *vcpu, int irq,
>  	if (!vgic_irq_is_edge(vcpu, irq))
>  		vlr.state |= LR_EOI_INT;
>  
> +	if (vlr.irq >= VGIC_NR_SGIS) {
> +		struct irq_phys_map *map;
> +		map = vgic_irq_map_search(vcpu, irq);
> +
> +		if (map) {
> +			int ret;
> +
> +			BUG_ON(!map->active);
> +			vlr.hwirq = map->phys_irq;
> +			vlr.state |= LR_HW;
> +			vlr.state &= ~LR_EOI_INT;
> +
> +			ret = irq_set_irqchip_state(map->irq,
> +						    IRQCHIP_STATE_ACTIVE,
> +						    true);
> +			vgic_irq_set_queued(vcpu, irq);
> +			WARN_ON(ret);
> +		}
> +	}
> +
>  	vgic_set_lr(vcpu, lr_nr, vlr);
>  	vgic_sync_lr_elrsr(vcpu, lr_nr, vlr);
>  }
> @@ -1344,6 +1364,35 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
>  	return level_pending;
>  }
>  
> +/* Return 1 if HW interrupt went from active to inactive, and 0 otherwise */
> +static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr)
> +{
> +	struct irq_phys_map *map;
> +	int ret;
> +
> +	if (!(vlr.state & LR_HW))
> +		return 0;
> +
> +	map = vgic_irq_map_search(vcpu, vlr.irq);

I wonder if it's safe to rely on that mapping here. Are we sure that
this hasn't changed while the VCPU was running? If I got this correctly,
currently only vcpu_reset will actually add a map entry, but I guess in
the future there will be more users.
Also we rely on the irqdomain mapping to be still the same, but that is
probably a safe assumption.

But I'd still find it more natural to use the hwirq number from the LR
at this point. Can't we use irq_find_mapping() here to learn Linux'
(current) irq number from that?

Or am I too paranoid here?

Cheers,
Andre.

> +	BUG_ON(!map || !map->active);
> +
> +	ret = irq_get_irqchip_state(map->irq,
> +				    IRQCHIP_STATE_ACTIVE,
> +				    &map->active);
> +
> +	WARN_ON(ret);
> +
> +	if (map->active) {
> +		ret = irq_set_irqchip_state(map->irq,
> +					    IRQCHIP_STATE_ACTIVE,
> +					    false);
> +		WARN_ON(ret);
> +		return 0;
> +	}
> +
> +	return 1;
> +}
> +
>  /* Sync back the VGIC state after a guest run */
>  static void __kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
>  {
> @@ -1358,14 +1407,30 @@ static void __kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
>  	elrsr = vgic_get_elrsr(vcpu);
>  	elrsr_ptr = u64_to_bitmask(&elrsr);
>  
> -	/* Clear mappings for empty LRs */
> -	for_each_set_bit(lr, elrsr_ptr, vgic->nr_lr) {
> +	/* Deal with HW interrupts, and clear mappings for empty LRs */
> +	for (lr = 0; lr < vgic->nr_lr; lr++) {
>  		struct vgic_lr vlr;
>  
> -		if (!test_and_clear_bit(lr, vgic_cpu->lr_used))
> +		if (!test_bit(lr, vgic_cpu->lr_used))
>  			continue;
>  
>  		vlr = vgic_get_lr(vcpu, lr);
> +		if (vgic_sync_hwirq(vcpu, vlr)) {
> +			/*
> +			 * So this is a HW interrupt that the guest
> +			 * EOI-ed. Clean the LR state and allow the
> +			 * interrupt to be queued again.
> +			 */
> +			vlr.state &= ~LR_HW;
> +			vlr.hwirq = 0;
> +			vgic_set_lr(vcpu, lr, vlr);
> +			vgic_irq_clear_queued(vcpu, vlr.irq);
> +		}
> +
> +		if (!test_bit(lr, elrsr_ptr))
> +			continue;
> +
> +		clear_bit(lr, vgic_cpu->lr_used);
>  
>  		BUG_ON(vlr.irq >= dist->nr_irqs);
>  		vgic_cpu->vgic_irq_lr_map[vlr.irq] = LR_EMPTY;
> 

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [PATCH 06/10] KVM: arm/arm64: vgic: Allow dynamic mapping of physical/virtual interrupts
  2015-06-11  8:43     ` Andre Przywara
@ 2015-06-11  8:56       ` Marc Zyngier
  -1 siblings, 0 replies; 118+ messages in thread
From: Marc Zyngier @ 2015-06-11  8:56 UTC (permalink / raw)
  To: linux-arm-kernel

On 11/06/15 09:43, Andre Przywara wrote:
> Hi,
> 
> On 06/08/2015 06:04 PM, Marc Zyngier wrote:
>> In order to be able to feed physical interrupts to a guest, we need
>> to be able to establish the virtual-physical mapping between the two
>> worlds.
>>
>> The mapping is kept in a rbtree, indexed by virtual interrupts.
>>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  include/kvm/arm_vgic.h |  18 ++++++++
>>  virt/kvm/arm/vgic.c    | 110 +++++++++++++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 128 insertions(+)
>>
>> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
>> index 4f9fa1d..33d121a 100644
>> --- a/include/kvm/arm_vgic.h
>> +++ b/include/kvm/arm_vgic.h
>> @@ -159,6 +159,14 @@ struct vgic_io_device {
>>  	struct kvm_io_device dev;
>>  };
>>  
>> +struct irq_phys_map {
>> +	struct rb_node		node;
>> +	u32			virt_irq;
>> +	u32			phys_irq;
>> +	u32			irq;
> 
> Can you add comments explaining the different IRQ types here?
> So I take it that phys_irq is the actual SPI number (hwirq in irqchip
> lingo), virt_irq is the guest's virtual IRQ number and irq is Linux'
> notion of the IRQ number (the first column in /proc/interrupts)?
> Would renaming help? (phys_irq to hwirq? virt_irq to guest_irq?)

Adding comments would probably help. Renaming, I'm not sure. The virt vs
phys was clear enough for me. The ambiguous one would be "irq"... I'll
try to work something out anyway.

> 
>> +	bool			active;
>> +};
>> +
>>  struct vgic_dist {
>>  	spinlock_t		lock;
>>  	bool			in_kernel;
>> @@ -256,6 +264,10 @@ struct vgic_dist {
>>  	struct vgic_vm_ops	vm_ops;
>>  	struct vgic_io_device	dist_iodev;
>>  	struct vgic_io_device	*redist_iodevs;
>> +
>> +	/* Virtual irq to hwirq mapping */
>> +	spinlock_t		irq_phys_map_lock;
>> +	struct rb_root		irq_phys_map;
>>  };
>>  
>>  struct vgic_v2_cpu_if {
>> @@ -307,6 +319,9 @@ struct vgic_cpu {
>>  		struct vgic_v2_cpu_if	vgic_v2;
>>  		struct vgic_v3_cpu_if	vgic_v3;
>>  	};
>> +
>> +	/* Protected by the distributor's irq_phys_map_lock */
>> +	struct rb_root	irq_phys_map;
>>  };
>>  
>>  #define LR_EMPTY	0xff
>> @@ -331,6 +346,9 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int irq_num,
>>  void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg);
>>  int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
>>  int kvm_vgic_vcpu_active_irq(struct kvm_vcpu *vcpu);
>> +struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
>> +				       int virt_irq, int irq);
>> +int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map);
>>  
>>  #define irqchip_in_kernel(k)	(!!((k)->arch.vgic.in_kernel))
>>  #define vgic_initialized(k)	(!!((k)->arch.vgic.nr_cpus))
>> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
>> index 59ed7a3..c6604f2 100644
>> --- a/virt/kvm/arm/vgic.c
>> +++ b/virt/kvm/arm/vgic.c
>> @@ -24,6 +24,7 @@
>>  #include <linux/of.h>
>>  #include <linux/of_address.h>
>>  #include <linux/of_irq.h>
>> +#include <linux/rbtree.h>
>>  #include <linux/uaccess.h>
>>  
>>  #include <linux/irqchip/arm-gic.h>
>> @@ -84,6 +85,8 @@ static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu);
>>  static void vgic_retire_lr(int lr_nr, int irq, struct kvm_vcpu *vcpu);
>>  static struct vgic_lr vgic_get_lr(const struct kvm_vcpu *vcpu, int lr);
>>  static void vgic_set_lr(struct kvm_vcpu *vcpu, int lr, struct vgic_lr lr_desc);
>> +static struct irq_phys_map *vgic_irq_map_search(struct kvm_vcpu *vcpu,
>> +						int virt_irq);
>>  
>>  static const struct vgic_ops *vgic_ops;
>>  static const struct vgic_params *vgic;
>> @@ -1585,6 +1588,112 @@ static irqreturn_t vgic_maintenance_handler(int irq, void *data)
>>  	return IRQ_HANDLED;
>>  }
>>  
>> +static struct rb_root *vgic_get_irq_phys_map(struct kvm_vcpu *vcpu,
>> +					     int virt_irq)
>> +{
>> +	if (virt_irq < VGIC_NR_PRIVATE_IRQS)
>> +		return &vcpu->arch.vgic_cpu.irq_phys_map;
>> +	else
>> +		return &vcpu->kvm->arch.vgic.irq_phys_map;
>> +}
>> +
>> +struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
>> +				       int virt_irq, int irq)
>> +{
>> +	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
>> +	struct rb_root *root = vgic_get_irq_phys_map(vcpu, virt_irq);
>> +	struct rb_node **new = &root->rb_node, *parent = NULL;
>> +	struct irq_phys_map *new_map;
>> +	struct irq_desc *desc;
>> +	struct irq_data *data;
>> +	int phys_irq;
>> +
>> +	desc = irq_to_desc(irq);
>> +	if (!desc) {
>> +		kvm_err("kvm_arch_timer: can't obtain interrupt descriptor\n");
> 
> I guess kvm_arch_timer: is a left-over of the original user?
> 
>> +		return NULL;
>> +	}
>> +
>> +	data = irq_desc_get_irq_data(desc);
>> +	while (data->parent_data)
>> +		data = data->parent_data;
>> +
>> +	phys_irq = data->hwirq;
> 
> So if I get this correctly we "cache" hwirq/phys_irq in this map to get
> a cheaper access to it, but actually it is redundant since we have
> Linux' irq number, isn't it?

Define what you call by redundant. Parsing the irq_data hierarchy can be
arbitrarily long, and what we're after is the irq vs GIC view of the HW irq.

> Are we sure that the irqdomain mapping of irq and phys_irq never will
> change while this map entry is valid? This is probably true for the
> timer, but does that still hold in the future with other devices?
> (see also the next email for more rationale)

How could such a mapping change? We have done a request_irq on this IRQ
line. If it was to change, we'd get notifier when *another device* is
interrupting us. That would simply defeat the whole purpose of having
separate interrupts.

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 06/10] KVM: arm/arm64: vgic: Allow dynamic mapping of physical/virtual interrupts
@ 2015-06-11  8:56       ` Marc Zyngier
  0 siblings, 0 replies; 118+ messages in thread
From: Marc Zyngier @ 2015-06-11  8:56 UTC (permalink / raw)
  To: Andre Przywara, kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org

On 11/06/15 09:43, Andre Przywara wrote:
> Hi,
> 
> On 06/08/2015 06:04 PM, Marc Zyngier wrote:
>> In order to be able to feed physical interrupts to a guest, we need
>> to be able to establish the virtual-physical mapping between the two
>> worlds.
>>
>> The mapping is kept in a rbtree, indexed by virtual interrupts.
>>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  include/kvm/arm_vgic.h |  18 ++++++++
>>  virt/kvm/arm/vgic.c    | 110 +++++++++++++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 128 insertions(+)
>>
>> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
>> index 4f9fa1d..33d121a 100644
>> --- a/include/kvm/arm_vgic.h
>> +++ b/include/kvm/arm_vgic.h
>> @@ -159,6 +159,14 @@ struct vgic_io_device {
>>  	struct kvm_io_device dev;
>>  };
>>  
>> +struct irq_phys_map {
>> +	struct rb_node		node;
>> +	u32			virt_irq;
>> +	u32			phys_irq;
>> +	u32			irq;
> 
> Can you add comments explaining the different IRQ types here?
> So I take it that phys_irq is the actual SPI number (hwirq in irqchip
> lingo), virt_irq is the guest's virtual IRQ number and irq is Linux'
> notion of the IRQ number (the first column in /proc/interrupts)?
> Would renaming help? (phys_irq to hwirq? virt_irq to guest_irq?)

Adding comments would probably help. Renaming, I'm not sure. The virt vs
phys was clear enough for me. The ambiguous one would be "irq"... I'll
try to work something out anyway.

> 
>> +	bool			active;
>> +};
>> +
>>  struct vgic_dist {
>>  	spinlock_t		lock;
>>  	bool			in_kernel;
>> @@ -256,6 +264,10 @@ struct vgic_dist {
>>  	struct vgic_vm_ops	vm_ops;
>>  	struct vgic_io_device	dist_iodev;
>>  	struct vgic_io_device	*redist_iodevs;
>> +
>> +	/* Virtual irq to hwirq mapping */
>> +	spinlock_t		irq_phys_map_lock;
>> +	struct rb_root		irq_phys_map;
>>  };
>>  
>>  struct vgic_v2_cpu_if {
>> @@ -307,6 +319,9 @@ struct vgic_cpu {
>>  		struct vgic_v2_cpu_if	vgic_v2;
>>  		struct vgic_v3_cpu_if	vgic_v3;
>>  	};
>> +
>> +	/* Protected by the distributor's irq_phys_map_lock */
>> +	struct rb_root	irq_phys_map;
>>  };
>>  
>>  #define LR_EMPTY	0xff
>> @@ -331,6 +346,9 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int irq_num,
>>  void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg);
>>  int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
>>  int kvm_vgic_vcpu_active_irq(struct kvm_vcpu *vcpu);
>> +struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
>> +				       int virt_irq, int irq);
>> +int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map);
>>  
>>  #define irqchip_in_kernel(k)	(!!((k)->arch.vgic.in_kernel))
>>  #define vgic_initialized(k)	(!!((k)->arch.vgic.nr_cpus))
>> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
>> index 59ed7a3..c6604f2 100644
>> --- a/virt/kvm/arm/vgic.c
>> +++ b/virt/kvm/arm/vgic.c
>> @@ -24,6 +24,7 @@
>>  #include <linux/of.h>
>>  #include <linux/of_address.h>
>>  #include <linux/of_irq.h>
>> +#include <linux/rbtree.h>
>>  #include <linux/uaccess.h>
>>  
>>  #include <linux/irqchip/arm-gic.h>
>> @@ -84,6 +85,8 @@ static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu);
>>  static void vgic_retire_lr(int lr_nr, int irq, struct kvm_vcpu *vcpu);
>>  static struct vgic_lr vgic_get_lr(const struct kvm_vcpu *vcpu, int lr);
>>  static void vgic_set_lr(struct kvm_vcpu *vcpu, int lr, struct vgic_lr lr_desc);
>> +static struct irq_phys_map *vgic_irq_map_search(struct kvm_vcpu *vcpu,
>> +						int virt_irq);
>>  
>>  static const struct vgic_ops *vgic_ops;
>>  static const struct vgic_params *vgic;
>> @@ -1585,6 +1588,112 @@ static irqreturn_t vgic_maintenance_handler(int irq, void *data)
>>  	return IRQ_HANDLED;
>>  }
>>  
>> +static struct rb_root *vgic_get_irq_phys_map(struct kvm_vcpu *vcpu,
>> +					     int virt_irq)
>> +{
>> +	if (virt_irq < VGIC_NR_PRIVATE_IRQS)
>> +		return &vcpu->arch.vgic_cpu.irq_phys_map;
>> +	else
>> +		return &vcpu->kvm->arch.vgic.irq_phys_map;
>> +}
>> +
>> +struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
>> +				       int virt_irq, int irq)
>> +{
>> +	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
>> +	struct rb_root *root = vgic_get_irq_phys_map(vcpu, virt_irq);
>> +	struct rb_node **new = &root->rb_node, *parent = NULL;
>> +	struct irq_phys_map *new_map;
>> +	struct irq_desc *desc;
>> +	struct irq_data *data;
>> +	int phys_irq;
>> +
>> +	desc = irq_to_desc(irq);
>> +	if (!desc) {
>> +		kvm_err("kvm_arch_timer: can't obtain interrupt descriptor\n");
> 
> I guess kvm_arch_timer: is a left-over of the original user?
> 
>> +		return NULL;
>> +	}
>> +
>> +	data = irq_desc_get_irq_data(desc);
>> +	while (data->parent_data)
>> +		data = data->parent_data;
>> +
>> +	phys_irq = data->hwirq;
> 
> So if I get this correctly we "cache" hwirq/phys_irq in this map to get
> a cheaper access to it, but actually it is redundant since we have
> Linux' irq number, isn't it?

Define what you call by redundant. Parsing the irq_data hierarchy can be
arbitrarily long, and what we're after is the irq vs GIC view of the HW irq.

> Are we sure that the irqdomain mapping of irq and phys_irq never will
> change while this map entry is valid? This is probably true for the
> timer, but does that still hold in the future with other devices?
> (see also the next email for more rationale)

How could such a mapping change? We have done a request_irq on this IRQ
line. If it was to change, we'd get notifier when *another device* is
interrupting us. That would simply defeat the whole purpose of having
separate interrupts.

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [PATCH 07/10] KVM: arm/arm64: vgic: Allow HW interrupts to be queued to a guest
  2015-06-11  8:44     ` Andre Przywara
@ 2015-06-11  9:15       ` Marc Zyngier
  -1 siblings, 0 replies; 118+ messages in thread
From: Marc Zyngier @ 2015-06-11  9:15 UTC (permalink / raw)
  To: linux-arm-kernel

On 11/06/15 09:44, Andre Przywara wrote:
> Hi Marc,
> 
> On 06/08/2015 06:04 PM, Marc Zyngier wrote:
>> To allow a HW interrupt to be injected into a guest, we lookup the
>> guest virtual interrupt in the irq_phys_map rbtree, and if we have
>> a match, encode both interrupts in the LR.
>>
>> We also mark the interrupt as "active" at the host distributor level.
>>
>> On guest EOI on the virtual interrupt, the host interrupt will be
>> deactivated.
>>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  virt/kvm/arm/vgic.c | 71 ++++++++++++++++++++++++++++++++++++++++++++++++++---
>>  1 file changed, 68 insertions(+), 3 deletions(-)
>>
>> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
>> index c6604f2..495ac7d 100644
>> --- a/virt/kvm/arm/vgic.c
>> +++ b/virt/kvm/arm/vgic.c
>> @@ -1120,6 +1120,26 @@ static void vgic_queue_irq_to_lr(struct kvm_vcpu *vcpu, int irq,
>>  	if (!vgic_irq_is_edge(vcpu, irq))
>>  		vlr.state |= LR_EOI_INT;
>>  
>> +	if (vlr.irq >= VGIC_NR_SGIS) {
>> +		struct irq_phys_map *map;
>> +		map = vgic_irq_map_search(vcpu, irq);
>> +
>> +		if (map) {
>> +			int ret;
>> +
>> +			BUG_ON(!map->active);
>> +			vlr.hwirq = map->phys_irq;
>> +			vlr.state |= LR_HW;
>> +			vlr.state &= ~LR_EOI_INT;
>> +
>> +			ret = irq_set_irqchip_state(map->irq,
>> +						    IRQCHIP_STATE_ACTIVE,
>> +						    true);
>> +			vgic_irq_set_queued(vcpu, irq);
>> +			WARN_ON(ret);
>> +		}
>> +	}
>> +
>>  	vgic_set_lr(vcpu, lr_nr, vlr);
>>  	vgic_sync_lr_elrsr(vcpu, lr_nr, vlr);
>>  }
>> @@ -1344,6 +1364,35 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
>>  	return level_pending;
>>  }
>>  
>> +/* Return 1 if HW interrupt went from active to inactive, and 0 otherwise */
>> +static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr)
>> +{
>> +	struct irq_phys_map *map;
>> +	int ret;
>> +
>> +	if (!(vlr.state & LR_HW))
>> +		return 0;
>> +
>> +	map = vgic_irq_map_search(vcpu, vlr.irq);
> 
> I wonder if it's safe to rely on that mapping here. Are we sure that
> this hasn't changed while the VCPU was running? If I got this correctly,
> currently only vcpu_reset will actually add a map entry, but I guess in
> the future there will be more users.

How can the guest interrupt change? This is HW, as far as the guest is
concerned. An actual interrupt line. We don't reconfigure the HW live.

> Also we rely on the irqdomain mapping to be still the same, but that is
> probably a safe assumption.

Like I said before, this *cannot* change.

> But I'd still find it more natural to use the hwirq number from the LR
> at this point. Can't we use irq_find_mapping() here to learn Linux'
> (current) irq number from that?

I think you're confused.

- The guest irq (vlr.irq) is entirely made up, and has no connection
with reality. it is stable, and cannot change during the lifetime of the
guest (think of it as a HW irq line).

- The host hwirq (vlr.hwirq) is stable as well, for the same reason.

- The Linux IRQ cannot change because we've been given it by the kernel,
and that's what we use for *everything* as far as the kernel is
concerned. Its mapping to hwirq is stable as well because this is how we
talk to the HW.

- irq_find_mapping gives you the *reverse* mapping (from hwirq to Linux
irq), and for that to work, you need the domain on which you want to
apply the translation. This is only useful when actually taking the
interrupt (i.e. in an interrupt controller driver). I can't see how that
could make sense here.

The purpose of this mapping is to, given the guest irq (because that's
what we inject), what the other values are:
- hwirq: to provide GICH with the interrupt to deactivate
- Linux irq: to control the active state through the irqchip state API.

> Or am I too paranoid here?

Hope it makes more sense to you now.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 07/10] KVM: arm/arm64: vgic: Allow HW interrupts to be queued to a guest
@ 2015-06-11  9:15       ` Marc Zyngier
  0 siblings, 0 replies; 118+ messages in thread
From: Marc Zyngier @ 2015-06-11  9:15 UTC (permalink / raw)
  To: Andre Przywara, kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org
  Cc: Christoffer Dall, Eric Auger, Alex Bennée

On 11/06/15 09:44, Andre Przywara wrote:
> Hi Marc,
> 
> On 06/08/2015 06:04 PM, Marc Zyngier wrote:
>> To allow a HW interrupt to be injected into a guest, we lookup the
>> guest virtual interrupt in the irq_phys_map rbtree, and if we have
>> a match, encode both interrupts in the LR.
>>
>> We also mark the interrupt as "active" at the host distributor level.
>>
>> On guest EOI on the virtual interrupt, the host interrupt will be
>> deactivated.
>>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  virt/kvm/arm/vgic.c | 71 ++++++++++++++++++++++++++++++++++++++++++++++++++---
>>  1 file changed, 68 insertions(+), 3 deletions(-)
>>
>> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
>> index c6604f2..495ac7d 100644
>> --- a/virt/kvm/arm/vgic.c
>> +++ b/virt/kvm/arm/vgic.c
>> @@ -1120,6 +1120,26 @@ static void vgic_queue_irq_to_lr(struct kvm_vcpu *vcpu, int irq,
>>  	if (!vgic_irq_is_edge(vcpu, irq))
>>  		vlr.state |= LR_EOI_INT;
>>  
>> +	if (vlr.irq >= VGIC_NR_SGIS) {
>> +		struct irq_phys_map *map;
>> +		map = vgic_irq_map_search(vcpu, irq);
>> +
>> +		if (map) {
>> +			int ret;
>> +
>> +			BUG_ON(!map->active);
>> +			vlr.hwirq = map->phys_irq;
>> +			vlr.state |= LR_HW;
>> +			vlr.state &= ~LR_EOI_INT;
>> +
>> +			ret = irq_set_irqchip_state(map->irq,
>> +						    IRQCHIP_STATE_ACTIVE,
>> +						    true);
>> +			vgic_irq_set_queued(vcpu, irq);
>> +			WARN_ON(ret);
>> +		}
>> +	}
>> +
>>  	vgic_set_lr(vcpu, lr_nr, vlr);
>>  	vgic_sync_lr_elrsr(vcpu, lr_nr, vlr);
>>  }
>> @@ -1344,6 +1364,35 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
>>  	return level_pending;
>>  }
>>  
>> +/* Return 1 if HW interrupt went from active to inactive, and 0 otherwise */
>> +static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr)
>> +{
>> +	struct irq_phys_map *map;
>> +	int ret;
>> +
>> +	if (!(vlr.state & LR_HW))
>> +		return 0;
>> +
>> +	map = vgic_irq_map_search(vcpu, vlr.irq);
> 
> I wonder if it's safe to rely on that mapping here. Are we sure that
> this hasn't changed while the VCPU was running? If I got this correctly,
> currently only vcpu_reset will actually add a map entry, but I guess in
> the future there will be more users.

How can the guest interrupt change? This is HW, as far as the guest is
concerned. An actual interrupt line. We don't reconfigure the HW live.

> Also we rely on the irqdomain mapping to be still the same, but that is
> probably a safe assumption.

Like I said before, this *cannot* change.

> But I'd still find it more natural to use the hwirq number from the LR
> at this point. Can't we use irq_find_mapping() here to learn Linux'
> (current) irq number from that?

I think you're confused.

- The guest irq (vlr.irq) is entirely made up, and has no connection
with reality. it is stable, and cannot change during the lifetime of the
guest (think of it as a HW irq line).

- The host hwirq (vlr.hwirq) is stable as well, for the same reason.

- The Linux IRQ cannot change because we've been given it by the kernel,
and that's what we use for *everything* as far as the kernel is
concerned. Its mapping to hwirq is stable as well because this is how we
talk to the HW.

- irq_find_mapping gives you the *reverse* mapping (from hwirq to Linux
irq), and for that to work, you need the domain on which you want to
apply the translation. This is only useful when actually taking the
interrupt (i.e. in an interrupt controller driver). I can't see how that
could make sense here.

The purpose of this mapping is to, given the guest irq (because that's
what we inject), what the other values are:
- hwirq: to provide GICH with the interrupt to deactivate
- Linux irq: to control the active state through the irqchip state API.

> Or am I too paranoid here?

Hope it makes more sense to you now.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [PATCH 07/10] KVM: arm/arm64: vgic: Allow HW interrupts to be queued to a guest
  2015-06-11  9:15       ` Marc Zyngier
@ 2015-06-11  9:44         ` Andre Przywara
  -1 siblings, 0 replies; 118+ messages in thread
From: Andre Przywara @ 2015-06-11  9:44 UTC (permalink / raw)
  To: linux-arm-kernel

On 06/11/2015 10:15 AM, Marc Zyngier wrote:
> On 11/06/15 09:44, Andre Przywara wrote:
>> On 06/08/2015 06:04 PM, Marc Zyngier wrote:
...
>>> @@ -1344,6 +1364,35 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
>>>  	return level_pending;
>>>  }
>>>  
>>> +/* Return 1 if HW interrupt went from active to inactive, and 0 otherwise */
>>> +static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr)
>>> +{
>>> +	struct irq_phys_map *map;
>>> +	int ret;
>>> +
>>> +	if (!(vlr.state & LR_HW))
>>> +		return 0;
>>> +
>>> +	map = vgic_irq_map_search(vcpu, vlr.irq);
>>
>> I wonder if it's safe to rely on that mapping here. Are we sure that
>> this hasn't changed while the VCPU was running? If I got this correctly,
>> currently only vcpu_reset will actually add a map entry, but I guess in
>> the future there will be more users.
> 
> How can the guest interrupt change? This is HW, as far as the guest is
> concerned. An actual interrupt line. We don't reconfigure the HW live.

I was thinking about the rbtree mapping we introduced. There we map a
guest interrupt to a hardware interrupt. Are we sure that no one tears
down that mapping while we have an LR populated with this pair?
I am not talking about the timer here, but more about future users.

>> Also we rely on the irqdomain mapping to be still the same, but that is
>> probably a safe assumption.
> 
> Like I said before, this *cannot* change.

OK, got it.

> 
>> But I'd still find it more natural to use the hwirq number from the LR
>> at this point. Can't we use irq_find_mapping() here to learn Linux'
>> (current) irq number from that?
> 
> I think you're confused.
> 
> - The guest irq (vlr.irq) is entirely made up, and has no connection
> with reality. it is stable, and cannot change during the lifetime of the
> guest (think of it as a HW irq line).
> 
> - The host hwirq (vlr.hwirq) is stable as well, for the same reason.
> 
> - The Linux IRQ cannot change because we've been given it by the kernel,
> and that's what we use for *everything* as far as the kernel is
> concerned. Its mapping to hwirq is stable as well because this is how we
> talk to the HW.

Not disputing any of them, but:

> - irq_find_mapping gives you the *reverse* mapping (from hwirq to Linux
> irq), and for that to work, you need the domain on which you want to
> apply the translation. This is only useful when actually taking the
> interrupt (i.e. in an interrupt controller driver). I can't see how that
> could make sense here.

So if the guest has acked/EOIed it's IRQ, the GIC at the same time
acked/EOIed the hardware IRQ it found in the LR. Now we assume that this
is the very same as the HW IRQ we found doing our rbtree traversal.
I just wanted to be sure that this is always true and that this mapping
didn't change while the VCPU was running.
If you are sure of this, fine, I was just concerned that someone breaks
this assumption in the future by more dynamically mapping/unmapping
entries (say some irq forwarding user) and we will not notice.

Cheers,
Andre.

> 
> The purpose of this mapping is to, given the guest irq (because that's
> what we inject), what the other values are:
> - hwirq: to provide GICH with the interrupt to deactivate
> - Linux irq: to control the active state through the irqchip state API.
> 
>> Or am I too paranoid here?
> 
> Hope it makes more sense to you now.
> 
> Thanks,
> 
> 	M.
> 

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 07/10] KVM: arm/arm64: vgic: Allow HW interrupts to be queued to a guest
@ 2015-06-11  9:44         ` Andre Przywara
  0 siblings, 0 replies; 118+ messages in thread
From: Andre Przywara @ 2015-06-11  9:44 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel@lists.infradead.org
  Cc: kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org

On 06/11/2015 10:15 AM, Marc Zyngier wrote:
> On 11/06/15 09:44, Andre Przywara wrote:
>> On 06/08/2015 06:04 PM, Marc Zyngier wrote:
...
>>> @@ -1344,6 +1364,35 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
>>>  	return level_pending;
>>>  }
>>>  
>>> +/* Return 1 if HW interrupt went from active to inactive, and 0 otherwise */
>>> +static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr)
>>> +{
>>> +	struct irq_phys_map *map;
>>> +	int ret;
>>> +
>>> +	if (!(vlr.state & LR_HW))
>>> +		return 0;
>>> +
>>> +	map = vgic_irq_map_search(vcpu, vlr.irq);
>>
>> I wonder if it's safe to rely on that mapping here. Are we sure that
>> this hasn't changed while the VCPU was running? If I got this correctly,
>> currently only vcpu_reset will actually add a map entry, but I guess in
>> the future there will be more users.
> 
> How can the guest interrupt change? This is HW, as far as the guest is
> concerned. An actual interrupt line. We don't reconfigure the HW live.

I was thinking about the rbtree mapping we introduced. There we map a
guest interrupt to a hardware interrupt. Are we sure that no one tears
down that mapping while we have an LR populated with this pair?
I am not talking about the timer here, but more about future users.

>> Also we rely on the irqdomain mapping to be still the same, but that is
>> probably a safe assumption.
> 
> Like I said before, this *cannot* change.

OK, got it.

> 
>> But I'd still find it more natural to use the hwirq number from the LR
>> at this point. Can't we use irq_find_mapping() here to learn Linux'
>> (current) irq number from that?
> 
> I think you're confused.
> 
> - The guest irq (vlr.irq) is entirely made up, and has no connection
> with reality. it is stable, and cannot change during the lifetime of the
> guest (think of it as a HW irq line).
> 
> - The host hwirq (vlr.hwirq) is stable as well, for the same reason.
> 
> - The Linux IRQ cannot change because we've been given it by the kernel,
> and that's what we use for *everything* as far as the kernel is
> concerned. Its mapping to hwirq is stable as well because this is how we
> talk to the HW.

Not disputing any of them, but:

> - irq_find_mapping gives you the *reverse* mapping (from hwirq to Linux
> irq), and for that to work, you need the domain on which you want to
> apply the translation. This is only useful when actually taking the
> interrupt (i.e. in an interrupt controller driver). I can't see how that
> could make sense here.

So if the guest has acked/EOIed it's IRQ, the GIC at the same time
acked/EOIed the hardware IRQ it found in the LR. Now we assume that this
is the very same as the HW IRQ we found doing our rbtree traversal.
I just wanted to be sure that this is always true and that this mapping
didn't change while the VCPU was running.
If you are sure of this, fine, I was just concerned that someone breaks
this assumption in the future by more dynamically mapping/unmapping
entries (say some irq forwarding user) and we will not notice.

Cheers,
Andre.

> 
> The purpose of this mapping is to, given the guest irq (because that's
> what we inject), what the other values are:
> - hwirq: to provide GICH with the interrupt to deactivate
> - Linux irq: to control the active state through the irqchip state API.
> 
>> Or am I too paranoid here?
> 
> Hope it makes more sense to you now.
> 
> Thanks,
> 
> 	M.
> 

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [PATCH 07/10] KVM: arm/arm64: vgic: Allow HW interrupts to be queued to a guest
  2015-06-11  9:44         ` Andre Przywara
@ 2015-06-11 10:02           ` Marc Zyngier
  -1 siblings, 0 replies; 118+ messages in thread
From: Marc Zyngier @ 2015-06-11 10:02 UTC (permalink / raw)
  To: linux-arm-kernel

On 11/06/15 10:44, Andre Przywara wrote:
> On 06/11/2015 10:15 AM, Marc Zyngier wrote:
>> On 11/06/15 09:44, Andre Przywara wrote:
>>> On 06/08/2015 06:04 PM, Marc Zyngier wrote:
> ...
>>>> @@ -1344,6 +1364,35 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
>>>>  	return level_pending;
>>>>  }
>>>>  
>>>> +/* Return 1 if HW interrupt went from active to inactive, and 0 otherwise */
>>>> +static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr)
>>>> +{
>>>> +	struct irq_phys_map *map;
>>>> +	int ret;
>>>> +
>>>> +	if (!(vlr.state & LR_HW))
>>>> +		return 0;
>>>> +
>>>> +	map = vgic_irq_map_search(vcpu, vlr.irq);
>>>
>>> I wonder if it's safe to rely on that mapping here. Are we sure that
>>> this hasn't changed while the VCPU was running? If I got this correctly,
>>> currently only vcpu_reset will actually add a map entry, but I guess in
>>> the future there will be more users.
>>
>> How can the guest interrupt change? This is HW, as far as the guest is
>> concerned. An actual interrupt line. We don't reconfigure the HW live.
> 
> I was thinking about the rbtree mapping we introduced. There we map a
> guest interrupt to a hardware interrupt. Are we sure that no one tears
> down that mapping while we have an LR populated with this pair?
> I am not talking about the timer here, but more about future users.
> 
>>> Also we rely on the irqdomain mapping to be still the same, but that is
>>> probably a safe assumption.
>>
>> Like I said before, this *cannot* change.
> 
> OK, got it.
> 
>>
>>> But I'd still find it more natural to use the hwirq number from the LR
>>> at this point. Can't we use irq_find_mapping() here to learn Linux'
>>> (current) irq number from that?
>>
>> I think you're confused.
>>
>> - The guest irq (vlr.irq) is entirely made up, and has no connection
>> with reality. it is stable, and cannot change during the lifetime of the
>> guest (think of it as a HW irq line).
>>
>> - The host hwirq (vlr.hwirq) is stable as well, for the same reason.
>>
>> - The Linux IRQ cannot change because we've been given it by the kernel,
>> and that's what we use for *everything* as far as the kernel is
>> concerned. Its mapping to hwirq is stable as well because this is how we
>> talk to the HW.
> 
> Not disputing any of them, but:
> 
>> - irq_find_mapping gives you the *reverse* mapping (from hwirq to Linux
>> irq), and for that to work, you need the domain on which you want to
>> apply the translation. This is only useful when actually taking the
>> interrupt (i.e. in an interrupt controller driver). I can't see how that
>> could make sense here.
> 
> So if the guest has acked/EOIed it's IRQ, the GIC at the same time
> acked/EOIed the hardware IRQ it found in the LR. Now we assume that this
> is the very same as the HW IRQ we found doing our rbtree traversal.
> I just wanted to be sure that this is always true and that this mapping
> didn't change while the VCPU was running.
> If you are sure of this, fine, I was just concerned that someone breaks
> this assumption in the future by more dynamically mapping/unmapping
> entries (say some irq forwarding user) and we will not notice.

How can the mapping change? Are you thinking of an unmap/map operation
being done while the guest is running, replacing a HW device with
another? That's not an option, and not only for the interrupts.

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 07/10] KVM: arm/arm64: vgic: Allow HW interrupts to be queued to a guest
@ 2015-06-11 10:02           ` Marc Zyngier
  0 siblings, 0 replies; 118+ messages in thread
From: Marc Zyngier @ 2015-06-11 10:02 UTC (permalink / raw)
  To: Andre Przywara, linux-arm-kernel@lists.infradead.org
  Cc: kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu,
	Christoffer Dall, Eric Auger, Alex Bennée

On 11/06/15 10:44, Andre Przywara wrote:
> On 06/11/2015 10:15 AM, Marc Zyngier wrote:
>> On 11/06/15 09:44, Andre Przywara wrote:
>>> On 06/08/2015 06:04 PM, Marc Zyngier wrote:
> ...
>>>> @@ -1344,6 +1364,35 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
>>>>  	return level_pending;
>>>>  }
>>>>  
>>>> +/* Return 1 if HW interrupt went from active to inactive, and 0 otherwise */
>>>> +static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr)
>>>> +{
>>>> +	struct irq_phys_map *map;
>>>> +	int ret;
>>>> +
>>>> +	if (!(vlr.state & LR_HW))
>>>> +		return 0;
>>>> +
>>>> +	map = vgic_irq_map_search(vcpu, vlr.irq);
>>>
>>> I wonder if it's safe to rely on that mapping here. Are we sure that
>>> this hasn't changed while the VCPU was running? If I got this correctly,
>>> currently only vcpu_reset will actually add a map entry, but I guess in
>>> the future there will be more users.
>>
>> How can the guest interrupt change? This is HW, as far as the guest is
>> concerned. An actual interrupt line. We don't reconfigure the HW live.
> 
> I was thinking about the rbtree mapping we introduced. There we map a
> guest interrupt to a hardware interrupt. Are we sure that no one tears
> down that mapping while we have an LR populated with this pair?
> I am not talking about the timer here, but more about future users.
> 
>>> Also we rely on the irqdomain mapping to be still the same, but that is
>>> probably a safe assumption.
>>
>> Like I said before, this *cannot* change.
> 
> OK, got it.
> 
>>
>>> But I'd still find it more natural to use the hwirq number from the LR
>>> at this point. Can't we use irq_find_mapping() here to learn Linux'
>>> (current) irq number from that?
>>
>> I think you're confused.
>>
>> - The guest irq (vlr.irq) is entirely made up, and has no connection
>> with reality. it is stable, and cannot change during the lifetime of the
>> guest (think of it as a HW irq line).
>>
>> - The host hwirq (vlr.hwirq) is stable as well, for the same reason.
>>
>> - The Linux IRQ cannot change because we've been given it by the kernel,
>> and that's what we use for *everything* as far as the kernel is
>> concerned. Its mapping to hwirq is stable as well because this is how we
>> talk to the HW.
> 
> Not disputing any of them, but:
> 
>> - irq_find_mapping gives you the *reverse* mapping (from hwirq to Linux
>> irq), and for that to work, you need the domain on which you want to
>> apply the translation. This is only useful when actually taking the
>> interrupt (i.e. in an interrupt controller driver). I can't see how that
>> could make sense here.
> 
> So if the guest has acked/EOIed it's IRQ, the GIC at the same time
> acked/EOIed the hardware IRQ it found in the LR. Now we assume that this
> is the very same as the HW IRQ we found doing our rbtree traversal.
> I just wanted to be sure that this is always true and that this mapping
> didn't change while the VCPU was running.
> If you are sure of this, fine, I was just concerned that someone breaks
> this assumption in the future by more dynamically mapping/unmapping
> entries (say some irq forwarding user) and we will not notice.

How can the mapping change? Are you thinking of an unmap/map operation
being done while the guest is running, replacing a HW device with
another? That's not an option, and not only for the interrupts.

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [PATCH 06/10] KVM: arm/arm64: vgic: Allow dynamic mapping of physical/virtual interrupts
  2015-06-08 17:04   ` Marc Zyngier
@ 2015-06-15 15:44     ` Eric Auger
  -1 siblings, 0 replies; 118+ messages in thread
From: Eric Auger @ 2015-06-15 15:44 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Marc,
On 06/08/2015 07:04 PM, Marc Zyngier wrote:
> In order to be able to feed physical interrupts to a guest, we need
> to be able to establish the virtual-physical mapping between the two
> worlds.
> 
> The mapping is kept in a rbtree, indexed by virtual interrupts.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  include/kvm/arm_vgic.h |  18 ++++++++
>  virt/kvm/arm/vgic.c    | 110 +++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 128 insertions(+)
> 
> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> index 4f9fa1d..33d121a 100644
> --- a/include/kvm/arm_vgic.h
> +++ b/include/kvm/arm_vgic.h
> @@ -159,6 +159,14 @@ struct vgic_io_device {
>  	struct kvm_io_device dev;
>  };
>  
> +struct irq_phys_map {
> +	struct rb_node		node;
> +	u32			virt_irq;
> +	u32			phys_irq;
> +	u32			irq;
> +	bool			active;
> +};
> +
>  struct vgic_dist {
>  	spinlock_t		lock;
>  	bool			in_kernel;
> @@ -256,6 +264,10 @@ struct vgic_dist {
>  	struct vgic_vm_ops	vm_ops;
>  	struct vgic_io_device	dist_iodev;
>  	struct vgic_io_device	*redist_iodevs;
> +
> +	/* Virtual irq to hwirq mapping */
> +	spinlock_t		irq_phys_map_lock;
> +	struct rb_root		irq_phys_map;
>  };
>  
>  struct vgic_v2_cpu_if {
> @@ -307,6 +319,9 @@ struct vgic_cpu {
>  		struct vgic_v2_cpu_if	vgic_v2;
>  		struct vgic_v3_cpu_if	vgic_v3;
>  	};
> +
> +	/* Protected by the distributor's irq_phys_map_lock */
> +	struct rb_root	irq_phys_map;
>  };
>  
>  #define LR_EMPTY	0xff
> @@ -331,6 +346,9 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int irq_num,
>  void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg);
>  int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
>  int kvm_vgic_vcpu_active_irq(struct kvm_vcpu *vcpu);
> +struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
> +				       int virt_irq, int irq);
> +int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map);
>  
>  #define irqchip_in_kernel(k)	(!!((k)->arch.vgic.in_kernel))
>  #define vgic_initialized(k)	(!!((k)->arch.vgic.nr_cpus))
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 59ed7a3..c6604f2 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -24,6 +24,7 @@
>  #include <linux/of.h>
>  #include <linux/of_address.h>
>  #include <linux/of_irq.h>
> +#include <linux/rbtree.h>
>  #include <linux/uaccess.h>
>  
>  #include <linux/irqchip/arm-gic.h>
> @@ -84,6 +85,8 @@ static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu);
>  static void vgic_retire_lr(int lr_nr, int irq, struct kvm_vcpu *vcpu);
>  static struct vgic_lr vgic_get_lr(const struct kvm_vcpu *vcpu, int lr);
>  static void vgic_set_lr(struct kvm_vcpu *vcpu, int lr, struct vgic_lr lr_desc);
> +static struct irq_phys_map *vgic_irq_map_search(struct kvm_vcpu *vcpu,
> +						int virt_irq);
>  
>  static const struct vgic_ops *vgic_ops;
>  static const struct vgic_params *vgic;
> @@ -1585,6 +1588,112 @@ static irqreturn_t vgic_maintenance_handler(int irq, void *data)
>  	return IRQ_HANDLED;
>  }
>  
> +static struct rb_root *vgic_get_irq_phys_map(struct kvm_vcpu *vcpu,
> +					     int virt_irq)
> +{
> +	if (virt_irq < VGIC_NR_PRIVATE_IRQS)
> +		return &vcpu->arch.vgic_cpu.irq_phys_map;
> +	else
> +		return &vcpu->kvm->arch.vgic.irq_phys_map;
> +}
> +
> +struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
> +				       int virt_irq, int irq)
> +{
> +	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
> +	struct rb_root *root = vgic_get_irq_phys_map(vcpu, virt_irq);
> +	struct rb_node **new = &root->rb_node, *parent = NULL;
> +	struct irq_phys_map *new_map;
> +	struct irq_desc *desc;
> +	struct irq_data *data;
> +	int phys_irq;
> +
> +	desc = irq_to_desc(irq);
> +	if (!desc) {
> +		kvm_err("kvm_arch_timer: can't obtain interrupt descriptor\n");
> +		return NULL;
> +	}
> +
> +	data = irq_desc_get_irq_data(desc);
> +	while (data->parent_data)
> +		data = data->parent_data;
> +
> +	phys_irq = data->hwirq;
> +
> +	spin_lock(&dist->irq_phys_map_lock);
> +
> +	/* Boilerplate rb_tree code */
> +	while (*new) {
> +		struct irq_phys_map *this;
> +
> +		this = container_of(*new, struct irq_phys_map, node);
> +		parent = *new;
> +		if (this->virt_irq < virt_irq)
> +			new = &(*new)->rb_left;
> +		else if (this->virt_irq > virt_irq)
> +			new = &(*new)->rb_right;
> +		else {
> +			new_map = this;
in case the mapping already exists you don't update the mappping or
return an error. Is it what you want here?

> +			goto out;
> +		}
> +	}
> +
> +	new_map = kzalloc(sizeof(*new_map), GFP_KERNEL);
> +	if (!new_map)
> +		goto out;
> +
> +	new_map->virt_irq = virt_irq;
> +	new_map->phys_irq = phys_irq;
> +	new_map->irq = irq;
> +
> +	rb_link_node(&new_map->node, parent, new);
> +	rb_insert_color(&new_map->node, root);
> +
> +out:
> +	spin_unlock(&dist->irq_phys_map_lock);
> +	return new_map;
> +}
> +
> +static struct irq_phys_map *vgic_irq_map_search(struct kvm_vcpu *vcpu,
> +						int virt_irq)
> +{
> +	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
> +	struct rb_root *root = vgic_get_irq_phys_map(vcpu, virt_irq);
> +	struct rb_node *node = root->rb_node;
> +	struct irq_phys_map *this = NULL;
> +
> +	spin_lock(&dist->irq_phys_map_lock);
> +
> +	while (node) {
> +		this = container_of(node, struct irq_phys_map, node);
> +
> +		if (this->virt_irq < virt_irq)
> +			node = node->rb_left;
> +		else if (this->virt_irq > virt_irq)
> +			node = node->rb_right;
> +		else
> +			break;
> +	}
> +
> +	spin_unlock(&dist->irq_phys_map_lock);
> +	return this;
> +}
> +
> +int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map)
> +{
> +	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
> +
> +	if (!map)
> +		return -EINVAL;
> +
> +	spin_lock(&dist->irq_phys_map_lock);
> +	rb_erase(&map->node, vgic_get_irq_phys_map(vcpu, map->virt_irq));
> +	spin_unlock(&dist->irq_phys_map_lock);
> +
> +	kfree(map);
> +	return 0;
> +}
> +
>  void kvm_vgic_vcpu_destroy(struct kvm_vcpu *vcpu)
>  {
>  	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
> @@ -1835,6 +1944,7 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
>  		goto out_unlock;
>  
>  	spin_lock_init(&kvm->arch.vgic.lock);
Don't you deallocate the rbtree nodes here? Also in the future with EOI
mode == 1 we will need to complete the physical IRQ in place of the guest.

- Eric
> +	spin_lock_init(&kvm->arch.vgic.irq_phys_map_lock);
>  	kvm->arch.vgic.in_kernel = true;
>  	kvm->arch.vgic.vgic_model = type;
>  	kvm->arch.vgic.vctrl_base = vgic->vctrl_base;
> 

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 06/10] KVM: arm/arm64: vgic: Allow dynamic mapping of physical/virtual interrupts
@ 2015-06-15 15:44     ` Eric Auger
  0 siblings, 0 replies; 118+ messages in thread
From: Eric Auger @ 2015-06-15 15:44 UTC (permalink / raw)
  To: Marc Zyngier, kvm, kvmarm, linux-arm-kernel
  Cc: Christoffer Dall, Alex Bennée, Andre Przywara

Hi Marc,
On 06/08/2015 07:04 PM, Marc Zyngier wrote:
> In order to be able to feed physical interrupts to a guest, we need
> to be able to establish the virtual-physical mapping between the two
> worlds.
> 
> The mapping is kept in a rbtree, indexed by virtual interrupts.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  include/kvm/arm_vgic.h |  18 ++++++++
>  virt/kvm/arm/vgic.c    | 110 +++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 128 insertions(+)
> 
> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> index 4f9fa1d..33d121a 100644
> --- a/include/kvm/arm_vgic.h
> +++ b/include/kvm/arm_vgic.h
> @@ -159,6 +159,14 @@ struct vgic_io_device {
>  	struct kvm_io_device dev;
>  };
>  
> +struct irq_phys_map {
> +	struct rb_node		node;
> +	u32			virt_irq;
> +	u32			phys_irq;
> +	u32			irq;
> +	bool			active;
> +};
> +
>  struct vgic_dist {
>  	spinlock_t		lock;
>  	bool			in_kernel;
> @@ -256,6 +264,10 @@ struct vgic_dist {
>  	struct vgic_vm_ops	vm_ops;
>  	struct vgic_io_device	dist_iodev;
>  	struct vgic_io_device	*redist_iodevs;
> +
> +	/* Virtual irq to hwirq mapping */
> +	spinlock_t		irq_phys_map_lock;
> +	struct rb_root		irq_phys_map;
>  };
>  
>  struct vgic_v2_cpu_if {
> @@ -307,6 +319,9 @@ struct vgic_cpu {
>  		struct vgic_v2_cpu_if	vgic_v2;
>  		struct vgic_v3_cpu_if	vgic_v3;
>  	};
> +
> +	/* Protected by the distributor's irq_phys_map_lock */
> +	struct rb_root	irq_phys_map;
>  };
>  
>  #define LR_EMPTY	0xff
> @@ -331,6 +346,9 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int irq_num,
>  void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg);
>  int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
>  int kvm_vgic_vcpu_active_irq(struct kvm_vcpu *vcpu);
> +struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
> +				       int virt_irq, int irq);
> +int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map);
>  
>  #define irqchip_in_kernel(k)	(!!((k)->arch.vgic.in_kernel))
>  #define vgic_initialized(k)	(!!((k)->arch.vgic.nr_cpus))
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 59ed7a3..c6604f2 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -24,6 +24,7 @@
>  #include <linux/of.h>
>  #include <linux/of_address.h>
>  #include <linux/of_irq.h>
> +#include <linux/rbtree.h>
>  #include <linux/uaccess.h>
>  
>  #include <linux/irqchip/arm-gic.h>
> @@ -84,6 +85,8 @@ static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu);
>  static void vgic_retire_lr(int lr_nr, int irq, struct kvm_vcpu *vcpu);
>  static struct vgic_lr vgic_get_lr(const struct kvm_vcpu *vcpu, int lr);
>  static void vgic_set_lr(struct kvm_vcpu *vcpu, int lr, struct vgic_lr lr_desc);
> +static struct irq_phys_map *vgic_irq_map_search(struct kvm_vcpu *vcpu,
> +						int virt_irq);
>  
>  static const struct vgic_ops *vgic_ops;
>  static const struct vgic_params *vgic;
> @@ -1585,6 +1588,112 @@ static irqreturn_t vgic_maintenance_handler(int irq, void *data)
>  	return IRQ_HANDLED;
>  }
>  
> +static struct rb_root *vgic_get_irq_phys_map(struct kvm_vcpu *vcpu,
> +					     int virt_irq)
> +{
> +	if (virt_irq < VGIC_NR_PRIVATE_IRQS)
> +		return &vcpu->arch.vgic_cpu.irq_phys_map;
> +	else
> +		return &vcpu->kvm->arch.vgic.irq_phys_map;
> +}
> +
> +struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
> +				       int virt_irq, int irq)
> +{
> +	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
> +	struct rb_root *root = vgic_get_irq_phys_map(vcpu, virt_irq);
> +	struct rb_node **new = &root->rb_node, *parent = NULL;
> +	struct irq_phys_map *new_map;
> +	struct irq_desc *desc;
> +	struct irq_data *data;
> +	int phys_irq;
> +
> +	desc = irq_to_desc(irq);
> +	if (!desc) {
> +		kvm_err("kvm_arch_timer: can't obtain interrupt descriptor\n");
> +		return NULL;
> +	}
> +
> +	data = irq_desc_get_irq_data(desc);
> +	while (data->parent_data)
> +		data = data->parent_data;
> +
> +	phys_irq = data->hwirq;
> +
> +	spin_lock(&dist->irq_phys_map_lock);
> +
> +	/* Boilerplate rb_tree code */
> +	while (*new) {
> +		struct irq_phys_map *this;
> +
> +		this = container_of(*new, struct irq_phys_map, node);
> +		parent = *new;
> +		if (this->virt_irq < virt_irq)
> +			new = &(*new)->rb_left;
> +		else if (this->virt_irq > virt_irq)
> +			new = &(*new)->rb_right;
> +		else {
> +			new_map = this;
in case the mapping already exists you don't update the mappping or
return an error. Is it what you want here?

> +			goto out;
> +		}
> +	}
> +
> +	new_map = kzalloc(sizeof(*new_map), GFP_KERNEL);
> +	if (!new_map)
> +		goto out;
> +
> +	new_map->virt_irq = virt_irq;
> +	new_map->phys_irq = phys_irq;
> +	new_map->irq = irq;
> +
> +	rb_link_node(&new_map->node, parent, new);
> +	rb_insert_color(&new_map->node, root);
> +
> +out:
> +	spin_unlock(&dist->irq_phys_map_lock);
> +	return new_map;
> +}
> +
> +static struct irq_phys_map *vgic_irq_map_search(struct kvm_vcpu *vcpu,
> +						int virt_irq)
> +{
> +	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
> +	struct rb_root *root = vgic_get_irq_phys_map(vcpu, virt_irq);
> +	struct rb_node *node = root->rb_node;
> +	struct irq_phys_map *this = NULL;
> +
> +	spin_lock(&dist->irq_phys_map_lock);
> +
> +	while (node) {
> +		this = container_of(node, struct irq_phys_map, node);
> +
> +		if (this->virt_irq < virt_irq)
> +			node = node->rb_left;
> +		else if (this->virt_irq > virt_irq)
> +			node = node->rb_right;
> +		else
> +			break;
> +	}
> +
> +	spin_unlock(&dist->irq_phys_map_lock);
> +	return this;
> +}
> +
> +int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map)
> +{
> +	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
> +
> +	if (!map)
> +		return -EINVAL;
> +
> +	spin_lock(&dist->irq_phys_map_lock);
> +	rb_erase(&map->node, vgic_get_irq_phys_map(vcpu, map->virt_irq));
> +	spin_unlock(&dist->irq_phys_map_lock);
> +
> +	kfree(map);
> +	return 0;
> +}
> +
>  void kvm_vgic_vcpu_destroy(struct kvm_vcpu *vcpu)
>  {
>  	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
> @@ -1835,6 +1944,7 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
>  		goto out_unlock;
>  
>  	spin_lock_init(&kvm->arch.vgic.lock);
Don't you deallocate the rbtree nodes here? Also in the future with EOI
mode == 1 we will need to complete the physical IRQ in place of the guest.

- Eric
> +	spin_lock_init(&kvm->arch.vgic.irq_phys_map_lock);
>  	kvm->arch.vgic.in_kernel = true;
>  	kvm->arch.vgic.vgic_model = type;
>  	kvm->arch.vgic.vctrl_base = vgic->vctrl_base;
> 


^ permalink raw reply	[flat|nested] 118+ messages in thread

* [PATCH 07/10] KVM: arm/arm64: vgic: Allow HW interrupts to be queued to a guest
  2015-06-11 10:02           ` Marc Zyngier
@ 2015-06-15 16:11             ` Eric Auger
  -1 siblings, 0 replies; 118+ messages in thread
From: Eric Auger @ 2015-06-15 16:11 UTC (permalink / raw)
  To: linux-arm-kernel

On 06/11/2015 12:02 PM, Marc Zyngier wrote:
> On 11/06/15 10:44, Andre Przywara wrote:
>> On 06/11/2015 10:15 AM, Marc Zyngier wrote:
>>> On 11/06/15 09:44, Andre Przywara wrote:
>>>> On 06/08/2015 06:04 PM, Marc Zyngier wrote:
>> ...
>>>>> @@ -1344,6 +1364,35 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
>>>>>  	return level_pending;
>>>>>  }
>>>>>  
>>>>> +/* Return 1 if HW interrupt went from active to inactive, and 0 otherwise */
>>>>> +static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr)
>>>>> +{
>>>>> +	struct irq_phys_map *map;
>>>>> +	int ret;
>>>>> +
>>>>> +	if (!(vlr.state & LR_HW))
>>>>> +		return 0;
>>>>> +
>>>>> +	map = vgic_irq_map_search(vcpu, vlr.irq);
>>>>
>>>> I wonder if it's safe to rely on that mapping here. Are we sure that
>>>> this hasn't changed while the VCPU was running? If I got this correctly,
>>>> currently only vcpu_reset will actually add a map entry, but I guess in
>>>> the future there will be more users.
>>>
>>> How can the guest interrupt change? This is HW, as far as the guest is
>>> concerned. An actual interrupt line. We don't reconfigure the HW live.
>>
>> I was thinking about the rbtree mapping we introduced. There we map a
>> guest interrupt to a hardware interrupt. Are we sure that no one tears
>> down that mapping while we have an LR populated with this pair?
>> I am not talking about the timer here, but more about future users.
>>
>>>> Also we rely on the irqdomain mapping to be still the same, but that is
>>>> probably a safe assumption.
>>>
>>> Like I said before, this *cannot* change.
>>
>> OK, got it.
>>
>>>
>>>> But I'd still find it more natural to use the hwirq number from the LR
>>>> at this point. Can't we use irq_find_mapping() here to learn Linux'
>>>> (current) irq number from that?
>>>
>>> I think you're confused.
>>>
>>> - The guest irq (vlr.irq) is entirely made up, and has no connection
>>> with reality. it is stable, and cannot change during the lifetime of the
>>> guest (think of it as a HW irq line).
>>>
>>> - The host hwirq (vlr.hwirq) is stable as well, for the same reason.
>>>
>>> - The Linux IRQ cannot change because we've been given it by the kernel,
>>> and that's what we use for *everything* as far as the kernel is
>>> concerned. Its mapping to hwirq is stable as well because this is how we
>>> talk to the HW.
>>
>> Not disputing any of them, but:
>>
>>> - irq_find_mapping gives you the *reverse* mapping (from hwirq to Linux
>>> irq), and for that to work, you need the domain on which you want to
>>> apply the translation. This is only useful when actually taking the
>>> interrupt (i.e. in an interrupt controller driver). I can't see how that
>>> could make sense here.
>>
>> So if the guest has acked/EOIed it's IRQ, the GIC at the same time
>> acked/EOIed the hardware IRQ it found in the LR. Now we assume that this
>> is the very same as the HW IRQ we found doing our rbtree traversal.
>> I just wanted to be sure that this is always true and that this mapping
>> didn't change while the VCPU was running.
>> If you are sure of this, fine, I was just concerned that someone breaks
>> this assumption in the future by more dynamically mapping/unmapping
>> entries (say some irq forwarding user) and we will not notice.
> 
> How can the mapping change? Are you thinking of an unmap/map operation
> being done while the guest is running, replacing a HW device with
> another? That's not an option, and not only for the interrupts.

Well that's what we achieved I think with the kvm-vfio integration. The
requirement was: since we allow the user-space to turn forwarding on,
through the kvm-vfio device, we should offer the inverse operation and
this was should never fail. This was achieved by forcing the guest exit,
check the HW state of the IRQ, and quite a lot of pain ...

At that time the kvm-vfio integration seemed to be the most appropriate
approach. Now it seems this is put into question again with Intel posted
IRQ API series review (https://lkml.org/lkml/2015/6/12/595). I think you
will happy. not sure I can say the same ;-)

Best Regards

Eric

> 
> 	M.
> 

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 07/10] KVM: arm/arm64: vgic: Allow HW interrupts to be queued to a guest
@ 2015-06-15 16:11             ` Eric Auger
  0 siblings, 0 replies; 118+ messages in thread
From: Eric Auger @ 2015-06-15 16:11 UTC (permalink / raw)
  To: Marc Zyngier, Andre Przywara,
	linux-arm-kernel@lists.infradead.org
  Cc: kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu,
	Christoffer Dall, Alex Bennée

On 06/11/2015 12:02 PM, Marc Zyngier wrote:
> On 11/06/15 10:44, Andre Przywara wrote:
>> On 06/11/2015 10:15 AM, Marc Zyngier wrote:
>>> On 11/06/15 09:44, Andre Przywara wrote:
>>>> On 06/08/2015 06:04 PM, Marc Zyngier wrote:
>> ...
>>>>> @@ -1344,6 +1364,35 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
>>>>>  	return level_pending;
>>>>>  }
>>>>>  
>>>>> +/* Return 1 if HW interrupt went from active to inactive, and 0 otherwise */
>>>>> +static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr)
>>>>> +{
>>>>> +	struct irq_phys_map *map;
>>>>> +	int ret;
>>>>> +
>>>>> +	if (!(vlr.state & LR_HW))
>>>>> +		return 0;
>>>>> +
>>>>> +	map = vgic_irq_map_search(vcpu, vlr.irq);
>>>>
>>>> I wonder if it's safe to rely on that mapping here. Are we sure that
>>>> this hasn't changed while the VCPU was running? If I got this correctly,
>>>> currently only vcpu_reset will actually add a map entry, but I guess in
>>>> the future there will be more users.
>>>
>>> How can the guest interrupt change? This is HW, as far as the guest is
>>> concerned. An actual interrupt line. We don't reconfigure the HW live.
>>
>> I was thinking about the rbtree mapping we introduced. There we map a
>> guest interrupt to a hardware interrupt. Are we sure that no one tears
>> down that mapping while we have an LR populated with this pair?
>> I am not talking about the timer here, but more about future users.
>>
>>>> Also we rely on the irqdomain mapping to be still the same, but that is
>>>> probably a safe assumption.
>>>
>>> Like I said before, this *cannot* change.
>>
>> OK, got it.
>>
>>>
>>>> But I'd still find it more natural to use the hwirq number from the LR
>>>> at this point. Can't we use irq_find_mapping() here to learn Linux'
>>>> (current) irq number from that?
>>>
>>> I think you're confused.
>>>
>>> - The guest irq (vlr.irq) is entirely made up, and has no connection
>>> with reality. it is stable, and cannot change during the lifetime of the
>>> guest (think of it as a HW irq line).
>>>
>>> - The host hwirq (vlr.hwirq) is stable as well, for the same reason.
>>>
>>> - The Linux IRQ cannot change because we've been given it by the kernel,
>>> and that's what we use for *everything* as far as the kernel is
>>> concerned. Its mapping to hwirq is stable as well because this is how we
>>> talk to the HW.
>>
>> Not disputing any of them, but:
>>
>>> - irq_find_mapping gives you the *reverse* mapping (from hwirq to Linux
>>> irq), and for that to work, you need the domain on which you want to
>>> apply the translation. This is only useful when actually taking the
>>> interrupt (i.e. in an interrupt controller driver). I can't see how that
>>> could make sense here.
>>
>> So if the guest has acked/EOIed it's IRQ, the GIC at the same time
>> acked/EOIed the hardware IRQ it found in the LR. Now we assume that this
>> is the very same as the HW IRQ we found doing our rbtree traversal.
>> I just wanted to be sure that this is always true and that this mapping
>> didn't change while the VCPU was running.
>> If you are sure of this, fine, I was just concerned that someone breaks
>> this assumption in the future by more dynamically mapping/unmapping
>> entries (say some irq forwarding user) and we will not notice.
> 
> How can the mapping change? Are you thinking of an unmap/map operation
> being done while the guest is running, replacing a HW device with
> another? That's not an option, and not only for the interrupts.

Well that's what we achieved I think with the kvm-vfio integration. The
requirement was: since we allow the user-space to turn forwarding on,
through the kvm-vfio device, we should offer the inverse operation and
this was should never fail. This was achieved by forcing the guest exit,
check the HW state of the IRQ, and quite a lot of pain ...

At that time the kvm-vfio integration seemed to be the most appropriate
approach. Now it seems this is put into question again with Intel posted
IRQ API series review (https://lkml.org/lkml/2015/6/12/595). I think you
will happy. not sure I can say the same ;-)

Best Regards

Eric

> 
> 	M.
> 


^ permalink raw reply	[flat|nested] 118+ messages in thread

* [PATCH 06/10] KVM: arm/arm64: vgic: Allow dynamic mapping of physical/virtual interrupts
  2015-06-15 15:44     ` Eric Auger
@ 2015-06-16  8:28       ` Marc Zyngier
  -1 siblings, 0 replies; 118+ messages in thread
From: Marc Zyngier @ 2015-06-16  8:28 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Eric,

On 15/06/15 16:44, Eric Auger wrote:
> Hi Marc,
> On 06/08/2015 07:04 PM, Marc Zyngier wrote:
>> In order to be able to feed physical interrupts to a guest, we need
>> to be able to establish the virtual-physical mapping between the two
>> worlds.
>>
>> The mapping is kept in a rbtree, indexed by virtual interrupts.
>>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  include/kvm/arm_vgic.h |  18 ++++++++
>>  virt/kvm/arm/vgic.c    | 110 +++++++++++++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 128 insertions(+)
>>
>> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
>> index 4f9fa1d..33d121a 100644
>> --- a/include/kvm/arm_vgic.h
>> +++ b/include/kvm/arm_vgic.h
>> @@ -159,6 +159,14 @@ struct vgic_io_device {
>>  	struct kvm_io_device dev;
>>  };
>>  
>> +struct irq_phys_map {
>> +	struct rb_node		node;
>> +	u32			virt_irq;
>> +	u32			phys_irq;
>> +	u32			irq;
>> +	bool			active;
>> +};
>> +
>>  struct vgic_dist {
>>  	spinlock_t		lock;
>>  	bool			in_kernel;
>> @@ -256,6 +264,10 @@ struct vgic_dist {
>>  	struct vgic_vm_ops	vm_ops;
>>  	struct vgic_io_device	dist_iodev;
>>  	struct vgic_io_device	*redist_iodevs;
>> +
>> +	/* Virtual irq to hwirq mapping */
>> +	spinlock_t		irq_phys_map_lock;
>> +	struct rb_root		irq_phys_map;
>>  };
>>  
>>  struct vgic_v2_cpu_if {
>> @@ -307,6 +319,9 @@ struct vgic_cpu {
>>  		struct vgic_v2_cpu_if	vgic_v2;
>>  		struct vgic_v3_cpu_if	vgic_v3;
>>  	};
>> +
>> +	/* Protected by the distributor's irq_phys_map_lock */
>> +	struct rb_root	irq_phys_map;
>>  };
>>  
>>  #define LR_EMPTY	0xff
>> @@ -331,6 +346,9 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int irq_num,
>>  void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg);
>>  int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
>>  int kvm_vgic_vcpu_active_irq(struct kvm_vcpu *vcpu);
>> +struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
>> +				       int virt_irq, int irq);
>> +int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map);
>>  
>>  #define irqchip_in_kernel(k)	(!!((k)->arch.vgic.in_kernel))
>>  #define vgic_initialized(k)	(!!((k)->arch.vgic.nr_cpus))
>> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
>> index 59ed7a3..c6604f2 100644
>> --- a/virt/kvm/arm/vgic.c
>> +++ b/virt/kvm/arm/vgic.c
>> @@ -24,6 +24,7 @@
>>  #include <linux/of.h>
>>  #include <linux/of_address.h>
>>  #include <linux/of_irq.h>
>> +#include <linux/rbtree.h>
>>  #include <linux/uaccess.h>
>>  
>>  #include <linux/irqchip/arm-gic.h>
>> @@ -84,6 +85,8 @@ static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu);
>>  static void vgic_retire_lr(int lr_nr, int irq, struct kvm_vcpu *vcpu);
>>  static struct vgic_lr vgic_get_lr(const struct kvm_vcpu *vcpu, int lr);
>>  static void vgic_set_lr(struct kvm_vcpu *vcpu, int lr, struct vgic_lr lr_desc);
>> +static struct irq_phys_map *vgic_irq_map_search(struct kvm_vcpu *vcpu,
>> +						int virt_irq);
>>  
>>  static const struct vgic_ops *vgic_ops;
>>  static const struct vgic_params *vgic;
>> @@ -1585,6 +1588,112 @@ static irqreturn_t vgic_maintenance_handler(int irq, void *data)
>>  	return IRQ_HANDLED;
>>  }
>>  
>> +static struct rb_root *vgic_get_irq_phys_map(struct kvm_vcpu *vcpu,
>> +					     int virt_irq)
>> +{
>> +	if (virt_irq < VGIC_NR_PRIVATE_IRQS)
>> +		return &vcpu->arch.vgic_cpu.irq_phys_map;
>> +	else
>> +		return &vcpu->kvm->arch.vgic.irq_phys_map;
>> +}
>> +
>> +struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
>> +				       int virt_irq, int irq)
>> +{
>> +	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
>> +	struct rb_root *root = vgic_get_irq_phys_map(vcpu, virt_irq);
>> +	struct rb_node **new = &root->rb_node, *parent = NULL;
>> +	struct irq_phys_map *new_map;
>> +	struct irq_desc *desc;
>> +	struct irq_data *data;
>> +	int phys_irq;
>> +
>> +	desc = irq_to_desc(irq);
>> +	if (!desc) {
>> +		kvm_err("kvm_arch_timer: can't obtain interrupt descriptor\n");
>> +		return NULL;
>> +	}
>> +
>> +	data = irq_desc_get_irq_data(desc);
>> +	while (data->parent_data)
>> +		data = data->parent_data;
>> +
>> +	phys_irq = data->hwirq;
>> +
>> +	spin_lock(&dist->irq_phys_map_lock);
>> +
>> +	/* Boilerplate rb_tree code */
>> +	while (*new) {
>> +		struct irq_phys_map *this;
>> +
>> +		this = container_of(*new, struct irq_phys_map, node);
>> +		parent = *new;
>> +		if (this->virt_irq < virt_irq)
>> +			new = &(*new)->rb_left;
>> +		else if (this->virt_irq > virt_irq)
>> +			new = &(*new)->rb_right;
>> +		else {
>> +			new_map = this;
> in case the mapping already exists you don't update the mappping or
> return an error. Is it what you want here?

Calling the map function several times is not necessarily a bad idea, as
long as they result in the same mapping. Think of a reset function for a
device that would perform the mapping (just like the timer does). It
should be possible to perform that reset several times without seeing
anything failing.

Now, the code doesn't handle the case where you'd end up with a
different mapping for the same IRQ, which would be an error (you'd need
to go through an unmap first).

I'll update the code to take care of this case.

>> +			goto out;
>> +		}
>> +	}
>> +
>> +	new_map = kzalloc(sizeof(*new_map), GFP_KERNEL);
>> +	if (!new_map)
>> +		goto out;
>> +
>> +	new_map->virt_irq = virt_irq;
>> +	new_map->phys_irq = phys_irq;
>> +	new_map->irq = irq;
>> +
>> +	rb_link_node(&new_map->node, parent, new);
>> +	rb_insert_color(&new_map->node, root);
>> +
>> +out:
>> +	spin_unlock(&dist->irq_phys_map_lock);
>> +	return new_map;
>> +}
>> +
>> +static struct irq_phys_map *vgic_irq_map_search(struct kvm_vcpu *vcpu,
>> +						int virt_irq)
>> +{
>> +	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
>> +	struct rb_root *root = vgic_get_irq_phys_map(vcpu, virt_irq);
>> +	struct rb_node *node = root->rb_node;
>> +	struct irq_phys_map *this = NULL;
>> +
>> +	spin_lock(&dist->irq_phys_map_lock);
>> +
>> +	while (node) {
>> +		this = container_of(node, struct irq_phys_map, node);
>> +
>> +		if (this->virt_irq < virt_irq)
>> +			node = node->rb_left;
>> +		else if (this->virt_irq > virt_irq)
>> +			node = node->rb_right;
>> +		else
>> +			break;
>> +	}
>> +
>> +	spin_unlock(&dist->irq_phys_map_lock);
>> +	return this;
>> +}
>> +
>> +int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map)
>> +{
>> +	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
>> +
>> +	if (!map)
>> +		return -EINVAL;
>> +
>> +	spin_lock(&dist->irq_phys_map_lock);
>> +	rb_erase(&map->node, vgic_get_irq_phys_map(vcpu, map->virt_irq));
>> +	spin_unlock(&dist->irq_phys_map_lock);
>> +
>> +	kfree(map);
>> +	return 0;
>> +}
>> +
>>  void kvm_vgic_vcpu_destroy(struct kvm_vcpu *vcpu)
>>  {
>>  	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
>> @@ -1835,6 +1944,7 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
>>  		goto out_unlock;
>>  
>>  	spin_lock_init(&kvm->arch.vgic.lock);
> Don't you deallocate the rbtree nodes here?

Erm... Yes, indeed. Silly me.

> Also in the future with EOI mode == 1 we will need to complete the
> physical IRQ in place of the guest.

That's one of the things I needed your input on. I can perform the
deactivate here (clearing the active state is easy). But what will
guarantee that the interrupt won't be screaming? Will the device be
quiesced at that time?

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 06/10] KVM: arm/arm64: vgic: Allow dynamic mapping of physical/virtual interrupts
@ 2015-06-16  8:28       ` Marc Zyngier
  0 siblings, 0 replies; 118+ messages in thread
From: Marc Zyngier @ 2015-06-16  8:28 UTC (permalink / raw)
  To: Eric Auger, kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org
  Cc: Christoffer Dall, Alex Bennée, Andre Przywara

Hi Eric,

On 15/06/15 16:44, Eric Auger wrote:
> Hi Marc,
> On 06/08/2015 07:04 PM, Marc Zyngier wrote:
>> In order to be able to feed physical interrupts to a guest, we need
>> to be able to establish the virtual-physical mapping between the two
>> worlds.
>>
>> The mapping is kept in a rbtree, indexed by virtual interrupts.
>>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  include/kvm/arm_vgic.h |  18 ++++++++
>>  virt/kvm/arm/vgic.c    | 110 +++++++++++++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 128 insertions(+)
>>
>> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
>> index 4f9fa1d..33d121a 100644
>> --- a/include/kvm/arm_vgic.h
>> +++ b/include/kvm/arm_vgic.h
>> @@ -159,6 +159,14 @@ struct vgic_io_device {
>>  	struct kvm_io_device dev;
>>  };
>>  
>> +struct irq_phys_map {
>> +	struct rb_node		node;
>> +	u32			virt_irq;
>> +	u32			phys_irq;
>> +	u32			irq;
>> +	bool			active;
>> +};
>> +
>>  struct vgic_dist {
>>  	spinlock_t		lock;
>>  	bool			in_kernel;
>> @@ -256,6 +264,10 @@ struct vgic_dist {
>>  	struct vgic_vm_ops	vm_ops;
>>  	struct vgic_io_device	dist_iodev;
>>  	struct vgic_io_device	*redist_iodevs;
>> +
>> +	/* Virtual irq to hwirq mapping */
>> +	spinlock_t		irq_phys_map_lock;
>> +	struct rb_root		irq_phys_map;
>>  };
>>  
>>  struct vgic_v2_cpu_if {
>> @@ -307,6 +319,9 @@ struct vgic_cpu {
>>  		struct vgic_v2_cpu_if	vgic_v2;
>>  		struct vgic_v3_cpu_if	vgic_v3;
>>  	};
>> +
>> +	/* Protected by the distributor's irq_phys_map_lock */
>> +	struct rb_root	irq_phys_map;
>>  };
>>  
>>  #define LR_EMPTY	0xff
>> @@ -331,6 +346,9 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int irq_num,
>>  void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg);
>>  int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
>>  int kvm_vgic_vcpu_active_irq(struct kvm_vcpu *vcpu);
>> +struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
>> +				       int virt_irq, int irq);
>> +int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map);
>>  
>>  #define irqchip_in_kernel(k)	(!!((k)->arch.vgic.in_kernel))
>>  #define vgic_initialized(k)	(!!((k)->arch.vgic.nr_cpus))
>> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
>> index 59ed7a3..c6604f2 100644
>> --- a/virt/kvm/arm/vgic.c
>> +++ b/virt/kvm/arm/vgic.c
>> @@ -24,6 +24,7 @@
>>  #include <linux/of.h>
>>  #include <linux/of_address.h>
>>  #include <linux/of_irq.h>
>> +#include <linux/rbtree.h>
>>  #include <linux/uaccess.h>
>>  
>>  #include <linux/irqchip/arm-gic.h>
>> @@ -84,6 +85,8 @@ static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu);
>>  static void vgic_retire_lr(int lr_nr, int irq, struct kvm_vcpu *vcpu);
>>  static struct vgic_lr vgic_get_lr(const struct kvm_vcpu *vcpu, int lr);
>>  static void vgic_set_lr(struct kvm_vcpu *vcpu, int lr, struct vgic_lr lr_desc);
>> +static struct irq_phys_map *vgic_irq_map_search(struct kvm_vcpu *vcpu,
>> +						int virt_irq);
>>  
>>  static const struct vgic_ops *vgic_ops;
>>  static const struct vgic_params *vgic;
>> @@ -1585,6 +1588,112 @@ static irqreturn_t vgic_maintenance_handler(int irq, void *data)
>>  	return IRQ_HANDLED;
>>  }
>>  
>> +static struct rb_root *vgic_get_irq_phys_map(struct kvm_vcpu *vcpu,
>> +					     int virt_irq)
>> +{
>> +	if (virt_irq < VGIC_NR_PRIVATE_IRQS)
>> +		return &vcpu->arch.vgic_cpu.irq_phys_map;
>> +	else
>> +		return &vcpu->kvm->arch.vgic.irq_phys_map;
>> +}
>> +
>> +struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
>> +				       int virt_irq, int irq)
>> +{
>> +	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
>> +	struct rb_root *root = vgic_get_irq_phys_map(vcpu, virt_irq);
>> +	struct rb_node **new = &root->rb_node, *parent = NULL;
>> +	struct irq_phys_map *new_map;
>> +	struct irq_desc *desc;
>> +	struct irq_data *data;
>> +	int phys_irq;
>> +
>> +	desc = irq_to_desc(irq);
>> +	if (!desc) {
>> +		kvm_err("kvm_arch_timer: can't obtain interrupt descriptor\n");
>> +		return NULL;
>> +	}
>> +
>> +	data = irq_desc_get_irq_data(desc);
>> +	while (data->parent_data)
>> +		data = data->parent_data;
>> +
>> +	phys_irq = data->hwirq;
>> +
>> +	spin_lock(&dist->irq_phys_map_lock);
>> +
>> +	/* Boilerplate rb_tree code */
>> +	while (*new) {
>> +		struct irq_phys_map *this;
>> +
>> +		this = container_of(*new, struct irq_phys_map, node);
>> +		parent = *new;
>> +		if (this->virt_irq < virt_irq)
>> +			new = &(*new)->rb_left;
>> +		else if (this->virt_irq > virt_irq)
>> +			new = &(*new)->rb_right;
>> +		else {
>> +			new_map = this;
> in case the mapping already exists you don't update the mappping or
> return an error. Is it what you want here?

Calling the map function several times is not necessarily a bad idea, as
long as they result in the same mapping. Think of a reset function for a
device that would perform the mapping (just like the timer does). It
should be possible to perform that reset several times without seeing
anything failing.

Now, the code doesn't handle the case where you'd end up with a
different mapping for the same IRQ, which would be an error (you'd need
to go through an unmap first).

I'll update the code to take care of this case.

>> +			goto out;
>> +		}
>> +	}
>> +
>> +	new_map = kzalloc(sizeof(*new_map), GFP_KERNEL);
>> +	if (!new_map)
>> +		goto out;
>> +
>> +	new_map->virt_irq = virt_irq;
>> +	new_map->phys_irq = phys_irq;
>> +	new_map->irq = irq;
>> +
>> +	rb_link_node(&new_map->node, parent, new);
>> +	rb_insert_color(&new_map->node, root);
>> +
>> +out:
>> +	spin_unlock(&dist->irq_phys_map_lock);
>> +	return new_map;
>> +}
>> +
>> +static struct irq_phys_map *vgic_irq_map_search(struct kvm_vcpu *vcpu,
>> +						int virt_irq)
>> +{
>> +	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
>> +	struct rb_root *root = vgic_get_irq_phys_map(vcpu, virt_irq);
>> +	struct rb_node *node = root->rb_node;
>> +	struct irq_phys_map *this = NULL;
>> +
>> +	spin_lock(&dist->irq_phys_map_lock);
>> +
>> +	while (node) {
>> +		this = container_of(node, struct irq_phys_map, node);
>> +
>> +		if (this->virt_irq < virt_irq)
>> +			node = node->rb_left;
>> +		else if (this->virt_irq > virt_irq)
>> +			node = node->rb_right;
>> +		else
>> +			break;
>> +	}
>> +
>> +	spin_unlock(&dist->irq_phys_map_lock);
>> +	return this;
>> +}
>> +
>> +int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map)
>> +{
>> +	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
>> +
>> +	if (!map)
>> +		return -EINVAL;
>> +
>> +	spin_lock(&dist->irq_phys_map_lock);
>> +	rb_erase(&map->node, vgic_get_irq_phys_map(vcpu, map->virt_irq));
>> +	spin_unlock(&dist->irq_phys_map_lock);
>> +
>> +	kfree(map);
>> +	return 0;
>> +}
>> +
>>  void kvm_vgic_vcpu_destroy(struct kvm_vcpu *vcpu)
>>  {
>>  	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
>> @@ -1835,6 +1944,7 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
>>  		goto out_unlock;
>>  
>>  	spin_lock_init(&kvm->arch.vgic.lock);
> Don't you deallocate the rbtree nodes here?

Erm... Yes, indeed. Silly me.

> Also in the future with EOI mode == 1 we will need to complete the
> physical IRQ in place of the guest.

That's one of the things I needed your input on. I can perform the
deactivate here (clearing the active state is easy). But what will
guarantee that the interrupt won't be screaming? Will the device be
quiesced at that time?

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [PATCH 06/10] KVM: arm/arm64: vgic: Allow dynamic mapping of physical/virtual interrupts
  2015-06-16  8:28       ` Marc Zyngier
@ 2015-06-16  9:10         ` Eric Auger
  -1 siblings, 0 replies; 118+ messages in thread
From: Eric Auger @ 2015-06-16  9:10 UTC (permalink / raw)
  To: linux-arm-kernel

On 06/16/2015 10:28 AM, Marc Zyngier wrote:
> Hi Eric,
> 
> On 15/06/15 16:44, Eric Auger wrote:
>> Hi Marc,
>> On 06/08/2015 07:04 PM, Marc Zyngier wrote:
>>> In order to be able to feed physical interrupts to a guest, we need
>>> to be able to establish the virtual-physical mapping between the two
>>> worlds.
>>>
>>> The mapping is kept in a rbtree, indexed by virtual interrupts.
>>>
>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>> ---
>>>  include/kvm/arm_vgic.h |  18 ++++++++
>>>  virt/kvm/arm/vgic.c    | 110 +++++++++++++++++++++++++++++++++++++++++++++++++
>>>  2 files changed, 128 insertions(+)
>>>
>>> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
>>> index 4f9fa1d..33d121a 100644
>>> --- a/include/kvm/arm_vgic.h
>>> +++ b/include/kvm/arm_vgic.h
>>> @@ -159,6 +159,14 @@ struct vgic_io_device {
>>>  	struct kvm_io_device dev;
>>>  };
>>>  
>>> +struct irq_phys_map {
>>> +	struct rb_node		node;
>>> +	u32			virt_irq;
>>> +	u32			phys_irq;
>>> +	u32			irq;
>>> +	bool			active;
>>> +};
>>> +
>>>  struct vgic_dist {
>>>  	spinlock_t		lock;
>>>  	bool			in_kernel;
>>> @@ -256,6 +264,10 @@ struct vgic_dist {
>>>  	struct vgic_vm_ops	vm_ops;
>>>  	struct vgic_io_device	dist_iodev;
>>>  	struct vgic_io_device	*redist_iodevs;
>>> +
>>> +	/* Virtual irq to hwirq mapping */
>>> +	spinlock_t		irq_phys_map_lock;
>>> +	struct rb_root		irq_phys_map;
>>>  };
>>>  
>>>  struct vgic_v2_cpu_if {
>>> @@ -307,6 +319,9 @@ struct vgic_cpu {
>>>  		struct vgic_v2_cpu_if	vgic_v2;
>>>  		struct vgic_v3_cpu_if	vgic_v3;
>>>  	};
>>> +
>>> +	/* Protected by the distributor's irq_phys_map_lock */
>>> +	struct rb_root	irq_phys_map;
>>>  };
>>>  
>>>  #define LR_EMPTY	0xff
>>> @@ -331,6 +346,9 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int irq_num,
>>>  void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg);
>>>  int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
>>>  int kvm_vgic_vcpu_active_irq(struct kvm_vcpu *vcpu);
>>> +struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
>>> +				       int virt_irq, int irq);
>>> +int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map);
>>>  
>>>  #define irqchip_in_kernel(k)	(!!((k)->arch.vgic.in_kernel))
>>>  #define vgic_initialized(k)	(!!((k)->arch.vgic.nr_cpus))
>>> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
>>> index 59ed7a3..c6604f2 100644
>>> --- a/virt/kvm/arm/vgic.c
>>> +++ b/virt/kvm/arm/vgic.c
>>> @@ -24,6 +24,7 @@
>>>  #include <linux/of.h>
>>>  #include <linux/of_address.h>
>>>  #include <linux/of_irq.h>
>>> +#include <linux/rbtree.h>
>>>  #include <linux/uaccess.h>
>>>  
>>>  #include <linux/irqchip/arm-gic.h>
>>> @@ -84,6 +85,8 @@ static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu);
>>>  static void vgic_retire_lr(int lr_nr, int irq, struct kvm_vcpu *vcpu);
>>>  static struct vgic_lr vgic_get_lr(const struct kvm_vcpu *vcpu, int lr);
>>>  static void vgic_set_lr(struct kvm_vcpu *vcpu, int lr, struct vgic_lr lr_desc);
>>> +static struct irq_phys_map *vgic_irq_map_search(struct kvm_vcpu *vcpu,
>>> +						int virt_irq);
>>>  
>>>  static const struct vgic_ops *vgic_ops;
>>>  static const struct vgic_params *vgic;
>>> @@ -1585,6 +1588,112 @@ static irqreturn_t vgic_maintenance_handler(int irq, void *data)
>>>  	return IRQ_HANDLED;
>>>  }
>>>  
>>> +static struct rb_root *vgic_get_irq_phys_map(struct kvm_vcpu *vcpu,
>>> +					     int virt_irq)
>>> +{
>>> +	if (virt_irq < VGIC_NR_PRIVATE_IRQS)
>>> +		return &vcpu->arch.vgic_cpu.irq_phys_map;
>>> +	else
>>> +		return &vcpu->kvm->arch.vgic.irq_phys_map;
>>> +}
>>> +
>>> +struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
>>> +				       int virt_irq, int irq)
>>> +{
>>> +	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
>>> +	struct rb_root *root = vgic_get_irq_phys_map(vcpu, virt_irq);
>>> +	struct rb_node **new = &root->rb_node, *parent = NULL;
>>> +	struct irq_phys_map *new_map;
>>> +	struct irq_desc *desc;
>>> +	struct irq_data *data;
>>> +	int phys_irq;
>>> +
>>> +	desc = irq_to_desc(irq);
>>> +	if (!desc) {
>>> +		kvm_err("kvm_arch_timer: can't obtain interrupt descriptor\n");
>>> +		return NULL;
>>> +	}
>>> +
>>> +	data = irq_desc_get_irq_data(desc);
>>> +	while (data->parent_data)
>>> +		data = data->parent_data;
>>> +
>>> +	phys_irq = data->hwirq;
>>> +
>>> +	spin_lock(&dist->irq_phys_map_lock);
>>> +
>>> +	/* Boilerplate rb_tree code */
>>> +	while (*new) {
>>> +		struct irq_phys_map *this;
>>> +
>>> +		this = container_of(*new, struct irq_phys_map, node);
>>> +		parent = *new;
>>> +		if (this->virt_irq < virt_irq)
>>> +			new = &(*new)->rb_left;
>>> +		else if (this->virt_irq > virt_irq)
>>> +			new = &(*new)->rb_right;
>>> +		else {
>>> +			new_map = this;
>> in case the mapping already exists you don't update the mappping or
>> return an error. Is it what you want here?
> 
> Calling the map function several times is not necessarily a bad idea, as
> long as they result in the same mapping. Think of a reset function for a
> device that would perform the mapping (just like the timer does). It
> should be possible to perform that reset several times without seeing
> anything failing.
> 
> Now, the code doesn't handle the case where you'd end up with a
> different mapping for the same IRQ, which would be an error (you'd need
> to go through an unmap first).
> 
> I'll update the code to take care of this case.
> 
>>> +			goto out;
>>> +		}
>>> +	}
>>> +
>>> +	new_map = kzalloc(sizeof(*new_map), GFP_KERNEL);
>>> +	if (!new_map)
>>> +		goto out;
>>> +
>>> +	new_map->virt_irq = virt_irq;
>>> +	new_map->phys_irq = phys_irq;
>>> +	new_map->irq = irq;
>>> +
>>> +	rb_link_node(&new_map->node, parent, new);
>>> +	rb_insert_color(&new_map->node, root);
>>> +
>>> +out:
>>> +	spin_unlock(&dist->irq_phys_map_lock);
>>> +	return new_map;
>>> +}
>>> +
>>> +static struct irq_phys_map *vgic_irq_map_search(struct kvm_vcpu *vcpu,
>>> +						int virt_irq)
>>> +{
>>> +	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
>>> +	struct rb_root *root = vgic_get_irq_phys_map(vcpu, virt_irq);
>>> +	struct rb_node *node = root->rb_node;
>>> +	struct irq_phys_map *this = NULL;
>>> +
>>> +	spin_lock(&dist->irq_phys_map_lock);
>>> +
>>> +	while (node) {
>>> +		this = container_of(node, struct irq_phys_map, node);
>>> +
>>> +		if (this->virt_irq < virt_irq)
>>> +			node = node->rb_left;
>>> +		else if (this->virt_irq > virt_irq)
>>> +			node = node->rb_right;
>>> +		else
>>> +			break;
>>> +	}
>>> +
>>> +	spin_unlock(&dist->irq_phys_map_lock);
>>> +	return this;
>>> +}
>>> +
>>> +int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map)
>>> +{
>>> +	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
>>> +
>>> +	if (!map)
>>> +		return -EINVAL;
>>> +
>>> +	spin_lock(&dist->irq_phys_map_lock);
>>> +	rb_erase(&map->node, vgic_get_irq_phys_map(vcpu, map->virt_irq));
>>> +	spin_unlock(&dist->irq_phys_map_lock);
>>> +
>>> +	kfree(map);
>>> +	return 0;
>>> +}
>>> +
>>>  void kvm_vgic_vcpu_destroy(struct kvm_vcpu *vcpu)
>>>  {
>>>  	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
>>> @@ -1835,6 +1944,7 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
>>>  		goto out_unlock;
>>>  
>>>  	spin_lock_init(&kvm->arch.vgic.lock);
>> Don't you deallocate the rbtree nodes here?
> 
> Erm... Yes, indeed. Silly me.
> 
>> Also in the future with EOI mode == 1 we will need to complete the
>> physical IRQ in place of the guest.
> 
> That's one of the things I needed your input on. I can perform the
> deactivate here (clearing the active state is easy). But what will
> guarantee that the interrupt won't be screaming? Will the device be
> quiesced at that time?
In my VFIO use case what does happen is that the guest might be killed
without being able to handle and deactivate the vIRQ/pIRQ properly. In
such a case when you restart a new guest using the same forwarded
physical IRQ, the physical IRQ still is active and you cannot restart
properly. This is why it is mandated to take care of the deactivation,
at some point, in case the guest failed to do it.

When killing the guest the device is not necessarily quiescent, ie IRQ
might still be issued. This depends on the availability of a vfio reset
module whose job consists in stopping DMA accesses and IRQ
(https://lkml.org/lkml/2015/6/15/123).

In //lkml.org/lkml/2014/11/23/120 I cleared the unforwarded state before
doing the deactivate so if a new IRQ hits, it is completed by the host
and not injected.

Hope it helps

Eric
> 
> Thanks,
> 
> 	M.
> 

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 06/10] KVM: arm/arm64: vgic: Allow dynamic mapping of physical/virtual interrupts
@ 2015-06-16  9:10         ` Eric Auger
  0 siblings, 0 replies; 118+ messages in thread
From: Eric Auger @ 2015-06-16  9:10 UTC (permalink / raw)
  To: Marc Zyngier, kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org
  Cc: Andre Przywara

On 06/16/2015 10:28 AM, Marc Zyngier wrote:
> Hi Eric,
> 
> On 15/06/15 16:44, Eric Auger wrote:
>> Hi Marc,
>> On 06/08/2015 07:04 PM, Marc Zyngier wrote:
>>> In order to be able to feed physical interrupts to a guest, we need
>>> to be able to establish the virtual-physical mapping between the two
>>> worlds.
>>>
>>> The mapping is kept in a rbtree, indexed by virtual interrupts.
>>>
>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>> ---
>>>  include/kvm/arm_vgic.h |  18 ++++++++
>>>  virt/kvm/arm/vgic.c    | 110 +++++++++++++++++++++++++++++++++++++++++++++++++
>>>  2 files changed, 128 insertions(+)
>>>
>>> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
>>> index 4f9fa1d..33d121a 100644
>>> --- a/include/kvm/arm_vgic.h
>>> +++ b/include/kvm/arm_vgic.h
>>> @@ -159,6 +159,14 @@ struct vgic_io_device {
>>>  	struct kvm_io_device dev;
>>>  };
>>>  
>>> +struct irq_phys_map {
>>> +	struct rb_node		node;
>>> +	u32			virt_irq;
>>> +	u32			phys_irq;
>>> +	u32			irq;
>>> +	bool			active;
>>> +};
>>> +
>>>  struct vgic_dist {
>>>  	spinlock_t		lock;
>>>  	bool			in_kernel;
>>> @@ -256,6 +264,10 @@ struct vgic_dist {
>>>  	struct vgic_vm_ops	vm_ops;
>>>  	struct vgic_io_device	dist_iodev;
>>>  	struct vgic_io_device	*redist_iodevs;
>>> +
>>> +	/* Virtual irq to hwirq mapping */
>>> +	spinlock_t		irq_phys_map_lock;
>>> +	struct rb_root		irq_phys_map;
>>>  };
>>>  
>>>  struct vgic_v2_cpu_if {
>>> @@ -307,6 +319,9 @@ struct vgic_cpu {
>>>  		struct vgic_v2_cpu_if	vgic_v2;
>>>  		struct vgic_v3_cpu_if	vgic_v3;
>>>  	};
>>> +
>>> +	/* Protected by the distributor's irq_phys_map_lock */
>>> +	struct rb_root	irq_phys_map;
>>>  };
>>>  
>>>  #define LR_EMPTY	0xff
>>> @@ -331,6 +346,9 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int irq_num,
>>>  void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg);
>>>  int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
>>>  int kvm_vgic_vcpu_active_irq(struct kvm_vcpu *vcpu);
>>> +struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
>>> +				       int virt_irq, int irq);
>>> +int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map);
>>>  
>>>  #define irqchip_in_kernel(k)	(!!((k)->arch.vgic.in_kernel))
>>>  #define vgic_initialized(k)	(!!((k)->arch.vgic.nr_cpus))
>>> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
>>> index 59ed7a3..c6604f2 100644
>>> --- a/virt/kvm/arm/vgic.c
>>> +++ b/virt/kvm/arm/vgic.c
>>> @@ -24,6 +24,7 @@
>>>  #include <linux/of.h>
>>>  #include <linux/of_address.h>
>>>  #include <linux/of_irq.h>
>>> +#include <linux/rbtree.h>
>>>  #include <linux/uaccess.h>
>>>  
>>>  #include <linux/irqchip/arm-gic.h>
>>> @@ -84,6 +85,8 @@ static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu);
>>>  static void vgic_retire_lr(int lr_nr, int irq, struct kvm_vcpu *vcpu);
>>>  static struct vgic_lr vgic_get_lr(const struct kvm_vcpu *vcpu, int lr);
>>>  static void vgic_set_lr(struct kvm_vcpu *vcpu, int lr, struct vgic_lr lr_desc);
>>> +static struct irq_phys_map *vgic_irq_map_search(struct kvm_vcpu *vcpu,
>>> +						int virt_irq);
>>>  
>>>  static const struct vgic_ops *vgic_ops;
>>>  static const struct vgic_params *vgic;
>>> @@ -1585,6 +1588,112 @@ static irqreturn_t vgic_maintenance_handler(int irq, void *data)
>>>  	return IRQ_HANDLED;
>>>  }
>>>  
>>> +static struct rb_root *vgic_get_irq_phys_map(struct kvm_vcpu *vcpu,
>>> +					     int virt_irq)
>>> +{
>>> +	if (virt_irq < VGIC_NR_PRIVATE_IRQS)
>>> +		return &vcpu->arch.vgic_cpu.irq_phys_map;
>>> +	else
>>> +		return &vcpu->kvm->arch.vgic.irq_phys_map;
>>> +}
>>> +
>>> +struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
>>> +				       int virt_irq, int irq)
>>> +{
>>> +	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
>>> +	struct rb_root *root = vgic_get_irq_phys_map(vcpu, virt_irq);
>>> +	struct rb_node **new = &root->rb_node, *parent = NULL;
>>> +	struct irq_phys_map *new_map;
>>> +	struct irq_desc *desc;
>>> +	struct irq_data *data;
>>> +	int phys_irq;
>>> +
>>> +	desc = irq_to_desc(irq);
>>> +	if (!desc) {
>>> +		kvm_err("kvm_arch_timer: can't obtain interrupt descriptor\n");
>>> +		return NULL;
>>> +	}
>>> +
>>> +	data = irq_desc_get_irq_data(desc);
>>> +	while (data->parent_data)
>>> +		data = data->parent_data;
>>> +
>>> +	phys_irq = data->hwirq;
>>> +
>>> +	spin_lock(&dist->irq_phys_map_lock);
>>> +
>>> +	/* Boilerplate rb_tree code */
>>> +	while (*new) {
>>> +		struct irq_phys_map *this;
>>> +
>>> +		this = container_of(*new, struct irq_phys_map, node);
>>> +		parent = *new;
>>> +		if (this->virt_irq < virt_irq)
>>> +			new = &(*new)->rb_left;
>>> +		else if (this->virt_irq > virt_irq)
>>> +			new = &(*new)->rb_right;
>>> +		else {
>>> +			new_map = this;
>> in case the mapping already exists you don't update the mappping or
>> return an error. Is it what you want here?
> 
> Calling the map function several times is not necessarily a bad idea, as
> long as they result in the same mapping. Think of a reset function for a
> device that would perform the mapping (just like the timer does). It
> should be possible to perform that reset several times without seeing
> anything failing.
> 
> Now, the code doesn't handle the case where you'd end up with a
> different mapping for the same IRQ, which would be an error (you'd need
> to go through an unmap first).
> 
> I'll update the code to take care of this case.
> 
>>> +			goto out;
>>> +		}
>>> +	}
>>> +
>>> +	new_map = kzalloc(sizeof(*new_map), GFP_KERNEL);
>>> +	if (!new_map)
>>> +		goto out;
>>> +
>>> +	new_map->virt_irq = virt_irq;
>>> +	new_map->phys_irq = phys_irq;
>>> +	new_map->irq = irq;
>>> +
>>> +	rb_link_node(&new_map->node, parent, new);
>>> +	rb_insert_color(&new_map->node, root);
>>> +
>>> +out:
>>> +	spin_unlock(&dist->irq_phys_map_lock);
>>> +	return new_map;
>>> +}
>>> +
>>> +static struct irq_phys_map *vgic_irq_map_search(struct kvm_vcpu *vcpu,
>>> +						int virt_irq)
>>> +{
>>> +	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
>>> +	struct rb_root *root = vgic_get_irq_phys_map(vcpu, virt_irq);
>>> +	struct rb_node *node = root->rb_node;
>>> +	struct irq_phys_map *this = NULL;
>>> +
>>> +	spin_lock(&dist->irq_phys_map_lock);
>>> +
>>> +	while (node) {
>>> +		this = container_of(node, struct irq_phys_map, node);
>>> +
>>> +		if (this->virt_irq < virt_irq)
>>> +			node = node->rb_left;
>>> +		else if (this->virt_irq > virt_irq)
>>> +			node = node->rb_right;
>>> +		else
>>> +			break;
>>> +	}
>>> +
>>> +	spin_unlock(&dist->irq_phys_map_lock);
>>> +	return this;
>>> +}
>>> +
>>> +int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map)
>>> +{
>>> +	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
>>> +
>>> +	if (!map)
>>> +		return -EINVAL;
>>> +
>>> +	spin_lock(&dist->irq_phys_map_lock);
>>> +	rb_erase(&map->node, vgic_get_irq_phys_map(vcpu, map->virt_irq));
>>> +	spin_unlock(&dist->irq_phys_map_lock);
>>> +
>>> +	kfree(map);
>>> +	return 0;
>>> +}
>>> +
>>>  void kvm_vgic_vcpu_destroy(struct kvm_vcpu *vcpu)
>>>  {
>>>  	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
>>> @@ -1835,6 +1944,7 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
>>>  		goto out_unlock;
>>>  
>>>  	spin_lock_init(&kvm->arch.vgic.lock);
>> Don't you deallocate the rbtree nodes here?
> 
> Erm... Yes, indeed. Silly me.
> 
>> Also in the future with EOI mode == 1 we will need to complete the
>> physical IRQ in place of the guest.
> 
> That's one of the things I needed your input on. I can perform the
> deactivate here (clearing the active state is easy). But what will
> guarantee that the interrupt won't be screaming? Will the device be
> quiesced at that time?
In my VFIO use case what does happen is that the guest might be killed
without being able to handle and deactivate the vIRQ/pIRQ properly. In
such a case when you restart a new guest using the same forwarded
physical IRQ, the physical IRQ still is active and you cannot restart
properly. This is why it is mandated to take care of the deactivation,
at some point, in case the guest failed to do it.

When killing the guest the device is not necessarily quiescent, ie IRQ
might still be issued. This depends on the availability of a vfio reset
module whose job consists in stopping DMA accesses and IRQ
(https://lkml.org/lkml/2015/6/15/123).

In //lkml.org/lkml/2014/11/23/120 I cleared the unforwarded state before
doing the deactivate so if a new IRQ hits, it is completed by the host
and not injected.

Hope it helps

Eric
> 
> Thanks,
> 
> 	M.
> 

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [PATCH 07/10] KVM: arm/arm64: vgic: Allow HW interrupts to be queued to a guest
  2015-06-08 17:04   ` Marc Zyngier
@ 2015-06-17 11:51     ` Eric Auger
  -1 siblings, 0 replies; 118+ messages in thread
From: Eric Auger @ 2015-06-17 11:51 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Marc,
On 06/08/2015 07:04 PM, Marc Zyngier wrote:
> To allow a HW interrupt to be injected into a guest, we lookup the
> guest virtual interrupt in the irq_phys_map rbtree, and if we have
> a match, encode both interrupts in the LR.
> 
> We also mark the interrupt as "active" at the host distributor level.
> 
> On guest EOI on the virtual interrupt, the host interrupt will be
> deactivated.
a "standard" physical IRQ would be first handled by the host handler
which would ack and deactivate it a first time. Here, if my
understanding is correct, the virtual counter PPI never hits. Instead we
"emulate" it on world-switch by directly setting the dist state. Is that
correct? If yes it is quite a specific handling of an "HW" IRQ.

> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  virt/kvm/arm/vgic.c | 71 ++++++++++++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 68 insertions(+), 3 deletions(-)
> 
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index c6604f2..495ac7d 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -1120,6 +1120,26 @@ static void vgic_queue_irq_to_lr(struct kvm_vcpu *vcpu, int irq,
>  	if (!vgic_irq_is_edge(vcpu, irq))
>  		vlr.state |= LR_EOI_INT;
>  
> +	if (vlr.irq >= VGIC_NR_SGIS) {
> +		struct irq_phys_map *map;
> +		map = vgic_irq_map_search(vcpu, irq);
> +
> +		if (map) {
> +			int ret;
> +
> +			BUG_ON(!map->active);
> +			vlr.hwirq = map->phys_irq;
> +			vlr.state |= LR_HW;
> +			vlr.state &= ~LR_EOI_INT;
> +
> +			ret = irq_set_irqchip_state(map->irq,
> +						    IRQCHIP_STATE_ACTIVE,
> +						    true);
> +			vgic_irq_set_queued(vcpu, irq);
queued state was used for level sensitive IRQs only. Forwarded or "HW"
IRQs theoretically can be edge or sensitive, right? If yes may be worth
to justify the usage of queued state for forwarded IRQ? Also
vgic_irq_set_queued rather was called in parent vgic_queue_hwirq today.

> +			WARN_ON(ret);
> +		}
> +	}
> +
>  	vgic_set_lr(vcpu, lr_nr, vlr);
>  	vgic_sync_lr_elrsr(vcpu, lr_nr, vlr);
>  }
> @@ -1344,6 +1364,35 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
>  	return level_pending;
>  }
>  
> +/* Return 1 if HW interrupt went from active to inactive, and 0 otherwise */
> +static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr)
> +{
> +	struct irq_phys_map *map;
> +	int ret;
> +
> +	if (!(vlr.state & LR_HW))
> +		return 0;
> +
> +	map = vgic_irq_map_search(vcpu, vlr.irq);
> +	BUG_ON(!map || !map->active);
> +
> +	ret = irq_get_irqchip_state(map->irq,
> +				    IRQCHIP_STATE_ACTIVE,
> +				    &map->active);
Doesn't it work because the virtual timer was disabled during the world
switch. Does it characterize all "shared" devices? Difficult for me to
understand how much this is specific to arch timer integration?
> +
> +	WARN_ON(ret);
> +
> +	if (map->active) {
> +		ret = irq_set_irqchip_state(map->irq,
> +					    IRQCHIP_STATE_ACTIVE,
> +					    false);
> +		WARN_ON(ret);
> +		return 0;
> +	}
> +
> +	return 1;
> +}
> +
>  /* Sync back the VGIC state after a guest run */
>  static void __kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
>  {
> @@ -1358,14 +1407,30 @@ static void __kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
>  	elrsr = vgic_get_elrsr(vcpu);
>  	elrsr_ptr = u64_to_bitmask(&elrsr);
>  
> -	/* Clear mappings for empty LRs */
> -	for_each_set_bit(lr, elrsr_ptr, vgic->nr_lr) {
> +	/* Deal with HW interrupts, and clear mappings for empty LRs */
> +	for (lr = 0; lr < vgic->nr_lr; lr++) {
>  		struct vgic_lr vlr;
>  
> -		if (!test_and_clear_bit(lr, vgic_cpu->lr_used))
> +		if (!test_bit(lr, vgic_cpu->lr_used))
>  			continue;
>  
>  		vlr = vgic_get_lr(vcpu, lr);
> +		if (vgic_sync_hwirq(vcpu, vlr)) {
> +			/*
> +			 * So this is a HW interrupt that the guest
> +			 * EOI-ed. Clean the LR state and allow the
> +			 * interrupt to be queued again.
> +			 */
> +			vlr.state &= ~LR_HW;
> +			vlr.hwirq = 0;
> +			vgic_set_lr(vcpu, lr, vlr);
> +			vgic_irq_clear_queued(vcpu, vlr.irq)
not necessarily a level sensitive IRQ?

- Eric
> +		}
> +
> +		if (!test_bit(lr, elrsr_ptr))
> +			continue;
> +
> +		clear_bit(lr, vgic_cpu->lr_used);
>  
>  		BUG_ON(vlr.irq >= dist->nr_irqs);
>  		vgic_cpu->vgic_irq_lr_map[vlr.irq] = LR_EMPTY;
> 

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 07/10] KVM: arm/arm64: vgic: Allow HW interrupts to be queued to a guest
@ 2015-06-17 11:51     ` Eric Auger
  0 siblings, 0 replies; 118+ messages in thread
From: Eric Auger @ 2015-06-17 11:51 UTC (permalink / raw)
  To: Marc Zyngier, kvm, kvmarm, linux-arm-kernel; +Cc: Andre Przywara

Hi Marc,
On 06/08/2015 07:04 PM, Marc Zyngier wrote:
> To allow a HW interrupt to be injected into a guest, we lookup the
> guest virtual interrupt in the irq_phys_map rbtree, and if we have
> a match, encode both interrupts in the LR.
> 
> We also mark the interrupt as "active" at the host distributor level.
> 
> On guest EOI on the virtual interrupt, the host interrupt will be
> deactivated.
a "standard" physical IRQ would be first handled by the host handler
which would ack and deactivate it a first time. Here, if my
understanding is correct, the virtual counter PPI never hits. Instead we
"emulate" it on world-switch by directly setting the dist state. Is that
correct? If yes it is quite a specific handling of an "HW" IRQ.

> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  virt/kvm/arm/vgic.c | 71 ++++++++++++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 68 insertions(+), 3 deletions(-)
> 
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index c6604f2..495ac7d 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -1120,6 +1120,26 @@ static void vgic_queue_irq_to_lr(struct kvm_vcpu *vcpu, int irq,
>  	if (!vgic_irq_is_edge(vcpu, irq))
>  		vlr.state |= LR_EOI_INT;
>  
> +	if (vlr.irq >= VGIC_NR_SGIS) {
> +		struct irq_phys_map *map;
> +		map = vgic_irq_map_search(vcpu, irq);
> +
> +		if (map) {
> +			int ret;
> +
> +			BUG_ON(!map->active);
> +			vlr.hwirq = map->phys_irq;
> +			vlr.state |= LR_HW;
> +			vlr.state &= ~LR_EOI_INT;
> +
> +			ret = irq_set_irqchip_state(map->irq,
> +						    IRQCHIP_STATE_ACTIVE,
> +						    true);
> +			vgic_irq_set_queued(vcpu, irq);
queued state was used for level sensitive IRQs only. Forwarded or "HW"
IRQs theoretically can be edge or sensitive, right? If yes may be worth
to justify the usage of queued state for forwarded IRQ? Also
vgic_irq_set_queued rather was called in parent vgic_queue_hwirq today.

> +			WARN_ON(ret);
> +		}
> +	}
> +
>  	vgic_set_lr(vcpu, lr_nr, vlr);
>  	vgic_sync_lr_elrsr(vcpu, lr_nr, vlr);
>  }
> @@ -1344,6 +1364,35 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
>  	return level_pending;
>  }
>  
> +/* Return 1 if HW interrupt went from active to inactive, and 0 otherwise */
> +static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr)
> +{
> +	struct irq_phys_map *map;
> +	int ret;
> +
> +	if (!(vlr.state & LR_HW))
> +		return 0;
> +
> +	map = vgic_irq_map_search(vcpu, vlr.irq);
> +	BUG_ON(!map || !map->active);
> +
> +	ret = irq_get_irqchip_state(map->irq,
> +				    IRQCHIP_STATE_ACTIVE,
> +				    &map->active);
Doesn't it work because the virtual timer was disabled during the world
switch. Does it characterize all "shared" devices? Difficult for me to
understand how much this is specific to arch timer integration?
> +
> +	WARN_ON(ret);
> +
> +	if (map->active) {
> +		ret = irq_set_irqchip_state(map->irq,
> +					    IRQCHIP_STATE_ACTIVE,
> +					    false);
> +		WARN_ON(ret);
> +		return 0;
> +	}
> +
> +	return 1;
> +}
> +
>  /* Sync back the VGIC state after a guest run */
>  static void __kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
>  {
> @@ -1358,14 +1407,30 @@ static void __kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
>  	elrsr = vgic_get_elrsr(vcpu);
>  	elrsr_ptr = u64_to_bitmask(&elrsr);
>  
> -	/* Clear mappings for empty LRs */
> -	for_each_set_bit(lr, elrsr_ptr, vgic->nr_lr) {
> +	/* Deal with HW interrupts, and clear mappings for empty LRs */
> +	for (lr = 0; lr < vgic->nr_lr; lr++) {
>  		struct vgic_lr vlr;
>  
> -		if (!test_and_clear_bit(lr, vgic_cpu->lr_used))
> +		if (!test_bit(lr, vgic_cpu->lr_used))
>  			continue;
>  
>  		vlr = vgic_get_lr(vcpu, lr);
> +		if (vgic_sync_hwirq(vcpu, vlr)) {
> +			/*
> +			 * So this is a HW interrupt that the guest
> +			 * EOI-ed. Clean the LR state and allow the
> +			 * interrupt to be queued again.
> +			 */
> +			vlr.state &= ~LR_HW;
> +			vlr.hwirq = 0;
> +			vgic_set_lr(vcpu, lr, vlr);
> +			vgic_irq_clear_queued(vcpu, vlr.irq)
not necessarily a level sensitive IRQ?

- Eric
> +		}
> +
> +		if (!test_bit(lr, elrsr_ptr))
> +			continue;
> +
> +		clear_bit(lr, vgic_cpu->lr_used);
>  
>  		BUG_ON(vlr.irq >= dist->nr_irqs);
>  		vgic_cpu->vgic_irq_lr_map[vlr.irq] = LR_EMPTY;
> 

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [PATCH 04/10] KVM: arm/arm64: vgic: Allow HW irq to be encoded in LR
  2015-06-08 17:03   ` Marc Zyngier
@ 2015-06-17 11:53     ` Eric Auger
  -1 siblings, 0 replies; 118+ messages in thread
From: Eric Auger @ 2015-06-17 11:53 UTC (permalink / raw)
  To: linux-arm-kernel

On 06/08/2015 07:03 PM, Marc Zyngier wrote:
> Now that struct vgic_lr supports the LR_HW bit and carries a hwirq
> field, we can encode that information into the list registers.
> 
> This patch provides implementations for both GICv2 and GICv3.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  include/linux/irqchip/arm-gic-v3.h |  3 +++
>  include/linux/irqchip/arm-gic.h    |  3 ++-
>  virt/kvm/arm/vgic-v2.c             | 16 +++++++++++++++-
>  virt/kvm/arm/vgic-v3.c             | 21 ++++++++++++++++++---
>  4 files changed, 38 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/irqchip/arm-gic-v3.h b/include/linux/irqchip/arm-gic-v3.h
> index ffbc034..cf637d6 100644
> --- a/include/linux/irqchip/arm-gic-v3.h
> +++ b/include/linux/irqchip/arm-gic-v3.h
> @@ -268,9 +268,12 @@
>  
>  #define ICH_LR_EOI			(1UL << 41)
>  #define ICH_LR_GROUP			(1UL << 60)
> +#define ICH_LR_HW			(1UL << 61)
>  #define ICH_LR_STATE			(3UL << 62)
>  #define ICH_LR_PENDING_BIT		(1UL << 62)
>  #define ICH_LR_ACTIVE_BIT		(1UL << 63)
> +#define ICH_LR_PHYS_ID_SHIFT		32
> +#define ICH_LR_PHYS_ID_MASK		(0x3ffUL << ICH_LR_PHYS_ID_SHIFT)
>  
>  #define ICH_MISR_EOI			(1 << 0)
>  #define ICH_MISR_U			(1 << 1)
> diff --git a/include/linux/irqchip/arm-gic.h b/include/linux/irqchip/arm-gic.h
> index 9de976b..ca88dad 100644
> --- a/include/linux/irqchip/arm-gic.h
> +++ b/include/linux/irqchip/arm-gic.h
> @@ -71,11 +71,12 @@
>  
>  #define GICH_LR_VIRTUALID		(0x3ff << 0)
>  #define GICH_LR_PHYSID_CPUID_SHIFT	(10)
> -#define GICH_LR_PHYSID_CPUID		(7 << GICH_LR_PHYSID_CPUID_SHIFT)
> +#define GICH_LR_PHYSID_CPUID		(0x3ff << GICH_LR_PHYSID_CPUID_SHIFT)
>  #define GICH_LR_STATE			(3 << 28)
>  #define GICH_LR_PENDING_BIT		(1 << 28)
>  #define GICH_LR_ACTIVE_BIT		(1 << 29)
>  #define GICH_LR_EOI			(1 << 19)
> +#define GICH_LR_HW			(1 << 31)
>  
>  #define GICH_VMCR_CTRL_SHIFT		0
>  #define GICH_VMCR_CTRL_MASK		(0x21f << GICH_VMCR_CTRL_SHIFT)
> diff --git a/virt/kvm/arm/vgic-v2.c b/virt/kvm/arm/vgic-v2.c
> index f9b9c7c..8d7b04d 100644
> --- a/virt/kvm/arm/vgic-v2.c
> +++ b/virt/kvm/arm/vgic-v2.c
> @@ -48,6 +48,10 @@ static struct vgic_lr vgic_v2_get_lr(const struct kvm_vcpu *vcpu, int lr)
>  		lr_desc.state |= LR_STATE_ACTIVE;
>  	if (val & GICH_LR_EOI)
>  		lr_desc.state |= LR_EOI_INT;
> +	if (val & GICH_LR_HW) {
> +		lr_desc.state |= LR_HW;
> +		lr_desc.hwirq = (val & GICH_LR_PHYSID_CPUID) >> GICH_LR_PHYSID_CPUID_SHIFT;
> +	}
>  
>  	return lr_desc;
>  }
> @@ -55,7 +59,9 @@ static struct vgic_lr vgic_v2_get_lr(const struct kvm_vcpu *vcpu, int lr)
>  static void vgic_v2_set_lr(struct kvm_vcpu *vcpu, int lr,
>  			   struct vgic_lr lr_desc)
>  {
> -	u32 lr_val = (lr_desc.source << GICH_LR_PHYSID_CPUID_SHIFT) | lr_desc.irq;
> +	u32 lr_val;
> +
> +	lr_val = lr_desc.irq;
>  
>  	if (lr_desc.state & LR_STATE_PENDING)
>  		lr_val |= GICH_LR_PENDING_BIT;
> @@ -64,6 +70,14 @@ static void vgic_v2_set_lr(struct kvm_vcpu *vcpu, int lr,
>  	if (lr_desc.state & LR_EOI_INT)
>  		lr_val |= GICH_LR_EOI;
>  
> +	if (lr_desc.state & LR_HW) {
> +		lr_val |= GICH_LR_HW;
> +		lr_val |= (u32)lr_desc.hwirq << GICH_LR_PHYSID_CPUID_SHIFT;
shouldn't we test somewhere that the hwirq is between 16 and 1019. Else
behavior is unpredictable according to v2 spec. when queuing into the LR
we currently check the linux irq vlr.irq >= VGIC_NR_SGIS if I am not wrong.

besides Reviewed-by: Eric Auger <eric.auger@linaro.org>

Eric
> +	}
> +
> +	if (lr_desc.irq < VGIC_NR_SGIS)
> +		lr_val |= (lr_desc.source << GICH_LR_PHYSID_CPUID_SHIFT);
> +
>  	vcpu->arch.vgic_cpu.vgic_v2.vgic_lr[lr] = lr_val;
>  }
>  
> diff --git a/virt/kvm/arm/vgic-v3.c b/virt/kvm/arm/vgic-v3.c
> index dff0602..afbf925 100644
> --- a/virt/kvm/arm/vgic-v3.c
> +++ b/virt/kvm/arm/vgic-v3.c
> @@ -67,6 +67,10 @@ static struct vgic_lr vgic_v3_get_lr(const struct kvm_vcpu *vcpu, int lr)
>  		lr_desc.state |= LR_STATE_ACTIVE;
>  	if (val & ICH_LR_EOI)
>  		lr_desc.state |= LR_EOI_INT;
> +	if (val & ICH_LR_HW) {
> +		lr_desc.state |= LR_HW;
> +		lr_desc.hwirq = (val >> ICH_LR_PHYS_ID_SHIFT) & GENMASK(9, 0);
> +	}
>  
>  	return lr_desc;
>  }
> @@ -84,10 +88,17 @@ static void vgic_v3_set_lr(struct kvm_vcpu *vcpu, int lr,
>  	 * Eventually we want to make this configurable, so we may revisit
>  	 * this in the future.
>  	 */
> -	if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3)
> +	switch (vcpu->kvm->arch.vgic.vgic_model) {
> +	case KVM_DEV_TYPE_ARM_VGIC_V3:
>  		lr_val |= ICH_LR_GROUP;
not related to that patch but why LR_GROUP setting does depend on guest
view? Why isn'it always set to 1?
> -	else
> -		lr_val |= (u32)lr_desc.source << GICH_LR_PHYSID_CPUID_SHIFT;
> +		break;
> +	case  KVM_DEV_TYPE_ARM_VGIC_V2:
> +		if (lr_desc.irq < VGIC_NR_SGIS)
> +			lr_val |= (u32)lr_desc.source << GICH_LR_PHYSID_CPUID_SHIFT;
> +		break;
> +	default:
> +		BUG();
> +	}
>  
>  	if (lr_desc.state & LR_STATE_PENDING)
>  		lr_val |= ICH_LR_PENDING_BIT;
> @@ -95,6 +106,10 @@ static void vgic_v3_set_lr(struct kvm_vcpu *vcpu, int lr,
>  		lr_val |= ICH_LR_ACTIVE_BIT;
>  	if (lr_desc.state & LR_EOI_INT)
>  		lr_val |= ICH_LR_EOI;
> +	if (lr_desc.state & LR_HW) {
> +		lr_val |= ICH_LR_HW;
> +		lr_val |= ((u64)lr_desc.hwirq) << ICH_LR_PHYS_ID_SHIFT;
> +	}
>  
>  	vcpu->arch.vgic_cpu.vgic_v3.vgic_lr[LR_INDEX(lr)] = lr_val;
>  }
> 

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 04/10] KVM: arm/arm64: vgic: Allow HW irq to be encoded in LR
@ 2015-06-17 11:53     ` Eric Auger
  0 siblings, 0 replies; 118+ messages in thread
From: Eric Auger @ 2015-06-17 11:53 UTC (permalink / raw)
  To: Marc Zyngier, kvm, kvmarm, linux-arm-kernel
  Cc: Christoffer Dall, Alex Bennée, Andre Przywara

On 06/08/2015 07:03 PM, Marc Zyngier wrote:
> Now that struct vgic_lr supports the LR_HW bit and carries a hwirq
> field, we can encode that information into the list registers.
> 
> This patch provides implementations for both GICv2 and GICv3.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  include/linux/irqchip/arm-gic-v3.h |  3 +++
>  include/linux/irqchip/arm-gic.h    |  3 ++-
>  virt/kvm/arm/vgic-v2.c             | 16 +++++++++++++++-
>  virt/kvm/arm/vgic-v3.c             | 21 ++++++++++++++++++---
>  4 files changed, 38 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/irqchip/arm-gic-v3.h b/include/linux/irqchip/arm-gic-v3.h
> index ffbc034..cf637d6 100644
> --- a/include/linux/irqchip/arm-gic-v3.h
> +++ b/include/linux/irqchip/arm-gic-v3.h
> @@ -268,9 +268,12 @@
>  
>  #define ICH_LR_EOI			(1UL << 41)
>  #define ICH_LR_GROUP			(1UL << 60)
> +#define ICH_LR_HW			(1UL << 61)
>  #define ICH_LR_STATE			(3UL << 62)
>  #define ICH_LR_PENDING_BIT		(1UL << 62)
>  #define ICH_LR_ACTIVE_BIT		(1UL << 63)
> +#define ICH_LR_PHYS_ID_SHIFT		32
> +#define ICH_LR_PHYS_ID_MASK		(0x3ffUL << ICH_LR_PHYS_ID_SHIFT)
>  
>  #define ICH_MISR_EOI			(1 << 0)
>  #define ICH_MISR_U			(1 << 1)
> diff --git a/include/linux/irqchip/arm-gic.h b/include/linux/irqchip/arm-gic.h
> index 9de976b..ca88dad 100644
> --- a/include/linux/irqchip/arm-gic.h
> +++ b/include/linux/irqchip/arm-gic.h
> @@ -71,11 +71,12 @@
>  
>  #define GICH_LR_VIRTUALID		(0x3ff << 0)
>  #define GICH_LR_PHYSID_CPUID_SHIFT	(10)
> -#define GICH_LR_PHYSID_CPUID		(7 << GICH_LR_PHYSID_CPUID_SHIFT)
> +#define GICH_LR_PHYSID_CPUID		(0x3ff << GICH_LR_PHYSID_CPUID_SHIFT)
>  #define GICH_LR_STATE			(3 << 28)
>  #define GICH_LR_PENDING_BIT		(1 << 28)
>  #define GICH_LR_ACTIVE_BIT		(1 << 29)
>  #define GICH_LR_EOI			(1 << 19)
> +#define GICH_LR_HW			(1 << 31)
>  
>  #define GICH_VMCR_CTRL_SHIFT		0
>  #define GICH_VMCR_CTRL_MASK		(0x21f << GICH_VMCR_CTRL_SHIFT)
> diff --git a/virt/kvm/arm/vgic-v2.c b/virt/kvm/arm/vgic-v2.c
> index f9b9c7c..8d7b04d 100644
> --- a/virt/kvm/arm/vgic-v2.c
> +++ b/virt/kvm/arm/vgic-v2.c
> @@ -48,6 +48,10 @@ static struct vgic_lr vgic_v2_get_lr(const struct kvm_vcpu *vcpu, int lr)
>  		lr_desc.state |= LR_STATE_ACTIVE;
>  	if (val & GICH_LR_EOI)
>  		lr_desc.state |= LR_EOI_INT;
> +	if (val & GICH_LR_HW) {
> +		lr_desc.state |= LR_HW;
> +		lr_desc.hwirq = (val & GICH_LR_PHYSID_CPUID) >> GICH_LR_PHYSID_CPUID_SHIFT;
> +	}
>  
>  	return lr_desc;
>  }
> @@ -55,7 +59,9 @@ static struct vgic_lr vgic_v2_get_lr(const struct kvm_vcpu *vcpu, int lr)
>  static void vgic_v2_set_lr(struct kvm_vcpu *vcpu, int lr,
>  			   struct vgic_lr lr_desc)
>  {
> -	u32 lr_val = (lr_desc.source << GICH_LR_PHYSID_CPUID_SHIFT) | lr_desc.irq;
> +	u32 lr_val;
> +
> +	lr_val = lr_desc.irq;
>  
>  	if (lr_desc.state & LR_STATE_PENDING)
>  		lr_val |= GICH_LR_PENDING_BIT;
> @@ -64,6 +70,14 @@ static void vgic_v2_set_lr(struct kvm_vcpu *vcpu, int lr,
>  	if (lr_desc.state & LR_EOI_INT)
>  		lr_val |= GICH_LR_EOI;
>  
> +	if (lr_desc.state & LR_HW) {
> +		lr_val |= GICH_LR_HW;
> +		lr_val |= (u32)lr_desc.hwirq << GICH_LR_PHYSID_CPUID_SHIFT;
shouldn't we test somewhere that the hwirq is between 16 and 1019. Else
behavior is unpredictable according to v2 spec. when queuing into the LR
we currently check the linux irq vlr.irq >= VGIC_NR_SGIS if I am not wrong.

besides Reviewed-by: Eric Auger <eric.auger@linaro.org>

Eric
> +	}
> +
> +	if (lr_desc.irq < VGIC_NR_SGIS)
> +		lr_val |= (lr_desc.source << GICH_LR_PHYSID_CPUID_SHIFT);
> +
>  	vcpu->arch.vgic_cpu.vgic_v2.vgic_lr[lr] = lr_val;
>  }
>  
> diff --git a/virt/kvm/arm/vgic-v3.c b/virt/kvm/arm/vgic-v3.c
> index dff0602..afbf925 100644
> --- a/virt/kvm/arm/vgic-v3.c
> +++ b/virt/kvm/arm/vgic-v3.c
> @@ -67,6 +67,10 @@ static struct vgic_lr vgic_v3_get_lr(const struct kvm_vcpu *vcpu, int lr)
>  		lr_desc.state |= LR_STATE_ACTIVE;
>  	if (val & ICH_LR_EOI)
>  		lr_desc.state |= LR_EOI_INT;
> +	if (val & ICH_LR_HW) {
> +		lr_desc.state |= LR_HW;
> +		lr_desc.hwirq = (val >> ICH_LR_PHYS_ID_SHIFT) & GENMASK(9, 0);
> +	}
>  
>  	return lr_desc;
>  }
> @@ -84,10 +88,17 @@ static void vgic_v3_set_lr(struct kvm_vcpu *vcpu, int lr,
>  	 * Eventually we want to make this configurable, so we may revisit
>  	 * this in the future.
>  	 */
> -	if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3)
> +	switch (vcpu->kvm->arch.vgic.vgic_model) {
> +	case KVM_DEV_TYPE_ARM_VGIC_V3:
>  		lr_val |= ICH_LR_GROUP;
not related to that patch but why LR_GROUP setting does depend on guest
view? Why isn'it always set to 1?
> -	else
> -		lr_val |= (u32)lr_desc.source << GICH_LR_PHYSID_CPUID_SHIFT;
> +		break;
> +	case  KVM_DEV_TYPE_ARM_VGIC_V2:
> +		if (lr_desc.irq < VGIC_NR_SGIS)
> +			lr_val |= (u32)lr_desc.source << GICH_LR_PHYSID_CPUID_SHIFT;
> +		break;
> +	default:
> +		BUG();
> +	}
>  
>  	if (lr_desc.state & LR_STATE_PENDING)
>  		lr_val |= ICH_LR_PENDING_BIT;
> @@ -95,6 +106,10 @@ static void vgic_v3_set_lr(struct kvm_vcpu *vcpu, int lr,
>  		lr_val |= ICH_LR_ACTIVE_BIT;
>  	if (lr_desc.state & LR_EOI_INT)
>  		lr_val |= ICH_LR_EOI;
> +	if (lr_desc.state & LR_HW) {
> +		lr_val |= ICH_LR_HW;
> +		lr_val |= ((u64)lr_desc.hwirq) << ICH_LR_PHYS_ID_SHIFT;
> +	}
>  
>  	vcpu->arch.vgic_cpu.vgic_v3.vgic_lr[LR_INDEX(lr)] = lr_val;
>  }
> 


^ permalink raw reply	[flat|nested] 118+ messages in thread

* [PATCH 07/10] KVM: arm/arm64: vgic: Allow HW interrupts to be queued to a guest
  2015-06-17 11:51     ` Eric Auger
@ 2015-06-17 12:23       ` Marc Zyngier
  -1 siblings, 0 replies; 118+ messages in thread
From: Marc Zyngier @ 2015-06-17 12:23 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Eric,

On 17/06/15 12:51, Eric Auger wrote:
> Hi Marc,
> On 06/08/2015 07:04 PM, Marc Zyngier wrote:
>> To allow a HW interrupt to be injected into a guest, we lookup the
>> guest virtual interrupt in the irq_phys_map rbtree, and if we have
>> a match, encode both interrupts in the LR.
>>
>> We also mark the interrupt as "active" at the host distributor level.
>>
>> On guest EOI on the virtual interrupt, the host interrupt will be
>> deactivated.
>
> a "standard" physical IRQ would be first handled by the host handler
> which would ack and deactivate it a first time. Here, if my
> understanding is correct, the virtual counter PPI never hits. Instead we
> "emulate" it on world-switch by directly setting the dist state. Is that
> correct? If yes it is quite a specific handling of an "HW" IRQ.

This is (mostly) correct. Because we deal with HW that is shared between
guests, we absolutely need to make that HW quiescent before getting back
to the host. Setting the active bit in the distributor allows us to
restore the HW in a state that shows a pending interrupt at the guest
level, but ensure that the interrupt doesn't fire at the host level.

As for the "specificity", this is how the architecture has been
designed, and the way we're expected to deal with this kind of shared
HW. Rest assured I didn't come up with that on my own! ;-)

> 
>>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  virt/kvm/arm/vgic.c | 71 ++++++++++++++++++++++++++++++++++++++++++++++++++---
>>  1 file changed, 68 insertions(+), 3 deletions(-)
>>
>> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
>> index c6604f2..495ac7d 100644
>> --- a/virt/kvm/arm/vgic.c
>> +++ b/virt/kvm/arm/vgic.c
>> @@ -1120,6 +1120,26 @@ static void vgic_queue_irq_to_lr(struct kvm_vcpu *vcpu, int irq,
>>  	if (!vgic_irq_is_edge(vcpu, irq))
>>  		vlr.state |= LR_EOI_INT;
>>  
>> +	if (vlr.irq >= VGIC_NR_SGIS) {
>> +		struct irq_phys_map *map;
>> +		map = vgic_irq_map_search(vcpu, irq);
>> +
>> +		if (map) {
>> +			int ret;
>> +
>> +			BUG_ON(!map->active);
>> +			vlr.hwirq = map->phys_irq;
>> +			vlr.state |= LR_HW;
>> +			vlr.state &= ~LR_EOI_INT;
>> +
>> +			ret = irq_set_irqchip_state(map->irq,
>> +						    IRQCHIP_STATE_ACTIVE,
>> +						    true);
>> +			vgic_irq_set_queued(vcpu, irq);
>
> queued state was used for level sensitive IRQs only. Forwarded or "HW"
> IRQs theoretically can be edge or sensitive, right? If yes may be worth
> to justify the usage of queued state for forwarded IRQ? Also

That's because it is illegal to set a HW interrupt to be PENDING+ACTIVE,
which means we have to prevent the interrupt to be injected multiple
times. The behaviour is sufficiently close to what we do for a level
interrupt that we use the same state.

> vgic_irq_set_queued rather was called in parent vgic_queue_hwirq today.

I tried to keep the HW bit madness as localized as possible. Letting it
spread further away seems to make the code more difficult to read IMHO.

> 
>> +			WARN_ON(ret);
>> +		}
>> +	}
>> +
>>  	vgic_set_lr(vcpu, lr_nr, vlr);
>>  	vgic_sync_lr_elrsr(vcpu, lr_nr, vlr);
>>  }
>> @@ -1344,6 +1364,35 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
>>  	return level_pending;
>>  }
>>  
>> +/* Return 1 if HW interrupt went from active to inactive, and 0 otherwise */
>> +static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr)
>> +{
>> +	struct irq_phys_map *map;
>> +	int ret;
>> +
>> +	if (!(vlr.state & LR_HW))
>> +		return 0;
>> +
>> +	map = vgic_irq_map_search(vcpu, vlr.irq);
>> +	BUG_ON(!map || !map->active);
>> +
>> +	ret = irq_get_irqchip_state(map->irq,
>> +				    IRQCHIP_STATE_ACTIVE,
>> +				    &map->active);
>
> Doesn't it work because the virtual timer was disabled during the world
> switch. Does it characterize all "shared" devices? Difficult for me to
> understand how much this is specific to arch timer integration?

Shared devices cannot be left running when the guest is not running
because (a) we have lost the context (the guest), and (b) we need to
give it to another guest. This is a fundamental property of this kind of
resource.

This is by no mean specific to the timer, BTW. The VGIC itself is a
shared resource, and we nuke it on each exit, for the same reason. The
only difference is that we don't propagate the VGIC interrupt to a guest.

>> +
>> +	WARN_ON(ret);
>> +
>> +	if (map->active) {
>> +		ret = irq_set_irqchip_state(map->irq,
>> +					    IRQCHIP_STATE_ACTIVE,
>> +					    false);
>> +		WARN_ON(ret);
>> +		return 0;
>> +	}
>> +
>> +	return 1;
>> +}
>> +
>>  /* Sync back the VGIC state after a guest run */
>>  static void __kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
>>  {
>> @@ -1358,14 +1407,30 @@ static void __kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
>>  	elrsr = vgic_get_elrsr(vcpu);
>>  	elrsr_ptr = u64_to_bitmask(&elrsr);
>>  
>> -	/* Clear mappings for empty LRs */
>> -	for_each_set_bit(lr, elrsr_ptr, vgic->nr_lr) {
>> +	/* Deal with HW interrupts, and clear mappings for empty LRs */
>> +	for (lr = 0; lr < vgic->nr_lr; lr++) {
>>  		struct vgic_lr vlr;
>>  
>> -		if (!test_and_clear_bit(lr, vgic_cpu->lr_used))
>> +		if (!test_bit(lr, vgic_cpu->lr_used))
>>  			continue;
>>  
>>  		vlr = vgic_get_lr(vcpu, lr);
>> +		if (vgic_sync_hwirq(vcpu, vlr)) {
>> +			/*
>> +			 * So this is a HW interrupt that the guest
>> +			 * EOI-ed. Clean the LR state and allow the
>> +			 * interrupt to be queued again.
>> +			 */
>> +			vlr.state &= ~LR_HW;
>> +			vlr.hwirq = 0;
>> +			vgic_set_lr(vcpu, lr, vlr);
>> +			vgic_irq_clear_queued(vcpu, vlr.irq)
>
> not necessarily a level sensitive IRQ?

As explained above, we have the same requirements when an interrupt is
forwarded to a guest.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 07/10] KVM: arm/arm64: vgic: Allow HW interrupts to be queued to a guest
@ 2015-06-17 12:23       ` Marc Zyngier
  0 siblings, 0 replies; 118+ messages in thread
From: Marc Zyngier @ 2015-06-17 12:23 UTC (permalink / raw)
  To: Eric Auger, kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org
  Cc: Christoffer Dall, Alex Bennée, Andre Przywara

Hi Eric,

On 17/06/15 12:51, Eric Auger wrote:
> Hi Marc,
> On 06/08/2015 07:04 PM, Marc Zyngier wrote:
>> To allow a HW interrupt to be injected into a guest, we lookup the
>> guest virtual interrupt in the irq_phys_map rbtree, and if we have
>> a match, encode both interrupts in the LR.
>>
>> We also mark the interrupt as "active" at the host distributor level.
>>
>> On guest EOI on the virtual interrupt, the host interrupt will be
>> deactivated.
>
> a "standard" physical IRQ would be first handled by the host handler
> which would ack and deactivate it a first time. Here, if my
> understanding is correct, the virtual counter PPI never hits. Instead we
> "emulate" it on world-switch by directly setting the dist state. Is that
> correct? If yes it is quite a specific handling of an "HW" IRQ.

This is (mostly) correct. Because we deal with HW that is shared between
guests, we absolutely need to make that HW quiescent before getting back
to the host. Setting the active bit in the distributor allows us to
restore the HW in a state that shows a pending interrupt at the guest
level, but ensure that the interrupt doesn't fire at the host level.

As for the "specificity", this is how the architecture has been
designed, and the way we're expected to deal with this kind of shared
HW. Rest assured I didn't come up with that on my own! ;-)

> 
>>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  virt/kvm/arm/vgic.c | 71 ++++++++++++++++++++++++++++++++++++++++++++++++++---
>>  1 file changed, 68 insertions(+), 3 deletions(-)
>>
>> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
>> index c6604f2..495ac7d 100644
>> --- a/virt/kvm/arm/vgic.c
>> +++ b/virt/kvm/arm/vgic.c
>> @@ -1120,6 +1120,26 @@ static void vgic_queue_irq_to_lr(struct kvm_vcpu *vcpu, int irq,
>>  	if (!vgic_irq_is_edge(vcpu, irq))
>>  		vlr.state |= LR_EOI_INT;
>>  
>> +	if (vlr.irq >= VGIC_NR_SGIS) {
>> +		struct irq_phys_map *map;
>> +		map = vgic_irq_map_search(vcpu, irq);
>> +
>> +		if (map) {
>> +			int ret;
>> +
>> +			BUG_ON(!map->active);
>> +			vlr.hwirq = map->phys_irq;
>> +			vlr.state |= LR_HW;
>> +			vlr.state &= ~LR_EOI_INT;
>> +
>> +			ret = irq_set_irqchip_state(map->irq,
>> +						    IRQCHIP_STATE_ACTIVE,
>> +						    true);
>> +			vgic_irq_set_queued(vcpu, irq);
>
> queued state was used for level sensitive IRQs only. Forwarded or "HW"
> IRQs theoretically can be edge or sensitive, right? If yes may be worth
> to justify the usage of queued state for forwarded IRQ? Also

That's because it is illegal to set a HW interrupt to be PENDING+ACTIVE,
which means we have to prevent the interrupt to be injected multiple
times. The behaviour is sufficiently close to what we do for a level
interrupt that we use the same state.

> vgic_irq_set_queued rather was called in parent vgic_queue_hwirq today.

I tried to keep the HW bit madness as localized as possible. Letting it
spread further away seems to make the code more difficult to read IMHO.

> 
>> +			WARN_ON(ret);
>> +		}
>> +	}
>> +
>>  	vgic_set_lr(vcpu, lr_nr, vlr);
>>  	vgic_sync_lr_elrsr(vcpu, lr_nr, vlr);
>>  }
>> @@ -1344,6 +1364,35 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
>>  	return level_pending;
>>  }
>>  
>> +/* Return 1 if HW interrupt went from active to inactive, and 0 otherwise */
>> +static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr)
>> +{
>> +	struct irq_phys_map *map;
>> +	int ret;
>> +
>> +	if (!(vlr.state & LR_HW))
>> +		return 0;
>> +
>> +	map = vgic_irq_map_search(vcpu, vlr.irq);
>> +	BUG_ON(!map || !map->active);
>> +
>> +	ret = irq_get_irqchip_state(map->irq,
>> +				    IRQCHIP_STATE_ACTIVE,
>> +				    &map->active);
>
> Doesn't it work because the virtual timer was disabled during the world
> switch. Does it characterize all "shared" devices? Difficult for me to
> understand how much this is specific to arch timer integration?

Shared devices cannot be left running when the guest is not running
because (a) we have lost the context (the guest), and (b) we need to
give it to another guest. This is a fundamental property of this kind of
resource.

This is by no mean specific to the timer, BTW. The VGIC itself is a
shared resource, and we nuke it on each exit, for the same reason. The
only difference is that we don't propagate the VGIC interrupt to a guest.

>> +
>> +	WARN_ON(ret);
>> +
>> +	if (map->active) {
>> +		ret = irq_set_irqchip_state(map->irq,
>> +					    IRQCHIP_STATE_ACTIVE,
>> +					    false);
>> +		WARN_ON(ret);
>> +		return 0;
>> +	}
>> +
>> +	return 1;
>> +}
>> +
>>  /* Sync back the VGIC state after a guest run */
>>  static void __kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
>>  {
>> @@ -1358,14 +1407,30 @@ static void __kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
>>  	elrsr = vgic_get_elrsr(vcpu);
>>  	elrsr_ptr = u64_to_bitmask(&elrsr);
>>  
>> -	/* Clear mappings for empty LRs */
>> -	for_each_set_bit(lr, elrsr_ptr, vgic->nr_lr) {
>> +	/* Deal with HW interrupts, and clear mappings for empty LRs */
>> +	for (lr = 0; lr < vgic->nr_lr; lr++) {
>>  		struct vgic_lr vlr;
>>  
>> -		if (!test_and_clear_bit(lr, vgic_cpu->lr_used))
>> +		if (!test_bit(lr, vgic_cpu->lr_used))
>>  			continue;
>>  
>>  		vlr = vgic_get_lr(vcpu, lr);
>> +		if (vgic_sync_hwirq(vcpu, vlr)) {
>> +			/*
>> +			 * So this is a HW interrupt that the guest
>> +			 * EOI-ed. Clean the LR state and allow the
>> +			 * interrupt to be queued again.
>> +			 */
>> +			vlr.state &= ~LR_HW;
>> +			vlr.hwirq = 0;
>> +			vgic_set_lr(vcpu, lr, vlr);
>> +			vgic_irq_clear_queued(vcpu, vlr.irq)
>
> not necessarily a level sensitive IRQ?

As explained above, we have the same requirements when an interrupt is
forwarded to a guest.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [PATCH 04/10] KVM: arm/arm64: vgic: Allow HW irq to be encoded in LR
  2015-06-17 11:53     ` Eric Auger
@ 2015-06-17 12:39       ` Marc Zyngier
  -1 siblings, 0 replies; 118+ messages in thread
From: Marc Zyngier @ 2015-06-17 12:39 UTC (permalink / raw)
  To: linux-arm-kernel

On 17/06/15 12:53, Eric Auger wrote:
> On 06/08/2015 07:03 PM, Marc Zyngier wrote:
>> Now that struct vgic_lr supports the LR_HW bit and carries a hwirq
>> field, we can encode that information into the list registers.
>>
>> This patch provides implementations for both GICv2 and GICv3.
>>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  include/linux/irqchip/arm-gic-v3.h |  3 +++
>>  include/linux/irqchip/arm-gic.h    |  3 ++-
>>  virt/kvm/arm/vgic-v2.c             | 16 +++++++++++++++-
>>  virt/kvm/arm/vgic-v3.c             | 21 ++++++++++++++++++---
>>  4 files changed, 38 insertions(+), 5 deletions(-)
>>
>> diff --git a/include/linux/irqchip/arm-gic-v3.h b/include/linux/irqchip/arm-gic-v3.h
>> index ffbc034..cf637d6 100644
>> --- a/include/linux/irqchip/arm-gic-v3.h
>> +++ b/include/linux/irqchip/arm-gic-v3.h
>> @@ -268,9 +268,12 @@
>>  
>>  #define ICH_LR_EOI			(1UL << 41)
>>  #define ICH_LR_GROUP			(1UL << 60)
>> +#define ICH_LR_HW			(1UL << 61)
>>  #define ICH_LR_STATE			(3UL << 62)
>>  #define ICH_LR_PENDING_BIT		(1UL << 62)
>>  #define ICH_LR_ACTIVE_BIT		(1UL << 63)
>> +#define ICH_LR_PHYS_ID_SHIFT		32
>> +#define ICH_LR_PHYS_ID_MASK		(0x3ffUL << ICH_LR_PHYS_ID_SHIFT)
>>  
>>  #define ICH_MISR_EOI			(1 << 0)
>>  #define ICH_MISR_U			(1 << 1)
>> diff --git a/include/linux/irqchip/arm-gic.h b/include/linux/irqchip/arm-gic.h
>> index 9de976b..ca88dad 100644
>> --- a/include/linux/irqchip/arm-gic.h
>> +++ b/include/linux/irqchip/arm-gic.h
>> @@ -71,11 +71,12 @@
>>  
>>  #define GICH_LR_VIRTUALID		(0x3ff << 0)
>>  #define GICH_LR_PHYSID_CPUID_SHIFT	(10)
>> -#define GICH_LR_PHYSID_CPUID		(7 << GICH_LR_PHYSID_CPUID_SHIFT)
>> +#define GICH_LR_PHYSID_CPUID		(0x3ff << GICH_LR_PHYSID_CPUID_SHIFT)
>>  #define GICH_LR_STATE			(3 << 28)
>>  #define GICH_LR_PENDING_BIT		(1 << 28)
>>  #define GICH_LR_ACTIVE_BIT		(1 << 29)
>>  #define GICH_LR_EOI			(1 << 19)
>> +#define GICH_LR_HW			(1 << 31)
>>  
>>  #define GICH_VMCR_CTRL_SHIFT		0
>>  #define GICH_VMCR_CTRL_MASK		(0x21f << GICH_VMCR_CTRL_SHIFT)
>> diff --git a/virt/kvm/arm/vgic-v2.c b/virt/kvm/arm/vgic-v2.c
>> index f9b9c7c..8d7b04d 100644
>> --- a/virt/kvm/arm/vgic-v2.c
>> +++ b/virt/kvm/arm/vgic-v2.c
>> @@ -48,6 +48,10 @@ static struct vgic_lr vgic_v2_get_lr(const struct kvm_vcpu *vcpu, int lr)
>>  		lr_desc.state |= LR_STATE_ACTIVE;
>>  	if (val & GICH_LR_EOI)
>>  		lr_desc.state |= LR_EOI_INT;
>> +	if (val & GICH_LR_HW) {
>> +		lr_desc.state |= LR_HW;
>> +		lr_desc.hwirq = (val & GICH_LR_PHYSID_CPUID) >> GICH_LR_PHYSID_CPUID_SHIFT;
>> +	}
>>  
>>  	return lr_desc;
>>  }
>> @@ -55,7 +59,9 @@ static struct vgic_lr vgic_v2_get_lr(const struct kvm_vcpu *vcpu, int lr)
>>  static void vgic_v2_set_lr(struct kvm_vcpu *vcpu, int lr,
>>  			   struct vgic_lr lr_desc)
>>  {
>> -	u32 lr_val = (lr_desc.source << GICH_LR_PHYSID_CPUID_SHIFT) | lr_desc.irq;
>> +	u32 lr_val;
>> +
>> +	lr_val = lr_desc.irq;
>>  
>>  	if (lr_desc.state & LR_STATE_PENDING)
>>  		lr_val |= GICH_LR_PENDING_BIT;
>> @@ -64,6 +70,14 @@ static void vgic_v2_set_lr(struct kvm_vcpu *vcpu, int lr,
>>  	if (lr_desc.state & LR_EOI_INT)
>>  		lr_val |= GICH_LR_EOI;
>>  
>> +	if (lr_desc.state & LR_HW) {
>> +		lr_val |= GICH_LR_HW;
>> +		lr_val |= (u32)lr_desc.hwirq << GICH_LR_PHYSID_CPUID_SHIFT;
>
> shouldn't we test somewhere that the hwirq is between 16 and 1019. Else
> behavior is unpredictable according to v2 spec. when queuing into the LR
> we currently check the linux irq vlr.irq >= VGIC_NR_SGIS if I am not wrong.

This is actually implicit. vgic_map_phys_irq() takes a parameter (irq)
that is the Linux view of the hwirq we're dealing with (we fetch this
hwirq by traversing the irq_data list associated with irq).

SGIs are not part of the set of interrupts that can be mapped to a Linux
irq (their usage is completely private to the two GIC drivers).

Note that GICv3 allows SGIs to be set as a physical interrupt in an LR
though, but this is not a feature we use so far.

> besides Reviewed-by: Eric Auger <eric.auger@linaro.org>

Thanks!

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 04/10] KVM: arm/arm64: vgic: Allow HW irq to be encoded in LR
@ 2015-06-17 12:39       ` Marc Zyngier
  0 siblings, 0 replies; 118+ messages in thread
From: Marc Zyngier @ 2015-06-17 12:39 UTC (permalink / raw)
  To: Eric Auger, kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org
  Cc: Christoffer Dall, Alex Bennée, Andre Przywara

On 17/06/15 12:53, Eric Auger wrote:
> On 06/08/2015 07:03 PM, Marc Zyngier wrote:
>> Now that struct vgic_lr supports the LR_HW bit and carries a hwirq
>> field, we can encode that information into the list registers.
>>
>> This patch provides implementations for both GICv2 and GICv3.
>>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  include/linux/irqchip/arm-gic-v3.h |  3 +++
>>  include/linux/irqchip/arm-gic.h    |  3 ++-
>>  virt/kvm/arm/vgic-v2.c             | 16 +++++++++++++++-
>>  virt/kvm/arm/vgic-v3.c             | 21 ++++++++++++++++++---
>>  4 files changed, 38 insertions(+), 5 deletions(-)
>>
>> diff --git a/include/linux/irqchip/arm-gic-v3.h b/include/linux/irqchip/arm-gic-v3.h
>> index ffbc034..cf637d6 100644
>> --- a/include/linux/irqchip/arm-gic-v3.h
>> +++ b/include/linux/irqchip/arm-gic-v3.h
>> @@ -268,9 +268,12 @@
>>  
>>  #define ICH_LR_EOI			(1UL << 41)
>>  #define ICH_LR_GROUP			(1UL << 60)
>> +#define ICH_LR_HW			(1UL << 61)
>>  #define ICH_LR_STATE			(3UL << 62)
>>  #define ICH_LR_PENDING_BIT		(1UL << 62)
>>  #define ICH_LR_ACTIVE_BIT		(1UL << 63)
>> +#define ICH_LR_PHYS_ID_SHIFT		32
>> +#define ICH_LR_PHYS_ID_MASK		(0x3ffUL << ICH_LR_PHYS_ID_SHIFT)
>>  
>>  #define ICH_MISR_EOI			(1 << 0)
>>  #define ICH_MISR_U			(1 << 1)
>> diff --git a/include/linux/irqchip/arm-gic.h b/include/linux/irqchip/arm-gic.h
>> index 9de976b..ca88dad 100644
>> --- a/include/linux/irqchip/arm-gic.h
>> +++ b/include/linux/irqchip/arm-gic.h
>> @@ -71,11 +71,12 @@
>>  
>>  #define GICH_LR_VIRTUALID		(0x3ff << 0)
>>  #define GICH_LR_PHYSID_CPUID_SHIFT	(10)
>> -#define GICH_LR_PHYSID_CPUID		(7 << GICH_LR_PHYSID_CPUID_SHIFT)
>> +#define GICH_LR_PHYSID_CPUID		(0x3ff << GICH_LR_PHYSID_CPUID_SHIFT)
>>  #define GICH_LR_STATE			(3 << 28)
>>  #define GICH_LR_PENDING_BIT		(1 << 28)
>>  #define GICH_LR_ACTIVE_BIT		(1 << 29)
>>  #define GICH_LR_EOI			(1 << 19)
>> +#define GICH_LR_HW			(1 << 31)
>>  
>>  #define GICH_VMCR_CTRL_SHIFT		0
>>  #define GICH_VMCR_CTRL_MASK		(0x21f << GICH_VMCR_CTRL_SHIFT)
>> diff --git a/virt/kvm/arm/vgic-v2.c b/virt/kvm/arm/vgic-v2.c
>> index f9b9c7c..8d7b04d 100644
>> --- a/virt/kvm/arm/vgic-v2.c
>> +++ b/virt/kvm/arm/vgic-v2.c
>> @@ -48,6 +48,10 @@ static struct vgic_lr vgic_v2_get_lr(const struct kvm_vcpu *vcpu, int lr)
>>  		lr_desc.state |= LR_STATE_ACTIVE;
>>  	if (val & GICH_LR_EOI)
>>  		lr_desc.state |= LR_EOI_INT;
>> +	if (val & GICH_LR_HW) {
>> +		lr_desc.state |= LR_HW;
>> +		lr_desc.hwirq = (val & GICH_LR_PHYSID_CPUID) >> GICH_LR_PHYSID_CPUID_SHIFT;
>> +	}
>>  
>>  	return lr_desc;
>>  }
>> @@ -55,7 +59,9 @@ static struct vgic_lr vgic_v2_get_lr(const struct kvm_vcpu *vcpu, int lr)
>>  static void vgic_v2_set_lr(struct kvm_vcpu *vcpu, int lr,
>>  			   struct vgic_lr lr_desc)
>>  {
>> -	u32 lr_val = (lr_desc.source << GICH_LR_PHYSID_CPUID_SHIFT) | lr_desc.irq;
>> +	u32 lr_val;
>> +
>> +	lr_val = lr_desc.irq;
>>  
>>  	if (lr_desc.state & LR_STATE_PENDING)
>>  		lr_val |= GICH_LR_PENDING_BIT;
>> @@ -64,6 +70,14 @@ static void vgic_v2_set_lr(struct kvm_vcpu *vcpu, int lr,
>>  	if (lr_desc.state & LR_EOI_INT)
>>  		lr_val |= GICH_LR_EOI;
>>  
>> +	if (lr_desc.state & LR_HW) {
>> +		lr_val |= GICH_LR_HW;
>> +		lr_val |= (u32)lr_desc.hwirq << GICH_LR_PHYSID_CPUID_SHIFT;
>
> shouldn't we test somewhere that the hwirq is between 16 and 1019. Else
> behavior is unpredictable according to v2 spec. when queuing into the LR
> we currently check the linux irq vlr.irq >= VGIC_NR_SGIS if I am not wrong.

This is actually implicit. vgic_map_phys_irq() takes a parameter (irq)
that is the Linux view of the hwirq we're dealing with (we fetch this
hwirq by traversing the irq_data list associated with irq).

SGIs are not part of the set of interrupts that can be mapped to a Linux
irq (their usage is completely private to the two GIC drivers).

Note that GICv3 allows SGIs to be set as a physical interrupt in an LR
though, but this is not a feature we use so far.

> besides Reviewed-by: Eric Auger <eric.auger@linaro.org>

Thanks!

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [PATCH 04/10] KVM: arm/arm64: vgic: Allow HW irq to be encoded in LR
  2015-06-17 11:53     ` Eric Auger
@ 2015-06-17 13:21       ` Peter Maydell
  -1 siblings, 0 replies; 118+ messages in thread
From: Peter Maydell @ 2015-06-17 13:21 UTC (permalink / raw)
  To: linux-arm-kernel

On 17 June 2015 at 12:53, Eric Auger <eric.auger@linaro.org> wrote:
> shouldn't we test somewhere that the hwirq is between 16 and 1019.

Not directly related, but that reminds me that I noticed the
other day that we have VGIC_MAX_IRQS = 1024 (and use that as a
guard on how many irqs we let userspace configure and ask us
to deliver), but that doesn't account for the couple of magic
numbers at the top of the range. I think that lets userspace
cause us to do UNPREDICTABLE things to the GIC...

-- PMM

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 04/10] KVM: arm/arm64: vgic: Allow HW irq to be encoded in LR
@ 2015-06-17 13:21       ` Peter Maydell
  0 siblings, 0 replies; 118+ messages in thread
From: Peter Maydell @ 2015-06-17 13:21 UTC (permalink / raw)
  To: Eric Auger
  Cc: Marc Zyngier, kvm-devel, kvmarm@lists.cs.columbia.edu,
	arm-mail-list, Andre Przywara

On 17 June 2015 at 12:53, Eric Auger <eric.auger@linaro.org> wrote:
> shouldn't we test somewhere that the hwirq is between 16 and 1019.

Not directly related, but that reminds me that I noticed the
other day that we have VGIC_MAX_IRQS = 1024 (and use that as a
guard on how many irqs we let userspace configure and ask us
to deliver), but that doesn't account for the couple of magic
numbers at the top of the range. I think that lets userspace
cause us to do UNPREDICTABLE things to the GIC...

-- PMM

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [PATCH 04/10] KVM: arm/arm64: vgic: Allow HW irq to be encoded in LR
  2015-06-17 13:21       ` Peter Maydell
@ 2015-06-17 13:34         ` Marc Zyngier
  -1 siblings, 0 replies; 118+ messages in thread
From: Marc Zyngier @ 2015-06-17 13:34 UTC (permalink / raw)
  To: linux-arm-kernel

On 17/06/15 14:21, Peter Maydell wrote:
> On 17 June 2015 at 12:53, Eric Auger <eric.auger@linaro.org> wrote:
>> shouldn't we test somewhere that the hwirq is between 16 and 1019.
> 
> Not directly related, but that reminds me that I noticed the
> other day that we have VGIC_MAX_IRQS = 1024 (and use that as a
> guard on how many irqs we let userspace configure and ask us
> to deliver), but that doesn't account for the couple of magic
> numbers at the top of the range. I think that lets userspace
> cause us to do UNPREDICTABLE things to the GIC...

Good point. How about the following:

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 78fb820..950064a 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -1561,7 +1561,7 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int irq_num,
 			goto out;
 	}
 
-	if (irq_num >= kvm->arch.vgic.nr_irqs)
+	if (irq_num >= min(kvm->arch.vgic.nr_irqs, 1020))
 		return -EINVAL;
 
 	vcpu_id = vgic_update_irq_pending(kvm, cpuid, irq_num, level);
@@ -2161,10 +2161,7 @@ int kvm_set_irq(struct kvm *kvm, int irq_source_id,
 
 	BUG_ON(!vgic_initialized(kvm));
 
-	if (spi > kvm->arch.vgic.nr_irqs)
-		return -EINVAL;
 	return kvm_vgic_inject_irq(kvm, 0, spi, level);
-
 }
 
 /* MSI not implemented yet */

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* Re: [PATCH 04/10] KVM: arm/arm64: vgic: Allow HW irq to be encoded in LR
@ 2015-06-17 13:34         ` Marc Zyngier
  0 siblings, 0 replies; 118+ messages in thread
From: Marc Zyngier @ 2015-06-17 13:34 UTC (permalink / raw)
  To: Peter Maydell, Eric Auger
  Cc: kvm-devel, kvmarm@lists.cs.columbia.edu, arm-mail-list,
	Andre Przywara

On 17/06/15 14:21, Peter Maydell wrote:
> On 17 June 2015 at 12:53, Eric Auger <eric.auger@linaro.org> wrote:
>> shouldn't we test somewhere that the hwirq is between 16 and 1019.
> 
> Not directly related, but that reminds me that I noticed the
> other day that we have VGIC_MAX_IRQS = 1024 (and use that as a
> guard on how many irqs we let userspace configure and ask us
> to deliver), but that doesn't account for the couple of magic
> numbers at the top of the range. I think that lets userspace
> cause us to do UNPREDICTABLE things to the GIC...

Good point. How about the following:

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 78fb820..950064a 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -1561,7 +1561,7 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int irq_num,
 			goto out;
 	}
 
-	if (irq_num >= kvm->arch.vgic.nr_irqs)
+	if (irq_num >= min(kvm->arch.vgic.nr_irqs, 1020))
 		return -EINVAL;
 
 	vcpu_id = vgic_update_irq_pending(kvm, cpuid, irq_num, level);
@@ -2161,10 +2161,7 @@ int kvm_set_irq(struct kvm *kvm, int irq_source_id,
 
 	BUG_ON(!vgic_initialized(kvm));
 
-	if (spi > kvm->arch.vgic.nr_irqs)
-		return -EINVAL;
 	return kvm_vgic_inject_irq(kvm, 0, spi, level);
-
 }
 
 /* MSI not implemented yet */

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [PATCH 10/10] KVM: arm/arm64: vgic: Allow non-shared device HW interrupts
  2015-06-08 17:04   ` Marc Zyngier
@ 2015-06-17 15:11     ` Eric Auger
  -1 siblings, 0 replies; 118+ messages in thread
From: Eric Auger @ 2015-06-17 15:11 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Marc,
On 06/08/2015 07:04 PM, Marc Zyngier wrote:
> So far, the only use of the HW interrupt facility is the timer,
> implying that the active state is context-switched for each vcpu,
> as the device is is shared across all vcpus.
s/is//
> 
> This does not work for a device that has been assigned to a VM,
> as the guest is entierely in control of that device (the HW is
entirely?
> not shared). In that case, it makes sense to bypass the whole
> active state srtwitchint, and only track the deactivation of the
switching
> interrupt.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  include/kvm/arm_vgic.h    |  5 +++--
>  virt/kvm/arm/arch_timer.c |  2 +-
>  virt/kvm/arm/vgic.c       | 37 ++++++++++++++++++++++++-------------
>  3 files changed, 28 insertions(+), 16 deletions(-)
> 
> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> index 1c653c1..5d47d60 100644
> --- a/include/kvm/arm_vgic.h
> +++ b/include/kvm/arm_vgic.h
> @@ -164,7 +164,8 @@ struct irq_phys_map {
>  	u32			virt_irq;
>  	u32			phys_irq;
>  	u32			irq;
> -	bool			active;
> +	bool			shared;
> +	bool			active; /* Only valid if shared */
>  };
>  
>  struct vgic_dist {
> @@ -347,7 +348,7 @@ void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg);
>  int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
>  int kvm_vgic_vcpu_active_irq(struct kvm_vcpu *vcpu);
>  struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
> -				       int virt_irq, int irq);
> +				       int virt_irq, int irq, bool shared);
>  int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map);
>  bool vgic_get_phys_irq_active(struct irq_phys_map *map);
>  void vgic_set_phys_irq_active(struct irq_phys_map *map, bool active);
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index b9fff78..9544d79 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -202,7 +202,7 @@ void kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
>  	 * Tell the VGIC that the virtual interrupt is tied to a
>  	 * physical interrupt. We do that once per VCPU.
>  	 */
> -	timer->map = vgic_map_phys_irq(vcpu, irq->irq, host_vtimer_irq);
> +	timer->map = vgic_map_phys_irq(vcpu, irq->irq, host_vtimer_irq, true);
>  	WARN_ON(!timer->map);
>  }
>  
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index f376b56..4223166 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -1125,18 +1125,21 @@ static void vgic_queue_irq_to_lr(struct kvm_vcpu *vcpu, int irq,
>  		map = vgic_irq_map_search(vcpu, irq);
>  
>  		if (map) {
> -			int ret;
> -
> -			BUG_ON(!map->active);
>  			vlr.hwirq = map->phys_irq;
>  			vlr.state |= LR_HW;
>  			vlr.state &= ~LR_EOI_INT;
>  
> -			ret = irq_set_irqchip_state(map->irq,
> -						    IRQCHIP_STATE_ACTIVE,
> -						    true);
>  			vgic_irq_set_queued(vcpu, irq);
the queued state is set again in vgic_queue_hwirq for level_sensitive
IRQs although not harmful.
> -			WARN_ON(ret);
> +
> +			if (map->shared) {
> +				int ret;
> +
> +				BUG_ON(!map->active);
> +				ret = irq_set_irqchip_state(map->irq,
> +							    IRQCHIP_STATE_ACTIVE,
> +							    true);
> +				WARN_ON(ret);
> +			}
>  		}
>  	}
>  
> @@ -1368,21 +1371,28 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
>  static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr)
>  {
>  	struct irq_phys_map *map;
> +	bool active;
>  	int ret;
>  
>  	if (!(vlr.state & LR_HW))
>  		return 0;
>  
>  	map = vgic_irq_map_search(vcpu, vlr.irq);
> -	BUG_ON(!map || !map->active);
> +	BUG_ON(!map);
> +	BUG_ON(map->shared && !map->active);
>  
>  	ret = irq_get_irqchip_state(map->irq,
>  				    IRQCHIP_STATE_ACTIVE,
> -				    &map->active);
> +				    &active);
>  
In case of non shared and EOIMode = 1 - I know this is not your current
interest here though ;-) - , once the guest EOIs its virtual IRQ and GIC
deactivates the physical one, a new phys IRQ can hit immediatly, the
physical handler can be entered and the state is seen as active here.
The queued state is never reset in such a case and the system gets stuck
since the can_sample fails I think. What I mean here is sounds the state
machine as is does not work for my VFIO case. So some adaptations still
are needed I think. Do you share my diagnosis?

Eric
>  
> -	if (map->active) {
> +	if (!map->shared)
> +		return !active;
> +
> +	map->active = active;
> +
> +	if (active) {
>  		ret = irq_set_irqchip_state(map->irq,
>  					    IRQCHIP_STATE_ACTIVE,
>  					    false);
> @@ -1663,7 +1673,7 @@ static struct rb_root *vgic_get_irq_phys_map(struct kvm_vcpu *vcpu,
>  }
>  
>  struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
> -				       int virt_irq, int irq)
> +				       int virt_irq, int irq, bool shared)
>  {
>  	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
>  	struct rb_root *root = vgic_get_irq_phys_map(vcpu, virt_irq);
> @@ -1710,6 +1720,7 @@ struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
>  	new_map->virt_irq = virt_irq;
>  	new_map->phys_irq = phys_irq;
>  	new_map->irq = irq;
> +	new_map->shared = shared;
>  
>  	rb_link_node(&new_map->node, parent, new);
>  	rb_insert_color(&new_map->node, root);
> @@ -1746,13 +1757,13 @@ static struct irq_phys_map *vgic_irq_map_search(struct kvm_vcpu *vcpu,
>  
>  bool vgic_get_phys_irq_active(struct irq_phys_map *map)
>  {
> -	BUG_ON(!map);
> +	BUG_ON(!map || !map->shared);
>  	return map->active;
>  }
>  
>  void vgic_set_phys_irq_active(struct irq_phys_map *map, bool active)
>  {
> -	BUG_ON(!map);
> +	BUG_ON(!map || !map->shared);
>  	map->active = active;
>  }
>  
> 

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 10/10] KVM: arm/arm64: vgic: Allow non-shared device HW interrupts
@ 2015-06-17 15:11     ` Eric Auger
  0 siblings, 0 replies; 118+ messages in thread
From: Eric Auger @ 2015-06-17 15:11 UTC (permalink / raw)
  To: Marc Zyngier, kvm, kvmarm, linux-arm-kernel; +Cc: Andre Przywara

Hi Marc,
On 06/08/2015 07:04 PM, Marc Zyngier wrote:
> So far, the only use of the HW interrupt facility is the timer,
> implying that the active state is context-switched for each vcpu,
> as the device is is shared across all vcpus.
s/is//
> 
> This does not work for a device that has been assigned to a VM,
> as the guest is entierely in control of that device (the HW is
entirely?
> not shared). In that case, it makes sense to bypass the whole
> active state srtwitchint, and only track the deactivation of the
switching
> interrupt.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  include/kvm/arm_vgic.h    |  5 +++--
>  virt/kvm/arm/arch_timer.c |  2 +-
>  virt/kvm/arm/vgic.c       | 37 ++++++++++++++++++++++++-------------
>  3 files changed, 28 insertions(+), 16 deletions(-)
> 
> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> index 1c653c1..5d47d60 100644
> --- a/include/kvm/arm_vgic.h
> +++ b/include/kvm/arm_vgic.h
> @@ -164,7 +164,8 @@ struct irq_phys_map {
>  	u32			virt_irq;
>  	u32			phys_irq;
>  	u32			irq;
> -	bool			active;
> +	bool			shared;
> +	bool			active; /* Only valid if shared */
>  };
>  
>  struct vgic_dist {
> @@ -347,7 +348,7 @@ void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg);
>  int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
>  int kvm_vgic_vcpu_active_irq(struct kvm_vcpu *vcpu);
>  struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
> -				       int virt_irq, int irq);
> +				       int virt_irq, int irq, bool shared);
>  int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map);
>  bool vgic_get_phys_irq_active(struct irq_phys_map *map);
>  void vgic_set_phys_irq_active(struct irq_phys_map *map, bool active);
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index b9fff78..9544d79 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -202,7 +202,7 @@ void kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
>  	 * Tell the VGIC that the virtual interrupt is tied to a
>  	 * physical interrupt. We do that once per VCPU.
>  	 */
> -	timer->map = vgic_map_phys_irq(vcpu, irq->irq, host_vtimer_irq);
> +	timer->map = vgic_map_phys_irq(vcpu, irq->irq, host_vtimer_irq, true);
>  	WARN_ON(!timer->map);
>  }
>  
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index f376b56..4223166 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -1125,18 +1125,21 @@ static void vgic_queue_irq_to_lr(struct kvm_vcpu *vcpu, int irq,
>  		map = vgic_irq_map_search(vcpu, irq);
>  
>  		if (map) {
> -			int ret;
> -
> -			BUG_ON(!map->active);
>  			vlr.hwirq = map->phys_irq;
>  			vlr.state |= LR_HW;
>  			vlr.state &= ~LR_EOI_INT;
>  
> -			ret = irq_set_irqchip_state(map->irq,
> -						    IRQCHIP_STATE_ACTIVE,
> -						    true);
>  			vgic_irq_set_queued(vcpu, irq);
the queued state is set again in vgic_queue_hwirq for level_sensitive
IRQs although not harmful.
> -			WARN_ON(ret);
> +
> +			if (map->shared) {
> +				int ret;
> +
> +				BUG_ON(!map->active);
> +				ret = irq_set_irqchip_state(map->irq,
> +							    IRQCHIP_STATE_ACTIVE,
> +							    true);
> +				WARN_ON(ret);
> +			}
>  		}
>  	}
>  
> @@ -1368,21 +1371,28 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
>  static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr)
>  {
>  	struct irq_phys_map *map;
> +	bool active;
>  	int ret;
>  
>  	if (!(vlr.state & LR_HW))
>  		return 0;
>  
>  	map = vgic_irq_map_search(vcpu, vlr.irq);
> -	BUG_ON(!map || !map->active);
> +	BUG_ON(!map);
> +	BUG_ON(map->shared && !map->active);
>  
>  	ret = irq_get_irqchip_state(map->irq,
>  				    IRQCHIP_STATE_ACTIVE,
> -				    &map->active);
> +				    &active);
>  
In case of non shared and EOIMode = 1 - I know this is not your current
interest here though ;-) - , once the guest EOIs its virtual IRQ and GIC
deactivates the physical one, a new phys IRQ can hit immediatly, the
physical handler can be entered and the state is seen as active here.
The queued state is never reset in such a case and the system gets stuck
since the can_sample fails I think. What I mean here is sounds the state
machine as is does not work for my VFIO case. So some adaptations still
are needed I think. Do you share my diagnosis?

Eric
>  
> -	if (map->active) {
> +	if (!map->shared)
> +		return !active;
> +
> +	map->active = active;
> +
> +	if (active) {
>  		ret = irq_set_irqchip_state(map->irq,
>  					    IRQCHIP_STATE_ACTIVE,
>  					    false);
> @@ -1663,7 +1673,7 @@ static struct rb_root *vgic_get_irq_phys_map(struct kvm_vcpu *vcpu,
>  }
>  
>  struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
> -				       int virt_irq, int irq)
> +				       int virt_irq, int irq, bool shared)
>  {
>  	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
>  	struct rb_root *root = vgic_get_irq_phys_map(vcpu, virt_irq);
> @@ -1710,6 +1720,7 @@ struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
>  	new_map->virt_irq = virt_irq;
>  	new_map->phys_irq = phys_irq;
>  	new_map->irq = irq;
> +	new_map->shared = shared;
>  
>  	rb_link_node(&new_map->node, parent, new);
>  	rb_insert_color(&new_map->node, root);
> @@ -1746,13 +1757,13 @@ static struct irq_phys_map *vgic_irq_map_search(struct kvm_vcpu *vcpu,
>  
>  bool vgic_get_phys_irq_active(struct irq_phys_map *map)
>  {
> -	BUG_ON(!map);
> +	BUG_ON(!map || !map->shared);
>  	return map->active;
>  }
>  
>  void vgic_set_phys_irq_active(struct irq_phys_map *map, bool active)
>  {
> -	BUG_ON(!map);
> +	BUG_ON(!map || !map->shared);
>  	map->active = active;
>  }
>  
> 

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [PATCH 08/10] KVM: arm/arm64: vgic: Add vgic_{get, set}_phys_irq_active
  2015-06-08 17:04   ` [PATCH 08/10] KVM: arm/arm64: vgic: Add vgic_{get,set}_phys_irq_active Marc Zyngier
@ 2015-06-17 15:11     ` Eric Auger
  -1 siblings, 0 replies; 118+ messages in thread
From: Eric Auger @ 2015-06-17 15:11 UTC (permalink / raw)
  To: linux-arm-kernel

Reviewed-by: Eric Auger <eric.auger@linaro.org>
On 06/08/2015 07:04 PM, Marc Zyngier wrote:
> In order to control the active state of an interrupt, introduce
> a pair of accessors allowing the state to be set/queried.
> 
> This only affects the logical state, and the HW state will only be
> applied at world-switch time.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  include/kvm/arm_vgic.h |  2 ++
>  virt/kvm/arm/vgic.c    | 12 ++++++++++++
>  2 files changed, 14 insertions(+)
> 
> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> index 33d121a..1c653c1 100644
> --- a/include/kvm/arm_vgic.h
> +++ b/include/kvm/arm_vgic.h
> @@ -349,6 +349,8 @@ int kvm_vgic_vcpu_active_irq(struct kvm_vcpu *vcpu);
>  struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
>  				       int virt_irq, int irq);
>  int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map);
> +bool vgic_get_phys_irq_active(struct irq_phys_map *map);
> +void vgic_set_phys_irq_active(struct irq_phys_map *map, bool active);
>  
>  #define irqchip_in_kernel(k)	(!!((k)->arch.vgic.in_kernel))
>  #define vgic_initialized(k)	(!!((k)->arch.vgic.nr_cpus))
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 495ac7d..f376b56 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -1744,6 +1744,18 @@ static struct irq_phys_map *vgic_irq_map_search(struct kvm_vcpu *vcpu,
>  	return this;
>  }
>  
> +bool vgic_get_phys_irq_active(struct irq_phys_map *map)
> +{
> +	BUG_ON(!map);
> +	return map->active;
> +}
> +
> +void vgic_set_phys_irq_active(struct irq_phys_map *map, bool active)
> +{
> +	BUG_ON(!map);
> +	map->active = active;
> +}
> +
>  int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map)
>  {
>  	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
> 

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 08/10] KVM: arm/arm64: vgic: Add vgic_{get,set}_phys_irq_active
@ 2015-06-17 15:11     ` Eric Auger
  0 siblings, 0 replies; 118+ messages in thread
From: Eric Auger @ 2015-06-17 15:11 UTC (permalink / raw)
  To: Marc Zyngier, kvm, kvmarm, linux-arm-kernel
  Cc: Christoffer Dall, Alex Bennée, Andre Przywara

Reviewed-by: Eric Auger <eric.auger@linaro.org>
On 06/08/2015 07:04 PM, Marc Zyngier wrote:
> In order to control the active state of an interrupt, introduce
> a pair of accessors allowing the state to be set/queried.
> 
> This only affects the logical state, and the HW state will only be
> applied at world-switch time.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  include/kvm/arm_vgic.h |  2 ++
>  virt/kvm/arm/vgic.c    | 12 ++++++++++++
>  2 files changed, 14 insertions(+)
> 
> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> index 33d121a..1c653c1 100644
> --- a/include/kvm/arm_vgic.h
> +++ b/include/kvm/arm_vgic.h
> @@ -349,6 +349,8 @@ int kvm_vgic_vcpu_active_irq(struct kvm_vcpu *vcpu);
>  struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
>  				       int virt_irq, int irq);
>  int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map);
> +bool vgic_get_phys_irq_active(struct irq_phys_map *map);
> +void vgic_set_phys_irq_active(struct irq_phys_map *map, bool active);
>  
>  #define irqchip_in_kernel(k)	(!!((k)->arch.vgic.in_kernel))
>  #define vgic_initialized(k)	(!!((k)->arch.vgic.nr_cpus))
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 495ac7d..f376b56 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -1744,6 +1744,18 @@ static struct irq_phys_map *vgic_irq_map_search(struct kvm_vcpu *vcpu,
>  	return this;
>  }
>  
> +bool vgic_get_phys_irq_active(struct irq_phys_map *map)
> +{
> +	BUG_ON(!map);
> +	return map->active;
> +}
> +
> +void vgic_set_phys_irq_active(struct irq_phys_map *map, bool active)
> +{
> +	BUG_ON(!map);
> +	map->active = active;
> +}
> +
>  int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map)
>  {
>  	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
> 


^ permalink raw reply	[flat|nested] 118+ messages in thread

* [PATCH 10/10] KVM: arm/arm64: vgic: Allow non-shared device HW interrupts
  2015-06-17 15:11     ` Eric Auger
@ 2015-06-17 15:37       ` Marc Zyngier
  -1 siblings, 0 replies; 118+ messages in thread
From: Marc Zyngier @ 2015-06-17 15:37 UTC (permalink / raw)
  To: linux-arm-kernel

On 17/06/15 16:11, Eric Auger wrote:
> Hi Marc,
> On 06/08/2015 07:04 PM, Marc Zyngier wrote:
>> So far, the only use of the HW interrupt facility is the timer,
>> implying that the active state is context-switched for each vcpu,
>> as the device is is shared across all vcpus.
> s/is//
>>
>> This does not work for a device that has been assigned to a VM,
>> as the guest is entierely in control of that device (the HW is
> entirely?
>> not shared). In that case, it makes sense to bypass the whole
>> active state srtwitchint, and only track the deactivation of the
> switching

Congratulations, I think you're now ready to try deciphering my
handwriting... ;-)

>> interrupt.
>>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  include/kvm/arm_vgic.h    |  5 +++--
>>  virt/kvm/arm/arch_timer.c |  2 +-
>>  virt/kvm/arm/vgic.c       | 37 ++++++++++++++++++++++++-------------
>>  3 files changed, 28 insertions(+), 16 deletions(-)
>>
>> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
>> index 1c653c1..5d47d60 100644
>> --- a/include/kvm/arm_vgic.h
>> +++ b/include/kvm/arm_vgic.h
>> @@ -164,7 +164,8 @@ struct irq_phys_map {
>>  	u32			virt_irq;
>>  	u32			phys_irq;
>>  	u32			irq;
>> -	bool			active;
>> +	bool			shared;
>> +	bool			active; /* Only valid if shared */
>>  };
>>  
>>  struct vgic_dist {
>> @@ -347,7 +348,7 @@ void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg);
>>  int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
>>  int kvm_vgic_vcpu_active_irq(struct kvm_vcpu *vcpu);
>>  struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
>> -				       int virt_irq, int irq);
>> +				       int virt_irq, int irq, bool shared);
>>  int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map);
>>  bool vgic_get_phys_irq_active(struct irq_phys_map *map);
>>  void vgic_set_phys_irq_active(struct irq_phys_map *map, bool active);
>> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
>> index b9fff78..9544d79 100644
>> --- a/virt/kvm/arm/arch_timer.c
>> +++ b/virt/kvm/arm/arch_timer.c
>> @@ -202,7 +202,7 @@ void kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
>>  	 * Tell the VGIC that the virtual interrupt is tied to a
>>  	 * physical interrupt. We do that once per VCPU.
>>  	 */
>> -	timer->map = vgic_map_phys_irq(vcpu, irq->irq, host_vtimer_irq);
>> +	timer->map = vgic_map_phys_irq(vcpu, irq->irq, host_vtimer_irq, true);
>>  	WARN_ON(!timer->map);
>>  }
>>  
>> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
>> index f376b56..4223166 100644
>> --- a/virt/kvm/arm/vgic.c
>> +++ b/virt/kvm/arm/vgic.c
>> @@ -1125,18 +1125,21 @@ static void vgic_queue_irq_to_lr(struct kvm_vcpu *vcpu, int irq,
>>  		map = vgic_irq_map_search(vcpu, irq);
>>  
>>  		if (map) {
>> -			int ret;
>> -
>> -			BUG_ON(!map->active);
>>  			vlr.hwirq = map->phys_irq;
>>  			vlr.state |= LR_HW;
>>  			vlr.state &= ~LR_EOI_INT;
>>  
>> -			ret = irq_set_irqchip_state(map->irq,
>> -						    IRQCHIP_STATE_ACTIVE,
>> -						    true);
>>  			vgic_irq_set_queued(vcpu, irq);
>
> the queued state is set again in vgic_queue_hwirq for level_sensitive
> IRQs although not harmful.

Indeed. We still need it for edge interrupts though. I'll try to find a
nicer way...

>> -			WARN_ON(ret);
>> +
>> +			if (map->shared) {
>> +				int ret;
>> +
>> +				BUG_ON(!map->active);
>> +				ret = irq_set_irqchip_state(map->irq,
>> +							    IRQCHIP_STATE_ACTIVE,
>> +							    true);
>> +				WARN_ON(ret);
>> +			}
>>  		}
>>  	}
>>  
>> @@ -1368,21 +1371,28 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
>>  static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr)
>>  {
>>  	struct irq_phys_map *map;
>> +	bool active;
>>  	int ret;
>>  
>>  	if (!(vlr.state & LR_HW))
>>  		return 0;
>>  
>>  	map = vgic_irq_map_search(vcpu, vlr.irq);
>> -	BUG_ON(!map || !map->active);
>> +	BUG_ON(!map);
>> +	BUG_ON(map->shared && !map->active);
>>  
>>  	ret = irq_get_irqchip_state(map->irq,
>>  				    IRQCHIP_STATE_ACTIVE,
>> -				    &map->active);
>> +				    &active);
>>  
> In case of non shared and EOIMode = 1 - I know this is not your current
> interest here though ;-) - , once the guest EOIs its virtual IRQ and GIC
> deactivates the physical one, a new phys IRQ can hit immediatly, the
> physical handler can be entered and the state is seen as active here.
> The queued state is never reset in such a case and the system gets stuck
> since the can_sample fails I think. What I mean here is sounds the state
> machine as is does not work for my VFIO case. So some adaptations still
> are needed I think. Do you share my diagnosis?

Yup, there is something that doesn't quite work here.

I think the mistake is to sample the distributor active state. I wonder
if I can simply rely on the LR state. If it is neither pending nor
active, it means that we have done the deactivation, and we can then
reset the queued state.

As a bonus, it would save a read from MMIO, which is often dog slow.

Thoughts?

	M.
> 
> Eric
>>  
>> -	if (map->active) {
>> +	if (!map->shared)
>> +		return !active;
>> +
>> +	map->active = active;
>> +
>> +	if (active) {
>>  		ret = irq_set_irqchip_state(map->irq,
>>  					    IRQCHIP_STATE_ACTIVE,
>>  					    false);
>> @@ -1663,7 +1673,7 @@ static struct rb_root *vgic_get_irq_phys_map(struct kvm_vcpu *vcpu,
>>  }
>>  
>>  struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
>> -				       int virt_irq, int irq)
>> +				       int virt_irq, int irq, bool shared)
>>  {
>>  	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
>>  	struct rb_root *root = vgic_get_irq_phys_map(vcpu, virt_irq);
>> @@ -1710,6 +1720,7 @@ struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
>>  	new_map->virt_irq = virt_irq;
>>  	new_map->phys_irq = phys_irq;
>>  	new_map->irq = irq;
>> +	new_map->shared = shared;
>>  
>>  	rb_link_node(&new_map->node, parent, new);
>>  	rb_insert_color(&new_map->node, root);
>> @@ -1746,13 +1757,13 @@ static struct irq_phys_map *vgic_irq_map_search(struct kvm_vcpu *vcpu,
>>  
>>  bool vgic_get_phys_irq_active(struct irq_phys_map *map)
>>  {
>> -	BUG_ON(!map);
>> +	BUG_ON(!map || !map->shared);
>>  	return map->active;
>>  }
>>  
>>  void vgic_set_phys_irq_active(struct irq_phys_map *map, bool active)
>>  {
>> -	BUG_ON(!map);
>> +	BUG_ON(!map || !map->shared);
>>  	map->active = active;
>>  }
>>  
>>
> 


-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 10/10] KVM: arm/arm64: vgic: Allow non-shared device HW interrupts
@ 2015-06-17 15:37       ` Marc Zyngier
  0 siblings, 0 replies; 118+ messages in thread
From: Marc Zyngier @ 2015-06-17 15:37 UTC (permalink / raw)
  To: Eric Auger, kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org
  Cc: Andre Przywara

On 17/06/15 16:11, Eric Auger wrote:
> Hi Marc,
> On 06/08/2015 07:04 PM, Marc Zyngier wrote:
>> So far, the only use of the HW interrupt facility is the timer,
>> implying that the active state is context-switched for each vcpu,
>> as the device is is shared across all vcpus.
> s/is//
>>
>> This does not work for a device that has been assigned to a VM,
>> as the guest is entierely in control of that device (the HW is
> entirely?
>> not shared). In that case, it makes sense to bypass the whole
>> active state srtwitchint, and only track the deactivation of the
> switching

Congratulations, I think you're now ready to try deciphering my
handwriting... ;-)

>> interrupt.
>>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  include/kvm/arm_vgic.h    |  5 +++--
>>  virt/kvm/arm/arch_timer.c |  2 +-
>>  virt/kvm/arm/vgic.c       | 37 ++++++++++++++++++++++++-------------
>>  3 files changed, 28 insertions(+), 16 deletions(-)
>>
>> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
>> index 1c653c1..5d47d60 100644
>> --- a/include/kvm/arm_vgic.h
>> +++ b/include/kvm/arm_vgic.h
>> @@ -164,7 +164,8 @@ struct irq_phys_map {
>>  	u32			virt_irq;
>>  	u32			phys_irq;
>>  	u32			irq;
>> -	bool			active;
>> +	bool			shared;
>> +	bool			active; /* Only valid if shared */
>>  };
>>  
>>  struct vgic_dist {
>> @@ -347,7 +348,7 @@ void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg);
>>  int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
>>  int kvm_vgic_vcpu_active_irq(struct kvm_vcpu *vcpu);
>>  struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
>> -				       int virt_irq, int irq);
>> +				       int virt_irq, int irq, bool shared);
>>  int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map);
>>  bool vgic_get_phys_irq_active(struct irq_phys_map *map);
>>  void vgic_set_phys_irq_active(struct irq_phys_map *map, bool active);
>> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
>> index b9fff78..9544d79 100644
>> --- a/virt/kvm/arm/arch_timer.c
>> +++ b/virt/kvm/arm/arch_timer.c
>> @@ -202,7 +202,7 @@ void kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
>>  	 * Tell the VGIC that the virtual interrupt is tied to a
>>  	 * physical interrupt. We do that once per VCPU.
>>  	 */
>> -	timer->map = vgic_map_phys_irq(vcpu, irq->irq, host_vtimer_irq);
>> +	timer->map = vgic_map_phys_irq(vcpu, irq->irq, host_vtimer_irq, true);
>>  	WARN_ON(!timer->map);
>>  }
>>  
>> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
>> index f376b56..4223166 100644
>> --- a/virt/kvm/arm/vgic.c
>> +++ b/virt/kvm/arm/vgic.c
>> @@ -1125,18 +1125,21 @@ static void vgic_queue_irq_to_lr(struct kvm_vcpu *vcpu, int irq,
>>  		map = vgic_irq_map_search(vcpu, irq);
>>  
>>  		if (map) {
>> -			int ret;
>> -
>> -			BUG_ON(!map->active);
>>  			vlr.hwirq = map->phys_irq;
>>  			vlr.state |= LR_HW;
>>  			vlr.state &= ~LR_EOI_INT;
>>  
>> -			ret = irq_set_irqchip_state(map->irq,
>> -						    IRQCHIP_STATE_ACTIVE,
>> -						    true);
>>  			vgic_irq_set_queued(vcpu, irq);
>
> the queued state is set again in vgic_queue_hwirq for level_sensitive
> IRQs although not harmful.

Indeed. We still need it for edge interrupts though. I'll try to find a
nicer way...

>> -			WARN_ON(ret);
>> +
>> +			if (map->shared) {
>> +				int ret;
>> +
>> +				BUG_ON(!map->active);
>> +				ret = irq_set_irqchip_state(map->irq,
>> +							    IRQCHIP_STATE_ACTIVE,
>> +							    true);
>> +				WARN_ON(ret);
>> +			}
>>  		}
>>  	}
>>  
>> @@ -1368,21 +1371,28 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
>>  static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr)
>>  {
>>  	struct irq_phys_map *map;
>> +	bool active;
>>  	int ret;
>>  
>>  	if (!(vlr.state & LR_HW))
>>  		return 0;
>>  
>>  	map = vgic_irq_map_search(vcpu, vlr.irq);
>> -	BUG_ON(!map || !map->active);
>> +	BUG_ON(!map);
>> +	BUG_ON(map->shared && !map->active);
>>  
>>  	ret = irq_get_irqchip_state(map->irq,
>>  				    IRQCHIP_STATE_ACTIVE,
>> -				    &map->active);
>> +				    &active);
>>  
> In case of non shared and EOIMode = 1 - I know this is not your current
> interest here though ;-) - , once the guest EOIs its virtual IRQ and GIC
> deactivates the physical one, a new phys IRQ can hit immediatly, the
> physical handler can be entered and the state is seen as active here.
> The queued state is never reset in such a case and the system gets stuck
> since the can_sample fails I think. What I mean here is sounds the state
> machine as is does not work for my VFIO case. So some adaptations still
> are needed I think. Do you share my diagnosis?

Yup, there is something that doesn't quite work here.

I think the mistake is to sample the distributor active state. I wonder
if I can simply rely on the LR state. If it is neither pending nor
active, it means that we have done the deactivation, and we can then
reset the queued state.

As a bonus, it would save a read from MMIO, which is often dog slow.

Thoughts?

	M.
> 
> Eric
>>  
>> -	if (map->active) {
>> +	if (!map->shared)
>> +		return !active;
>> +
>> +	map->active = active;
>> +
>> +	if (active) {
>>  		ret = irq_set_irqchip_state(map->irq,
>>  					    IRQCHIP_STATE_ACTIVE,
>>  					    false);
>> @@ -1663,7 +1673,7 @@ static struct rb_root *vgic_get_irq_phys_map(struct kvm_vcpu *vcpu,
>>  }
>>  
>>  struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
>> -				       int virt_irq, int irq)
>> +				       int virt_irq, int irq, bool shared)
>>  {
>>  	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
>>  	struct rb_root *root = vgic_get_irq_phys_map(vcpu, virt_irq);
>> @@ -1710,6 +1720,7 @@ struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
>>  	new_map->virt_irq = virt_irq;
>>  	new_map->phys_irq = phys_irq;
>>  	new_map->irq = irq;
>> +	new_map->shared = shared;
>>  
>>  	rb_link_node(&new_map->node, parent, new);
>>  	rb_insert_color(&new_map->node, root);
>> @@ -1746,13 +1757,13 @@ static struct irq_phys_map *vgic_irq_map_search(struct kvm_vcpu *vcpu,
>>  
>>  bool vgic_get_phys_irq_active(struct irq_phys_map *map)
>>  {
>> -	BUG_ON(!map);
>> +	BUG_ON(!map || !map->shared);
>>  	return map->active;
>>  }
>>  
>>  void vgic_set_phys_irq_active(struct irq_phys_map *map, bool active)
>>  {
>> -	BUG_ON(!map);
>> +	BUG_ON(!map || !map->shared);
>>  	map->active = active;
>>  }
>>  
>>
> 


-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [PATCH 10/10] KVM: arm/arm64: vgic: Allow non-shared device HW interrupts
  2015-06-17 15:37       ` Marc Zyngier
@ 2015-06-17 15:50         ` Eric Auger
  -1 siblings, 0 replies; 118+ messages in thread
From: Eric Auger @ 2015-06-17 15:50 UTC (permalink / raw)
  To: linux-arm-kernel

On 06/17/2015 05:37 PM, Marc Zyngier wrote:
> On 17/06/15 16:11, Eric Auger wrote:
>> Hi Marc,
>> On 06/08/2015 07:04 PM, Marc Zyngier wrote:
>>> So far, the only use of the HW interrupt facility is the timer,
>>> implying that the active state is context-switched for each vcpu,
>>> as the device is is shared across all vcpus.
>> s/is//
>>>
>>> This does not work for a device that has been assigned to a VM,
>>> as the guest is entierely in control of that device (the HW is
>> entirely?
>>> not shared). In that case, it makes sense to bypass the whole
>>> active state srtwitchint, and only track the deactivation of the
>> switching
> 
> Congratulations, I think you're now ready to try deciphering my
> handwriting... ;-)
good to see you're not a machine or maybe you do it on purpose some
times ;-)
> 
>>> interrupt.
>>>
>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>> ---
>>>  include/kvm/arm_vgic.h    |  5 +++--
>>>  virt/kvm/arm/arch_timer.c |  2 +-
>>>  virt/kvm/arm/vgic.c       | 37 ++++++++++++++++++++++++-------------
>>>  3 files changed, 28 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
>>> index 1c653c1..5d47d60 100644
>>> --- a/include/kvm/arm_vgic.h
>>> +++ b/include/kvm/arm_vgic.h
>>> @@ -164,7 +164,8 @@ struct irq_phys_map {
>>>  	u32			virt_irq;
>>>  	u32			phys_irq;
>>>  	u32			irq;
>>> -	bool			active;
>>> +	bool			shared;
>>> +	bool			active; /* Only valid if shared */
>>>  };
>>>  
>>>  struct vgic_dist {
>>> @@ -347,7 +348,7 @@ void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg);
>>>  int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
>>>  int kvm_vgic_vcpu_active_irq(struct kvm_vcpu *vcpu);
>>>  struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
>>> -				       int virt_irq, int irq);
>>> +				       int virt_irq, int irq, bool shared);
>>>  int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map);
>>>  bool vgic_get_phys_irq_active(struct irq_phys_map *map);
>>>  void vgic_set_phys_irq_active(struct irq_phys_map *map, bool active);
>>> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
>>> index b9fff78..9544d79 100644
>>> --- a/virt/kvm/arm/arch_timer.c
>>> +++ b/virt/kvm/arm/arch_timer.c
>>> @@ -202,7 +202,7 @@ void kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
>>>  	 * Tell the VGIC that the virtual interrupt is tied to a
>>>  	 * physical interrupt. We do that once per VCPU.
>>>  	 */
>>> -	timer->map = vgic_map_phys_irq(vcpu, irq->irq, host_vtimer_irq);
>>> +	timer->map = vgic_map_phys_irq(vcpu, irq->irq, host_vtimer_irq, true);
>>>  	WARN_ON(!timer->map);
>>>  }
>>>  
>>> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
>>> index f376b56..4223166 100644
>>> --- a/virt/kvm/arm/vgic.c
>>> +++ b/virt/kvm/arm/vgic.c
>>> @@ -1125,18 +1125,21 @@ static void vgic_queue_irq_to_lr(struct kvm_vcpu *vcpu, int irq,
>>>  		map = vgic_irq_map_search(vcpu, irq);
>>>  
>>>  		if (map) {
>>> -			int ret;
>>> -
>>> -			BUG_ON(!map->active);
>>>  			vlr.hwirq = map->phys_irq;
>>>  			vlr.state |= LR_HW;
>>>  			vlr.state &= ~LR_EOI_INT;
>>>  
>>> -			ret = irq_set_irqchip_state(map->irq,
>>> -						    IRQCHIP_STATE_ACTIVE,
>>> -						    true);
>>>  			vgic_irq_set_queued(vcpu, irq);
>>
>> the queued state is set again in vgic_queue_hwirq for level_sensitive
>> IRQs although not harmful.
> 
> Indeed. We still need it for edge interrupts though. I'll try to find a
> nicer way...
> 
>>> -			WARN_ON(ret);
>>> +
>>> +			if (map->shared) {
>>> +				int ret;
>>> +
>>> +				BUG_ON(!map->active);
>>> +				ret = irq_set_irqchip_state(map->irq,
>>> +							    IRQCHIP_STATE_ACTIVE,
>>> +							    true);
>>> +				WARN_ON(ret);
>>> +			}
>>>  		}
>>>  	}
>>>  
>>> @@ -1368,21 +1371,28 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
>>>  static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr)
>>>  {
>>>  	struct irq_phys_map *map;
>>> +	bool active;
>>>  	int ret;
>>>  
>>>  	if (!(vlr.state & LR_HW))
>>>  		return 0;
>>>  
>>>  	map = vgic_irq_map_search(vcpu, vlr.irq);
>>> -	BUG_ON(!map || !map->active);
>>> +	BUG_ON(!map);
>>> +	BUG_ON(map->shared && !map->active);
>>>  
>>>  	ret = irq_get_irqchip_state(map->irq,
>>>  				    IRQCHIP_STATE_ACTIVE,
>>> -				    &map->active);
>>> +				    &active);
>>>  
>> In case of non shared and EOIMode = 1 - I know this is not your current
>> interest here though ;-) - , once the guest EOIs its virtual IRQ and GIC
>> deactivates the physical one, a new phys IRQ can hit immediatly, the
>> physical handler can be entered and the state is seen as active here.
>> The queued state is never reset in such a case and the system gets stuck
>> since the can_sample fails I think. What I mean here is sounds the state
>> machine as is does not work for my VFIO case. So some adaptations still
>> are needed I think. Do you share my diagnosis?
> 
> Yup, there is something that doesn't quite work here.
> 
> I think the mistake is to sample the distributor active state. I wonder
> if I can simply rely on the LR state. If it is neither pending nor
> active, it means that we have done the deactivation, and we can then
> reset the queued state.

I tried to use the LR in the past - it was also Christoffer's will - but
it was not working. I observed injection before seeing the LR voided.
This is why I resorted to using the pending state instead and treated
forwarded IRQ as edge in vgic_queue_hwirq.  sampling could be done only
if the IRQ was pending.

Eric
> 
> As a bonus, it would save a read from MMIO, which is often dog slow.
> 
> Thoughts?
> 
> 	M.
>>
>> Eric
>>>  
>>> -	if (map->active) {
>>> +	if (!map->shared)
>>> +		return !active;
>>> +
>>> +	map->active = active;
>>> +
>>> +	if (active) {
>>>  		ret = irq_set_irqchip_state(map->irq,
>>>  					    IRQCHIP_STATE_ACTIVE,
>>>  					    false);
>>> @@ -1663,7 +1673,7 @@ static struct rb_root *vgic_get_irq_phys_map(struct kvm_vcpu *vcpu,
>>>  }
>>>  
>>>  struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
>>> -				       int virt_irq, int irq)
>>> +				       int virt_irq, int irq, bool shared)
>>>  {
>>>  	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
>>>  	struct rb_root *root = vgic_get_irq_phys_map(vcpu, virt_irq);
>>> @@ -1710,6 +1720,7 @@ struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
>>>  	new_map->virt_irq = virt_irq;
>>>  	new_map->phys_irq = phys_irq;
>>>  	new_map->irq = irq;
>>> +	new_map->shared = shared;
>>>  
>>>  	rb_link_node(&new_map->node, parent, new);
>>>  	rb_insert_color(&new_map->node, root);
>>> @@ -1746,13 +1757,13 @@ static struct irq_phys_map *vgic_irq_map_search(struct kvm_vcpu *vcpu,
>>>  
>>>  bool vgic_get_phys_irq_active(struct irq_phys_map *map)
>>>  {
>>> -	BUG_ON(!map);
>>> +	BUG_ON(!map || !map->shared);
>>>  	return map->active;
>>>  }
>>>  
>>>  void vgic_set_phys_irq_active(struct irq_phys_map *map, bool active)
>>>  {
>>> -	BUG_ON(!map);
>>> +	BUG_ON(!map || !map->shared);
>>>  	map->active = active;
>>>  }
>>>  
>>>
>>
> 
> 

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 10/10] KVM: arm/arm64: vgic: Allow non-shared device HW interrupts
@ 2015-06-17 15:50         ` Eric Auger
  0 siblings, 0 replies; 118+ messages in thread
From: Eric Auger @ 2015-06-17 15:50 UTC (permalink / raw)
  To: Marc Zyngier, kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org
  Cc: Andre Przywara

On 06/17/2015 05:37 PM, Marc Zyngier wrote:
> On 17/06/15 16:11, Eric Auger wrote:
>> Hi Marc,
>> On 06/08/2015 07:04 PM, Marc Zyngier wrote:
>>> So far, the only use of the HW interrupt facility is the timer,
>>> implying that the active state is context-switched for each vcpu,
>>> as the device is is shared across all vcpus.
>> s/is//
>>>
>>> This does not work for a device that has been assigned to a VM,
>>> as the guest is entierely in control of that device (the HW is
>> entirely?
>>> not shared). In that case, it makes sense to bypass the whole
>>> active state srtwitchint, and only track the deactivation of the
>> switching
> 
> Congratulations, I think you're now ready to try deciphering my
> handwriting... ;-)
good to see you're not a machine or maybe you do it on purpose some
times ;-)
> 
>>> interrupt.
>>>
>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>> ---
>>>  include/kvm/arm_vgic.h    |  5 +++--
>>>  virt/kvm/arm/arch_timer.c |  2 +-
>>>  virt/kvm/arm/vgic.c       | 37 ++++++++++++++++++++++++-------------
>>>  3 files changed, 28 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
>>> index 1c653c1..5d47d60 100644
>>> --- a/include/kvm/arm_vgic.h
>>> +++ b/include/kvm/arm_vgic.h
>>> @@ -164,7 +164,8 @@ struct irq_phys_map {
>>>  	u32			virt_irq;
>>>  	u32			phys_irq;
>>>  	u32			irq;
>>> -	bool			active;
>>> +	bool			shared;
>>> +	bool			active; /* Only valid if shared */
>>>  };
>>>  
>>>  struct vgic_dist {
>>> @@ -347,7 +348,7 @@ void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg);
>>>  int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
>>>  int kvm_vgic_vcpu_active_irq(struct kvm_vcpu *vcpu);
>>>  struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
>>> -				       int virt_irq, int irq);
>>> +				       int virt_irq, int irq, bool shared);
>>>  int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map);
>>>  bool vgic_get_phys_irq_active(struct irq_phys_map *map);
>>>  void vgic_set_phys_irq_active(struct irq_phys_map *map, bool active);
>>> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
>>> index b9fff78..9544d79 100644
>>> --- a/virt/kvm/arm/arch_timer.c
>>> +++ b/virt/kvm/arm/arch_timer.c
>>> @@ -202,7 +202,7 @@ void kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
>>>  	 * Tell the VGIC that the virtual interrupt is tied to a
>>>  	 * physical interrupt. We do that once per VCPU.
>>>  	 */
>>> -	timer->map = vgic_map_phys_irq(vcpu, irq->irq, host_vtimer_irq);
>>> +	timer->map = vgic_map_phys_irq(vcpu, irq->irq, host_vtimer_irq, true);
>>>  	WARN_ON(!timer->map);
>>>  }
>>>  
>>> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
>>> index f376b56..4223166 100644
>>> --- a/virt/kvm/arm/vgic.c
>>> +++ b/virt/kvm/arm/vgic.c
>>> @@ -1125,18 +1125,21 @@ static void vgic_queue_irq_to_lr(struct kvm_vcpu *vcpu, int irq,
>>>  		map = vgic_irq_map_search(vcpu, irq);
>>>  
>>>  		if (map) {
>>> -			int ret;
>>> -
>>> -			BUG_ON(!map->active);
>>>  			vlr.hwirq = map->phys_irq;
>>>  			vlr.state |= LR_HW;
>>>  			vlr.state &= ~LR_EOI_INT;
>>>  
>>> -			ret = irq_set_irqchip_state(map->irq,
>>> -						    IRQCHIP_STATE_ACTIVE,
>>> -						    true);
>>>  			vgic_irq_set_queued(vcpu, irq);
>>
>> the queued state is set again in vgic_queue_hwirq for level_sensitive
>> IRQs although not harmful.
> 
> Indeed. We still need it for edge interrupts though. I'll try to find a
> nicer way...
> 
>>> -			WARN_ON(ret);
>>> +
>>> +			if (map->shared) {
>>> +				int ret;
>>> +
>>> +				BUG_ON(!map->active);
>>> +				ret = irq_set_irqchip_state(map->irq,
>>> +							    IRQCHIP_STATE_ACTIVE,
>>> +							    true);
>>> +				WARN_ON(ret);
>>> +			}
>>>  		}
>>>  	}
>>>  
>>> @@ -1368,21 +1371,28 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
>>>  static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr)
>>>  {
>>>  	struct irq_phys_map *map;
>>> +	bool active;
>>>  	int ret;
>>>  
>>>  	if (!(vlr.state & LR_HW))
>>>  		return 0;
>>>  
>>>  	map = vgic_irq_map_search(vcpu, vlr.irq);
>>> -	BUG_ON(!map || !map->active);
>>> +	BUG_ON(!map);
>>> +	BUG_ON(map->shared && !map->active);
>>>  
>>>  	ret = irq_get_irqchip_state(map->irq,
>>>  				    IRQCHIP_STATE_ACTIVE,
>>> -				    &map->active);
>>> +				    &active);
>>>  
>> In case of non shared and EOIMode = 1 - I know this is not your current
>> interest here though ;-) - , once the guest EOIs its virtual IRQ and GIC
>> deactivates the physical one, a new phys IRQ can hit immediatly, the
>> physical handler can be entered and the state is seen as active here.
>> The queued state is never reset in such a case and the system gets stuck
>> since the can_sample fails I think. What I mean here is sounds the state
>> machine as is does not work for my VFIO case. So some adaptations still
>> are needed I think. Do you share my diagnosis?
> 
> Yup, there is something that doesn't quite work here.
> 
> I think the mistake is to sample the distributor active state. I wonder
> if I can simply rely on the LR state. If it is neither pending nor
> active, it means that we have done the deactivation, and we can then
> reset the queued state.

I tried to use the LR in the past - it was also Christoffer's will - but
it was not working. I observed injection before seeing the LR voided.
This is why I resorted to using the pending state instead and treated
forwarded IRQ as edge in vgic_queue_hwirq.  sampling could be done only
if the IRQ was pending.

Eric
> 
> As a bonus, it would save a read from MMIO, which is often dog slow.
> 
> Thoughts?
> 
> 	M.
>>
>> Eric
>>>  
>>> -	if (map->active) {
>>> +	if (!map->shared)
>>> +		return !active;
>>> +
>>> +	map->active = active;
>>> +
>>> +	if (active) {
>>>  		ret = irq_set_irqchip_state(map->irq,
>>>  					    IRQCHIP_STATE_ACTIVE,
>>>  					    false);
>>> @@ -1663,7 +1673,7 @@ static struct rb_root *vgic_get_irq_phys_map(struct kvm_vcpu *vcpu,
>>>  }
>>>  
>>>  struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
>>> -				       int virt_irq, int irq)
>>> +				       int virt_irq, int irq, bool shared)
>>>  {
>>>  	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
>>>  	struct rb_root *root = vgic_get_irq_phys_map(vcpu, virt_irq);
>>> @@ -1710,6 +1720,7 @@ struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
>>>  	new_map->virt_irq = virt_irq;
>>>  	new_map->phys_irq = phys_irq;
>>>  	new_map->irq = irq;
>>> +	new_map->shared = shared;
>>>  
>>>  	rb_link_node(&new_map->node, parent, new);
>>>  	rb_insert_color(&new_map->node, root);
>>> @@ -1746,13 +1757,13 @@ static struct irq_phys_map *vgic_irq_map_search(struct kvm_vcpu *vcpu,
>>>  
>>>  bool vgic_get_phys_irq_active(struct irq_phys_map *map)
>>>  {
>>> -	BUG_ON(!map);
>>> +	BUG_ON(!map || !map->shared);
>>>  	return map->active;
>>>  }
>>>  
>>>  void vgic_set_phys_irq_active(struct irq_phys_map *map, bool active)
>>>  {
>>> -	BUG_ON(!map);
>>> +	BUG_ON(!map || !map->shared);
>>>  	map->active = active;
>>>  }
>>>  
>>>
>>
> 
> 

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [PATCH 00/10] arm/arm64: KVM: Active interrupt state switching for shared devices
  2015-06-08 17:03 ` Marc Zyngier
@ 2015-06-18  6:51   ` Eric Auger
  -1 siblings, 0 replies; 118+ messages in thread
From: Eric Auger @ 2015-06-18  6:51 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Marc,

I tested your series on Calxeda Midway (shared device path)
Tested-by: Eric Auger <eric.auger@linaro.org>. No exotic migration use
case though.

I also exercised the non-shared device path. As already discussed some
adaptations where needed in vgic state machine.

If anyone is interested, the rebase with EOIMode == 1 + vgic/irqchip
adaptations + kvm-vfio series can be found at
https://git.linaro.org/people/eric.auger/linux.git/shortlog/refs/heads/v4.1-rc6-state-switch-forward.
Tested with Calxeda midway xgmac passthrough.

Eric

On 06/08/2015 07:03 PM, Marc Zyngier wrote:
> From day 1, our timer code has been using a terrible hack: whenever
> the guest is scheduled with a timer interrupt pending (i.e. the HW
> timer has expired), we restore the timer state with the MASK bit set,
> in order to avoid the physical interrupt to fire again. And again. And
> again...
> 
> This is absolutely silly, for at least two reasons:
> 
> - This relies on the device (the timer) having a mask bit that we can
>   play with. Not all devices are built like this.
> 
> - This expects some behaviour of the guest that only works because the
>   both the kernel timer code and the KVM counterpart have been written
>   by the same idiot (the idiot being me).
> 
> The One True Way is to set the GIC active bit when injecting the
> interrupt, and to context-switch across the world switch. This is what
> this series implements.
> 
> We introduce a relatively simple infrastructure enabling the mapping
> of a virtual interrupt with its physical counterpart:
> 
> - Whenever an virtual interrupt is injected, we look it up in an
>   rbtree. If we have a match, the interrupt is injected with the HW
>   bit set in the LR, together with the physical interrupt.
> 
> - Across the world switch, we save/restore the active state for these
>   interrupts using the irqchip_state API.
> 
> - On guest EOI, the HW interrupt is automagically deactivated by the
>   GIC, allowing the interrupt to be resampled.
> 
> The timer code is slightly modified to set the active state at the
> same time as the injection.
> 
> The last patch also allows non-shared devices to have their interrupt
> deactivated the same way (in this case we do not context-switch the
> active state). This is the first step in the long overdue direction of
> the mythical IRQ forwarding thing...
> 
> This series is based on v4.1-rc7, and has been tested on Juno (GICv2)
> and the FVP Base model (GICv3 host, both GICv2 and GICv3 guests). I'd
> appreciate any form of testing, specially in the context of guest
> migration (there is obviously some interesting stuff there...).
> 
> The code is otherwise available at
> git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git kvm-arm64/active-timer
> 
> Marc Zyngier (10):
>   arm/arm64: KVM: Fix ordering of timer/GIC on guest entry
>   arm/arm64: KVM: Move vgic handling to a non-preemptible section
>   KVM: arm/arm64: vgic: Convert struct vgic_lr to use bitfields
>   KVM: arm/arm64: vgic: Allow HW irq to be encoded in LR
>   KVM: arm/arm64: vgic: Relax vgic_can_sample_irq for edge IRQs
>   KVM: arm/arm64: vgic: Allow dynamic mapping of physical/virtual
>     interrupts
>   KVM: arm/arm64: vgic: Allow HW interrupts to be queued to a guest
>   KVM: arm/arm64: vgic: Add vgic_{get,set}_phys_irq_active
>   KVM: arm/arm64: timer: Allow the timer to control the active state
>   KVM: arm/arm64: vgic: Allow non-shared device HW interrupts
> 
>  arch/arm/kvm/arm.c                 |  21 +++-
>  include/kvm/arm_arch_timer.h       |   3 +
>  include/kvm/arm_vgic.h             |  31 +++++-
>  include/linux/irqchip/arm-gic-v3.h |   3 +
>  include/linux/irqchip/arm-gic.h    |   3 +-
>  virt/kvm/arm/arch_timer.c          |  13 ++-
>  virt/kvm/arm/vgic-v2.c             |  16 ++-
>  virt/kvm/arm/vgic-v3.c             |  21 +++-
>  virt/kvm/arm/vgic.c                | 206 ++++++++++++++++++++++++++++++++++++-
>  9 files changed, 300 insertions(+), 17 deletions(-)
> 

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 00/10] arm/arm64: KVM: Active interrupt state switching for shared devices
@ 2015-06-18  6:51   ` Eric Auger
  0 siblings, 0 replies; 118+ messages in thread
From: Eric Auger @ 2015-06-18  6:51 UTC (permalink / raw)
  To: Marc Zyngier, kvm, kvmarm, linux-arm-kernel; +Cc: Andre Przywara

Hi Marc,

I tested your series on Calxeda Midway (shared device path)
Tested-by: Eric Auger <eric.auger@linaro.org>. No exotic migration use
case though.

I also exercised the non-shared device path. As already discussed some
adaptations where needed in vgic state machine.

If anyone is interested, the rebase with EOIMode == 1 + vgic/irqchip
adaptations + kvm-vfio series can be found at
https://git.linaro.org/people/eric.auger/linux.git/shortlog/refs/heads/v4.1-rc6-state-switch-forward.
Tested with Calxeda midway xgmac passthrough.

Eric

On 06/08/2015 07:03 PM, Marc Zyngier wrote:
> From day 1, our timer code has been using a terrible hack: whenever
> the guest is scheduled with a timer interrupt pending (i.e. the HW
> timer has expired), we restore the timer state with the MASK bit set,
> in order to avoid the physical interrupt to fire again. And again. And
> again...
> 
> This is absolutely silly, for at least two reasons:
> 
> - This relies on the device (the timer) having a mask bit that we can
>   play with. Not all devices are built like this.
> 
> - This expects some behaviour of the guest that only works because the
>   both the kernel timer code and the KVM counterpart have been written
>   by the same idiot (the idiot being me).
> 
> The One True Way is to set the GIC active bit when injecting the
> interrupt, and to context-switch across the world switch. This is what
> this series implements.
> 
> We introduce a relatively simple infrastructure enabling the mapping
> of a virtual interrupt with its physical counterpart:
> 
> - Whenever an virtual interrupt is injected, we look it up in an
>   rbtree. If we have a match, the interrupt is injected with the HW
>   bit set in the LR, together with the physical interrupt.
> 
> - Across the world switch, we save/restore the active state for these
>   interrupts using the irqchip_state API.
> 
> - On guest EOI, the HW interrupt is automagically deactivated by the
>   GIC, allowing the interrupt to be resampled.
> 
> The timer code is slightly modified to set the active state at the
> same time as the injection.
> 
> The last patch also allows non-shared devices to have their interrupt
> deactivated the same way (in this case we do not context-switch the
> active state). This is the first step in the long overdue direction of
> the mythical IRQ forwarding thing...
> 
> This series is based on v4.1-rc7, and has been tested on Juno (GICv2)
> and the FVP Base model (GICv3 host, both GICv2 and GICv3 guests). I'd
> appreciate any form of testing, specially in the context of guest
> migration (there is obviously some interesting stuff there...).
> 
> The code is otherwise available at
> git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git kvm-arm64/active-timer
> 
> Marc Zyngier (10):
>   arm/arm64: KVM: Fix ordering of timer/GIC on guest entry
>   arm/arm64: KVM: Move vgic handling to a non-preemptible section
>   KVM: arm/arm64: vgic: Convert struct vgic_lr to use bitfields
>   KVM: arm/arm64: vgic: Allow HW irq to be encoded in LR
>   KVM: arm/arm64: vgic: Relax vgic_can_sample_irq for edge IRQs
>   KVM: arm/arm64: vgic: Allow dynamic mapping of physical/virtual
>     interrupts
>   KVM: arm/arm64: vgic: Allow HW interrupts to be queued to a guest
>   KVM: arm/arm64: vgic: Add vgic_{get,set}_phys_irq_active
>   KVM: arm/arm64: timer: Allow the timer to control the active state
>   KVM: arm/arm64: vgic: Allow non-shared device HW interrupts
> 
>  arch/arm/kvm/arm.c                 |  21 +++-
>  include/kvm/arm_arch_timer.h       |   3 +
>  include/kvm/arm_vgic.h             |  31 +++++-
>  include/linux/irqchip/arm-gic-v3.h |   3 +
>  include/linux/irqchip/arm-gic.h    |   3 +-
>  virt/kvm/arm/arch_timer.c          |  13 ++-
>  virt/kvm/arm/vgic-v2.c             |  16 ++-
>  virt/kvm/arm/vgic-v3.c             |  21 +++-
>  virt/kvm/arm/vgic.c                | 206 ++++++++++++++++++++++++++++++++++++-
>  9 files changed, 300 insertions(+), 17 deletions(-)
> 

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [PATCH 10/10] KVM: arm/arm64: vgic: Allow non-shared device HW interrupts
  2015-06-17 15:50         ` Eric Auger
@ 2015-06-18  8:37           ` Marc Zyngier
  -1 siblings, 0 replies; 118+ messages in thread
From: Marc Zyngier @ 2015-06-18  8:37 UTC (permalink / raw)
  To: linux-arm-kernel

On 17/06/15 16:50, Eric Auger wrote:
> On 06/17/2015 05:37 PM, Marc Zyngier wrote:
>> On 17/06/15 16:11, Eric Auger wrote:
>>> Hi Marc,
>>> On 06/08/2015 07:04 PM, Marc Zyngier wrote:
>>>> So far, the only use of the HW interrupt facility is the timer,
>>>> implying that the active state is context-switched for each vcpu,
>>>> as the device is is shared across all vcpus.
>>> s/is//
>>>>
>>>> This does not work for a device that has been assigned to a VM,
>>>> as the guest is entierely in control of that device (the HW is
>>> entirely?
>>>> not shared). In that case, it makes sense to bypass the whole
>>>> active state srtwitchint, and only track the deactivation of the
>>> switching
>>
>> Congratulations, I think you're now ready to try deciphering my
>> handwriting... ;-)
> good to see you're not a machine or maybe you do it on purpose some
> times ;-)
>>
>>>> interrupt.
>>>>
>>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>>> ---
>>>>  include/kvm/arm_vgic.h    |  5 +++--
>>>>  virt/kvm/arm/arch_timer.c |  2 +-
>>>>  virt/kvm/arm/vgic.c       | 37 ++++++++++++++++++++++++-------------
>>>>  3 files changed, 28 insertions(+), 16 deletions(-)
>>>>
>>>> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
>>>> index 1c653c1..5d47d60 100644
>>>> --- a/include/kvm/arm_vgic.h
>>>> +++ b/include/kvm/arm_vgic.h
>>>> @@ -164,7 +164,8 @@ struct irq_phys_map {
>>>>  	u32			virt_irq;
>>>>  	u32			phys_irq;
>>>>  	u32			irq;
>>>> -	bool			active;
>>>> +	bool			shared;
>>>> +	bool			active; /* Only valid if shared */
>>>>  };
>>>>  
>>>>  struct vgic_dist {
>>>> @@ -347,7 +348,7 @@ void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg);
>>>>  int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
>>>>  int kvm_vgic_vcpu_active_irq(struct kvm_vcpu *vcpu);
>>>>  struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
>>>> -				       int virt_irq, int irq);
>>>> +				       int virt_irq, int irq, bool shared);
>>>>  int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map);
>>>>  bool vgic_get_phys_irq_active(struct irq_phys_map *map);
>>>>  void vgic_set_phys_irq_active(struct irq_phys_map *map, bool active);
>>>> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
>>>> index b9fff78..9544d79 100644
>>>> --- a/virt/kvm/arm/arch_timer.c
>>>> +++ b/virt/kvm/arm/arch_timer.c
>>>> @@ -202,7 +202,7 @@ void kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
>>>>  	 * Tell the VGIC that the virtual interrupt is tied to a
>>>>  	 * physical interrupt. We do that once per VCPU.
>>>>  	 */
>>>> -	timer->map = vgic_map_phys_irq(vcpu, irq->irq, host_vtimer_irq);
>>>> +	timer->map = vgic_map_phys_irq(vcpu, irq->irq, host_vtimer_irq, true);
>>>>  	WARN_ON(!timer->map);
>>>>  }
>>>>  
>>>> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
>>>> index f376b56..4223166 100644
>>>> --- a/virt/kvm/arm/vgic.c
>>>> +++ b/virt/kvm/arm/vgic.c
>>>> @@ -1125,18 +1125,21 @@ static void vgic_queue_irq_to_lr(struct kvm_vcpu *vcpu, int irq,
>>>>  		map = vgic_irq_map_search(vcpu, irq);
>>>>  
>>>>  		if (map) {
>>>> -			int ret;
>>>> -
>>>> -			BUG_ON(!map->active);
>>>>  			vlr.hwirq = map->phys_irq;
>>>>  			vlr.state |= LR_HW;
>>>>  			vlr.state &= ~LR_EOI_INT;
>>>>  
>>>> -			ret = irq_set_irqchip_state(map->irq,
>>>> -						    IRQCHIP_STATE_ACTIVE,
>>>> -						    true);
>>>>  			vgic_irq_set_queued(vcpu, irq);
>>>
>>> the queued state is set again in vgic_queue_hwirq for level_sensitive
>>> IRQs although not harmful.
>>
>> Indeed. We still need it for edge interrupts though. I'll try to find a
>> nicer way...
>>
>>>> -			WARN_ON(ret);
>>>> +
>>>> +			if (map->shared) {
>>>> +				int ret;
>>>> +
>>>> +				BUG_ON(!map->active);
>>>> +				ret = irq_set_irqchip_state(map->irq,
>>>> +							    IRQCHIP_STATE_ACTIVE,
>>>> +							    true);
>>>> +				WARN_ON(ret);
>>>> +			}
>>>>  		}
>>>>  	}
>>>>  
>>>> @@ -1368,21 +1371,28 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
>>>>  static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr)
>>>>  {
>>>>  	struct irq_phys_map *map;
>>>> +	bool active;
>>>>  	int ret;
>>>>  
>>>>  	if (!(vlr.state & LR_HW))
>>>>  		return 0;
>>>>  
>>>>  	map = vgic_irq_map_search(vcpu, vlr.irq);
>>>> -	BUG_ON(!map || !map->active);
>>>> +	BUG_ON(!map);
>>>> +	BUG_ON(map->shared && !map->active);
>>>>  
>>>>  	ret = irq_get_irqchip_state(map->irq,
>>>>  				    IRQCHIP_STATE_ACTIVE,
>>>> -				    &map->active);
>>>> +				    &active);
>>>>  
>>> In case of non shared and EOIMode = 1 - I know this is not your current
>>> interest here though ;-) - , once the guest EOIs its virtual IRQ and GIC
>>> deactivates the physical one, a new phys IRQ can hit immediatly, the
>>> physical handler can be entered and the state is seen as active here.
>>> The queued state is never reset in such a case and the system gets stuck
>>> since the can_sample fails I think. What I mean here is sounds the state
>>> machine as is does not work for my VFIO case. So some adaptations still
>>> are needed I think. Do you share my diagnosis?
>>
>> Yup, there is something that doesn't quite work here.
>>
>> I think the mistake is to sample the distributor active state. I wonder
>> if I can simply rely on the LR state. If it is neither pending nor
>> active, it means that we have done the deactivation, and we can then
>> reset the queued state.
> 
> I tried to use the LR in the past - it was also Christoffer's will - but
> it was not working. I observed injection before seeing the LR voided.
> This is why I resorted to using the pending state instead and treated
> forwarded IRQ as edge in vgic_queue_hwirq.  sampling could be done only
> if the IRQ was pending.

Of course, you're right. The LR state is not used at all for a physical
interrupt (the HW bit really says "use the distributor").

I've given it more thoughts last night, and I think we can solve this is
a fairly simple way. In the scenario you outline, we do not observe the
ACTIVE to INACTIVE transition because the interrupt has fired again,
leaving the interrupt flagged as queued.

I think we can clear the "queued" bit on injection, as we're guaranteed
that seeing a new interrupt is the proof that the previous one has been
deactivated (how could we see it otherwise?).

How about the following (untested) patch:

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 6687ac4..a01f821 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -1387,8 +1387,17 @@ static int vgic_sync_hwirq(struct kvm_vcpu *vcpu,
struct vgic_lr vlr)

 	WARN_ON(ret);

+	/*
+	 * For a non-shared interrupt, we have to cater for two
+	 * possible deactivation conditions
+	 *
+	 * - the interrupt is now inactive
+	 * - the interrupt is still active, but is flagged as not
+	 *   queued, indicating another interrupt has fired before we
+	 *   could observe the deactivate.
+	 */
 	if (!map->shared)
-		return !active;
+		return !active || !vgic_irq_is_queued(vcpu, vlr.irq);

 	map->active = active;

@@ -1534,6 +1543,7 @@ static int vgic_update_irq_pending(struct kvm
*kvm, int cpuid,
 	int edge_triggered, level_triggered;
 	int enabled;
 	bool ret = true, can_inject = true;
+	struct irq_phys_map *map;

 	spin_lock(&dist->lock);

@@ -1580,6 +1590,18 @@ static int vgic_update_irq_pending(struct kvm
*kvm, int cpuid,
 		goto out;
 	}

+	map = vgic_irq_map_search(vcpu, irq_num);
+	if (map && !map->shared) {
+		/*
+		 * We are told to inject a HW irq, so we have to trust
+		 * the caller that the previous one has been EOIed,
+		 * and that a new one is now active. Clearing the
+		 * queued state will have the effect of making it
+		 * sample-able again.
+		 */
+		vgic_irq_clear_queued(vcpu, irq_num);
+	}
+
 	if (!vgic_can_sample_irq(vcpu, irq_num)) {
 		/*
 		 * Level interrupt in progress, will be picked up

Thoughts?

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* Re: [PATCH 10/10] KVM: arm/arm64: vgic: Allow non-shared device HW interrupts
@ 2015-06-18  8:37           ` Marc Zyngier
  0 siblings, 0 replies; 118+ messages in thread
From: Marc Zyngier @ 2015-06-18  8:37 UTC (permalink / raw)
  To: Eric Auger, kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org
  Cc: Christoffer Dall, Alex Bennée, Andre Przywara

On 17/06/15 16:50, Eric Auger wrote:
> On 06/17/2015 05:37 PM, Marc Zyngier wrote:
>> On 17/06/15 16:11, Eric Auger wrote:
>>> Hi Marc,
>>> On 06/08/2015 07:04 PM, Marc Zyngier wrote:
>>>> So far, the only use of the HW interrupt facility is the timer,
>>>> implying that the active state is context-switched for each vcpu,
>>>> as the device is is shared across all vcpus.
>>> s/is//
>>>>
>>>> This does not work for a device that has been assigned to a VM,
>>>> as the guest is entierely in control of that device (the HW is
>>> entirely?
>>>> not shared). In that case, it makes sense to bypass the whole
>>>> active state srtwitchint, and only track the deactivation of the
>>> switching
>>
>> Congratulations, I think you're now ready to try deciphering my
>> handwriting... ;-)
> good to see you're not a machine or maybe you do it on purpose some
> times ;-)
>>
>>>> interrupt.
>>>>
>>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>>> ---
>>>>  include/kvm/arm_vgic.h    |  5 +++--
>>>>  virt/kvm/arm/arch_timer.c |  2 +-
>>>>  virt/kvm/arm/vgic.c       | 37 ++++++++++++++++++++++++-------------
>>>>  3 files changed, 28 insertions(+), 16 deletions(-)
>>>>
>>>> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
>>>> index 1c653c1..5d47d60 100644
>>>> --- a/include/kvm/arm_vgic.h
>>>> +++ b/include/kvm/arm_vgic.h
>>>> @@ -164,7 +164,8 @@ struct irq_phys_map {
>>>>  	u32			virt_irq;
>>>>  	u32			phys_irq;
>>>>  	u32			irq;
>>>> -	bool			active;
>>>> +	bool			shared;
>>>> +	bool			active; /* Only valid if shared */
>>>>  };
>>>>  
>>>>  struct vgic_dist {
>>>> @@ -347,7 +348,7 @@ void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg);
>>>>  int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
>>>>  int kvm_vgic_vcpu_active_irq(struct kvm_vcpu *vcpu);
>>>>  struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
>>>> -				       int virt_irq, int irq);
>>>> +				       int virt_irq, int irq, bool shared);
>>>>  int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map);
>>>>  bool vgic_get_phys_irq_active(struct irq_phys_map *map);
>>>>  void vgic_set_phys_irq_active(struct irq_phys_map *map, bool active);
>>>> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
>>>> index b9fff78..9544d79 100644
>>>> --- a/virt/kvm/arm/arch_timer.c
>>>> +++ b/virt/kvm/arm/arch_timer.c
>>>> @@ -202,7 +202,7 @@ void kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
>>>>  	 * Tell the VGIC that the virtual interrupt is tied to a
>>>>  	 * physical interrupt. We do that once per VCPU.
>>>>  	 */
>>>> -	timer->map = vgic_map_phys_irq(vcpu, irq->irq, host_vtimer_irq);
>>>> +	timer->map = vgic_map_phys_irq(vcpu, irq->irq, host_vtimer_irq, true);
>>>>  	WARN_ON(!timer->map);
>>>>  }
>>>>  
>>>> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
>>>> index f376b56..4223166 100644
>>>> --- a/virt/kvm/arm/vgic.c
>>>> +++ b/virt/kvm/arm/vgic.c
>>>> @@ -1125,18 +1125,21 @@ static void vgic_queue_irq_to_lr(struct kvm_vcpu *vcpu, int irq,
>>>>  		map = vgic_irq_map_search(vcpu, irq);
>>>>  
>>>>  		if (map) {
>>>> -			int ret;
>>>> -
>>>> -			BUG_ON(!map->active);
>>>>  			vlr.hwirq = map->phys_irq;
>>>>  			vlr.state |= LR_HW;
>>>>  			vlr.state &= ~LR_EOI_INT;
>>>>  
>>>> -			ret = irq_set_irqchip_state(map->irq,
>>>> -						    IRQCHIP_STATE_ACTIVE,
>>>> -						    true);
>>>>  			vgic_irq_set_queued(vcpu, irq);
>>>
>>> the queued state is set again in vgic_queue_hwirq for level_sensitive
>>> IRQs although not harmful.
>>
>> Indeed. We still need it for edge interrupts though. I'll try to find a
>> nicer way...
>>
>>>> -			WARN_ON(ret);
>>>> +
>>>> +			if (map->shared) {
>>>> +				int ret;
>>>> +
>>>> +				BUG_ON(!map->active);
>>>> +				ret = irq_set_irqchip_state(map->irq,
>>>> +							    IRQCHIP_STATE_ACTIVE,
>>>> +							    true);
>>>> +				WARN_ON(ret);
>>>> +			}
>>>>  		}
>>>>  	}
>>>>  
>>>> @@ -1368,21 +1371,28 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
>>>>  static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr)
>>>>  {
>>>>  	struct irq_phys_map *map;
>>>> +	bool active;
>>>>  	int ret;
>>>>  
>>>>  	if (!(vlr.state & LR_HW))
>>>>  		return 0;
>>>>  
>>>>  	map = vgic_irq_map_search(vcpu, vlr.irq);
>>>> -	BUG_ON(!map || !map->active);
>>>> +	BUG_ON(!map);
>>>> +	BUG_ON(map->shared && !map->active);
>>>>  
>>>>  	ret = irq_get_irqchip_state(map->irq,
>>>>  				    IRQCHIP_STATE_ACTIVE,
>>>> -				    &map->active);
>>>> +				    &active);
>>>>  
>>> In case of non shared and EOIMode = 1 - I know this is not your current
>>> interest here though ;-) - , once the guest EOIs its virtual IRQ and GIC
>>> deactivates the physical one, a new phys IRQ can hit immediatly, the
>>> physical handler can be entered and the state is seen as active here.
>>> The queued state is never reset in such a case and the system gets stuck
>>> since the can_sample fails I think. What I mean here is sounds the state
>>> machine as is does not work for my VFIO case. So some adaptations still
>>> are needed I think. Do you share my diagnosis?
>>
>> Yup, there is something that doesn't quite work here.
>>
>> I think the mistake is to sample the distributor active state. I wonder
>> if I can simply rely on the LR state. If it is neither pending nor
>> active, it means that we have done the deactivation, and we can then
>> reset the queued state.
> 
> I tried to use the LR in the past - it was also Christoffer's will - but
> it was not working. I observed injection before seeing the LR voided.
> This is why I resorted to using the pending state instead and treated
> forwarded IRQ as edge in vgic_queue_hwirq.  sampling could be done only
> if the IRQ was pending.

Of course, you're right. The LR state is not used at all for a physical
interrupt (the HW bit really says "use the distributor").

I've given it more thoughts last night, and I think we can solve this is
a fairly simple way. In the scenario you outline, we do not observe the
ACTIVE to INACTIVE transition because the interrupt has fired again,
leaving the interrupt flagged as queued.

I think we can clear the "queued" bit on injection, as we're guaranteed
that seeing a new interrupt is the proof that the previous one has been
deactivated (how could we see it otherwise?).

How about the following (untested) patch:

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 6687ac4..a01f821 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -1387,8 +1387,17 @@ static int vgic_sync_hwirq(struct kvm_vcpu *vcpu,
struct vgic_lr vlr)

 	WARN_ON(ret);

+	/*
+	 * For a non-shared interrupt, we have to cater for two
+	 * possible deactivation conditions
+	 *
+	 * - the interrupt is now inactive
+	 * - the interrupt is still active, but is flagged as not
+	 *   queued, indicating another interrupt has fired before we
+	 *   could observe the deactivate.
+	 */
 	if (!map->shared)
-		return !active;
+		return !active || !vgic_irq_is_queued(vcpu, vlr.irq);

 	map->active = active;

@@ -1534,6 +1543,7 @@ static int vgic_update_irq_pending(struct kvm
*kvm, int cpuid,
 	int edge_triggered, level_triggered;
 	int enabled;
 	bool ret = true, can_inject = true;
+	struct irq_phys_map *map;

 	spin_lock(&dist->lock);

@@ -1580,6 +1590,18 @@ static int vgic_update_irq_pending(struct kvm
*kvm, int cpuid,
 		goto out;
 	}

+	map = vgic_irq_map_search(vcpu, irq_num);
+	if (map && !map->shared) {
+		/*
+		 * We are told to inject a HW irq, so we have to trust
+		 * the caller that the previous one has been EOIed,
+		 * and that a new one is now active. Clearing the
+		 * queued state will have the effect of making it
+		 * sample-able again.
+		 */
+		vgic_irq_clear_queued(vcpu, irq_num);
+	}
+
 	if (!vgic_can_sample_irq(vcpu, irq_num)) {
 		/*
 		 * Level interrupt in progress, will be picked up

Thoughts?

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [PATCH 10/10] KVM: arm/arm64: vgic: Allow non-shared device HW interrupts
  2015-06-18  8:37           ` Marc Zyngier
@ 2015-06-18 17:51             ` Eric Auger
  -1 siblings, 0 replies; 118+ messages in thread
From: Eric Auger @ 2015-06-18 17:51 UTC (permalink / raw)
  To: linux-arm-kernel

On 06/18/2015 10:37 AM, Marc Zyngier wrote:
> On 17/06/15 16:50, Eric Auger wrote:
>> On 06/17/2015 05:37 PM, Marc Zyngier wrote:
>>> On 17/06/15 16:11, Eric Auger wrote:
>>>> Hi Marc,
>>>> On 06/08/2015 07:04 PM, Marc Zyngier wrote:
>>>>> So far, the only use of the HW interrupt facility is the timer,
>>>>> implying that the active state is context-switched for each vcpu,
>>>>> as the device is is shared across all vcpus.
>>>> s/is//
>>>>>
>>>>> This does not work for a device that has been assigned to a VM,
>>>>> as the guest is entierely in control of that device (the HW is
>>>> entirely?
>>>>> not shared). In that case, it makes sense to bypass the whole
>>>>> active state srtwitchint, and only track the deactivation of the
>>>> switching
>>>
>>> Congratulations, I think you're now ready to try deciphering my
>>> handwriting... ;-)
>> good to see you're not a machine or maybe you do it on purpose some
>> times ;-)
>>>
>>>>> interrupt.
>>>>>
>>>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>>>> ---
>>>>>  include/kvm/arm_vgic.h    |  5 +++--
>>>>>  virt/kvm/arm/arch_timer.c |  2 +-
>>>>>  virt/kvm/arm/vgic.c       | 37 ++++++++++++++++++++++++-------------
>>>>>  3 files changed, 28 insertions(+), 16 deletions(-)
>>>>>
>>>>> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
>>>>> index 1c653c1..5d47d60 100644
>>>>> --- a/include/kvm/arm_vgic.h
>>>>> +++ b/include/kvm/arm_vgic.h
>>>>> @@ -164,7 +164,8 @@ struct irq_phys_map {
>>>>>  	u32			virt_irq;
>>>>>  	u32			phys_irq;
>>>>>  	u32			irq;
>>>>> -	bool			active;
>>>>> +	bool			shared;
>>>>> +	bool			active; /* Only valid if shared */
>>>>>  };
>>>>>  
>>>>>  struct vgic_dist {
>>>>> @@ -347,7 +348,7 @@ void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg);
>>>>>  int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
>>>>>  int kvm_vgic_vcpu_active_irq(struct kvm_vcpu *vcpu);
>>>>>  struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
>>>>> -				       int virt_irq, int irq);
>>>>> +				       int virt_irq, int irq, bool shared);
>>>>>  int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map);
>>>>>  bool vgic_get_phys_irq_active(struct irq_phys_map *map);
>>>>>  void vgic_set_phys_irq_active(struct irq_phys_map *map, bool active);
>>>>> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
>>>>> index b9fff78..9544d79 100644
>>>>> --- a/virt/kvm/arm/arch_timer.c
>>>>> +++ b/virt/kvm/arm/arch_timer.c
>>>>> @@ -202,7 +202,7 @@ void kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
>>>>>  	 * Tell the VGIC that the virtual interrupt is tied to a
>>>>>  	 * physical interrupt. We do that once per VCPU.
>>>>>  	 */
>>>>> -	timer->map = vgic_map_phys_irq(vcpu, irq->irq, host_vtimer_irq);
>>>>> +	timer->map = vgic_map_phys_irq(vcpu, irq->irq, host_vtimer_irq, true);
>>>>>  	WARN_ON(!timer->map);
>>>>>  }
>>>>>  
>>>>> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
>>>>> index f376b56..4223166 100644
>>>>> --- a/virt/kvm/arm/vgic.c
>>>>> +++ b/virt/kvm/arm/vgic.c
>>>>> @@ -1125,18 +1125,21 @@ static void vgic_queue_irq_to_lr(struct kvm_vcpu *vcpu, int irq,
>>>>>  		map = vgic_irq_map_search(vcpu, irq);
>>>>>  
>>>>>  		if (map) {
>>>>> -			int ret;
>>>>> -
>>>>> -			BUG_ON(!map->active);
>>>>>  			vlr.hwirq = map->phys_irq;
>>>>>  			vlr.state |= LR_HW;
>>>>>  			vlr.state &= ~LR_EOI_INT;
>>>>>  
>>>>> -			ret = irq_set_irqchip_state(map->irq,
>>>>> -						    IRQCHIP_STATE_ACTIVE,
>>>>> -						    true);
>>>>>  			vgic_irq_set_queued(vcpu, irq);
>>>>
>>>> the queued state is set again in vgic_queue_hwirq for level_sensitive
>>>> IRQs although not harmful.
>>>
>>> Indeed. We still need it for edge interrupts though. I'll try to find a
>>> nicer way...
>>>
>>>>> -			WARN_ON(ret);
>>>>> +
>>>>> +			if (map->shared) {
>>>>> +				int ret;
>>>>> +
>>>>> +				BUG_ON(!map->active);
>>>>> +				ret = irq_set_irqchip_state(map->irq,
>>>>> +							    IRQCHIP_STATE_ACTIVE,
>>>>> +							    true);
>>>>> +				WARN_ON(ret);
>>>>> +			}
>>>>>  		}
>>>>>  	}
>>>>>  
>>>>> @@ -1368,21 +1371,28 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
>>>>>  static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr)
>>>>>  {
>>>>>  	struct irq_phys_map *map;
>>>>> +	bool active;
>>>>>  	int ret;
>>>>>  
>>>>>  	if (!(vlr.state & LR_HW))
>>>>>  		return 0;
>>>>>  
>>>>>  	map = vgic_irq_map_search(vcpu, vlr.irq);
>>>>> -	BUG_ON(!map || !map->active);
>>>>> +	BUG_ON(!map);
>>>>> +	BUG_ON(map->shared && !map->active);
>>>>>  
>>>>>  	ret = irq_get_irqchip_state(map->irq,
>>>>>  				    IRQCHIP_STATE_ACTIVE,
>>>>> -				    &map->active);
>>>>> +				    &active);
>>>>>  
>>>> In case of non shared and EOIMode = 1 - I know this is not your current
>>>> interest here though ;-) - , once the guest EOIs its virtual IRQ and GIC
>>>> deactivates the physical one, a new phys IRQ can hit immediatly, the
>>>> physical handler can be entered and the state is seen as active here.
>>>> The queued state is never reset in such a case and the system gets stuck
>>>> since the can_sample fails I think. What I mean here is sounds the state
>>>> machine as is does not work for my VFIO case. So some adaptations still
>>>> are needed I think. Do you share my diagnosis?
>>>
>>> Yup, there is something that doesn't quite work here.
>>>
>>> I think the mistake is to sample the distributor active state. I wonder
>>> if I can simply rely on the LR state. If it is neither pending nor
>>> active, it means that we have done the deactivation, and we can then
>>> reset the queued state.
>>
>> I tried to use the LR in the past - it was also Christoffer's will - but
>> it was not working. I observed injection before seeing the LR voided.
>> This is why I resorted to using the pending state instead and treated
>> forwarded IRQ as edge in vgic_queue_hwirq.  sampling could be done only
>> if the IRQ was pending.
> 
> Of course, you're right. The LR state is not used at all for a physical
> interrupt (the HW bit really says "use the distributor").
> 
> I've given it more thoughts last night, and I think we can solve this is
> a fairly simple way. In the scenario you outline, we do not observe the
> ACTIVE to INACTIVE transition because the interrupt has fired again,
> leaving the interrupt flagged as queued.
> 
> I think we can clear the "queued" bit on injection, as we're guaranteed
> that seeing a new interrupt is the proof that the previous one has been
> deactivated (how could we see it otherwise?).
> 
> How about the following (untested) patch:

If think in EOIMode =1 it is indeed fairly safe. I will do some testing
and let you know ...

Best Regards

Eric
> 
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 6687ac4..a01f821 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -1387,8 +1387,17 @@ static int vgic_sync_hwirq(struct kvm_vcpu *vcpu,
> struct vgic_lr vlr)
> 
>  	WARN_ON(ret);
> 
> +	/*
> +	 * For a non-shared interrupt, we have to cater for two
> +	 * possible deactivation conditions
> +	 *
> +	 * - the interrupt is now inactive
> +	 * - the interrupt is still active, but is flagged as not
> +	 *   queued, indicating another interrupt has fired before we
> +	 *   could observe the deactivate.
> +	 */
>  	if (!map->shared)
> -		return !active;
> +		return !active || !vgic_irq_is_queued(vcpu, vlr.irq);
> 
>  	map->active = active;
> 
> @@ -1534,6 +1543,7 @@ static int vgic_update_irq_pending(struct kvm
> *kvm, int cpuid,
>  	int edge_triggered, level_triggered;
>  	int enabled;
>  	bool ret = true, can_inject = true;
> +	struct irq_phys_map *map;
> 
>  	spin_lock(&dist->lock);
> 
> @@ -1580,6 +1590,18 @@ static int vgic_update_irq_pending(struct kvm
> *kvm, int cpuid,
>  		goto out;
>  	}
> 
> +	map = vgic_irq_map_search(vcpu, irq_num);
> +	if (map && !map->shared) {
> +		/*
> +		 * We are told to inject a HW irq, so we have to trust
> +		 * the caller that the previous one has been EOIed,
> +		 * and that a new one is now active. Clearing the
> +		 * queued state will have the effect of making it
> +		 * sample-able again.
> +		 */
> +		vgic_irq_clear_queued(vcpu, irq_num);
> +	}
> +
>  	if (!vgic_can_sample_irq(vcpu, irq_num)) {
>  		/*
>  		 * Level interrupt in progress, will be picked up
> 
> Thoughts?
> 
> 	M.
> 

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 10/10] KVM: arm/arm64: vgic: Allow non-shared device HW interrupts
@ 2015-06-18 17:51             ` Eric Auger
  0 siblings, 0 replies; 118+ messages in thread
From: Eric Auger @ 2015-06-18 17:51 UTC (permalink / raw)
  To: Marc Zyngier, kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org
  Cc: Andre Przywara

On 06/18/2015 10:37 AM, Marc Zyngier wrote:
> On 17/06/15 16:50, Eric Auger wrote:
>> On 06/17/2015 05:37 PM, Marc Zyngier wrote:
>>> On 17/06/15 16:11, Eric Auger wrote:
>>>> Hi Marc,
>>>> On 06/08/2015 07:04 PM, Marc Zyngier wrote:
>>>>> So far, the only use of the HW interrupt facility is the timer,
>>>>> implying that the active state is context-switched for each vcpu,
>>>>> as the device is is shared across all vcpus.
>>>> s/is//
>>>>>
>>>>> This does not work for a device that has been assigned to a VM,
>>>>> as the guest is entierely in control of that device (the HW is
>>>> entirely?
>>>>> not shared). In that case, it makes sense to bypass the whole
>>>>> active state srtwitchint, and only track the deactivation of the
>>>> switching
>>>
>>> Congratulations, I think you're now ready to try deciphering my
>>> handwriting... ;-)
>> good to see you're not a machine or maybe you do it on purpose some
>> times ;-)
>>>
>>>>> interrupt.
>>>>>
>>>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>>>> ---
>>>>>  include/kvm/arm_vgic.h    |  5 +++--
>>>>>  virt/kvm/arm/arch_timer.c |  2 +-
>>>>>  virt/kvm/arm/vgic.c       | 37 ++++++++++++++++++++++++-------------
>>>>>  3 files changed, 28 insertions(+), 16 deletions(-)
>>>>>
>>>>> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
>>>>> index 1c653c1..5d47d60 100644
>>>>> --- a/include/kvm/arm_vgic.h
>>>>> +++ b/include/kvm/arm_vgic.h
>>>>> @@ -164,7 +164,8 @@ struct irq_phys_map {
>>>>>  	u32			virt_irq;
>>>>>  	u32			phys_irq;
>>>>>  	u32			irq;
>>>>> -	bool			active;
>>>>> +	bool			shared;
>>>>> +	bool			active; /* Only valid if shared */
>>>>>  };
>>>>>  
>>>>>  struct vgic_dist {
>>>>> @@ -347,7 +348,7 @@ void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg);
>>>>>  int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
>>>>>  int kvm_vgic_vcpu_active_irq(struct kvm_vcpu *vcpu);
>>>>>  struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
>>>>> -				       int virt_irq, int irq);
>>>>> +				       int virt_irq, int irq, bool shared);
>>>>>  int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map);
>>>>>  bool vgic_get_phys_irq_active(struct irq_phys_map *map);
>>>>>  void vgic_set_phys_irq_active(struct irq_phys_map *map, bool active);
>>>>> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
>>>>> index b9fff78..9544d79 100644
>>>>> --- a/virt/kvm/arm/arch_timer.c
>>>>> +++ b/virt/kvm/arm/arch_timer.c
>>>>> @@ -202,7 +202,7 @@ void kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
>>>>>  	 * Tell the VGIC that the virtual interrupt is tied to a
>>>>>  	 * physical interrupt. We do that once per VCPU.
>>>>>  	 */
>>>>> -	timer->map = vgic_map_phys_irq(vcpu, irq->irq, host_vtimer_irq);
>>>>> +	timer->map = vgic_map_phys_irq(vcpu, irq->irq, host_vtimer_irq, true);
>>>>>  	WARN_ON(!timer->map);
>>>>>  }
>>>>>  
>>>>> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
>>>>> index f376b56..4223166 100644
>>>>> --- a/virt/kvm/arm/vgic.c
>>>>> +++ b/virt/kvm/arm/vgic.c
>>>>> @@ -1125,18 +1125,21 @@ static void vgic_queue_irq_to_lr(struct kvm_vcpu *vcpu, int irq,
>>>>>  		map = vgic_irq_map_search(vcpu, irq);
>>>>>  
>>>>>  		if (map) {
>>>>> -			int ret;
>>>>> -
>>>>> -			BUG_ON(!map->active);
>>>>>  			vlr.hwirq = map->phys_irq;
>>>>>  			vlr.state |= LR_HW;
>>>>>  			vlr.state &= ~LR_EOI_INT;
>>>>>  
>>>>> -			ret = irq_set_irqchip_state(map->irq,
>>>>> -						    IRQCHIP_STATE_ACTIVE,
>>>>> -						    true);
>>>>>  			vgic_irq_set_queued(vcpu, irq);
>>>>
>>>> the queued state is set again in vgic_queue_hwirq for level_sensitive
>>>> IRQs although not harmful.
>>>
>>> Indeed. We still need it for edge interrupts though. I'll try to find a
>>> nicer way...
>>>
>>>>> -			WARN_ON(ret);
>>>>> +
>>>>> +			if (map->shared) {
>>>>> +				int ret;
>>>>> +
>>>>> +				BUG_ON(!map->active);
>>>>> +				ret = irq_set_irqchip_state(map->irq,
>>>>> +							    IRQCHIP_STATE_ACTIVE,
>>>>> +							    true);
>>>>> +				WARN_ON(ret);
>>>>> +			}
>>>>>  		}
>>>>>  	}
>>>>>  
>>>>> @@ -1368,21 +1371,28 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
>>>>>  static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr)
>>>>>  {
>>>>>  	struct irq_phys_map *map;
>>>>> +	bool active;
>>>>>  	int ret;
>>>>>  
>>>>>  	if (!(vlr.state & LR_HW))
>>>>>  		return 0;
>>>>>  
>>>>>  	map = vgic_irq_map_search(vcpu, vlr.irq);
>>>>> -	BUG_ON(!map || !map->active);
>>>>> +	BUG_ON(!map);
>>>>> +	BUG_ON(map->shared && !map->active);
>>>>>  
>>>>>  	ret = irq_get_irqchip_state(map->irq,
>>>>>  				    IRQCHIP_STATE_ACTIVE,
>>>>> -				    &map->active);
>>>>> +				    &active);
>>>>>  
>>>> In case of non shared and EOIMode = 1 - I know this is not your current
>>>> interest here though ;-) - , once the guest EOIs its virtual IRQ and GIC
>>>> deactivates the physical one, a new phys IRQ can hit immediatly, the
>>>> physical handler can be entered and the state is seen as active here.
>>>> The queued state is never reset in such a case and the system gets stuck
>>>> since the can_sample fails I think. What I mean here is sounds the state
>>>> machine as is does not work for my VFIO case. So some adaptations still
>>>> are needed I think. Do you share my diagnosis?
>>>
>>> Yup, there is something that doesn't quite work here.
>>>
>>> I think the mistake is to sample the distributor active state. I wonder
>>> if I can simply rely on the LR state. If it is neither pending nor
>>> active, it means that we have done the deactivation, and we can then
>>> reset the queued state.
>>
>> I tried to use the LR in the past - it was also Christoffer's will - but
>> it was not working. I observed injection before seeing the LR voided.
>> This is why I resorted to using the pending state instead and treated
>> forwarded IRQ as edge in vgic_queue_hwirq.  sampling could be done only
>> if the IRQ was pending.
> 
> Of course, you're right. The LR state is not used at all for a physical
> interrupt (the HW bit really says "use the distributor").
> 
> I've given it more thoughts last night, and I think we can solve this is
> a fairly simple way. In the scenario you outline, we do not observe the
> ACTIVE to INACTIVE transition because the interrupt has fired again,
> leaving the interrupt flagged as queued.
> 
> I think we can clear the "queued" bit on injection, as we're guaranteed
> that seeing a new interrupt is the proof that the previous one has been
> deactivated (how could we see it otherwise?).
> 
> How about the following (untested) patch:

If think in EOIMode =1 it is indeed fairly safe. I will do some testing
and let you know ...

Best Regards

Eric
> 
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 6687ac4..a01f821 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -1387,8 +1387,17 @@ static int vgic_sync_hwirq(struct kvm_vcpu *vcpu,
> struct vgic_lr vlr)
> 
>  	WARN_ON(ret);
> 
> +	/*
> +	 * For a non-shared interrupt, we have to cater for two
> +	 * possible deactivation conditions
> +	 *
> +	 * - the interrupt is now inactive
> +	 * - the interrupt is still active, but is flagged as not
> +	 *   queued, indicating another interrupt has fired before we
> +	 *   could observe the deactivate.
> +	 */
>  	if (!map->shared)
> -		return !active;
> +		return !active || !vgic_irq_is_queued(vcpu, vlr.irq);
> 
>  	map->active = active;
> 
> @@ -1534,6 +1543,7 @@ static int vgic_update_irq_pending(struct kvm
> *kvm, int cpuid,
>  	int edge_triggered, level_triggered;
>  	int enabled;
>  	bool ret = true, can_inject = true;
> +	struct irq_phys_map *map;
> 
>  	spin_lock(&dist->lock);
> 
> @@ -1580,6 +1590,18 @@ static int vgic_update_irq_pending(struct kvm
> *kvm, int cpuid,
>  		goto out;
>  	}
> 
> +	map = vgic_irq_map_search(vcpu, irq_num);
> +	if (map && !map->shared) {
> +		/*
> +		 * We are told to inject a HW irq, so we have to trust
> +		 * the caller that the previous one has been EOIed,
> +		 * and that a new one is now active. Clearing the
> +		 * queued state will have the effect of making it
> +		 * sample-able again.
> +		 */
> +		vgic_irq_clear_queued(vcpu, irq_num);
> +	}
> +
>  	if (!vgic_can_sample_irq(vcpu, irq_num)) {
>  		/*
>  		 * Level interrupt in progress, will be picked up
> 
> Thoughts?
> 
> 	M.
> 

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [PATCH 10/10] KVM: arm/arm64: vgic: Allow non-shared device HW interrupts
  2015-06-08 17:04   ` Marc Zyngier
@ 2015-06-30 20:19     ` Christoffer Dall
  -1 siblings, 0 replies; 118+ messages in thread
From: Christoffer Dall @ 2015-06-30 20:19 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jun 08, 2015 at 06:04:05PM +0100, Marc Zyngier wrote:
> So far, the only use of the HW interrupt facility is the timer,
> implying that the active state is context-switched for each vcpu,
> as the device is is shared across all vcpus.
> 
> This does not work for a device that has been assigned to a VM,
> as the guest is entierely in control of that device (the HW is
> not shared). In that case, it makes sense to bypass the whole
> active state switchint, and only track the deactivation of the
> interrupt.
> 
The discinction here between shared and non-shared feels a bit arbitrary
(it may not be, but just feel that way) and I can't easily convince
myself that this is the logical/correct/all-encompassing word to
describe the nature of the two devices.

Meh, not the most productive comment, I know...

-Christoffer

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 10/10] KVM: arm/arm64: vgic: Allow non-shared device HW interrupts
@ 2015-06-30 20:19     ` Christoffer Dall
  0 siblings, 0 replies; 118+ messages in thread
From: Christoffer Dall @ 2015-06-30 20:19 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvm, kvmarm, linux-arm-kernel, Eric Auger, Alex Bennée,
	Andre Przywara

On Mon, Jun 08, 2015 at 06:04:05PM +0100, Marc Zyngier wrote:
> So far, the only use of the HW interrupt facility is the timer,
> implying that the active state is context-switched for each vcpu,
> as the device is is shared across all vcpus.
> 
> This does not work for a device that has been assigned to a VM,
> as the guest is entierely in control of that device (the HW is
> not shared). In that case, it makes sense to bypass the whole
> active state switchint, and only track the deactivation of the
> interrupt.
> 
The discinction here between shared and non-shared feels a bit arbitrary
(it may not be, but just feel that way) and I can't easily convince
myself that this is the logical/correct/all-encompassing word to
describe the nature of the two devices.

Meh, not the most productive comment, I know...

-Christoffer

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [PATCH 01/10] arm/arm64: KVM: Fix ordering of timer/GIC on guest entry
  2015-06-08 17:03   ` Marc Zyngier
@ 2015-06-30 20:19     ` Christoffer Dall
  -1 siblings, 0 replies; 118+ messages in thread
From: Christoffer Dall @ 2015-06-30 20:19 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jun 08, 2015 at 06:03:56PM +0100, Marc Zyngier wrote:
> As we now inject the timer interrupt when we're about to enter
> the guest, it makes a lot more sense to make sure this happens
> before the vgic code queues the pending interrupts.
> 
> Otherwise, we get the interrupt on the following exit, which is
> not great for latency (and leads to all kind of bizarre issues
> when using with active interrupts at the HW level).
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 01/10] arm/arm64: KVM: Fix ordering of timer/GIC on guest entry
@ 2015-06-30 20:19     ` Christoffer Dall
  0 siblings, 0 replies; 118+ messages in thread
From: Christoffer Dall @ 2015-06-30 20:19 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvm, kvmarm, linux-arm-kernel, Eric Auger, Alex Bennée,
	Andre Przywara

On Mon, Jun 08, 2015 at 06:03:56PM +0100, Marc Zyngier wrote:
> As we now inject the timer interrupt when we're about to enter
> the guest, it makes a lot more sense to make sure this happens
> before the vgic code queues the pending interrupts.
> 
> Otherwise, we get the interrupt on the following exit, which is
> not great for latency (and leads to all kind of bizarre issues
> when using with active interrupts at the HW level).
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [PATCH 02/10] arm/arm64: KVM: Move vgic handling to a non-preemptible section
  2015-06-08 17:03   ` Marc Zyngier
@ 2015-06-30 20:19     ` Christoffer Dall
  -1 siblings, 0 replies; 118+ messages in thread
From: Christoffer Dall @ 2015-06-30 20:19 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jun 08, 2015 at 06:03:57PM +0100, Marc Zyngier wrote:
> As we're about to introduce some serious GIC-poking to the vgic code,
> it is important to make sure that we're going to poke the part of
> the GIC that belongs to the CPU we're about to run on (otherwise,
> we'd end up with some unexpected interrupts firing)...
> 
> Introducing a non-preemptible section in kvm_arch_vcpu_ioctl_run
> prevents the problem from occuring.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm/kvm/arm.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 46db690..4986300 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -529,8 +529,18 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		if (vcpu->arch.pause)
>  			vcpu_pause(vcpu);
>  
> +		/*
> +		 * Disarming the timer must be done with in a

s/with //

> +		 * preemptible context, as this call may sleep.
> +		 */
>  		kvm_timer_flush_hwstate(vcpu);
>  
> +		/*
> +		 * Preparing the interrupts to be injected also
> +		 * involves poking the GIC, which must be done in a
> +		 * non-preemptible context.
> +		 */
> +		preempt_disable();
>  		kvm_vgic_flush_hwstate(vcpu);
>  
>  		local_irq_disable();
> @@ -546,6 +556,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
>  			local_irq_enable();
>  			kvm_vgic_sync_hwstate(vcpu);
> +			preempt_enable();
>  			kvm_timer_sync_hwstate(vcpu);
>  			continue;
>  		}
> @@ -580,6 +591,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  
>  		kvm_vgic_sync_hwstate(vcpu);
>  
> +		preempt_enable();
> +
>  		kvm_timer_sync_hwstate(vcpu);
>  
>  		ret = handle_exit(vcpu, run, ret);
> -- 
> 2.1.4
> 
This should get more simple when rebased on cpu time accounting patch,
but otherwise looks good.

-Christoffer

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 02/10] arm/arm64: KVM: Move vgic handling to a non-preemptible section
@ 2015-06-30 20:19     ` Christoffer Dall
  0 siblings, 0 replies; 118+ messages in thread
From: Christoffer Dall @ 2015-06-30 20:19 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvm, kvmarm, linux-arm-kernel, Eric Auger, Alex Bennée,
	Andre Przywara

On Mon, Jun 08, 2015 at 06:03:57PM +0100, Marc Zyngier wrote:
> As we're about to introduce some serious GIC-poking to the vgic code,
> it is important to make sure that we're going to poke the part of
> the GIC that belongs to the CPU we're about to run on (otherwise,
> we'd end up with some unexpected interrupts firing)...
> 
> Introducing a non-preemptible section in kvm_arch_vcpu_ioctl_run
> prevents the problem from occuring.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm/kvm/arm.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 46db690..4986300 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -529,8 +529,18 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		if (vcpu->arch.pause)
>  			vcpu_pause(vcpu);
>  
> +		/*
> +		 * Disarming the timer must be done with in a

s/with //

> +		 * preemptible context, as this call may sleep.
> +		 */
>  		kvm_timer_flush_hwstate(vcpu);
>  
> +		/*
> +		 * Preparing the interrupts to be injected also
> +		 * involves poking the GIC, which must be done in a
> +		 * non-preemptible context.
> +		 */
> +		preempt_disable();
>  		kvm_vgic_flush_hwstate(vcpu);
>  
>  		local_irq_disable();
> @@ -546,6 +556,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
>  			local_irq_enable();
>  			kvm_vgic_sync_hwstate(vcpu);
> +			preempt_enable();
>  			kvm_timer_sync_hwstate(vcpu);
>  			continue;
>  		}
> @@ -580,6 +591,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  
>  		kvm_vgic_sync_hwstate(vcpu);
>  
> +		preempt_enable();
> +
>  		kvm_timer_sync_hwstate(vcpu);
>  
>  		ret = handle_exit(vcpu, run, ret);
> -- 
> 2.1.4
> 
This should get more simple when rebased on cpu time accounting patch,
but otherwise looks good.

-Christoffer

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [PATCH 05/10] KVM: arm/arm64: vgic: Relax vgic_can_sample_irq for edge IRQs
  2015-06-08 17:04   ` Marc Zyngier
@ 2015-06-30 20:19     ` Christoffer Dall
  -1 siblings, 0 replies; 118+ messages in thread
From: Christoffer Dall @ 2015-06-30 20:19 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jun 08, 2015 at 06:04:00PM +0100, Marc Zyngier wrote:
> We only set the irq_queued flag for level interrupts, meaning
> that "!vgic_irq_is_queued(vcpu, irq)" is a good enough predicate
> for all interrupts.
> 
> This will allow us to inject edge HW interrupts, for which the
> state ACTIVE+PENDING is not allowed.

I don't understand this; ACTIVE+PENDING is allowed for edge interrupts.
Do you mean that if we set the HW bit in the LR, then we are linking to
an HW interrupt where we don't allow that to be ACTIVE+PENDING on the HW
GIC side?

Why is this relevant here?  I feel like I'm missing context.

Thanks,
-Christoffer

> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  virt/kvm/arm/vgic.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 78fb820..59ed7a3 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -377,7 +377,7 @@ void vgic_cpu_irq_clear(struct kvm_vcpu *vcpu, int irq)
>  
>  static bool vgic_can_sample_irq(struct kvm_vcpu *vcpu, int irq)
>  {
> -	return vgic_irq_is_edge(vcpu, irq) || !vgic_irq_is_queued(vcpu, irq);
> +	return !vgic_irq_is_queued(vcpu, irq);
>  }
>  
>  /**
> -- 
> 2.1.4
> 

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 05/10] KVM: arm/arm64: vgic: Relax vgic_can_sample_irq for edge IRQs
@ 2015-06-30 20:19     ` Christoffer Dall
  0 siblings, 0 replies; 118+ messages in thread
From: Christoffer Dall @ 2015-06-30 20:19 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvm, kvmarm, linux-arm-kernel, Eric Auger, Alex Bennée,
	Andre Przywara

On Mon, Jun 08, 2015 at 06:04:00PM +0100, Marc Zyngier wrote:
> We only set the irq_queued flag for level interrupts, meaning
> that "!vgic_irq_is_queued(vcpu, irq)" is a good enough predicate
> for all interrupts.
> 
> This will allow us to inject edge HW interrupts, for which the
> state ACTIVE+PENDING is not allowed.

I don't understand this; ACTIVE+PENDING is allowed for edge interrupts.
Do you mean that if we set the HW bit in the LR, then we are linking to
an HW interrupt where we don't allow that to be ACTIVE+PENDING on the HW
GIC side?

Why is this relevant here?  I feel like I'm missing context.

Thanks,
-Christoffer

> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  virt/kvm/arm/vgic.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 78fb820..59ed7a3 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -377,7 +377,7 @@ void vgic_cpu_irq_clear(struct kvm_vcpu *vcpu, int irq)
>  
>  static bool vgic_can_sample_irq(struct kvm_vcpu *vcpu, int irq)
>  {
> -	return vgic_irq_is_edge(vcpu, irq) || !vgic_irq_is_queued(vcpu, irq);
> +	return !vgic_irq_is_queued(vcpu, irq);
>  }
>  
>  /**
> -- 
> 2.1.4
> 

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [PATCH 06/10] KVM: arm/arm64: vgic: Allow dynamic mapping of physical/virtual interrupts
  2015-06-08 17:04   ` Marc Zyngier
@ 2015-06-30 20:19     ` Christoffer Dall
  -1 siblings, 0 replies; 118+ messages in thread
From: Christoffer Dall @ 2015-06-30 20:19 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jun 08, 2015 at 06:04:01PM +0100, Marc Zyngier wrote:
> In order to be able to feed physical interrupts to a guest, we need
> to be able to establish the virtual-physical mapping between the two
> worlds.
> 
> The mapping is kept in a rbtree, indexed by virtual interrupts.

how many of these do you expect there will be?  Is the extra code and
complexity of an rbtree really warranted?

I would assume that you'll have one PPI for each CPU in the default case
plus potentially a few more for an assigned network adapter, let's say a
couple of handfulls.  Am I missing something obvious or is this
optimization of traversing a list of 10-12 mappings in the typical case
not likely to be measurable?

I would actually be more concerned about the additional locking and
would look at RCU for protecting a list instead.  Can you protect an
rbtree with RCU easily?

Thanks,
-Christoffer

> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  include/kvm/arm_vgic.h |  18 ++++++++
>  virt/kvm/arm/vgic.c    | 110 +++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 128 insertions(+)
> 
> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> index 4f9fa1d..33d121a 100644
> --- a/include/kvm/arm_vgic.h
> +++ b/include/kvm/arm_vgic.h
> @@ -159,6 +159,14 @@ struct vgic_io_device {
>  	struct kvm_io_device dev;
>  };
>  
> +struct irq_phys_map {
> +	struct rb_node		node;
> +	u32			virt_irq;
> +	u32			phys_irq;
> +	u32			irq;
> +	bool			active;
> +};
> +
>  struct vgic_dist {
>  	spinlock_t		lock;
>  	bool			in_kernel;
> @@ -256,6 +264,10 @@ struct vgic_dist {
>  	struct vgic_vm_ops	vm_ops;
>  	struct vgic_io_device	dist_iodev;
>  	struct vgic_io_device	*redist_iodevs;
> +
> +	/* Virtual irq to hwirq mapping */
> +	spinlock_t		irq_phys_map_lock;

why do we need a separate lock here?

> +	struct rb_root		irq_phys_map;
>  };
>  
>  struct vgic_v2_cpu_if {
> @@ -307,6 +319,9 @@ struct vgic_cpu {
>  		struct vgic_v2_cpu_if	vgic_v2;
>  		struct vgic_v3_cpu_if	vgic_v3;
>  	};
> +
> +	/* Protected by the distributor's irq_phys_map_lock */
> +	struct rb_root	irq_phys_map;
>  };
>  
>  #define LR_EMPTY	0xff
> @@ -331,6 +346,9 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int irq_num,
>  void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg);
>  int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
>  int kvm_vgic_vcpu_active_irq(struct kvm_vcpu *vcpu);
> +struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
> +				       int virt_irq, int irq);
> +int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map);
>  
>  #define irqchip_in_kernel(k)	(!!((k)->arch.vgic.in_kernel))
>  #define vgic_initialized(k)	(!!((k)->arch.vgic.nr_cpus))
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 59ed7a3..c6604f2 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -24,6 +24,7 @@
>  #include <linux/of.h>
>  #include <linux/of_address.h>
>  #include <linux/of_irq.h>
> +#include <linux/rbtree.h>
>  #include <linux/uaccess.h>
>  
>  #include <linux/irqchip/arm-gic.h>
> @@ -84,6 +85,8 @@ static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu);
>  static void vgic_retire_lr(int lr_nr, int irq, struct kvm_vcpu *vcpu);
>  static struct vgic_lr vgic_get_lr(const struct kvm_vcpu *vcpu, int lr);
>  static void vgic_set_lr(struct kvm_vcpu *vcpu, int lr, struct vgic_lr lr_desc);
> +static struct irq_phys_map *vgic_irq_map_search(struct kvm_vcpu *vcpu,
> +						int virt_irq);
>  
>  static const struct vgic_ops *vgic_ops;
>  static const struct vgic_params *vgic;
> @@ -1585,6 +1588,112 @@ static irqreturn_t vgic_maintenance_handler(int irq, void *data)
>  	return IRQ_HANDLED;
>  }
>  
> +static struct rb_root *vgic_get_irq_phys_map(struct kvm_vcpu *vcpu,
> +					     int virt_irq)
> +{
> +	if (virt_irq < VGIC_NR_PRIVATE_IRQS)
> +		return &vcpu->arch.vgic_cpu.irq_phys_map;
> +	else
> +		return &vcpu->kvm->arch.vgic.irq_phys_map;
> +}
> +
> +struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
> +				       int virt_irq, int irq)
> +{
> +	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
> +	struct rb_root *root = vgic_get_irq_phys_map(vcpu, virt_irq);
> +	struct rb_node **new = &root->rb_node, *parent = NULL;
> +	struct irq_phys_map *new_map;
> +	struct irq_desc *desc;
> +	struct irq_data *data;
> +	int phys_irq;
> +
> +	desc = irq_to_desc(irq);
> +	if (!desc) {
> +		kvm_err("kvm_arch_timer: can't obtain interrupt descriptor\n");
> +		return NULL;
> +	}
> +
> +	data = irq_desc_get_irq_data(desc);
> +	while (data->parent_data)
> +		data = data->parent_data;
> +
> +	phys_irq = data->hwirq;
> +
> +	spin_lock(&dist->irq_phys_map_lock);
> +
> +	/* Boilerplate rb_tree code */
> +	while (*new) {
> +		struct irq_phys_map *this;
> +
> +		this = container_of(*new, struct irq_phys_map, node);
> +		parent = *new;
> +		if (this->virt_irq < virt_irq)
> +			new = &(*new)->rb_left;
> +		else if (this->virt_irq > virt_irq)
> +			new = &(*new)->rb_right;
> +		else {
> +			new_map = this;
> +			goto out;
> +		}
> +	}
> +
> +	new_map = kzalloc(sizeof(*new_map), GFP_KERNEL);
> +	if (!new_map)
> +		goto out;
> +
> +	new_map->virt_irq = virt_irq;
> +	new_map->phys_irq = phys_irq;
> +	new_map->irq = irq;
> +
> +	rb_link_node(&new_map->node, parent, new);
> +	rb_insert_color(&new_map->node, root);
> +
> +out:
> +	spin_unlock(&dist->irq_phys_map_lock);
> +	return new_map;
> +}
> +
> +static struct irq_phys_map *vgic_irq_map_search(struct kvm_vcpu *vcpu,
> +						int virt_irq)
> +{
> +	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
> +	struct rb_root *root = vgic_get_irq_phys_map(vcpu, virt_irq);
> +	struct rb_node *node = root->rb_node;
> +	struct irq_phys_map *this = NULL;
> +
> +	spin_lock(&dist->irq_phys_map_lock);
> +
> +	while (node) {
> +		this = container_of(node, struct irq_phys_map, node);
> +
> +		if (this->virt_irq < virt_irq)
> +			node = node->rb_left;
> +		else if (this->virt_irq > virt_irq)
> +			node = node->rb_right;
> +		else
> +			break;
> +	}
> +
> +	spin_unlock(&dist->irq_phys_map_lock);
> +	return this;
> +}
> +
> +int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map)
> +{
> +	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
> +
> +	if (!map)
> +		return -EINVAL;
> +
> +	spin_lock(&dist->irq_phys_map_lock);
> +	rb_erase(&map->node, vgic_get_irq_phys_map(vcpu, map->virt_irq));
> +	spin_unlock(&dist->irq_phys_map_lock);
> +
> +	kfree(map);
> +	return 0;
> +}
> +
>  void kvm_vgic_vcpu_destroy(struct kvm_vcpu *vcpu)
>  {
>  	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
> @@ -1835,6 +1944,7 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
>  		goto out_unlock;
>  
>  	spin_lock_init(&kvm->arch.vgic.lock);
> +	spin_lock_init(&kvm->arch.vgic.irq_phys_map_lock);
>  	kvm->arch.vgic.in_kernel = true;
>  	kvm->arch.vgic.vgic_model = type;
>  	kvm->arch.vgic.vctrl_base = vgic->vctrl_base;
> -- 
> 2.1.4
> 

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 06/10] KVM: arm/arm64: vgic: Allow dynamic mapping of physical/virtual interrupts
@ 2015-06-30 20:19     ` Christoffer Dall
  0 siblings, 0 replies; 118+ messages in thread
From: Christoffer Dall @ 2015-06-30 20:19 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvm, Andre Przywara, kvmarm, linux-arm-kernel

On Mon, Jun 08, 2015 at 06:04:01PM +0100, Marc Zyngier wrote:
> In order to be able to feed physical interrupts to a guest, we need
> to be able to establish the virtual-physical mapping between the two
> worlds.
> 
> The mapping is kept in a rbtree, indexed by virtual interrupts.

how many of these do you expect there will be?  Is the extra code and
complexity of an rbtree really warranted?

I would assume that you'll have one PPI for each CPU in the default case
plus potentially a few more for an assigned network adapter, let's say a
couple of handfulls.  Am I missing something obvious or is this
optimization of traversing a list of 10-12 mappings in the typical case
not likely to be measurable?

I would actually be more concerned about the additional locking and
would look at RCU for protecting a list instead.  Can you protect an
rbtree with RCU easily?

Thanks,
-Christoffer

> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  include/kvm/arm_vgic.h |  18 ++++++++
>  virt/kvm/arm/vgic.c    | 110 +++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 128 insertions(+)
> 
> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> index 4f9fa1d..33d121a 100644
> --- a/include/kvm/arm_vgic.h
> +++ b/include/kvm/arm_vgic.h
> @@ -159,6 +159,14 @@ struct vgic_io_device {
>  	struct kvm_io_device dev;
>  };
>  
> +struct irq_phys_map {
> +	struct rb_node		node;
> +	u32			virt_irq;
> +	u32			phys_irq;
> +	u32			irq;
> +	bool			active;
> +};
> +
>  struct vgic_dist {
>  	spinlock_t		lock;
>  	bool			in_kernel;
> @@ -256,6 +264,10 @@ struct vgic_dist {
>  	struct vgic_vm_ops	vm_ops;
>  	struct vgic_io_device	dist_iodev;
>  	struct vgic_io_device	*redist_iodevs;
> +
> +	/* Virtual irq to hwirq mapping */
> +	spinlock_t		irq_phys_map_lock;

why do we need a separate lock here?

> +	struct rb_root		irq_phys_map;
>  };
>  
>  struct vgic_v2_cpu_if {
> @@ -307,6 +319,9 @@ struct vgic_cpu {
>  		struct vgic_v2_cpu_if	vgic_v2;
>  		struct vgic_v3_cpu_if	vgic_v3;
>  	};
> +
> +	/* Protected by the distributor's irq_phys_map_lock */
> +	struct rb_root	irq_phys_map;
>  };
>  
>  #define LR_EMPTY	0xff
> @@ -331,6 +346,9 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int irq_num,
>  void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg);
>  int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
>  int kvm_vgic_vcpu_active_irq(struct kvm_vcpu *vcpu);
> +struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
> +				       int virt_irq, int irq);
> +int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map);
>  
>  #define irqchip_in_kernel(k)	(!!((k)->arch.vgic.in_kernel))
>  #define vgic_initialized(k)	(!!((k)->arch.vgic.nr_cpus))
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 59ed7a3..c6604f2 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -24,6 +24,7 @@
>  #include <linux/of.h>
>  #include <linux/of_address.h>
>  #include <linux/of_irq.h>
> +#include <linux/rbtree.h>
>  #include <linux/uaccess.h>
>  
>  #include <linux/irqchip/arm-gic.h>
> @@ -84,6 +85,8 @@ static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu);
>  static void vgic_retire_lr(int lr_nr, int irq, struct kvm_vcpu *vcpu);
>  static struct vgic_lr vgic_get_lr(const struct kvm_vcpu *vcpu, int lr);
>  static void vgic_set_lr(struct kvm_vcpu *vcpu, int lr, struct vgic_lr lr_desc);
> +static struct irq_phys_map *vgic_irq_map_search(struct kvm_vcpu *vcpu,
> +						int virt_irq);
>  
>  static const struct vgic_ops *vgic_ops;
>  static const struct vgic_params *vgic;
> @@ -1585,6 +1588,112 @@ static irqreturn_t vgic_maintenance_handler(int irq, void *data)
>  	return IRQ_HANDLED;
>  }
>  
> +static struct rb_root *vgic_get_irq_phys_map(struct kvm_vcpu *vcpu,
> +					     int virt_irq)
> +{
> +	if (virt_irq < VGIC_NR_PRIVATE_IRQS)
> +		return &vcpu->arch.vgic_cpu.irq_phys_map;
> +	else
> +		return &vcpu->kvm->arch.vgic.irq_phys_map;
> +}
> +
> +struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
> +				       int virt_irq, int irq)
> +{
> +	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
> +	struct rb_root *root = vgic_get_irq_phys_map(vcpu, virt_irq);
> +	struct rb_node **new = &root->rb_node, *parent = NULL;
> +	struct irq_phys_map *new_map;
> +	struct irq_desc *desc;
> +	struct irq_data *data;
> +	int phys_irq;
> +
> +	desc = irq_to_desc(irq);
> +	if (!desc) {
> +		kvm_err("kvm_arch_timer: can't obtain interrupt descriptor\n");
> +		return NULL;
> +	}
> +
> +	data = irq_desc_get_irq_data(desc);
> +	while (data->parent_data)
> +		data = data->parent_data;
> +
> +	phys_irq = data->hwirq;
> +
> +	spin_lock(&dist->irq_phys_map_lock);
> +
> +	/* Boilerplate rb_tree code */
> +	while (*new) {
> +		struct irq_phys_map *this;
> +
> +		this = container_of(*new, struct irq_phys_map, node);
> +		parent = *new;
> +		if (this->virt_irq < virt_irq)
> +			new = &(*new)->rb_left;
> +		else if (this->virt_irq > virt_irq)
> +			new = &(*new)->rb_right;
> +		else {
> +			new_map = this;
> +			goto out;
> +		}
> +	}
> +
> +	new_map = kzalloc(sizeof(*new_map), GFP_KERNEL);
> +	if (!new_map)
> +		goto out;
> +
> +	new_map->virt_irq = virt_irq;
> +	new_map->phys_irq = phys_irq;
> +	new_map->irq = irq;
> +
> +	rb_link_node(&new_map->node, parent, new);
> +	rb_insert_color(&new_map->node, root);
> +
> +out:
> +	spin_unlock(&dist->irq_phys_map_lock);
> +	return new_map;
> +}
> +
> +static struct irq_phys_map *vgic_irq_map_search(struct kvm_vcpu *vcpu,
> +						int virt_irq)
> +{
> +	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
> +	struct rb_root *root = vgic_get_irq_phys_map(vcpu, virt_irq);
> +	struct rb_node *node = root->rb_node;
> +	struct irq_phys_map *this = NULL;
> +
> +	spin_lock(&dist->irq_phys_map_lock);
> +
> +	while (node) {
> +		this = container_of(node, struct irq_phys_map, node);
> +
> +		if (this->virt_irq < virt_irq)
> +			node = node->rb_left;
> +		else if (this->virt_irq > virt_irq)
> +			node = node->rb_right;
> +		else
> +			break;
> +	}
> +
> +	spin_unlock(&dist->irq_phys_map_lock);
> +	return this;
> +}
> +
> +int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map)
> +{
> +	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
> +
> +	if (!map)
> +		return -EINVAL;
> +
> +	spin_lock(&dist->irq_phys_map_lock);
> +	rb_erase(&map->node, vgic_get_irq_phys_map(vcpu, map->virt_irq));
> +	spin_unlock(&dist->irq_phys_map_lock);
> +
> +	kfree(map);
> +	return 0;
> +}
> +
>  void kvm_vgic_vcpu_destroy(struct kvm_vcpu *vcpu)
>  {
>  	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
> @@ -1835,6 +1944,7 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
>  		goto out_unlock;
>  
>  	spin_lock_init(&kvm->arch.vgic.lock);
> +	spin_lock_init(&kvm->arch.vgic.irq_phys_map_lock);
>  	kvm->arch.vgic.in_kernel = true;
>  	kvm->arch.vgic.vgic_model = type;
>  	kvm->arch.vgic.vctrl_base = vgic->vctrl_base;
> -- 
> 2.1.4
> 

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [PATCH 10/10] KVM: arm/arm64: vgic: Allow non-shared device HW interrupts
  2015-06-30 20:19     ` Christoffer Dall
@ 2015-07-01  8:26       ` Marc Zyngier
  -1 siblings, 0 replies; 118+ messages in thread
From: Marc Zyngier @ 2015-07-01  8:26 UTC (permalink / raw)
  To: linux-arm-kernel

On 30/06/15 21:19, Christoffer Dall wrote:
> On Mon, Jun 08, 2015 at 06:04:05PM +0100, Marc Zyngier wrote:
>> So far, the only use of the HW interrupt facility is the timer,
>> implying that the active state is context-switched for each vcpu,
>> as the device is is shared across all vcpus.
>>
>> This does not work for a device that has been assigned to a VM,
>> as the guest is entierely in control of that device (the HW is
>> not shared). In that case, it makes sense to bypass the whole
>> active state switchint, and only track the deactivation of the
>> interrupt.
>>
> The discinction here between shared and non-shared feels a bit arbitrary
> (it may not be, but just feel that way) and I can't easily convince
> myself that this is the logical/correct/all-encompassing word to
> describe the nature of the two devices.

Does the idea of global vs private resource feel more correct?

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 10/10] KVM: arm/arm64: vgic: Allow non-shared device HW interrupts
@ 2015-07-01  8:26       ` Marc Zyngier
  0 siblings, 0 replies; 118+ messages in thread
From: Marc Zyngier @ 2015-07-01  8:26 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org, Eric Auger,
	Alex Bennée, Andre Przywara

On 30/06/15 21:19, Christoffer Dall wrote:
> On Mon, Jun 08, 2015 at 06:04:05PM +0100, Marc Zyngier wrote:
>> So far, the only use of the HW interrupt facility is the timer,
>> implying that the active state is context-switched for each vcpu,
>> as the device is is shared across all vcpus.
>>
>> This does not work for a device that has been assigned to a VM,
>> as the guest is entierely in control of that device (the HW is
>> not shared). In that case, it makes sense to bypass the whole
>> active state switchint, and only track the deactivation of the
>> interrupt.
>>
> The discinction here between shared and non-shared feels a bit arbitrary
> (it may not be, but just feel that way) and I can't easily convince
> myself that this is the logical/correct/all-encompassing word to
> describe the nature of the two devices.

Does the idea of global vs private resource feel more correct?

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [PATCH 10/10] KVM: arm/arm64: vgic: Allow non-shared device HW interrupts
  2015-07-01  8:26       ` Marc Zyngier
@ 2015-07-01  8:57         ` Christoffer Dall
  -1 siblings, 0 replies; 118+ messages in thread
From: Christoffer Dall @ 2015-07-01  8:57 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jul 01, 2015 at 09:26:59AM +0100, Marc Zyngier wrote:
> On 30/06/15 21:19, Christoffer Dall wrote:
> > On Mon, Jun 08, 2015 at 06:04:05PM +0100, Marc Zyngier wrote:
> >> So far, the only use of the HW interrupt facility is the timer,
> >> implying that the active state is context-switched for each vcpu,
> >> as the device is is shared across all vcpus.
> >>
> >> This does not work for a device that has been assigned to a VM,
> >> as the guest is entierely in control of that device (the HW is
> >> not shared). In that case, it makes sense to bypass the whole
> >> active state switchint, and only track the deactivation of the
> >> interrupt.
> >>
> > The discinction here between shared and non-shared feels a bit arbitrary
> > (it may not be, but just feel that way) and I can't easily convince
> > myself that this is the logical/correct/all-encompassing word to
> > describe the nature of the two devices.
> 
> Does the idea of global vs private resource feel more correct?
> 
I think shared covers that equally well.  This feels like one of those
things that just doesn't make intuitive sense on its own but when you
think about the cases we are familiar with, then it fits for now.  So
what you have here is probably as good as it gets and hopefully it does
cover all the cases we care about, i.e. shared and non-shared :)

-Christoffer

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 10/10] KVM: arm/arm64: vgic: Allow non-shared device HW interrupts
@ 2015-07-01  8:57         ` Christoffer Dall
  0 siblings, 0 replies; 118+ messages in thread
From: Christoffer Dall @ 2015-07-01  8:57 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvm@vger.kernel.org, Andre Przywara, kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org

On Wed, Jul 01, 2015 at 09:26:59AM +0100, Marc Zyngier wrote:
> On 30/06/15 21:19, Christoffer Dall wrote:
> > On Mon, Jun 08, 2015 at 06:04:05PM +0100, Marc Zyngier wrote:
> >> So far, the only use of the HW interrupt facility is the timer,
> >> implying that the active state is context-switched for each vcpu,
> >> as the device is is shared across all vcpus.
> >>
> >> This does not work for a device that has been assigned to a VM,
> >> as the guest is entierely in control of that device (the HW is
> >> not shared). In that case, it makes sense to bypass the whole
> >> active state switchint, and only track the deactivation of the
> >> interrupt.
> >>
> > The discinction here between shared and non-shared feels a bit arbitrary
> > (it may not be, but just feel that way) and I can't easily convince
> > myself that this is the logical/correct/all-encompassing word to
> > describe the nature of the two devices.
> 
> Does the idea of global vs private resource feel more correct?
> 
I think shared covers that equally well.  This feels like one of those
things that just doesn't make intuitive sense on its own but when you
think about the cases we are familiar with, then it fits for now.  So
what you have here is probably as good as it gets and hopefully it does
cover all the cases we care about, i.e. shared and non-shared :)

-Christoffer

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [PATCH 05/10] KVM: arm/arm64: vgic: Relax vgic_can_sample_irq for edge IRQs
  2015-06-30 20:19     ` Christoffer Dall
@ 2015-07-01  9:17       ` Marc Zyngier
  -1 siblings, 0 replies; 118+ messages in thread
From: Marc Zyngier @ 2015-07-01  9:17 UTC (permalink / raw)
  To: linux-arm-kernel

On 30/06/15 21:19, Christoffer Dall wrote:
> On Mon, Jun 08, 2015 at 06:04:00PM +0100, Marc Zyngier wrote:
>> We only set the irq_queued flag for level interrupts, meaning
>> that "!vgic_irq_is_queued(vcpu, irq)" is a good enough predicate
>> for all interrupts.
>>
>> This will allow us to inject edge HW interrupts, for which the
>> state ACTIVE+PENDING is not allowed.
> 
> I don't understand this; ACTIVE+PENDING is allowed for edge interrupts.
> Do you mean that if we set the HW bit in the LR, then we are linking to
> an HW interrupt where we don't allow that to be ACTIVE+PENDING on the HW
> GIC side?
> 
> Why is this relevant here?  I feel like I'm missing context.

I've probably taken a shortcut here - bear with me while I'm trying to
explain the issue.

For HW interrupts, we shouldn't even try to use the state bits in the
LR, because that state is contained in the physical distributor. Setting
the HW bit really means "there is something going on at the distributor
level, just go there".

If we were to inject a ACTIVE+PENDING interrupt at the LR level, we'd
basically loose the second interrupt because that state is simply not
considered.

So the trick we're using is to only inject the active interrupt, and
prevent anything else from being injected until we can confirm that the
active state has been cleared at the physical level.

Does it make any sense?

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 05/10] KVM: arm/arm64: vgic: Relax vgic_can_sample_irq for edge IRQs
@ 2015-07-01  9:17       ` Marc Zyngier
  0 siblings, 0 replies; 118+ messages in thread
From: Marc Zyngier @ 2015-07-01  9:17 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org, Eric Auger,
	Alex Bennée, Andre Przywara

On 30/06/15 21:19, Christoffer Dall wrote:
> On Mon, Jun 08, 2015 at 06:04:00PM +0100, Marc Zyngier wrote:
>> We only set the irq_queued flag for level interrupts, meaning
>> that "!vgic_irq_is_queued(vcpu, irq)" is a good enough predicate
>> for all interrupts.
>>
>> This will allow us to inject edge HW interrupts, for which the
>> state ACTIVE+PENDING is not allowed.
> 
> I don't understand this; ACTIVE+PENDING is allowed for edge interrupts.
> Do you mean that if we set the HW bit in the LR, then we are linking to
> an HW interrupt where we don't allow that to be ACTIVE+PENDING on the HW
> GIC side?
> 
> Why is this relevant here?  I feel like I'm missing context.

I've probably taken a shortcut here - bear with me while I'm trying to
explain the issue.

For HW interrupts, we shouldn't even try to use the state bits in the
LR, because that state is contained in the physical distributor. Setting
the HW bit really means "there is something going on at the distributor
level, just go there".

If we were to inject a ACTIVE+PENDING interrupt at the LR level, we'd
basically loose the second interrupt because that state is simply not
considered.

So the trick we're using is to only inject the active interrupt, and
prevent anything else from being injected until we can confirm that the
active state has been cleared at the physical level.

Does it make any sense?

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [PATCH 06/10] KVM: arm/arm64: vgic: Allow dynamic mapping of physical/virtual interrupts
  2015-06-30 20:19     ` Christoffer Dall
@ 2015-07-01 10:20       ` Marc Zyngier
  -1 siblings, 0 replies; 118+ messages in thread
From: Marc Zyngier @ 2015-07-01 10:20 UTC (permalink / raw)
  To: linux-arm-kernel

On 30/06/15 21:19, Christoffer Dall wrote:
> On Mon, Jun 08, 2015 at 06:04:01PM +0100, Marc Zyngier wrote:
>> In order to be able to feed physical interrupts to a guest, we need
>> to be able to establish the virtual-physical mapping between the two
>> worlds.
>>
>> The mapping is kept in a rbtree, indexed by virtual interrupts.
> 
> how many of these do you expect there will be?  Is the extra code and
> complexity of an rbtree really warranted?
> 
> I would assume that you'll have one PPI for each CPU in the default case
> plus potentially a few more for an assigned network adapter, let's say a
> couple of handfulls.  Am I missing something obvious or is this
> optimization of traversing a list of 10-12 mappings in the typical case
> not likely to be measurable?
> 
> I would actually be more concerned about the additional locking and
> would look at RCU for protecting a list instead.  Can you protect an
> rbtree with RCU easily?

Not very easily. There was some work done a while ago for the dentry
cache IIRC, but I doubt that's reusable directly, and probably overkill.

RCU protected lists are, on the other hand, readily available. Bah. I'll
switch to this. By the time it becomes the bottleneck, the world will
have moved on. Or so I hope.

	M.

> 
> Thanks,
> -Christoffer
> 
>>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  include/kvm/arm_vgic.h |  18 ++++++++
>>  virt/kvm/arm/vgic.c    | 110 +++++++++++++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 128 insertions(+)
>>
>> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
>> index 4f9fa1d..33d121a 100644
>> --- a/include/kvm/arm_vgic.h
>> +++ b/include/kvm/arm_vgic.h
>> @@ -159,6 +159,14 @@ struct vgic_io_device {
>>  	struct kvm_io_device dev;
>>  };
>>  
>> +struct irq_phys_map {
>> +	struct rb_node		node;
>> +	u32			virt_irq;
>> +	u32			phys_irq;
>> +	u32			irq;
>> +	bool			active;
>> +};
>> +
>>  struct vgic_dist {
>>  	spinlock_t		lock;
>>  	bool			in_kernel;
>> @@ -256,6 +264,10 @@ struct vgic_dist {
>>  	struct vgic_vm_ops	vm_ops;
>>  	struct vgic_io_device	dist_iodev;
>>  	struct vgic_io_device	*redist_iodevs;
>> +
>> +	/* Virtual irq to hwirq mapping */
>> +	spinlock_t		irq_phys_map_lock;
> 
> why do we need a separate lock here?
> 
>> +	struct rb_root		irq_phys_map;
>>  };
>>  
>>  struct vgic_v2_cpu_if {
>> @@ -307,6 +319,9 @@ struct vgic_cpu {
>>  		struct vgic_v2_cpu_if	vgic_v2;
>>  		struct vgic_v3_cpu_if	vgic_v3;
>>  	};
>> +
>> +	/* Protected by the distributor's irq_phys_map_lock */
>> +	struct rb_root	irq_phys_map;
>>  };
>>  
>>  #define LR_EMPTY	0xff
>> @@ -331,6 +346,9 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int irq_num,
>>  void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg);
>>  int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
>>  int kvm_vgic_vcpu_active_irq(struct kvm_vcpu *vcpu);
>> +struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
>> +				       int virt_irq, int irq);
>> +int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map);
>>  
>>  #define irqchip_in_kernel(k)	(!!((k)->arch.vgic.in_kernel))
>>  #define vgic_initialized(k)	(!!((k)->arch.vgic.nr_cpus))
>> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
>> index 59ed7a3..c6604f2 100644
>> --- a/virt/kvm/arm/vgic.c
>> +++ b/virt/kvm/arm/vgic.c
>> @@ -24,6 +24,7 @@
>>  #include <linux/of.h>
>>  #include <linux/of_address.h>
>>  #include <linux/of_irq.h>
>> +#include <linux/rbtree.h>
>>  #include <linux/uaccess.h>
>>  
>>  #include <linux/irqchip/arm-gic.h>
>> @@ -84,6 +85,8 @@ static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu);
>>  static void vgic_retire_lr(int lr_nr, int irq, struct kvm_vcpu *vcpu);
>>  static struct vgic_lr vgic_get_lr(const struct kvm_vcpu *vcpu, int lr);
>>  static void vgic_set_lr(struct kvm_vcpu *vcpu, int lr, struct vgic_lr lr_desc);
>> +static struct irq_phys_map *vgic_irq_map_search(struct kvm_vcpu *vcpu,
>> +						int virt_irq);
>>  
>>  static const struct vgic_ops *vgic_ops;
>>  static const struct vgic_params *vgic;
>> @@ -1585,6 +1588,112 @@ static irqreturn_t vgic_maintenance_handler(int irq, void *data)
>>  	return IRQ_HANDLED;
>>  }
>>  
>> +static struct rb_root *vgic_get_irq_phys_map(struct kvm_vcpu *vcpu,
>> +					     int virt_irq)
>> +{
>> +	if (virt_irq < VGIC_NR_PRIVATE_IRQS)
>> +		return &vcpu->arch.vgic_cpu.irq_phys_map;
>> +	else
>> +		return &vcpu->kvm->arch.vgic.irq_phys_map;
>> +}
>> +
>> +struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
>> +				       int virt_irq, int irq)
>> +{
>> +	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
>> +	struct rb_root *root = vgic_get_irq_phys_map(vcpu, virt_irq);
>> +	struct rb_node **new = &root->rb_node, *parent = NULL;
>> +	struct irq_phys_map *new_map;
>> +	struct irq_desc *desc;
>> +	struct irq_data *data;
>> +	int phys_irq;
>> +
>> +	desc = irq_to_desc(irq);
>> +	if (!desc) {
>> +		kvm_err("kvm_arch_timer: can't obtain interrupt descriptor\n");
>> +		return NULL;
>> +	}
>> +
>> +	data = irq_desc_get_irq_data(desc);
>> +	while (data->parent_data)
>> +		data = data->parent_data;
>> +
>> +	phys_irq = data->hwirq;
>> +
>> +	spin_lock(&dist->irq_phys_map_lock);
>> +
>> +	/* Boilerplate rb_tree code */
>> +	while (*new) {
>> +		struct irq_phys_map *this;
>> +
>> +		this = container_of(*new, struct irq_phys_map, node);
>> +		parent = *new;
>> +		if (this->virt_irq < virt_irq)
>> +			new = &(*new)->rb_left;
>> +		else if (this->virt_irq > virt_irq)
>> +			new = &(*new)->rb_right;
>> +		else {
>> +			new_map = this;
>> +			goto out;
>> +		}
>> +	}
>> +
>> +	new_map = kzalloc(sizeof(*new_map), GFP_KERNEL);
>> +	if (!new_map)
>> +		goto out;
>> +
>> +	new_map->virt_irq = virt_irq;
>> +	new_map->phys_irq = phys_irq;
>> +	new_map->irq = irq;
>> +
>> +	rb_link_node(&new_map->node, parent, new);
>> +	rb_insert_color(&new_map->node, root);
>> +
>> +out:
>> +	spin_unlock(&dist->irq_phys_map_lock);
>> +	return new_map;
>> +}
>> +
>> +static struct irq_phys_map *vgic_irq_map_search(struct kvm_vcpu *vcpu,
>> +						int virt_irq)
>> +{
>> +	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
>> +	struct rb_root *root = vgic_get_irq_phys_map(vcpu, virt_irq);
>> +	struct rb_node *node = root->rb_node;
>> +	struct irq_phys_map *this = NULL;
>> +
>> +	spin_lock(&dist->irq_phys_map_lock);
>> +
>> +	while (node) {
>> +		this = container_of(node, struct irq_phys_map, node);
>> +
>> +		if (this->virt_irq < virt_irq)
>> +			node = node->rb_left;
>> +		else if (this->virt_irq > virt_irq)
>> +			node = node->rb_right;
>> +		else
>> +			break;
>> +	}
>> +
>> +	spin_unlock(&dist->irq_phys_map_lock);
>> +	return this;
>> +}
>> +
>> +int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map)
>> +{
>> +	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
>> +
>> +	if (!map)
>> +		return -EINVAL;
>> +
>> +	spin_lock(&dist->irq_phys_map_lock);
>> +	rb_erase(&map->node, vgic_get_irq_phys_map(vcpu, map->virt_irq));
>> +	spin_unlock(&dist->irq_phys_map_lock);
>> +
>> +	kfree(map);
>> +	return 0;
>> +}
>> +
>>  void kvm_vgic_vcpu_destroy(struct kvm_vcpu *vcpu)
>>  {
>>  	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
>> @@ -1835,6 +1944,7 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
>>  		goto out_unlock;
>>  
>>  	spin_lock_init(&kvm->arch.vgic.lock);
>> +	spin_lock_init(&kvm->arch.vgic.irq_phys_map_lock);
>>  	kvm->arch.vgic.in_kernel = true;
>>  	kvm->arch.vgic.vgic_model = type;
>>  	kvm->arch.vgic.vctrl_base = vgic->vctrl_base;
>> -- 
>> 2.1.4
>>
> 


-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 06/10] KVM: arm/arm64: vgic: Allow dynamic mapping of physical/virtual interrupts
@ 2015-07-01 10:20       ` Marc Zyngier
  0 siblings, 0 replies; 118+ messages in thread
From: Marc Zyngier @ 2015-07-01 10:20 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org, Eric Auger,
	Alex Bennée, Andre Przywara

On 30/06/15 21:19, Christoffer Dall wrote:
> On Mon, Jun 08, 2015 at 06:04:01PM +0100, Marc Zyngier wrote:
>> In order to be able to feed physical interrupts to a guest, we need
>> to be able to establish the virtual-physical mapping between the two
>> worlds.
>>
>> The mapping is kept in a rbtree, indexed by virtual interrupts.
> 
> how many of these do you expect there will be?  Is the extra code and
> complexity of an rbtree really warranted?
> 
> I would assume that you'll have one PPI for each CPU in the default case
> plus potentially a few more for an assigned network adapter, let's say a
> couple of handfulls.  Am I missing something obvious or is this
> optimization of traversing a list of 10-12 mappings in the typical case
> not likely to be measurable?
> 
> I would actually be more concerned about the additional locking and
> would look at RCU for protecting a list instead.  Can you protect an
> rbtree with RCU easily?

Not very easily. There was some work done a while ago for the dentry
cache IIRC, but I doubt that's reusable directly, and probably overkill.

RCU protected lists are, on the other hand, readily available. Bah. I'll
switch to this. By the time it becomes the bottleneck, the world will
have moved on. Or so I hope.

	M.

> 
> Thanks,
> -Christoffer
> 
>>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  include/kvm/arm_vgic.h |  18 ++++++++
>>  virt/kvm/arm/vgic.c    | 110 +++++++++++++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 128 insertions(+)
>>
>> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
>> index 4f9fa1d..33d121a 100644
>> --- a/include/kvm/arm_vgic.h
>> +++ b/include/kvm/arm_vgic.h
>> @@ -159,6 +159,14 @@ struct vgic_io_device {
>>  	struct kvm_io_device dev;
>>  };
>>  
>> +struct irq_phys_map {
>> +	struct rb_node		node;
>> +	u32			virt_irq;
>> +	u32			phys_irq;
>> +	u32			irq;
>> +	bool			active;
>> +};
>> +
>>  struct vgic_dist {
>>  	spinlock_t		lock;
>>  	bool			in_kernel;
>> @@ -256,6 +264,10 @@ struct vgic_dist {
>>  	struct vgic_vm_ops	vm_ops;
>>  	struct vgic_io_device	dist_iodev;
>>  	struct vgic_io_device	*redist_iodevs;
>> +
>> +	/* Virtual irq to hwirq mapping */
>> +	spinlock_t		irq_phys_map_lock;
> 
> why do we need a separate lock here?
> 
>> +	struct rb_root		irq_phys_map;
>>  };
>>  
>>  struct vgic_v2_cpu_if {
>> @@ -307,6 +319,9 @@ struct vgic_cpu {
>>  		struct vgic_v2_cpu_if	vgic_v2;
>>  		struct vgic_v3_cpu_if	vgic_v3;
>>  	};
>> +
>> +	/* Protected by the distributor's irq_phys_map_lock */
>> +	struct rb_root	irq_phys_map;
>>  };
>>  
>>  #define LR_EMPTY	0xff
>> @@ -331,6 +346,9 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int irq_num,
>>  void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg);
>>  int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
>>  int kvm_vgic_vcpu_active_irq(struct kvm_vcpu *vcpu);
>> +struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
>> +				       int virt_irq, int irq);
>> +int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map);
>>  
>>  #define irqchip_in_kernel(k)	(!!((k)->arch.vgic.in_kernel))
>>  #define vgic_initialized(k)	(!!((k)->arch.vgic.nr_cpus))
>> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
>> index 59ed7a3..c6604f2 100644
>> --- a/virt/kvm/arm/vgic.c
>> +++ b/virt/kvm/arm/vgic.c
>> @@ -24,6 +24,7 @@
>>  #include <linux/of.h>
>>  #include <linux/of_address.h>
>>  #include <linux/of_irq.h>
>> +#include <linux/rbtree.h>
>>  #include <linux/uaccess.h>
>>  
>>  #include <linux/irqchip/arm-gic.h>
>> @@ -84,6 +85,8 @@ static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu);
>>  static void vgic_retire_lr(int lr_nr, int irq, struct kvm_vcpu *vcpu);
>>  static struct vgic_lr vgic_get_lr(const struct kvm_vcpu *vcpu, int lr);
>>  static void vgic_set_lr(struct kvm_vcpu *vcpu, int lr, struct vgic_lr lr_desc);
>> +static struct irq_phys_map *vgic_irq_map_search(struct kvm_vcpu *vcpu,
>> +						int virt_irq);
>>  
>>  static const struct vgic_ops *vgic_ops;
>>  static const struct vgic_params *vgic;
>> @@ -1585,6 +1588,112 @@ static irqreturn_t vgic_maintenance_handler(int irq, void *data)
>>  	return IRQ_HANDLED;
>>  }
>>  
>> +static struct rb_root *vgic_get_irq_phys_map(struct kvm_vcpu *vcpu,
>> +					     int virt_irq)
>> +{
>> +	if (virt_irq < VGIC_NR_PRIVATE_IRQS)
>> +		return &vcpu->arch.vgic_cpu.irq_phys_map;
>> +	else
>> +		return &vcpu->kvm->arch.vgic.irq_phys_map;
>> +}
>> +
>> +struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
>> +				       int virt_irq, int irq)
>> +{
>> +	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
>> +	struct rb_root *root = vgic_get_irq_phys_map(vcpu, virt_irq);
>> +	struct rb_node **new = &root->rb_node, *parent = NULL;
>> +	struct irq_phys_map *new_map;
>> +	struct irq_desc *desc;
>> +	struct irq_data *data;
>> +	int phys_irq;
>> +
>> +	desc = irq_to_desc(irq);
>> +	if (!desc) {
>> +		kvm_err("kvm_arch_timer: can't obtain interrupt descriptor\n");
>> +		return NULL;
>> +	}
>> +
>> +	data = irq_desc_get_irq_data(desc);
>> +	while (data->parent_data)
>> +		data = data->parent_data;
>> +
>> +	phys_irq = data->hwirq;
>> +
>> +	spin_lock(&dist->irq_phys_map_lock);
>> +
>> +	/* Boilerplate rb_tree code */
>> +	while (*new) {
>> +		struct irq_phys_map *this;
>> +
>> +		this = container_of(*new, struct irq_phys_map, node);
>> +		parent = *new;
>> +		if (this->virt_irq < virt_irq)
>> +			new = &(*new)->rb_left;
>> +		else if (this->virt_irq > virt_irq)
>> +			new = &(*new)->rb_right;
>> +		else {
>> +			new_map = this;
>> +			goto out;
>> +		}
>> +	}
>> +
>> +	new_map = kzalloc(sizeof(*new_map), GFP_KERNEL);
>> +	if (!new_map)
>> +		goto out;
>> +
>> +	new_map->virt_irq = virt_irq;
>> +	new_map->phys_irq = phys_irq;
>> +	new_map->irq = irq;
>> +
>> +	rb_link_node(&new_map->node, parent, new);
>> +	rb_insert_color(&new_map->node, root);
>> +
>> +out:
>> +	spin_unlock(&dist->irq_phys_map_lock);
>> +	return new_map;
>> +}
>> +
>> +static struct irq_phys_map *vgic_irq_map_search(struct kvm_vcpu *vcpu,
>> +						int virt_irq)
>> +{
>> +	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
>> +	struct rb_root *root = vgic_get_irq_phys_map(vcpu, virt_irq);
>> +	struct rb_node *node = root->rb_node;
>> +	struct irq_phys_map *this = NULL;
>> +
>> +	spin_lock(&dist->irq_phys_map_lock);
>> +
>> +	while (node) {
>> +		this = container_of(node, struct irq_phys_map, node);
>> +
>> +		if (this->virt_irq < virt_irq)
>> +			node = node->rb_left;
>> +		else if (this->virt_irq > virt_irq)
>> +			node = node->rb_right;
>> +		else
>> +			break;
>> +	}
>> +
>> +	spin_unlock(&dist->irq_phys_map_lock);
>> +	return this;
>> +}
>> +
>> +int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map)
>> +{
>> +	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
>> +
>> +	if (!map)
>> +		return -EINVAL;
>> +
>> +	spin_lock(&dist->irq_phys_map_lock);
>> +	rb_erase(&map->node, vgic_get_irq_phys_map(vcpu, map->virt_irq));
>> +	spin_unlock(&dist->irq_phys_map_lock);
>> +
>> +	kfree(map);
>> +	return 0;
>> +}
>> +
>>  void kvm_vgic_vcpu_destroy(struct kvm_vcpu *vcpu)
>>  {
>>  	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
>> @@ -1835,6 +1944,7 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
>>  		goto out_unlock;
>>  
>>  	spin_lock_init(&kvm->arch.vgic.lock);
>> +	spin_lock_init(&kvm->arch.vgic.irq_phys_map_lock);
>>  	kvm->arch.vgic.in_kernel = true;
>>  	kvm->arch.vgic.vgic_model = type;
>>  	kvm->arch.vgic.vctrl_base = vgic->vctrl_base;
>> -- 
>> 2.1.4
>>
> 


-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [PATCH 06/10] KVM: arm/arm64: vgic: Allow dynamic mapping of physical/virtual interrupts
  2015-07-01 10:20       ` Marc Zyngier
@ 2015-07-01 11:45         ` Christoffer Dall
  -1 siblings, 0 replies; 118+ messages in thread
From: Christoffer Dall @ 2015-07-01 11:45 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jul 01, 2015 at 11:20:45AM +0100, Marc Zyngier wrote:
> On 30/06/15 21:19, Christoffer Dall wrote:
> > On Mon, Jun 08, 2015 at 06:04:01PM +0100, Marc Zyngier wrote:
> >> In order to be able to feed physical interrupts to a guest, we need
> >> to be able to establish the virtual-physical mapping between the two
> >> worlds.
> >>
> >> The mapping is kept in a rbtree, indexed by virtual interrupts.
> > 
> > how many of these do you expect there will be?  Is the extra code and
> > complexity of an rbtree really warranted?
> > 
> > I would assume that you'll have one PPI for each CPU in the default case
> > plus potentially a few more for an assigned network adapter, let's say a
> > couple of handfulls.  Am I missing something obvious or is this
> > optimization of traversing a list of 10-12 mappings in the typical case
> > not likely to be measurable?
> > 
> > I would actually be more concerned about the additional locking and
> > would look at RCU for protecting a list instead.  Can you protect an
> > rbtree with RCU easily?
> 
> Not very easily. There was some work done a while ago for the dentry
> cache IIRC, but I doubt that's reusable directly, and probably overkill.
> 
> RCU protected lists are, on the other hand, readily available. Bah. I'll
> switch to this. By the time it becomes the bottleneck, the world will
> have moved on. Or so I hope.
> 
We can also move to RB trees if we have some data to show us it's worth
the hassle later on, but I assume that since these structs are fairly
small and overhead like this is mostly to show up on a hot path, a
better optimization would be to allocate a bunch of these structures
contiguously for cache locality, but again, I feel like this is all
premature and we should measure the beast first.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 06/10] KVM: arm/arm64: vgic: Allow dynamic mapping of physical/virtual interrupts
@ 2015-07-01 11:45         ` Christoffer Dall
  0 siblings, 0 replies; 118+ messages in thread
From: Christoffer Dall @ 2015-07-01 11:45 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org, Eric Auger,
	Alex Bennée, Andre Przywara

On Wed, Jul 01, 2015 at 11:20:45AM +0100, Marc Zyngier wrote:
> On 30/06/15 21:19, Christoffer Dall wrote:
> > On Mon, Jun 08, 2015 at 06:04:01PM +0100, Marc Zyngier wrote:
> >> In order to be able to feed physical interrupts to a guest, we need
> >> to be able to establish the virtual-physical mapping between the two
> >> worlds.
> >>
> >> The mapping is kept in a rbtree, indexed by virtual interrupts.
> > 
> > how many of these do you expect there will be?  Is the extra code and
> > complexity of an rbtree really warranted?
> > 
> > I would assume that you'll have one PPI for each CPU in the default case
> > plus potentially a few more for an assigned network adapter, let's say a
> > couple of handfulls.  Am I missing something obvious or is this
> > optimization of traversing a list of 10-12 mappings in the typical case
> > not likely to be measurable?
> > 
> > I would actually be more concerned about the additional locking and
> > would look at RCU for protecting a list instead.  Can you protect an
> > rbtree with RCU easily?
> 
> Not very easily. There was some work done a while ago for the dentry
> cache IIRC, but I doubt that's reusable directly, and probably overkill.
> 
> RCU protected lists are, on the other hand, readily available. Bah. I'll
> switch to this. By the time it becomes the bottleneck, the world will
> have moved on. Or so I hope.
> 
We can also move to RB trees if we have some data to show us it's worth
the hassle later on, but I assume that since these structs are fairly
small and overhead like this is mostly to show up on a hot path, a
better optimization would be to allocate a bunch of these structures
contiguously for cache locality, but again, I feel like this is all
premature and we should measure the beast first.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [PATCH 05/10] KVM: arm/arm64: vgic: Relax vgic_can_sample_irq for edge IRQs
  2015-07-01  9:17       ` Marc Zyngier
@ 2015-07-01 11:58         ` Christoffer Dall
  -1 siblings, 0 replies; 118+ messages in thread
From: Christoffer Dall @ 2015-07-01 11:58 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jul 01, 2015 at 10:17:52AM +0100, Marc Zyngier wrote:
> On 30/06/15 21:19, Christoffer Dall wrote:
> > On Mon, Jun 08, 2015 at 06:04:00PM +0100, Marc Zyngier wrote:
> >> We only set the irq_queued flag for level interrupts, meaning
> >> that "!vgic_irq_is_queued(vcpu, irq)" is a good enough predicate
> >> for all interrupts.
> >>
> >> This will allow us to inject edge HW interrupts, for which the
> >> state ACTIVE+PENDING is not allowed.
> > 
> > I don't understand this; ACTIVE+PENDING is allowed for edge interrupts.
> > Do you mean that if we set the HW bit in the LR, then we are linking to
> > an HW interrupt where we don't allow that to be ACTIVE+PENDING on the HW
> > GIC side?
> > 
> > Why is this relevant here?  I feel like I'm missing context.
> 
> I've probably taken a shortcut here - bear with me while I'm trying to
> explain the issue.
> 
> For HW interrupts, we shouldn't even try to use the state bits in the
> LR, because that state is contained in the physical distributor. Setting
> the HW bit really means "there is something going on at the distributor
> level, just go there".

ok, so by "HW interrupts" you mean virtual interrupts with the HW bit in
the LR set, correct?

> 
> If we were to inject a ACTIVE+PENDING interrupt at the LR level, we'd
> basically loose the second interrupt because that state is simply not
> considered.

Huh?  Which second interrupt.  I looked at the spec and it says don't
use the state bits for HW interrupts, so isn't it simply not supported
to set these bits at all and that's it?

> 
> So the trick we're using is to only inject the active interrupt, and
> prevent anything else from being injected until we can confirm that the
> active state has been cleared at the physical level.
> 
> Does it make any sense?
> 
Sort of, but what I don't understand now is how the guest ever sees the
interrupt then.  If we always inject the virtual interrupt by setting
the active state on the physical distributor, and we can't inject this
as active+pending, and the guest doesn't see the state in the LR, then
how does this ever raise a virtual interrupt and how does the guest see
an interrupt which is only PENDING so that it can ack it etc. etc.?

Maybe I don't fully understand how the HW bit works after all...

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 05/10] KVM: arm/arm64: vgic: Relax vgic_can_sample_irq for edge IRQs
@ 2015-07-01 11:58         ` Christoffer Dall
  0 siblings, 0 replies; 118+ messages in thread
From: Christoffer Dall @ 2015-07-01 11:58 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org, Eric Auger,
	Alex Bennée, Andre Przywara

On Wed, Jul 01, 2015 at 10:17:52AM +0100, Marc Zyngier wrote:
> On 30/06/15 21:19, Christoffer Dall wrote:
> > On Mon, Jun 08, 2015 at 06:04:00PM +0100, Marc Zyngier wrote:
> >> We only set the irq_queued flag for level interrupts, meaning
> >> that "!vgic_irq_is_queued(vcpu, irq)" is a good enough predicate
> >> for all interrupts.
> >>
> >> This will allow us to inject edge HW interrupts, for which the
> >> state ACTIVE+PENDING is not allowed.
> > 
> > I don't understand this; ACTIVE+PENDING is allowed for edge interrupts.
> > Do you mean that if we set the HW bit in the LR, then we are linking to
> > an HW interrupt where we don't allow that to be ACTIVE+PENDING on the HW
> > GIC side?
> > 
> > Why is this relevant here?  I feel like I'm missing context.
> 
> I've probably taken a shortcut here - bear with me while I'm trying to
> explain the issue.
> 
> For HW interrupts, we shouldn't even try to use the state bits in the
> LR, because that state is contained in the physical distributor. Setting
> the HW bit really means "there is something going on at the distributor
> level, just go there".

ok, so by "HW interrupts" you mean virtual interrupts with the HW bit in
the LR set, correct?

> 
> If we were to inject a ACTIVE+PENDING interrupt at the LR level, we'd
> basically loose the second interrupt because that state is simply not
> considered.

Huh?  Which second interrupt.  I looked at the spec and it says don't
use the state bits for HW interrupts, so isn't it simply not supported
to set these bits at all and that's it?

> 
> So the trick we're using is to only inject the active interrupt, and
> prevent anything else from being injected until we can confirm that the
> active state has been cleared at the physical level.
> 
> Does it make any sense?
> 
Sort of, but what I don't understand now is how the guest ever sees the
interrupt then.  If we always inject the virtual interrupt by setting
the active state on the physical distributor, and we can't inject this
as active+pending, and the guest doesn't see the state in the LR, then
how does this ever raise a virtual interrupt and how does the guest see
an interrupt which is only PENDING so that it can ack it etc. etc.?

Maybe I don't fully understand how the HW bit works after all...

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [PATCH 05/10] KVM: arm/arm64: vgic: Relax vgic_can_sample_irq for edge IRQs
  2015-07-01 11:58         ` Christoffer Dall
@ 2015-07-01 18:18           ` Marc Zyngier
  -1 siblings, 0 replies; 118+ messages in thread
From: Marc Zyngier @ 2015-07-01 18:18 UTC (permalink / raw)
  To: linux-arm-kernel

On 01/07/15 12:58, Christoffer Dall wrote:
> On Wed, Jul 01, 2015 at 10:17:52AM +0100, Marc Zyngier wrote:
>> On 30/06/15 21:19, Christoffer Dall wrote:
>>> On Mon, Jun 08, 2015 at 06:04:00PM +0100, Marc Zyngier wrote:
>>>> We only set the irq_queued flag for level interrupts, meaning
>>>> that "!vgic_irq_is_queued(vcpu, irq)" is a good enough predicate
>>>> for all interrupts.
>>>>
>>>> This will allow us to inject edge HW interrupts, for which the
>>>> state ACTIVE+PENDING is not allowed.
>>>
>>> I don't understand this; ACTIVE+PENDING is allowed for edge interrupts.
>>> Do you mean that if we set the HW bit in the LR, then we are linking to
>>> an HW interrupt where we don't allow that to be ACTIVE+PENDING on the HW
>>> GIC side?
>>>
>>> Why is this relevant here?  I feel like I'm missing context.
>>
>> I've probably taken a shortcut here - bear with me while I'm trying to
>> explain the issue.
>>
>> For HW interrupts, we shouldn't even try to use the state bits in the
>> LR, because that state is contained in the physical distributor. Setting
>> the HW bit really means "there is something going on at the distributor
>> level, just go there".
> 
> ok, so by "HW interrupts" you mean virtual interrupts with the HW bit in
> the LR set, correct?

Yes, sorry.

>>
>> If we were to inject a ACTIVE+PENDING interrupt at the LR level, we'd
>> basically loose the second interrupt because that state is simply not
>> considered.
> 
> Huh?  Which second interrupt.  I looked at the spec and it says don't
> use the state bits for HW interrupts, so isn't it simply not supported
> to set these bits at all and that's it?

I managed to confuse myself reading the same bit. It says (GICv3 spec):

"A hypervisor must only use the pending and active state for software
originated interrupts, which are typically associated with virtual
devices, or SGIs."

That's the PENDING+ACTIVE state, and not the pending and active bits
like I read it initially.

Now consider the following scenario:

- We inject a virtual edge interrupt
- We mark the corresponding physical interrupt as active.
- Queue interrupt in an LR
- Resume vcpu

Now, we inject another edge interrupt, the vcpu exits for whatever
reason, and the previously injected interrupt is still active.

The normal vGIC flow would be to mark the interrupt as ACTIVE+PENDING in
the LR, and resume the vcpu. But the above states that this is invalid
for HW generated interrupts.

>>
>> So the trick we're using is to only inject the active interrupt, and
>> prevent anything else from being injected until we can confirm that the
>> active state has been cleared at the physical level.
>>
>> Does it make any sense?
>>
> Sort of, but what I don't understand now is how the guest ever sees the
> interrupt then.  If we always inject the virtual interrupt by setting
> the active state on the physical distributor, and we can't inject this
> as active+pending, and the guest doesn't see the state in the LR, then
> how does this ever raise a virtual interrupt and how does the guest see
> an interrupt which is only PENDING so that it can ack it etc. etc.?
> 
> Maybe I don't fully understand how the HW bit works after all...

The way the spec is written is slightly misleading. But the gist of it
is that we still signal the guest using the PENDING bit in the LR, and
switch the LR as usual. it is just that we can't use the PENDING+ACTIVE
state (apparently, this can lead to a double deactivation).

Not sure the above makes sense. Beer time, I suppose.

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 05/10] KVM: arm/arm64: vgic: Relax vgic_can_sample_irq for edge IRQs
@ 2015-07-01 18:18           ` Marc Zyngier
  0 siblings, 0 replies; 118+ messages in thread
From: Marc Zyngier @ 2015-07-01 18:18 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org, Eric Auger,
	Alex Bennée, Andre Przywara

On 01/07/15 12:58, Christoffer Dall wrote:
> On Wed, Jul 01, 2015 at 10:17:52AM +0100, Marc Zyngier wrote:
>> On 30/06/15 21:19, Christoffer Dall wrote:
>>> On Mon, Jun 08, 2015 at 06:04:00PM +0100, Marc Zyngier wrote:
>>>> We only set the irq_queued flag for level interrupts, meaning
>>>> that "!vgic_irq_is_queued(vcpu, irq)" is a good enough predicate
>>>> for all interrupts.
>>>>
>>>> This will allow us to inject edge HW interrupts, for which the
>>>> state ACTIVE+PENDING is not allowed.
>>>
>>> I don't understand this; ACTIVE+PENDING is allowed for edge interrupts.
>>> Do you mean that if we set the HW bit in the LR, then we are linking to
>>> an HW interrupt where we don't allow that to be ACTIVE+PENDING on the HW
>>> GIC side?
>>>
>>> Why is this relevant here?  I feel like I'm missing context.
>>
>> I've probably taken a shortcut here - bear with me while I'm trying to
>> explain the issue.
>>
>> For HW interrupts, we shouldn't even try to use the state bits in the
>> LR, because that state is contained in the physical distributor. Setting
>> the HW bit really means "there is something going on at the distributor
>> level, just go there".
> 
> ok, so by "HW interrupts" you mean virtual interrupts with the HW bit in
> the LR set, correct?

Yes, sorry.

>>
>> If we were to inject a ACTIVE+PENDING interrupt at the LR level, we'd
>> basically loose the second interrupt because that state is simply not
>> considered.
> 
> Huh?  Which second interrupt.  I looked at the spec and it says don't
> use the state bits for HW interrupts, so isn't it simply not supported
> to set these bits at all and that's it?

I managed to confuse myself reading the same bit. It says (GICv3 spec):

"A hypervisor must only use the pending and active state for software
originated interrupts, which are typically associated with virtual
devices, or SGIs."

That's the PENDING+ACTIVE state, and not the pending and active bits
like I read it initially.

Now consider the following scenario:

- We inject a virtual edge interrupt
- We mark the corresponding physical interrupt as active.
- Queue interrupt in an LR
- Resume vcpu

Now, we inject another edge interrupt, the vcpu exits for whatever
reason, and the previously injected interrupt is still active.

The normal vGIC flow would be to mark the interrupt as ACTIVE+PENDING in
the LR, and resume the vcpu. But the above states that this is invalid
for HW generated interrupts.

>>
>> So the trick we're using is to only inject the active interrupt, and
>> prevent anything else from being injected until we can confirm that the
>> active state has been cleared at the physical level.
>>
>> Does it make any sense?
>>
> Sort of, but what I don't understand now is how the guest ever sees the
> interrupt then.  If we always inject the virtual interrupt by setting
> the active state on the physical distributor, and we can't inject this
> as active+pending, and the guest doesn't see the state in the LR, then
> how does this ever raise a virtual interrupt and how does the guest see
> an interrupt which is only PENDING so that it can ack it etc. etc.?
> 
> Maybe I don't fully understand how the HW bit works after all...

The way the spec is written is slightly misleading. But the gist of it
is that we still signal the guest using the PENDING bit in the LR, and
switch the LR as usual. it is just that we can't use the PENDING+ACTIVE
state (apparently, this can lead to a double deactivation).

Not sure the above makes sense. Beer time, I suppose.

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [PATCH 05/10] KVM: arm/arm64: vgic: Relax vgic_can_sample_irq for edge IRQs
  2015-07-01 18:18           ` Marc Zyngier
@ 2015-07-02 16:23             ` Christoffer Dall
  -1 siblings, 0 replies; 118+ messages in thread
From: Christoffer Dall @ 2015-07-02 16:23 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jul 01, 2015 at 07:18:40PM +0100, Marc Zyngier wrote:
> On 01/07/15 12:58, Christoffer Dall wrote:
> > On Wed, Jul 01, 2015 at 10:17:52AM +0100, Marc Zyngier wrote:
> >> On 30/06/15 21:19, Christoffer Dall wrote:
> >>> On Mon, Jun 08, 2015 at 06:04:00PM +0100, Marc Zyngier wrote:
> >>>> We only set the irq_queued flag for level interrupts, meaning
> >>>> that "!vgic_irq_is_queued(vcpu, irq)" is a good enough predicate
> >>>> for all interrupts.
> >>>>
> >>>> This will allow us to inject edge HW interrupts, for which the
> >>>> state ACTIVE+PENDING is not allowed.
> >>>
> >>> I don't understand this; ACTIVE+PENDING is allowed for edge interrupts.
> >>> Do you mean that if we set the HW bit in the LR, then we are linking to
> >>> an HW interrupt where we don't allow that to be ACTIVE+PENDING on the HW
> >>> GIC side?
> >>>
> >>> Why is this relevant here?  I feel like I'm missing context.
> >>
> >> I've probably taken a shortcut here - bear with me while I'm trying to
> >> explain the issue.
> >>
> >> For HW interrupts, we shouldn't even try to use the state bits in the
> >> LR, because that state is contained in the physical distributor. Setting
> >> the HW bit really means "there is something going on at the distributor
> >> level, just go there".
> > 
> > ok, so by "HW interrupts" you mean virtual interrupts with the HW bit in
> > the LR set, correct?
> 
> Yes, sorry.
> 
> >>
> >> If we were to inject a ACTIVE+PENDING interrupt at the LR level, we'd
> >> basically loose the second interrupt because that state is simply not
> >> considered.
> > 
> > Huh?  Which second interrupt.  I looked at the spec and it says don't
> > use the state bits for HW interrupts, so isn't it simply not supported
> > to set these bits at all and that's it?
> 
> I managed to confuse myself reading the same bit. It says (GICv3 spec):
> 
> "A hypervisor must only use the pending and active state for software
> originated interrupts, which are typically associated with virtual
> devices, or SGIs."
> 
> That's the PENDING+ACTIVE state, and not the pending and active bits
> like I read it initially.
> 
> Now consider the following scenario:
> 
> - We inject a virtual edge interrupt
> - We mark the corresponding physical interrupt as active.
> - Queue interrupt in an LR
> - Resume vcpu
> 
> Now, we inject another edge interrupt, the vcpu exits for whatever
> reason, and the previously injected interrupt is still active.
> 
> The normal vGIC flow would be to mark the interrupt as ACTIVE+PENDING in
> the LR, and resume the vcpu. But the above states that this is invalid
> for HW generated interrupts.

Right, ok, so we must resample the pending state even for an
edge-triggered interrupt once it's EOIed, because we cannot put it in
the LR despite it being pending on the physical distributor?

Incidentally, we do not need to set the EOI_INT bit, becuase when the
guest EOIs the interrupt, it will also deactivate it on the physical
distributor and the hardware will then take the pending physical
interrupt, we will handle it in the host, etc. etc.

If we had a different *shared* device than the timer which is
edge-triggered, don't we then also need to capture the physical
distributor's pending state along with the state of the device unless we
assume that upon restoring the state for the device count on the device
to have another rising/falling edge to trigger the interrupt again? (I
assume the line would always go high for a level-triggered interrupt in
this case).

> 
> >>
> >> So the trick we're using is to only inject the active interrupt, and
> >> prevent anything else from being injected until we can confirm that the
> >> active state has been cleared at the physical level.
> >>
> >> Does it make any sense?
> >>
> > Sort of, but what I don't understand now is how the guest ever sees the
> > interrupt then.  If we always inject the virtual interrupt by setting
> > the active state on the physical distributor, and we can't inject this
> > as active+pending, and the guest doesn't see the state in the LR, then
> > how does this ever raise a virtual interrupt and how does the guest see
> > an interrupt which is only PENDING so that it can ack it etc. etc.?
> > 
> > Maybe I don't fully understand how the HW bit works after all...
> 
> The way the spec is written is slightly misleading. But the gist of it
> is that we still signal the guest using the PENDING bit in the LR, and
> switch the LR as usual. it is just that we can't use the PENDING+ACTIVE
> state (apparently, this can lead to a double deactivation).
> 
> Not sure the above makes sense. Beer time, I suppose.
> 
It does make sense, I just had to sleep on it and see the code as a
whole instead of trying to understand it by just looking at this patch
individually.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 05/10] KVM: arm/arm64: vgic: Relax vgic_can_sample_irq for edge IRQs
@ 2015-07-02 16:23             ` Christoffer Dall
  0 siblings, 0 replies; 118+ messages in thread
From: Christoffer Dall @ 2015-07-02 16:23 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org, Eric Auger,
	Alex Bennée, Andre Przywara

On Wed, Jul 01, 2015 at 07:18:40PM +0100, Marc Zyngier wrote:
> On 01/07/15 12:58, Christoffer Dall wrote:
> > On Wed, Jul 01, 2015 at 10:17:52AM +0100, Marc Zyngier wrote:
> >> On 30/06/15 21:19, Christoffer Dall wrote:
> >>> On Mon, Jun 08, 2015 at 06:04:00PM +0100, Marc Zyngier wrote:
> >>>> We only set the irq_queued flag for level interrupts, meaning
> >>>> that "!vgic_irq_is_queued(vcpu, irq)" is a good enough predicate
> >>>> for all interrupts.
> >>>>
> >>>> This will allow us to inject edge HW interrupts, for which the
> >>>> state ACTIVE+PENDING is not allowed.
> >>>
> >>> I don't understand this; ACTIVE+PENDING is allowed for edge interrupts.
> >>> Do you mean that if we set the HW bit in the LR, then we are linking to
> >>> an HW interrupt where we don't allow that to be ACTIVE+PENDING on the HW
> >>> GIC side?
> >>>
> >>> Why is this relevant here?  I feel like I'm missing context.
> >>
> >> I've probably taken a shortcut here - bear with me while I'm trying to
> >> explain the issue.
> >>
> >> For HW interrupts, we shouldn't even try to use the state bits in the
> >> LR, because that state is contained in the physical distributor. Setting
> >> the HW bit really means "there is something going on at the distributor
> >> level, just go there".
> > 
> > ok, so by "HW interrupts" you mean virtual interrupts with the HW bit in
> > the LR set, correct?
> 
> Yes, sorry.
> 
> >>
> >> If we were to inject a ACTIVE+PENDING interrupt at the LR level, we'd
> >> basically loose the second interrupt because that state is simply not
> >> considered.
> > 
> > Huh?  Which second interrupt.  I looked at the spec and it says don't
> > use the state bits for HW interrupts, so isn't it simply not supported
> > to set these bits at all and that's it?
> 
> I managed to confuse myself reading the same bit. It says (GICv3 spec):
> 
> "A hypervisor must only use the pending and active state for software
> originated interrupts, which are typically associated with virtual
> devices, or SGIs."
> 
> That's the PENDING+ACTIVE state, and not the pending and active bits
> like I read it initially.
> 
> Now consider the following scenario:
> 
> - We inject a virtual edge interrupt
> - We mark the corresponding physical interrupt as active.
> - Queue interrupt in an LR
> - Resume vcpu
> 
> Now, we inject another edge interrupt, the vcpu exits for whatever
> reason, and the previously injected interrupt is still active.
> 
> The normal vGIC flow would be to mark the interrupt as ACTIVE+PENDING in
> the LR, and resume the vcpu. But the above states that this is invalid
> for HW generated interrupts.

Right, ok, so we must resample the pending state even for an
edge-triggered interrupt once it's EOIed, because we cannot put it in
the LR despite it being pending on the physical distributor?

Incidentally, we do not need to set the EOI_INT bit, becuase when the
guest EOIs the interrupt, it will also deactivate it on the physical
distributor and the hardware will then take the pending physical
interrupt, we will handle it in the host, etc. etc.

If we had a different *shared* device than the timer which is
edge-triggered, don't we then also need to capture the physical
distributor's pending state along with the state of the device unless we
assume that upon restoring the state for the device count on the device
to have another rising/falling edge to trigger the interrupt again? (I
assume the line would always go high for a level-triggered interrupt in
this case).

> 
> >>
> >> So the trick we're using is to only inject the active interrupt, and
> >> prevent anything else from being injected until we can confirm that the
> >> active state has been cleared at the physical level.
> >>
> >> Does it make any sense?
> >>
> > Sort of, but what I don't understand now is how the guest ever sees the
> > interrupt then.  If we always inject the virtual interrupt by setting
> > the active state on the physical distributor, and we can't inject this
> > as active+pending, and the guest doesn't see the state in the LR, then
> > how does this ever raise a virtual interrupt and how does the guest see
> > an interrupt which is only PENDING so that it can ack it etc. etc.?
> > 
> > Maybe I don't fully understand how the HW bit works after all...
> 
> The way the spec is written is slightly misleading. But the gist of it
> is that we still signal the guest using the PENDING bit in the LR, and
> switch the LR as usual. it is just that we can't use the PENDING+ACTIVE
> state (apparently, this can lead to a double deactivation).
> 
> Not sure the above makes sense. Beer time, I suppose.
> 
It does make sense, I just had to sleep on it and see the code as a
whole instead of trying to understand it by just looking at this patch
individually.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [PATCH 05/10] KVM: arm/arm64: vgic: Relax vgic_can_sample_irq for edge IRQs
  2015-07-02 16:23             ` Christoffer Dall
@ 2015-07-03  9:50               ` Marc Zyngier
  -1 siblings, 0 replies; 118+ messages in thread
From: Marc Zyngier @ 2015-07-03  9:50 UTC (permalink / raw)
  To: linux-arm-kernel

On 02/07/15 17:23, Christoffer Dall wrote:
> On Wed, Jul 01, 2015 at 07:18:40PM +0100, Marc Zyngier wrote:
>> On 01/07/15 12:58, Christoffer Dall wrote:
>>> On Wed, Jul 01, 2015 at 10:17:52AM +0100, Marc Zyngier wrote:
>>>> On 30/06/15 21:19, Christoffer Dall wrote:
>>>>> On Mon, Jun 08, 2015 at 06:04:00PM +0100, Marc Zyngier wrote:
>>>>>> We only set the irq_queued flag for level interrupts, meaning
>>>>>> that "!vgic_irq_is_queued(vcpu, irq)" is a good enough predicate
>>>>>> for all interrupts.
>>>>>>
>>>>>> This will allow us to inject edge HW interrupts, for which the
>>>>>> state ACTIVE+PENDING is not allowed.
>>>>>
>>>>> I don't understand this; ACTIVE+PENDING is allowed for edge interrupts.
>>>>> Do you mean that if we set the HW bit in the LR, then we are linking to
>>>>> an HW interrupt where we don't allow that to be ACTIVE+PENDING on the HW
>>>>> GIC side?
>>>>>
>>>>> Why is this relevant here?  I feel like I'm missing context.
>>>>
>>>> I've probably taken a shortcut here - bear with me while I'm trying to
>>>> explain the issue.
>>>>
>>>> For HW interrupts, we shouldn't even try to use the state bits in the
>>>> LR, because that state is contained in the physical distributor. Setting
>>>> the HW bit really means "there is something going on at the distributor
>>>> level, just go there".
>>>
>>> ok, so by "HW interrupts" you mean virtual interrupts with the HW bit in
>>> the LR set, correct?
>>
>> Yes, sorry.
>>
>>>>
>>>> If we were to inject a ACTIVE+PENDING interrupt at the LR level, we'd
>>>> basically loose the second interrupt because that state is simply not
>>>> considered.
>>>
>>> Huh?  Which second interrupt.  I looked at the spec and it says don't
>>> use the state bits for HW interrupts, so isn't it simply not supported
>>> to set these bits at all and that's it?
>>
>> I managed to confuse myself reading the same bit. It says (GICv3 spec):
>>
>> "A hypervisor must only use the pending and active state for software
>> originated interrupts, which are typically associated with virtual
>> devices, or SGIs."
>>
>> That's the PENDING+ACTIVE state, and not the pending and active bits
>> like I read it initially.
>>
>> Now consider the following scenario:
>>
>> - We inject a virtual edge interrupt
>> - We mark the corresponding physical interrupt as active.
>> - Queue interrupt in an LR
>> - Resume vcpu
>>
>> Now, we inject another edge interrupt, the vcpu exits for whatever
>> reason, and the previously injected interrupt is still active.
>>
>> The normal vGIC flow would be to mark the interrupt as ACTIVE+PENDING in
>> the LR, and resume the vcpu. But the above states that this is invalid
>> for HW generated interrupts.
> 
> Right, ok, so we must resample the pending state even for an
> edge-triggered interrupt once it's EOIed, because we cannot put it in
> the LR despite it being pending on the physical distributor?
> 
> Incidentally, we do not need to set the EOI_INT bit, becuase when the
> guest EOIs the interrupt, it will also deactivate it on the physical
> distributor and the hardware will then take the pending physical
> interrupt, we will handle it in the host, etc. etc.
> 
> If we had a different *shared* device than the timer which is
> edge-triggered, don't we then also need to capture the physical
> distributor's pending state along with the state of the device unless we
> assume that upon restoring the state for the device count on the device
> to have another rising/falling edge to trigger the interrupt again? (I
> assume the line would always go high for a level-triggered interrupt in
> this case).

I'd definitely assume that restoring the state of the device would make
it generate an interrupt. This has to be a property of the device,
otherwise it is not really shareable between vcpus.

Time will tell - we still have to see one of these.

>>
>>>>
>>>> So the trick we're using is to only inject the active interrupt, and
>>>> prevent anything else from being injected until we can confirm that the
>>>> active state has been cleared at the physical level.
>>>>
>>>> Does it make any sense?
>>>>
>>> Sort of, but what I don't understand now is how the guest ever sees the
>>> interrupt then.  If we always inject the virtual interrupt by setting
>>> the active state on the physical distributor, and we can't inject this
>>> as active+pending, and the guest doesn't see the state in the LR, then
>>> how does this ever raise a virtual interrupt and how does the guest see
>>> an interrupt which is only PENDING so that it can ack it etc. etc.?
>>>
>>> Maybe I don't fully understand how the HW bit works after all...
>>
>> The way the spec is written is slightly misleading. But the gist of it
>> is that we still signal the guest using the PENDING bit in the LR, and
>> switch the LR as usual. it is just that we can't use the PENDING+ACTIVE
>> state (apparently, this can lead to a double deactivation).
>>
>> Not sure the above makes sense. Beer time, I suppose.
>>
> It does make sense, I just had to sleep on it and see the code as a
> whole instead of trying to understand it by just looking at this patch
> individually.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 05/10] KVM: arm/arm64: vgic: Relax vgic_can_sample_irq for edge IRQs
@ 2015-07-03  9:50               ` Marc Zyngier
  0 siblings, 0 replies; 118+ messages in thread
From: Marc Zyngier @ 2015-07-03  9:50 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvm@vger.kernel.org, Andre Przywara, kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org

On 02/07/15 17:23, Christoffer Dall wrote:
> On Wed, Jul 01, 2015 at 07:18:40PM +0100, Marc Zyngier wrote:
>> On 01/07/15 12:58, Christoffer Dall wrote:
>>> On Wed, Jul 01, 2015 at 10:17:52AM +0100, Marc Zyngier wrote:
>>>> On 30/06/15 21:19, Christoffer Dall wrote:
>>>>> On Mon, Jun 08, 2015 at 06:04:00PM +0100, Marc Zyngier wrote:
>>>>>> We only set the irq_queued flag for level interrupts, meaning
>>>>>> that "!vgic_irq_is_queued(vcpu, irq)" is a good enough predicate
>>>>>> for all interrupts.
>>>>>>
>>>>>> This will allow us to inject edge HW interrupts, for which the
>>>>>> state ACTIVE+PENDING is not allowed.
>>>>>
>>>>> I don't understand this; ACTIVE+PENDING is allowed for edge interrupts.
>>>>> Do you mean that if we set the HW bit in the LR, then we are linking to
>>>>> an HW interrupt where we don't allow that to be ACTIVE+PENDING on the HW
>>>>> GIC side?
>>>>>
>>>>> Why is this relevant here?  I feel like I'm missing context.
>>>>
>>>> I've probably taken a shortcut here - bear with me while I'm trying to
>>>> explain the issue.
>>>>
>>>> For HW interrupts, we shouldn't even try to use the state bits in the
>>>> LR, because that state is contained in the physical distributor. Setting
>>>> the HW bit really means "there is something going on at the distributor
>>>> level, just go there".
>>>
>>> ok, so by "HW interrupts" you mean virtual interrupts with the HW bit in
>>> the LR set, correct?
>>
>> Yes, sorry.
>>
>>>>
>>>> If we were to inject a ACTIVE+PENDING interrupt at the LR level, we'd
>>>> basically loose the second interrupt because that state is simply not
>>>> considered.
>>>
>>> Huh?  Which second interrupt.  I looked at the spec and it says don't
>>> use the state bits for HW interrupts, so isn't it simply not supported
>>> to set these bits at all and that's it?
>>
>> I managed to confuse myself reading the same bit. It says (GICv3 spec):
>>
>> "A hypervisor must only use the pending and active state for software
>> originated interrupts, which are typically associated with virtual
>> devices, or SGIs."
>>
>> That's the PENDING+ACTIVE state, and not the pending and active bits
>> like I read it initially.
>>
>> Now consider the following scenario:
>>
>> - We inject a virtual edge interrupt
>> - We mark the corresponding physical interrupt as active.
>> - Queue interrupt in an LR
>> - Resume vcpu
>>
>> Now, we inject another edge interrupt, the vcpu exits for whatever
>> reason, and the previously injected interrupt is still active.
>>
>> The normal vGIC flow would be to mark the interrupt as ACTIVE+PENDING in
>> the LR, and resume the vcpu. But the above states that this is invalid
>> for HW generated interrupts.
> 
> Right, ok, so we must resample the pending state even for an
> edge-triggered interrupt once it's EOIed, because we cannot put it in
> the LR despite it being pending on the physical distributor?
> 
> Incidentally, we do not need to set the EOI_INT bit, becuase when the
> guest EOIs the interrupt, it will also deactivate it on the physical
> distributor and the hardware will then take the pending physical
> interrupt, we will handle it in the host, etc. etc.
> 
> If we had a different *shared* device than the timer which is
> edge-triggered, don't we then also need to capture the physical
> distributor's pending state along with the state of the device unless we
> assume that upon restoring the state for the device count on the device
> to have another rising/falling edge to trigger the interrupt again? (I
> assume the line would always go high for a level-triggered interrupt in
> this case).

I'd definitely assume that restoring the state of the device would make
it generate an interrupt. This has to be a property of the device,
otherwise it is not really shareable between vcpus.

Time will tell - we still have to see one of these.

>>
>>>>
>>>> So the trick we're using is to only inject the active interrupt, and
>>>> prevent anything else from being injected until we can confirm that the
>>>> active state has been cleared at the physical level.
>>>>
>>>> Does it make any sense?
>>>>
>>> Sort of, but what I don't understand now is how the guest ever sees the
>>> interrupt then.  If we always inject the virtual interrupt by setting
>>> the active state on the physical distributor, and we can't inject this
>>> as active+pending, and the guest doesn't see the state in the LR, then
>>> how does this ever raise a virtual interrupt and how does the guest see
>>> an interrupt which is only PENDING so that it can ack it etc. etc.?
>>>
>>> Maybe I don't fully understand how the HW bit works after all...
>>
>> The way the spec is written is slightly misleading. But the gist of it
>> is that we still signal the guest using the PENDING bit in the LR, and
>> switch the LR as usual. it is just that we can't use the PENDING+ACTIVE
>> state (apparently, this can lead to a double deactivation).
>>
>> Not sure the above makes sense. Beer time, I suppose.
>>
> It does make sense, I just had to sleep on it and see the code as a
> whole instead of trying to understand it by just looking at this patch
> individually.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [PATCH 05/10] KVM: arm/arm64: vgic: Relax vgic_can_sample_irq for edge IRQs
  2015-07-03  9:50               ` Marc Zyngier
@ 2015-07-03  9:57                 ` Peter Maydell
  -1 siblings, 0 replies; 118+ messages in thread
From: Peter Maydell @ 2015-07-03  9:57 UTC (permalink / raw)
  To: linux-arm-kernel

On 3 July 2015 at 10:50, Marc Zyngier <marc.zyngier@arm.com> wrote:
> On 02/07/15 17:23, Christoffer Dall wrote:
>> If we had a different *shared* device than the timer which is
>> edge-triggered, don't we then also need to capture the physical
>> distributor's pending state along with the state of the device unless we
>> assume that upon restoring the state for the device count on the device
>> to have another rising/falling edge to trigger the interrupt again? (I
>> assume the line would always go high for a level-triggered interrupt in
>> this case).
>
> I'd definitely assume that restoring the state of the device would make
> it generate an interrupt. This has to be a property of the device,
> otherwise it is not really shareable between vcpus.

FWIW, QEMU's modelling approach to this is to say that devices
do *not* generate interrupts on restore. If the device had
previously generated an interrupt then this should be captured
by the state of the interrupt controller (or whatever else
it is connected to) and dealt with when the GIC state is restored.

If you say that restoring the device state is supposed to
generate an interrupt, you introduce an ordering requirement
that the state of the interrupt controller is restored
first and the device second (otherwise the incoming GIC
state will overwrite the interrupt that the device just
generated), which isn't ideal (especially since QEMU
makes no guarantees about restore order between devices).

thanks
-- PMM

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 05/10] KVM: arm/arm64: vgic: Relax vgic_can_sample_irq for edge IRQs
@ 2015-07-03  9:57                 ` Peter Maydell
  0 siblings, 0 replies; 118+ messages in thread
From: Peter Maydell @ 2015-07-03  9:57 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel@lists.infradead.org, Andre Przywara,
	kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu

On 3 July 2015 at 10:50, Marc Zyngier <marc.zyngier@arm.com> wrote:
> On 02/07/15 17:23, Christoffer Dall wrote:
>> If we had a different *shared* device than the timer which is
>> edge-triggered, don't we then also need to capture the physical
>> distributor's pending state along with the state of the device unless we
>> assume that upon restoring the state for the device count on the device
>> to have another rising/falling edge to trigger the interrupt again? (I
>> assume the line would always go high for a level-triggered interrupt in
>> this case).
>
> I'd definitely assume that restoring the state of the device would make
> it generate an interrupt. This has to be a property of the device,
> otherwise it is not really shareable between vcpus.

FWIW, QEMU's modelling approach to this is to say that devices
do *not* generate interrupts on restore. If the device had
previously generated an interrupt then this should be captured
by the state of the interrupt controller (or whatever else
it is connected to) and dealt with when the GIC state is restored.

If you say that restoring the device state is supposed to
generate an interrupt, you introduce an ordering requirement
that the state of the interrupt controller is restored
first and the device second (otherwise the incoming GIC
state will overwrite the interrupt that the device just
generated), which isn't ideal (especially since QEMU
makes no guarantees about restore order between devices).

thanks
-- PMM

^ permalink raw reply	[flat|nested] 118+ messages in thread

end of thread, other threads:[~2015-07-03  9:57 UTC | newest]

Thread overview: 118+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-08 17:03 [PATCH 00/10] arm/arm64: KVM: Active interrupt state switching for shared devices Marc Zyngier
2015-06-08 17:03 ` Marc Zyngier
2015-06-08 17:03 ` [PATCH 01/10] arm/arm64: KVM: Fix ordering of timer/GIC on guest entry Marc Zyngier
2015-06-08 17:03   ` Marc Zyngier
2015-06-09 11:29   ` Alex Bennée
2015-06-09 11:29     ` Alex Bennée
2015-06-30 20:19   ` Christoffer Dall
2015-06-30 20:19     ` Christoffer Dall
2015-06-08 17:03 ` [PATCH 02/10] arm/arm64: KVM: Move vgic handling to a non-preemptible section Marc Zyngier
2015-06-08 17:03   ` Marc Zyngier
2015-06-09 11:38   ` Alex Bennée
2015-06-09 11:38     ` Alex Bennée
2015-06-30 20:19   ` Christoffer Dall
2015-06-30 20:19     ` Christoffer Dall
2015-06-08 17:03 ` [PATCH 03/10] KVM: arm/arm64: vgic: Convert struct vgic_lr to use bitfields Marc Zyngier
2015-06-08 17:03   ` Marc Zyngier
2015-06-09 13:12   ` Alex Bennée
2015-06-09 13:12     ` Alex Bennée
2015-06-10 17:23   ` Andre Przywara
2015-06-10 17:23     ` Andre Przywara
2015-06-10 18:04     ` Marc Zyngier
2015-06-10 18:04       ` Marc Zyngier
2015-06-08 17:03 ` [PATCH 04/10] KVM: arm/arm64: vgic: Allow HW irq to be encoded in LR Marc Zyngier
2015-06-08 17:03   ` Marc Zyngier
2015-06-09 13:21   ` Alex Bennée
2015-06-09 13:21     ` Alex Bennée
2015-06-09 14:03     ` Marc Zyngier
2015-06-09 14:03       ` Marc Zyngier
2015-06-17 11:53   ` Eric Auger
2015-06-17 11:53     ` Eric Auger
2015-06-17 12:39     ` Marc Zyngier
2015-06-17 12:39       ` Marc Zyngier
2015-06-17 13:21     ` Peter Maydell
2015-06-17 13:21       ` Peter Maydell
2015-06-17 13:34       ` Marc Zyngier
2015-06-17 13:34         ` Marc Zyngier
2015-06-08 17:04 ` [PATCH 05/10] KVM: arm/arm64: vgic: Relax vgic_can_sample_irq for edge IRQs Marc Zyngier
2015-06-08 17:04   ` Marc Zyngier
2015-06-30 20:19   ` Christoffer Dall
2015-06-30 20:19     ` Christoffer Dall
2015-07-01  9:17     ` Marc Zyngier
2015-07-01  9:17       ` Marc Zyngier
2015-07-01 11:58       ` Christoffer Dall
2015-07-01 11:58         ` Christoffer Dall
2015-07-01 18:18         ` Marc Zyngier
2015-07-01 18:18           ` Marc Zyngier
2015-07-02 16:23           ` Christoffer Dall
2015-07-02 16:23             ` Christoffer Dall
2015-07-03  9:50             ` Marc Zyngier
2015-07-03  9:50               ` Marc Zyngier
2015-07-03  9:57               ` Peter Maydell
2015-07-03  9:57                 ` Peter Maydell
2015-06-08 17:04 ` [PATCH 06/10] KVM: arm/arm64: vgic: Allow dynamic mapping of physical/virtual interrupts Marc Zyngier
2015-06-08 17:04   ` Marc Zyngier
2015-06-11  8:43   ` Andre Przywara
2015-06-11  8:43     ` Andre Przywara
2015-06-11  8:56     ` Marc Zyngier
2015-06-11  8:56       ` Marc Zyngier
2015-06-15 15:44   ` Eric Auger
2015-06-15 15:44     ` Eric Auger
2015-06-16  8:28     ` Marc Zyngier
2015-06-16  8:28       ` Marc Zyngier
2015-06-16  9:10       ` Eric Auger
2015-06-16  9:10         ` Eric Auger
2015-06-30 20:19   ` Christoffer Dall
2015-06-30 20:19     ` Christoffer Dall
2015-07-01 10:20     ` Marc Zyngier
2015-07-01 10:20       ` Marc Zyngier
2015-07-01 11:45       ` Christoffer Dall
2015-07-01 11:45         ` Christoffer Dall
2015-06-08 17:04 ` [PATCH 07/10] KVM: arm/arm64: vgic: Allow HW interrupts to be queued to a guest Marc Zyngier
2015-06-08 17:04   ` Marc Zyngier
2015-06-11  8:44   ` Andre Przywara
2015-06-11  8:44     ` Andre Przywara
2015-06-11  9:15     ` Marc Zyngier
2015-06-11  9:15       ` Marc Zyngier
2015-06-11  9:44       ` Andre Przywara
2015-06-11  9:44         ` Andre Przywara
2015-06-11 10:02         ` Marc Zyngier
2015-06-11 10:02           ` Marc Zyngier
2015-06-15 16:11           ` Eric Auger
2015-06-15 16:11             ` Eric Auger
2015-06-17 11:51   ` Eric Auger
2015-06-17 11:51     ` Eric Auger
2015-06-17 12:23     ` Marc Zyngier
2015-06-17 12:23       ` Marc Zyngier
2015-06-08 17:04 ` [PATCH 08/10] KVM: arm/arm64: vgic: Add vgic_{get, set}_phys_irq_active Marc Zyngier
2015-06-08 17:04   ` [PATCH 08/10] KVM: arm/arm64: vgic: Add vgic_{get,set}_phys_irq_active Marc Zyngier
2015-06-17 15:11   ` [PATCH 08/10] KVM: arm/arm64: vgic: Add vgic_{get, set}_phys_irq_active Eric Auger
2015-06-17 15:11     ` [PATCH 08/10] KVM: arm/arm64: vgic: Add vgic_{get,set}_phys_irq_active Eric Auger
2015-06-08 17:04 ` [PATCH 09/10] KVM: arm/arm64: timer: Allow the timer to control the active state Marc Zyngier
2015-06-08 17:04   ` Marc Zyngier
2015-06-08 17:04 ` [PATCH 10/10] KVM: arm/arm64: vgic: Allow non-shared device HW interrupts Marc Zyngier
2015-06-08 17:04   ` Marc Zyngier
2015-06-17 15:11   ` Eric Auger
2015-06-17 15:11     ` Eric Auger
2015-06-17 15:37     ` Marc Zyngier
2015-06-17 15:37       ` Marc Zyngier
2015-06-17 15:50       ` Eric Auger
2015-06-17 15:50         ` Eric Auger
2015-06-18  8:37         ` Marc Zyngier
2015-06-18  8:37           ` Marc Zyngier
2015-06-18 17:51           ` Eric Auger
2015-06-18 17:51             ` Eric Auger
2015-06-30 20:19   ` Christoffer Dall
2015-06-30 20:19     ` Christoffer Dall
2015-07-01  8:26     ` Marc Zyngier
2015-07-01  8:26       ` Marc Zyngier
2015-07-01  8:57       ` Christoffer Dall
2015-07-01  8:57         ` Christoffer Dall
2015-06-10  8:33 ` [PATCH 00/10] arm/arm64: KVM: Active interrupt state switching for shared devices Eric Auger
2015-06-10  8:33   ` Eric Auger
2015-06-10  9:03   ` Marc Zyngier
2015-06-10  9:03     ` Marc Zyngier
2015-06-10 11:13     ` Eric Auger
2015-06-10 11:13       ` Eric Auger
2015-06-18  6:51 ` Eric Auger
2015-06-18  6:51   ` Eric Auger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.