From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Re: [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked Date: Mon, 29 Jun 2015 18:07:01 +0100 Message-ID: <55917B35.2070706@citrix.com> References: <1435123109-10481-1-git-send-email-feng.wu@intel.com> <1435123109-10481-13-git-send-email-feng.wu@intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1435123109-10481-13-git-send-email-feng.wu@intel.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Feng Wu , xen-devel@lists.xen.org Cc: yang.z.zhang@intel.com, george.dunlap@eu.citrix.com, kevin.tian@intel.com, keir@xen.org, jbeulich@suse.com List-Id: xen-devel@lists.xenproject.org On 24/06/15 06:18, Feng Wu wrote: > This patch includes the following aspects: > - Add a global vector to wake up the blocked vCPU > when an interrupt is being posted to it (This > part was sugguested by Yang Zhang ). > - Adds a new per-vCPU tasklet to wakeup the blocked > vCPU. It can be used in the case vcpu_unblock > cannot be called directly. > - Define two per-cpu variables: > * pi_blocked_vcpu: > A list storing the vCPUs which were blocked on this pCPU. > > * pi_blocked_vcpu_lock: > The spinlock to protect pi_blocked_vcpu. > > Signed-off-by: Feng Wu > --- > v3: > - This patch is generated by merging the following three patches in v2: > [RFC v2 09/15] Add a new per-vCPU tasklet to wakeup the blocked vCPU > [RFC v2 10/15] vmx: Define two per-cpu variables > [RFC v2 11/15] vmx: Add a global wake-up vector for VT-d Posted-Interrupts > - rename 'vcpu_wakeup_tasklet' to 'pi_vcpu_wakeup_tasklet' > - Move the definition of 'pi_vcpu_wakeup_tasklet' to 'struct arch_vmx_struct' > - rename 'vcpu_wakeup_tasklet_handler' to 'pi_vcpu_wakeup_tasklet_handler' > - Make pi_wakeup_interrupt() static > - Rename 'blocked_vcpu_list' to 'pi_blocked_vcpu_list' > - move 'pi_blocked_vcpu_list' to 'struct arch_vmx_struct' > - Rename 'blocked_vcpu' to 'pi_blocked_vcpu' > - Rename 'blocked_vcpu_lock' to 'pi_blocked_vcpu_lock' > > xen/arch/x86/hvm/vmx/vmcs.c | 3 +++ > xen/arch/x86/hvm/vmx/vmx.c | 54 ++++++++++++++++++++++++++++++++++++++ > xen/include/asm-x86/hvm/hvm.h | 1 + > xen/include/asm-x86/hvm/vmx/vmcs.h | 5 ++++ > xen/include/asm-x86/hvm/vmx/vmx.h | 5 ++++ > 5 files changed, 68 insertions(+) > > diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c > index 11dc1b5..0c5ce3f 100644 > --- a/xen/arch/x86/hvm/vmx/vmcs.c > +++ b/xen/arch/x86/hvm/vmx/vmcs.c > @@ -631,6 +631,9 @@ int vmx_cpu_up(void) > if ( cpu_has_vmx_vpid ) > vpid_sync_all(); > > + INIT_LIST_HEAD(&per_cpu(pi_blocked_vcpu, cpu)); > + spin_lock_init(&per_cpu(pi_blocked_vcpu_lock, cpu)); > + > return 0; > } > > diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c > index b94ef6a..7db6009 100644 > --- a/xen/arch/x86/hvm/vmx/vmx.c > +++ b/xen/arch/x86/hvm/vmx/vmx.c > @@ -82,7 +82,20 @@ static int vmx_msr_read_intercept(unsigned int msr, uint64_t *msr_content); > static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content); > static void vmx_invlpg_intercept(unsigned long vaddr); > > +/* > + * We maintian a per-CPU linked-list of vCPU, so in PI wakeup handler we > + * can find which vCPU should be waken up. > + */ > +DEFINE_PER_CPU(struct list_head, pi_blocked_vcpu); > +DEFINE_PER_CPU(spinlock_t, pi_blocked_vcpu_lock); > + > uint8_t __read_mostly posted_intr_vector; > +uint8_t __read_mostly pi_wakeup_vector; > + > +static void pi_vcpu_wakeup_tasklet_handler(unsigned long arg) > +{ > + vcpu_unblock((struct vcpu *)arg); > +} > > static int vmx_domain_initialise(struct domain *d) > { > @@ -148,11 +161,19 @@ static int vmx_vcpu_initialise(struct vcpu *v) > if ( v->vcpu_id == 0 ) > v->arch.user_regs.eax = 1; > > + tasklet_init( > + &v->arch.hvm_vmx.pi_vcpu_wakeup_tasklet, > + pi_vcpu_wakeup_tasklet_handler, > + (unsigned long)v); c/s f6dd295 indicates that the global tasklet lock causes a bottleneck when injecting interrupts, and replaced a tasklet with a softirq to fix the scalability issue. I would expect exactly the bottleneck to exist here. > + > + INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_blocked_vcpu_list); > + > return 0; > } > > static void vmx_vcpu_destroy(struct vcpu *v) > { > + tasklet_kill(&v->arch.hvm_vmx.pi_vcpu_wakeup_tasklet); > /* > * There are cases that domain still remains in log-dirty mode when it is > * about to be destroyed (ex, user types 'xl destroy '), in which case > @@ -1848,6 +1869,33 @@ static struct hvm_function_table __initdata vmx_function_table = { > .enable_msr_exit_interception = vmx_enable_msr_exit_interception, > }; > > +/* > + * Handle VT-d posted-interrupt when VCPU is blocked. > + */ > +static void pi_wakeup_interrupt(struct cpu_user_regs *regs) > +{ > + struct arch_vmx_struct *vmx; > + unsigned int cpu = smp_processor_id(); > + > + spin_lock(&per_cpu(pi_blocked_vcpu_lock, cpu)); this_cpu($foo) should be used in preference to per_cpu($foo, $myself). However, always hoist repeated uses of this/per_cpu into local variables, as the compiler is unable to elide repeated accesses (because of a deliberate anti-optimisation behind the scenes). spinlock_t *lock = &this_cpu(pi_blocked_vcpu_lock); list_head *blocked_vcpus = &this_cpu(ps_blocked_vcpu); ~Andrew