From mboxrd@z Thu Jan  1 00:00:00 1970
From: "Wu, Feng" <feng.wu@intel.com>
Subject: Re: [v3 12/15] vmx: posted-interrupt handling when vCPU
 is blocked
Date: Fri, 10 Jul 2015 08:57:21 +0000
Message-ID: <E959C4978C3B6342920538CF579893F002602B1F@SHSMSX104.ccr.corp.intel.com>
References: <1435123109-10481-1-git-send-email-feng.wu@intel.com>
	<1435123109-10481-13-git-send-email-feng.wu@intel.com>
	<AADFC41AFE54684AB9EE6CBC0274A5D1262B8C4D@SHSMSX101.ccr.corp.intel.com>
	<559D37C1020000780008E373@mail.emea.novell.com>
	<559D20EE.90407@citrix.com>
	<AADFC41AFE54684AB9EE6CBC0274A5D1262BAD56@SHSMSX101.ccr.corp.intel.com>
	<559E3E0F020000780008E93D@mail.emea.novell.com>
	<E959C4978C3B6342920538CF579893F0026021F7@SHSMSX104.ccr.corp.intel.com>
	<559F831C020000780008F360@mail.emea.novell.com>
	<E959C4978C3B6342920538CF579893F002602547@SHSMSX104.ccr.corp.intel.com>
	<559FA33C020000780008F44D@mail.emea.novell.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
In-Reply-To: <559FA33C020000780008F44D@mail.emea.novell.com>
Content-Language: en-US
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Jan Beulich <JBeulich@suse.com>
Cc: "Tian, Kevin" <kevin.tian@intel.com>, "keir@xen.org" <keir@xen.org>, "george.dunlap@eu.citrix.com" <george.dunlap@eu.citrix.com>, Andrew Cooper <andrew.cooper3@citrix.com>, "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>, "Zhang,
	Yang Z" <yang.z.zhang@intel.com>, "Wu, Feng" <feng.wu@intel.com>
List-Id: xen-devel@lists.xenproject.org


> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Friday, July 10, 2015 4:50 PM
> To: Wu, Feng
> Cc: Andrew Cooper; george.dunlap@eu.citrix.com; Tian, Kevin; Zhang, Yang Z;
> xen-devel@lists.xen.org; keir@xen.org
> Subject: Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU
> is blocked
> 
> >>> On 10.07.15 at 09:29, <feng.wu@intel.com> wrote:
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> Sent: Friday, July 10, 2015 2:32 PM
> >> >>> On 10.07.15 at 08:21, <feng.wu@intel.com> wrote:
> >> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> >> Sent: Thursday, July 09, 2015 3:26 PM
> >> >> >>> On 09.07.15 at 00:49, <kevin.tian@intel.com> wrote:
> >> >> >>  From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> >> >> >> Sent: Wednesday, July 08, 2015 9:09 PM
> >> >> >> On 08/07/2015 13:46, Jan Beulich wrote:
> >> >> >> >>>> On 08.07.15 at 13:00, <kevin.tian@intel.com> wrote:
> >> >> >> >>> @@ -1848,6 +1869,33 @@ static struct hvm_function_table
> >> __initdata
> >> >> >> >>> vmx_function_table = {
> >> >> >> >>>      .enable_msr_exit_interception =
> >> >> vmx_enable_msr_exit_interception,
> >> >> >> >>>  };
> >> >> >> >>>
> >> >> >> >>> +/*
> >> >> >> >>> + * Handle VT-d posted-interrupt when VCPU is blocked.
> >> >> >> >>> + */
> >> >> >> >>> +static void pi_wakeup_interrupt(struct cpu_user_regs *regs)
> >> >> >> >>> +{
> >> >> >> >>> +    struct arch_vmx_struct *vmx;
> >> >> >> >>> +    unsigned int cpu = smp_processor_id();
> >> >> >> >>> +
> >> >> >> >>> +    spin_lock(&per_cpu(pi_blocked_vcpu_lock, cpu));
> >> >> >> >>> +
> >> >> >> >>> +    /*
> >> >> >> >>> +     * FIXME: The length of the list depends on how many
> >> >> >> >>> +     * vCPU is current blocked on this specific pCPU.
> >> >> >> >>> +     * This may hurt the interrupt latency if the list
> >> >> >> >>> +     * grows to too many entries.
> >> >> >> >>> +     */
> >> >> >> >> let's go with this linked list first until a real issue is identified.
> >> >> >> > This is exactly the way of thinking I dislike when it comes to code
> >> >> >> > that isn't intended to be experimental only: We shouldn't wait
> >> >> >> > for problems to surface when we already can see them. I.e. if
> >> >> >> > there are no plans to deal with this, I'd ask for the feature to be
> >> >> >> > off by default and be properly marked experimental in the
> >> >> >> > command line option documentation (so people know to stay
> >> >> >> > away from it).
> >> >> >>
> >> >> >> And in this specific case, there is no balancing of vcpus across the
> >> >> >> pcpus lists.
> >> >> >>
> >> >> >> One can construct a pathological case using pinning and pausing to
> get
> >> >> >> almost every vcpu on a single pcpu list, and vcpus recieving fewer
> >> >> >> interrupts will exasperate the problem by staying on the list for longer
> >> >> >> periods of time.
> >> >> >
> >> >> > In that extreme case I believe many contentions in other code paths
> will
> >> >> > be much larger than overhead caused by this structure limitation.
> >> >>
> >> >> Examples?
> >> >>
> >> >> >> IMO, the PI feature cannot be declared as done/supported with this
> bug
> >> >> >> remaining.  OTOH, it is fine to be experimental, and disabled by
> default
> >> >> >> for people who wish to experiment.
> >> >> >>
> >> >> >
> >> >> > Again, I don't expect to see it disabled as experimental. For good
> >> >> > production
> >> >> > environment where vcpus are well balanced and interrupt latency is
> >> >> > sensitive,
> >> >> > linked list should be efficient here. For bad environment like extreme
> > case
> >> >> > you raised, I don't know whether it really matters to just tune interrupt
> >> >> > path.
> >> >>
> >> >> Can you _guarantee_ that everything potentially leading to such a
> >> >> pathological situation is covered by XSA-77? And even if it is now,
> >> >> removing elements from the waiver list would become significantly
> >> >> more difficult if disconnected behavior like this one would need to
> >> >> be taken into account.
> >> >>
> >> >> Please understand that history has told us to be rather more careful
> >> >> than might seem necessary with this: ATS originally having been
> >> >> enabled by default is one bold example, and the recent flood of MSI
> >> >> related XSAs is another; I suppose I could find more. All affecting
> >> >> code originating from Intel, apparently written with only functionality
> >> >> in mind, while having left out (other than basic) security considerations.
> >> >>
> >> >> IOW, with my committer role hat on, the feature is going to be
> >> >> experimental (and hence default off) unless the issue here gets
> >> >> addressed. And no, I cannot immediately suggest a good approach,
> >> >> and with all of the rush before the feature freeze I also can't justify
> >> >> taking a lot of time to think of options.
> >> >
> >> > Is it acceptable to you if I only add the blocked vcpus that has
> >> > assigned devices to the list? I think that should shorten the
> >> > length of the list.
> >>
> >> I actually implied this to be the case already, i.e.
> >> - if it's not, this needs to be fixed anyway,
> >> - it's not going to eliminate the concern (just think of a couple of
> >>   many-vCPU guests all having devices assigned).
> >
> > So how about allocating multiple wakeup vectors (says, 16, maybe
> > we can make this configurable) and multiplex them amongst all the
> > blocked vCPUs?
> 
> For such an approach to be effective, you'd need to know up front
> how many vCPU-s you may need to deal with, or allocate vectors
> on demand. Plus you'd need to convince us that spending additional
> vectors (which we're already short of on certain big systems) is the
> only viable solution to the issue (which I don't think it is).

Well, in that case, maybe you need to provide a better solution!

Thanks,
Feng

> 
> Jan