From mboxrd@z Thu Jan  1 00:00:00 1970
From: "Jan Beulich" <JBeulich@suse.com>
Subject: Re: [v3 12/15] vmx: posted-interrupt handling when vCPU
 is blocked
Date: Fri, 10 Jul 2015 07:32:28 +0100
Message-ID: <559F831C020000780008F360@mail.emea.novell.com>
References: <1435123109-10481-1-git-send-email-feng.wu@intel.com>
	<1435123109-10481-13-git-send-email-feng.wu@intel.com>
	<AADFC41AFE54684AB9EE6CBC0274A5D1262B8C4D@SHSMSX101.ccr.corp.intel.com>
	<559D37C1020000780008E373@mail.emea.novell.com>
	<559D20EE.90407@citrix.com>
	<AADFC41AFE54684AB9EE6CBC0274A5D1262BAD56@SHSMSX101.ccr.corp.intel.com>
	<559E3E0F020000780008E93D@mail.emea.novell.com>
	<E959C4978C3B6342920538CF579893F0026021F7@SHSMSX104.ccr.corp.intel.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
In-Reply-To: <E959C4978C3B6342920538CF579893F0026021F7@SHSMSX104.ccr.corp.intel.com>
Content-Disposition: inline
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Feng Wu <feng.wu@intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>, "keir@xen.org" <keir@xen.org>, "george.dunlap@eu.citrix.com" <george.dunlap@eu.citrix.com>, Andrew Cooper <andrew.cooper3@citrix.com>, "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>, Yang Z Zhang <yang.z.zhang@intel.com>
List-Id: xen-devel@lists.xenproject.org

>>> On 10.07.15 at 08:21, <feng.wu@intel.com> wrote:

> 
>> -----Original Message-----
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: Thursday, July 09, 2015 3:26 PM
>> To: Wu, Feng; Tian, Kevin
>> Cc: Andrew Cooper; george.dunlap@eu.citrix.com; Zhang, Yang Z;
>> xen-devel@lists.xen.org; keir@xen.org 
>> Subject: RE: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU
>> is blocked
>> 
>> >>> On 09.07.15 at 00:49, <kevin.tian@intel.com> wrote:
>> >>  From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
>> >> Sent: Wednesday, July 08, 2015 9:09 PM
>> >> On 08/07/2015 13:46, Jan Beulich wrote:
>> >> >>>> On 08.07.15 at 13:00, <kevin.tian@intel.com> wrote:
>> >> >>> @@ -1848,6 +1869,33 @@ static struct hvm_function_table __initdata
>> >> >>> vmx_function_table = {
>> >> >>>      .enable_msr_exit_interception =
>> vmx_enable_msr_exit_interception,
>> >> >>>  };
>> >> >>>
>> >> >>> +/*
>> >> >>> + * Handle VT-d posted-interrupt when VCPU is blocked.
>> >> >>> + */
>> >> >>> +static void pi_wakeup_interrupt(struct cpu_user_regs *regs)
>> >> >>> +{
>> >> >>> +    struct arch_vmx_struct *vmx;
>> >> >>> +    unsigned int cpu = smp_processor_id();
>> >> >>> +
>> >> >>> +    spin_lock(&per_cpu(pi_blocked_vcpu_lock, cpu));
>> >> >>> +
>> >> >>> +    /*
>> >> >>> +     * FIXME: The length of the list depends on how many
>> >> >>> +     * vCPU is current blocked on this specific pCPU.
>> >> >>> +     * This may hurt the interrupt latency if the list
>> >> >>> +     * grows to too many entries.
>> >> >>> +     */
>> >> >> let's go with this linked list first until a real issue is identified.
>> >> > This is exactly the way of thinking I dislike when it comes to code
>> >> > that isn't intended to be experimental only: We shouldn't wait
>> >> > for problems to surface when we already can see them. I.e. if
>> >> > there are no plans to deal with this, I'd ask for the feature to be
>> >> > off by default and be properly marked experimental in the
>> >> > command line option documentation (so people know to stay
>> >> > away from it).
>> >>
>> >> And in this specific case, there is no balancing of vcpus across the
>> >> pcpus lists.
>> >>
>> >> One can construct a pathological case using pinning and pausing to get
>> >> almost every vcpu on a single pcpu list, and vcpus recieving fewer
>> >> interrupts will exasperate the problem by staying on the list for longer
>> >> periods of time.
>> >
>> > In that extreme case I believe many contentions in other code paths will
>> > be much larger than overhead caused by this structure limitation.
>> 
>> Examples?
>> 
>> >> IMO, the PI feature cannot be declared as done/supported with this bug
>> >> remaining.  OTOH, it is fine to be experimental, and disabled by default
>> >> for people who wish to experiment.
>> >>
>> >
>> > Again, I don't expect to see it disabled as experimental. For good
>> > production
>> > environment where vcpus are well balanced and interrupt latency is
>> > sensitive,
>> > linked list should be efficient here. For bad environment like extreme case
>> > you raised, I don't know whether it really matters to just tune interrupt
>> > path.
>> 
>> Can you _guarantee_ that everything potentially leading to such a
>> pathological situation is covered by XSA-77? And even if it is now,
>> removing elements from the waiver list would become significantly
>> more difficult if disconnected behavior like this one would need to
>> be taken into account.
>> 
>> Please understand that history has told us to be rather more careful
>> than might seem necessary with this: ATS originally having been
>> enabled by default is one bold example, and the recent flood of MSI
>> related XSAs is another; I suppose I could find more. All affecting
>> code originating from Intel, apparently written with only functionality
>> in mind, while having left out (other than basic) security considerations.
>> 
>> IOW, with my committer role hat on, the feature is going to be
>> experimental (and hence default off) unless the issue here gets
>> addressed. And no, I cannot immediately suggest a good approach,
>> and with all of the rush before the feature freeze I also can't justify
>> taking a lot of time to think of options.
> 
> Is it acceptable to you if I only add the blocked vcpus that has
> assigned devices to the list? I think that should shorten the
> length of the list.

I actually implied this to be the case already, i.e.
- if it's not, this needs to be fixed anyway,
- it's not going to eliminate the concern (just think of a couple of
  many-vCPU guests all having devices assigned).

Jan