From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Wu, Feng" Subject: Re: [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked Date: Fri, 10 Jul 2015 08:57:21 +0000 Message-ID: References: <1435123109-10481-1-git-send-email-feng.wu@intel.com> <1435123109-10481-13-git-send-email-feng.wu@intel.com> <559D37C1020000780008E373@mail.emea.novell.com> <559D20EE.90407@citrix.com> <559E3E0F020000780008E93D@mail.emea.novell.com> <559F831C020000780008F360@mail.emea.novell.com> <559FA33C020000780008F44D@mail.emea.novell.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <559FA33C020000780008F44D@mail.emea.novell.com> Content-Language: en-US List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Jan Beulich Cc: "Tian, Kevin" , "keir@xen.org" , "george.dunlap@eu.citrix.com" , Andrew Cooper , "xen-devel@lists.xen.org" , "Zhang, Yang Z" , "Wu, Feng" List-Id: xen-devel@lists.xenproject.org > -----Original Message----- > From: Jan Beulich [mailto:JBeulich@suse.com] > Sent: Friday, July 10, 2015 4:50 PM > To: Wu, Feng > Cc: Andrew Cooper; george.dunlap@eu.citrix.com; Tian, Kevin; Zhang, Yang Z; > xen-devel@lists.xen.org; keir@xen.org > Subject: Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU > is blocked > > >>> On 10.07.15 at 09:29, wrote: > >> From: Jan Beulich [mailto:JBeulich@suse.com] > >> Sent: Friday, July 10, 2015 2:32 PM > >> >>> On 10.07.15 at 08:21, wrote: > >> >> From: Jan Beulich [mailto:JBeulich@suse.com] > >> >> Sent: Thursday, July 09, 2015 3:26 PM > >> >> >>> On 09.07.15 at 00:49, wrote: > >> >> >> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com] > >> >> >> Sent: Wednesday, July 08, 2015 9:09 PM > >> >> >> On 08/07/2015 13:46, Jan Beulich wrote: > >> >> >> >>>> On 08.07.15 at 13:00, wrote: > >> >> >> >>> @@ -1848,6 +1869,33 @@ static struct hvm_function_table > >> __initdata > >> >> >> >>> vmx_function_table = { > >> >> >> >>> .enable_msr_exit_interception = > >> >> vmx_enable_msr_exit_interception, > >> >> >> >>> }; > >> >> >> >>> > >> >> >> >>> +/* > >> >> >> >>> + * Handle VT-d posted-interrupt when VCPU is blocked. > >> >> >> >>> + */ > >> >> >> >>> +static void pi_wakeup_interrupt(struct cpu_user_regs *regs) > >> >> >> >>> +{ > >> >> >> >>> + struct arch_vmx_struct *vmx; > >> >> >> >>> + unsigned int cpu = smp_processor_id(); > >> >> >> >>> + > >> >> >> >>> + spin_lock(&per_cpu(pi_blocked_vcpu_lock, cpu)); > >> >> >> >>> + > >> >> >> >>> + /* > >> >> >> >>> + * FIXME: The length of the list depends on how many > >> >> >> >>> + * vCPU is current blocked on this specific pCPU. > >> >> >> >>> + * This may hurt the interrupt latency if the list > >> >> >> >>> + * grows to too many entries. > >> >> >> >>> + */ > >> >> >> >> let's go with this linked list first until a real issue is identified. > >> >> >> > This is exactly the way of thinking I dislike when it comes to code > >> >> >> > that isn't intended to be experimental only: We shouldn't wait > >> >> >> > for problems to surface when we already can see them. I.e. if > >> >> >> > there are no plans to deal with this, I'd ask for the feature to be > >> >> >> > off by default and be properly marked experimental in the > >> >> >> > command line option documentation (so people know to stay > >> >> >> > away from it). > >> >> >> > >> >> >> And in this specific case, there is no balancing of vcpus across the > >> >> >> pcpus lists. > >> >> >> > >> >> >> One can construct a pathological case using pinning and pausing to > get > >> >> >> almost every vcpu on a single pcpu list, and vcpus recieving fewer > >> >> >> interrupts will exasperate the problem by staying on the list for longer > >> >> >> periods of time. > >> >> > > >> >> > In that extreme case I believe many contentions in other code paths > will > >> >> > be much larger than overhead caused by this structure limitation. > >> >> > >> >> Examples? > >> >> > >> >> >> IMO, the PI feature cannot be declared as done/supported with this > bug > >> >> >> remaining. OTOH, it is fine to be experimental, and disabled by > default > >> >> >> for people who wish to experiment. > >> >> >> > >> >> > > >> >> > Again, I don't expect to see it disabled as experimental. For good > >> >> > production > >> >> > environment where vcpus are well balanced and interrupt latency is > >> >> > sensitive, > >> >> > linked list should be efficient here. For bad environment like extreme > > case > >> >> > you raised, I don't know whether it really matters to just tune interrupt > >> >> > path. > >> >> > >> >> Can you _guarantee_ that everything potentially leading to such a > >> >> pathological situation is covered by XSA-77? And even if it is now, > >> >> removing elements from the waiver list would become significantly > >> >> more difficult if disconnected behavior like this one would need to > >> >> be taken into account. > >> >> > >> >> Please understand that history has told us to be rather more careful > >> >> than might seem necessary with this: ATS originally having been > >> >> enabled by default is one bold example, and the recent flood of MSI > >> >> related XSAs is another; I suppose I could find more. All affecting > >> >> code originating from Intel, apparently written with only functionality > >> >> in mind, while having left out (other than basic) security considerations. > >> >> > >> >> IOW, with my committer role hat on, the feature is going to be > >> >> experimental (and hence default off) unless the issue here gets > >> >> addressed. And no, I cannot immediately suggest a good approach, > >> >> and with all of the rush before the feature freeze I also can't justify > >> >> taking a lot of time to think of options. > >> > > >> > Is it acceptable to you if I only add the blocked vcpus that has > >> > assigned devices to the list? I think that should shorten the > >> > length of the list. > >> > >> I actually implied this to be the case already, i.e. > >> - if it's not, this needs to be fixed anyway, > >> - it's not going to eliminate the concern (just think of a couple of > >> many-vCPU guests all having devices assigned). > > > > So how about allocating multiple wakeup vectors (says, 16, maybe > > we can make this configurable) and multiplex them amongst all the > > blocked vCPUs? > > For such an approach to be effective, you'd need to know up front > how many vCPU-s you may need to deal with, or allocate vectors > on demand. Plus you'd need to convince us that spending additional > vectors (which we're already short of on certain big systems) is the > only viable solution to the issue (which I don't think it is). Well, in that case, maybe you need to provide a better solution! Thanks, Feng > > Jan