From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Jan Beulich" Subject: Re: [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked Date: Fri, 10 Jul 2015 07:32:28 +0100 Message-ID: <559F831C020000780008F360@mail.emea.novell.com> References: <1435123109-10481-1-git-send-email-feng.wu@intel.com> <1435123109-10481-13-git-send-email-feng.wu@intel.com> <559D37C1020000780008E373@mail.emea.novell.com> <559D20EE.90407@citrix.com> <559E3E0F020000780008E93D@mail.emea.novell.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: Content-Disposition: inline List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Feng Wu Cc: Kevin Tian , "keir@xen.org" , "george.dunlap@eu.citrix.com" , Andrew Cooper , "xen-devel@lists.xen.org" , Yang Z Zhang List-Id: xen-devel@lists.xenproject.org >>> On 10.07.15 at 08:21, wrote: > >> -----Original Message----- >> From: Jan Beulich [mailto:JBeulich@suse.com] >> Sent: Thursday, July 09, 2015 3:26 PM >> To: Wu, Feng; Tian, Kevin >> Cc: Andrew Cooper; george.dunlap@eu.citrix.com; Zhang, Yang Z; >> xen-devel@lists.xen.org; keir@xen.org >> Subject: RE: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU >> is blocked >> >> >>> On 09.07.15 at 00:49, wrote: >> >> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com] >> >> Sent: Wednesday, July 08, 2015 9:09 PM >> >> On 08/07/2015 13:46, Jan Beulich wrote: >> >> >>>> On 08.07.15 at 13:00, wrote: >> >> >>> @@ -1848,6 +1869,33 @@ static struct hvm_function_table __initdata >> >> >>> vmx_function_table = { >> >> >>> .enable_msr_exit_interception = >> vmx_enable_msr_exit_interception, >> >> >>> }; >> >> >>> >> >> >>> +/* >> >> >>> + * Handle VT-d posted-interrupt when VCPU is blocked. >> >> >>> + */ >> >> >>> +static void pi_wakeup_interrupt(struct cpu_user_regs *regs) >> >> >>> +{ >> >> >>> + struct arch_vmx_struct *vmx; >> >> >>> + unsigned int cpu = smp_processor_id(); >> >> >>> + >> >> >>> + spin_lock(&per_cpu(pi_blocked_vcpu_lock, cpu)); >> >> >>> + >> >> >>> + /* >> >> >>> + * FIXME: The length of the list depends on how many >> >> >>> + * vCPU is current blocked on this specific pCPU. >> >> >>> + * This may hurt the interrupt latency if the list >> >> >>> + * grows to too many entries. >> >> >>> + */ >> >> >> let's go with this linked list first until a real issue is identified. >> >> > This is exactly the way of thinking I dislike when it comes to code >> >> > that isn't intended to be experimental only: We shouldn't wait >> >> > for problems to surface when we already can see them. I.e. if >> >> > there are no plans to deal with this, I'd ask for the feature to be >> >> > off by default and be properly marked experimental in the >> >> > command line option documentation (so people know to stay >> >> > away from it). >> >> >> >> And in this specific case, there is no balancing of vcpus across the >> >> pcpus lists. >> >> >> >> One can construct a pathological case using pinning and pausing to get >> >> almost every vcpu on a single pcpu list, and vcpus recieving fewer >> >> interrupts will exasperate the problem by staying on the list for longer >> >> periods of time. >> > >> > In that extreme case I believe many contentions in other code paths will >> > be much larger than overhead caused by this structure limitation. >> >> Examples? >> >> >> IMO, the PI feature cannot be declared as done/supported with this bug >> >> remaining. OTOH, it is fine to be experimental, and disabled by default >> >> for people who wish to experiment. >> >> >> > >> > Again, I don't expect to see it disabled as experimental. For good >> > production >> > environment where vcpus are well balanced and interrupt latency is >> > sensitive, >> > linked list should be efficient here. For bad environment like extreme case >> > you raised, I don't know whether it really matters to just tune interrupt >> > path. >> >> Can you _guarantee_ that everything potentially leading to such a >> pathological situation is covered by XSA-77? And even if it is now, >> removing elements from the waiver list would become significantly >> more difficult if disconnected behavior like this one would need to >> be taken into account. >> >> Please understand that history has told us to be rather more careful >> than might seem necessary with this: ATS originally having been >> enabled by default is one bold example, and the recent flood of MSI >> related XSAs is another; I suppose I could find more. All affecting >> code originating from Intel, apparently written with only functionality >> in mind, while having left out (other than basic) security considerations. >> >> IOW, with my committer role hat on, the feature is going to be >> experimental (and hence default off) unless the issue here gets >> addressed. And no, I cannot immediately suggest a good approach, >> and with all of the rush before the feature freeze I also can't justify >> taking a lot of time to think of options. > > Is it acceptable to you if I only add the blocked vcpus that has > assigned devices to the list? I think that should shorten the > length of the list. I actually implied this to be the case already, i.e. - if it's not, this needs to be fixed anyway, - it's not going to eliminate the concern (just think of a couple of many-vCPU guests all having devices assigned). Jan