From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ian Campbell Subject: Re: Dom0 crash with apache bench (ab) Date: Mon, 14 Sep 2015 16:20:24 +0100 Message-ID: <1442244024.3549.300.camel@citrix.com> References: <20150728145022.GE26623@x230.dumpdata.com> <1438095319.11600.165.camel@citrix.com> <55BB4DBA.2040909@citrix.com> <20150914124008.GA17195@cbox> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20150914124008.GA17195@cbox> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Christoffer Dall , Christoffer Dall Cc: xen-devel@lists.xen.org, Wei Liu , David Vrabel , Stefano Stabellini List-Id: xen-devel@lists.xenproject.org On Mon, 2015-09-14 at 14:40 +0200, Christoffer Dall wrote: > On Fri, Jul 31, 2015 at 03:17:56PM +0200, Christoffer Dall wrote: > > On Fri, Jul 31, 2015 at 12:28 PM, David Vrabel > > > > wrote: > > > > > On 31/07/15 11:24, Stefano Stabellini wrote: > > > > This is a Linux Dom0 crash on x86 (Dell PowerEdge R320, Xeon E5 > > > > -2450), > > > > CC'ing relevant people. As you can see from the links below the > > > > crash > > > > is: > > > > > > > > [ 253.619326] Call Trace: > > > > [ 253.619330] > > > > [ 253.619332] [] ? skb_copy_ubufs+0xa5/0x230 > > > > [ 253.619347] [] > > > > __netif_receive_skb_core+0x6f5/0x940 > > > > [ 253.619353] [] __netif_receive_skb+0x18/0x60 > > > > [ 253.619360] [] > > > > netif_receive_skb_internal+0x28/0x90 > > > > [ 253.619366] [] napi_gro_frags+0x125/0x1a0 > > > > [ 253.619378] [] > > > > mlx4_en_process_rx_cq+0x753/0xb50 > > > [mlx4_en] > > > > [ 253.619387] [] mlx4_en_poll_rx_cq+0x97/0x160 > > > [mlx4_en] > > > > > > What makes you think this is Xen specific? I suggest raising this > > > the > > > the mlx4 maintainers. > > > > > > > > Linux native and KVM guests (same hw, same kernel version+config) run > > just > > fine under the same workload. > > > Ping? > > From the fact that bare-metal and KVM works fine with this hardware I > still think it's reasonable to assume that it's a Xen issue and not a > mlx4 issue. > > Is this completely flawed? My (somewhat educated) guess is that this is to do with the difference between (pseudo-)physical addresses and machine (AKA real-physical) addresses when running under Xen. The way this often shows up is in drivers which do not make correct use of the kernels DMA APIs but which happen to work on native x86 because physical==bus address on x86. Sometimes booting natively with 'iommu=soft swiotlb=force' can expose these sorts of issues. You are running 64-bit so I don't think the recent "config: Enable NEED_DMA_MAP_STATE by default when SWIOTLB is selected" is likely to be relevant (it's already unconditionally on for 64-bit). The trace appears to be on rx from a physical nic, there shouldn't be any magic Xen stuff (granted pages etc) getting themselves into that path at all. If it were tx then maybe it might be an issue with foreign pages. In any case I think you are able to repro with just dom0, i.e. never having started a domU, is that right? Ian.