From mboxrd@z Thu Jan 1 00:00:00 1970 From: Christoffer Dall Subject: Re: Dom0 crash with apache bench (ab) Date: Mon, 14 Sep 2015 18:16:18 +0200 Message-ID: References: <20150728145022.GE26623@x230.dumpdata.com> <1438095319.11600.165.camel@citrix.com> <55BB4DBA.2040909@citrix.com> <20150914124008.GA17195@cbox> <1442244024.3549.300.camel@citrix.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============2073014019314485857==" Return-path: In-Reply-To: <1442244024.3549.300.camel@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Ian Campbell Cc: xen-devel@lists.xen.org, Wei Liu , David Vrabel , Stefano Stabellini List-Id: xen-devel@lists.xenproject.org --===============2073014019314485857== Content-Type: multipart/alternative; boundary=001a1135b85eb4d309051fb762f6 --001a1135b85eb4d309051fb762f6 Content-Type: text/plain; charset=UTF-8 On Mon, Sep 14, 2015 at 5:20 PM, Ian Campbell wrote: > On Mon, 2015-09-14 at 14:40 +0200, Christoffer Dall wrote: > > On Fri, Jul 31, 2015 at 03:17:56PM +0200, Christoffer Dall wrote: > > > On Fri, Jul 31, 2015 at 12:28 PM, David Vrabel < > david.vrabel@citrix.com > > > > > > > wrote: > > > > > > > On 31/07/15 11:24, Stefano Stabellini wrote: > > > > > This is a Linux Dom0 crash on x86 (Dell PowerEdge R320, Xeon E5 > > > > > -2450), > > > > > CC'ing relevant people. As you can see from the links below the > > > > > crash > > > > > is: > > > > > > > > > > [ 253.619326] Call Trace: > > > > > [ 253.619330] > > > > > [ 253.619332] [] ? skb_copy_ubufs+0xa5/0x230 > > > > > [ 253.619347] [] > > > > > __netif_receive_skb_core+0x6f5/0x940 > > > > > [ 253.619353] [] __netif_receive_skb+0x18/0x60 > > > > > [ 253.619360] [] > > > > > netif_receive_skb_internal+0x28/0x90 > > > > > [ 253.619366] [] napi_gro_frags+0x125/0x1a0 > > > > > [ 253.619378] [] > > > > > mlx4_en_process_rx_cq+0x753/0xb50 > > > > [mlx4_en] > > > > > [ 253.619387] [] mlx4_en_poll_rx_cq+0x97/0x160 > > > > [mlx4_en] > > > > > > > > What makes you think this is Xen specific? I suggest raising this > > > > the > > > > the mlx4 maintainers. > > > > > > > > > > > Linux native and KVM guests (same hw, same kernel version+config) run > > > just > > > fine under the same workload. > > > > > Ping? > > > > From the fact that bare-metal and KVM works fine with this hardware I > > still think it's reasonable to assume that it's a Xen issue and not a > > mlx4 issue. > > > > Is this completely flawed? > > My (somewhat educated) guess is that this is to do with the difference > between (pseudo-)physical addresses and machine (AKA real-physical) > addresses when running under Xen. > > The way this often shows up is in drivers which do not make correct use of > the kernels DMA APIs but which happen to work on native x86 because > physical==bus address on x86. > > Sometimes booting natively with 'iommu=soft swiotlb=force' can expose these > sorts of issues. > I'll give this a try. > > You are running 64-bit so I don't think the recent "config: Enable > NEED_DMA_MAP_STATE by default when SWIOTLB is selected" is likely to be > relevant (it's already unconditionally on for 64-bit). > > The trace appears to be on rx from a physical nic, there shouldn't be any > magic Xen stuff (granted pages etc) getting themselves into that path at > all. If it were tx then maybe it might be an issue with foreign pages. In > any case I think you are able to repro with just dom0, i.e. never having > started a domU, is that right? > As far as I remember and as far as I can interpret my own e-mail, yes. Thanks for the feedback, I'll try the suggested approaches and also try using v4.3-rc1 and take it up with the mlx4 maintainers if I still see the issue. -Christoffer --001a1135b85eb4d309051fb762f6 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable


On Mon, Sep 14, 2015 at 5:20 PM, Ian Campbell <ian.campbell@citr= ix.com> wrote:
On Mon, 2015-09-14 at 14:40 +0200, Christoffe= r Dall wrote:
> On Fri, Jul 31, 2015 at 03:17:56PM +0200, Christoffer Dall wrote:
> > On Fri, Jul 31, 2015 at 12:28 PM, David Vrabel <david.vrabel@citrix.com
> > >
> > wrote:
> >
> > > On 31/07/15 11:24, Stefano Stabellini wrote:
> > > > This is a Linux Dom0 crash on x86 (Dell PowerEdge R320,= Xeon E5
> > > > -2450),
> > > > CC'ing relevant people. As you can see from the lin= ks below the
> > > > crash
> > > > is:
> > > >
> > > > [ 253.619326] Call Trace:
> > > > [ 253.619330] <IRQ>
> > > > [ 253.619332] [<ffffffff815d7c25>] ? skb_copy_ubu= fs+0xa5/0x230
> > > > [ 253.619347] [<ffffffff815e8525>]
> > > > __netif_receive_skb_core+0x6f5/0x940
> > > > [ 253.619353] [<ffffffff815e8788>] __netif_receiv= e_skb+0x18/0x60
> > > > [ 253.619360] [<ffffffff815e87f8>]
> > > > netif_receive_skb_internal+0x28/0x90
> > > > [ 253.619366] [<ffffffff815e91f5>] napi_gro_frags= +0x125/0x1a0
> > > > [ 253.619378] [<ffffffffa01b1173>]
> > > > mlx4_en_process_rx_cq+0x753/0xb50
> > > [mlx4_en]
> > > > [ 253.619387] [<ffffffffa01b1657>] mlx4_en_poll_r= x_cq+0x97/0x160
> > > [mlx4_en]
> > >
> > > What makes you think this is Xen specific?=C2=A0 I suggest r= aising this
> > > the
> > > the mlx4 maintainers.
> > >
> > >
> > Linux native and KVM guests (same hw, same kernel version+config)= run
> > just
> > fine under the same workload.
> >
> Ping?
>
> From the fact that bare-metal and KVM works fine with this hardware I<= br> > still think it's reasonable to assume that it's a Xen issue an= d not a
> mlx4 issue.
>
> Is this completely flawed?

My (somewhat educated) guess is that this is to do with the dif= ference
between (pseudo-)physical addresses and machine (AKA real-physical)
addresses when running under Xen.

The way this often shows up is in drivers which do not make correct use of<= br> the kernels DMA APIs but which happen to work on native x86 because
physical=3D=3Dbus address on x86.

Sometimes booting natively with 'iommu=3Dsoft swiotlb=3Dforce' can = expose these
sorts of issues.

I'll give this a t= ry.
=C2=A0

You are running 64-bit so I don't think the recent "config: Enable=
NEED_DMA_MAP_STATE by default when SWIOTLB is selected" is likely to b= e
relevant (it's already unconditionally on for 64-bit).

The trace appears to be on rx from a physical nic, there shouldn't be a= ny
magic Xen stuff (granted pages etc) getting themselves into that path at all. If it were tx then maybe it might be an issue with foreign pages. In any case I think you are able to repro with just dom0, i.e. never having started a domU, is that right?

As far a= s I remember and as far as I can interpret my own e-mail, yes.=C2=A0
<= div>
Thanks for the feedback, I'll try the suggested appr= oaches and also try using v4.3-rc1 and take it up with the mlx4 maintainers= if I still see the issue.

-Christoffer
--001a1135b85eb4d309051fb762f6-- --===============2073014019314485857== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel --===============2073014019314485857==--