From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Vrabel Subject: Re: Dom0 crash with apache bench (ab) Date: Fri, 31 Jul 2015 11:28:10 +0100 Message-ID: <55BB4DBA.2040909@citrix.com> References: <20150728145022.GE26623@x230.dumpdata.com> <1438095319.11600.165.camel@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Stefano Stabellini , Christoffer Dall Cc: Wei Liu , Ian Campbell , xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org On 31/07/15 11:24, Stefano Stabellini wrote: > This is a Linux Dom0 crash on x86 (Dell PowerEdge R320, Xeon E5-2450), > CC'ing relevant people. As you can see from the links below the crash > is: > > [ 253.619326] Call Trace: > [ 253.619330] > [ 253.619332] [] ? skb_copy_ubufs+0xa5/0x230 > [ 253.619347] [] __netif_receive_skb_core+0x6f5/0x940 > [ 253.619353] [] __netif_receive_skb+0x18/0x60 > [ 253.619360] [] netif_receive_skb_internal+0x28/0x90 > [ 253.619366] [] napi_gro_frags+0x125/0x1a0 > [ 253.619378] [] mlx4_en_process_rx_cq+0x753/0xb50 [mlx4_en] > [ 253.619387] [] mlx4_en_poll_rx_cq+0x97/0x160 [mlx4_en] What makes you think this is Xen specific? I suggest raising this the the mlx4 maintainers. David > [ 253.619393] [] net_rx_action+0x13d/0x2f0 > [ 253.619400] [] __do_softirq+0xda/0x1f0 > [ 253.619406] [] irq_exit+0x9d/0xb0 > [ 253.619412] [] xen_evtchn_do_upcall+0x35/0x50 > [ 253.619420] [] xen_do_hypervisor_callback+0x1e/0x40 > [ 253.619423] > [ 253.619426] [] ? shrink_dcache_for_umount+0x90/0x90 > [ 253.619437] [] ? d_alloc_pseudo+0x9/0x10 > [ 253.619443] [] ? sock_alloc_file+0x4d/0x120 > [ 253.619448] [] ? SYSC_accept4+0xb8/0x200 > [ 253.619454] [] ? SyS_epoll_wait+0x87/0xe0 > [ 253.619459] [] ? SyS_accept4+0x9/0x10 > [ 253.619465] [] ? system_call_fastpath+0x16/0x1b > [ 253.619469] Code: 4e 48 83 c4 08 5b 5d c3 66 0f 1f 44 00 00 e8 6b fc > ff ff eb e1 90 90 90 90 90 90 90 90 90 48 89 f8 48 89 d1 48 c1 e9 03 83 > e2 07 > 48 a5 89 d1 f3 a4 c3 20 4c 8b 06 4c 8b 4e 08 4c 8b 56 10 4c > [ 253.619513] RIP [] __memcpy+0xd/0x110 > [ 253.619520] RSP > [ 253.619524] ---[ end trace ba5d35a466b03856 ]--- > > On Tue, 28 Jul 2015, Christoffer Dall wrote: >> On Tue, Jul 28, 2015 at 4:55 PM, Ian Campbell wrote: >> On Tue, 2015-07-28 at 10:50 -0400, Konrad Rzeszutek Wilk wrote: >> > On Tue, Jul 28, 2015 at 03:09:31PM +0200, Christoffer Dall wrote: >> > > Hi, >> > > >> > > I've been doing some performance comparisons lately, and wanted to >> > > compare >> > > the performance overhead of using Xen with apache bench, but >> > > unfortunately >> > > the Dom0 kernel crashes when hitting it with ab from a remote machine. >> > > Most other workloads seem to be stable, however, I do see similar >> > > crashes >> > > if hitting Dom0 mysql with a mysql benchmark with a high level of >> > > parallelism. >> > > >> > > I use a 10G Mellanox MX354A Dual port FDR CX3 adapter for networking on >> > > a >> > > Dell PowerEdge R320 system with a Xeon E5-2450 and 16 GB of RAM. >> > > >> > > Interestingly, we had a similarly looking issue on arm64 recently, but >> > > that >> > > was fixed with an APM-soecific fix to the hypervisor, so I am guessing >> > > this >> > > is unrelated, see: >> > > http://lists.xenproject.org/archives/html/xen-devel/2015 >> > > -03/msg02731.html >> > > and the fix: >> > > http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=50dcb3de603927db2fd >> > > 87ba09e29c817415aaa44 >> > > >> > > I have tried with several Linux versions, v3.13, v3.18, v4.0-rc4, and >> > > v4.1, >> > > same issue. I have tried with Xen 4.5-0 release, and the Ubuntu >> > > packaged >> > > Xen 4.4 release, same issue. >> > > >> > > Examples of crash: >> > > http://pastebin.ubuntu.com/11953498/ >> > > http://pastebin.ubuntu.com/11953443/ >> > >> > 4.0-rc4? >> > >> > Have you tried 4.1? >> >> According to the previous paragraph, yes he has. >> >> yes, I have. Just for clarify, I used 4.0-rc4 because that's a branch which contained arm64 PCI support and has >> been used for other measurements, so this was simply my 'working tree'.