From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751738AbcBEIOc (ORCPT ); Fri, 5 Feb 2016 03:14:32 -0500 Received: from mail.neosystem.cz ([94.23.169.88]:58384 "EHLO mail.neosystem.cz" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750865AbcBEIOb (ORCPT ); Fri, 5 Feb 2016 03:14:31 -0500 Date: Fri, 5 Feb 2016 09:11:46 +0100 From: Daniel Bilik To: Mike Galbraith Cc: Jan Kara , Thomas Gleixner , Tejun Heo , Michal Hocko , Jiri Slaby , Petr Mladek , Sasha Levin , Shaohua Li , LKML , stable@vger.kernel.org Subject: Re: Crashes with 874bbfe600a6 in 3.18.25 Message-Id: <20160205091146.d25db60f5c68229056aad82f@neosystem.cz> In-Reply-To: <1454640046.3545.8.camel@gmail.com> References: <20160126093400.GV24938@quack.suse.cz> <20160126111438.GA731@pathway.suse.cz> <56B1C9E4.4020400@suse.cz> <20160203122855.GB6762@dhcp22.suse.cz> <20160203162441.GE14091@mtj.duckdns.org> <1454518913.6148.15.camel@gmail.com> <20160203170652.GI14091@mtj.duckdns.org> <1454580263.3407.114.camel@gmail.com> <20160204112044.GE4956@quack.suse.cz> <20160204173931.4735a8de14fc0bde6c114321@neosystem.cz> <1454640046.3545.8.camel@gmail.com> Organization: neosystem.cz X-Mailer: Sylpheed 3.5.0 (GTK+ 2.24.29; amd64-portbld-freebsd10.3) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 05 Feb 2016 03:40:46 +0100 Mike Galbraith wrote: > On Thu, 2016-02-04 at 17:39 +0100, Daniel Bilik wrote: > > On Thu, 4 Feb 2016 12:20:44 +0100 > > Jan Kara wrote: > > > > > Thanks for backport Thomas and to Mike for persistence :). I've > > > asked my friend seeing crashes with 3.18.25 to try whether this > > > patch fixes the issues. It may take some time so stay tuned... > > > > Patch tested and it really fixes the crash we were experiencing on > > 3.18.25 with commit 874bbfe+. But it seem to introduce (rather scary) > > regression. Tested host shows abnormal cpu usage in both kernel and > > userland under the same load and traffic pattern. One picture is worth > > a thousand words, so I've taken snapshots of our graphs, see here: > > http://neosystem.cz/test/linux-3.18.25/ > > The host was running 3.18.25 with commit 874bbfe+ (1e7af29+ on > > 3.18-stable) reverted. With this commit included, it crashed within > > minutes. Around 13:30 we booted 3.18.25 with commit 874bbfe+ included > > and with the patch from Thomas. And around 15:40 we've booted the host > > with previous kernel, just to ensure this abnormal behaviour was > > really caused by the test kernel. > > Also interesting, in addition to high cpu usage, there is abnormally > > high number of zombie processes reported by the system. > > IMHO you should restore the CC list and re-post. (If I were the > maintainer of either the workqueue code or 3.18-stable, I'd be highly > interested in this finding). Sorry, I haven't realized tha patch proposed by Thomas is already on its way to stable. CC restored and re-posting. -- Daniel Bilik neosystem.cz