From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751589AbcBGF7z (ORCPT ); Sun, 7 Feb 2016 00:59:55 -0500 Received: from mail-wm0-f45.google.com ([74.125.82.45]:37999 "EHLO mail-wm0-f45.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750739AbcBGF7x (ORCPT ); Sun, 7 Feb 2016 00:59:53 -0500 Message-ID: <1454824789.3616.26.camel@gmail.com> Subject: Re: Crashes with 874bbfe600a6 in 3.18.25 From: Mike Galbraith To: Henrique de Moraes Holschuh , Tejun Heo Cc: Michal Hocko , Jiri Slaby , Thomas Gleixner , Petr Mladek , Jan Kara , Ben Hutchings , Sasha Levin , Shaohua Li , LKML , stable@vger.kernel.org, Daniel Bilik Date: Sun, 07 Feb 2016 06:59:49 +0100 In-Reply-To: <1454822397.3616.12.camel@gmail.com> References: <20160203122855.GB6762@dhcp22.suse.cz> <20160203162441.GE14091@mtj.duckdns.org> <1454518913.6148.15.camel@gmail.com> <20160203170652.GI14091@mtj.duckdns.org> <1454551217.3677.27.camel@gmail.com> <20160205164923.GC4401@htj.duckdns.org> <1454705231.3819.151.camel@gmail.com> <20160205205456.GG4401@htj.duckdns.org> <1454705989.3819.158.camel@gmail.com> <20160205210606.GH4401@htj.duckdns.org> <20160206130742.GA17482@khazad-dum.debian.net> <1454822397.3616.12.camel@gmail.com> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.16.5 Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, 2016-02-07 at 06:19 +0100, Mike Galbraith wrote: > On Sat, 2016-02-06 at 11:07 -0200, Henrique de Moraes Holschuh wrote: > > On Fri, 05 Feb 2016, Tejun Heo wrote: > > > On Fri, Feb 05, 2016 at 09:59:49PM +0100, Mike Galbraith wrote: > > > > On Fri, 2016-02-05 at 15:54 -0500, Tejun Heo wrote: > > > > > > > > > What are you suggesting? > > > > > > > > That 874bbfe6 should die. > > > > > > Yeah, it's gonna be killed. The commit is there because the behavior > > > change broke things. We don't want to guarantee it but have been and > > > can't change it right away just because we don't like it when things > > > may break from it. The plan is to implement a debug option to force > > > workqueue to always execute these work items on a foreign cpu to weed > > > out breakages. > > > > Is there a path to filter down sane behavior (whichever one it might be) to > > the affected stable/LTS kernels? > > What Michal said, replace 874bbfe6 with 176bed1d. Without 22b886dd, > 874bbfe6 is a landmine, uses add_timer_on() as if it were mod_timer(), > which it is not, or rather was not until 22b886dd came along, and still > does not look like the mod_timer() alias that add_timer() is. BTW, with the 874bbfe6 22b886dd pair, mundane workqueue timers are no longer deflected to housekeeper CPUs, so NO_HZ_FULL regresses. -Mike