From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1949323AbcBSSeY (ORCPT ); Fri, 19 Feb 2016 13:34:24 -0500 Received: from mail-wm0-f50.google.com ([74.125.82.50]:37992 "EHLO mail-wm0-f50.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1948656AbcBSSeW (ORCPT ); Fri, 19 Feb 2016 13:34:22 -0500 Date: Fri, 19 Feb 2016 19:34:19 +0100 From: Michal Hocko To: Andrew Morton Cc: David Rientjes , Mel Gorman , Tetsuo Handa , Oleg Nesterov , Linus Torvalds , Hugh Dickins , Andrea Argangeli , Rik van Riel , linux-mm@kvack.org, LKML Subject: Re: [PATCH 6/5] oom, oom_reaper: disable oom_reaper for Message-ID: <20160219183419.GA30059@dhcp22.suse.cz> References: <1454505240-23446-1-git-send-email-mhocko@kernel.org> <1454505240-23446-6-git-send-email-mhocko@kernel.org> <20160217094855.GC29196@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160217094855.GC29196@dhcp22.suse.cz> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed 17-02-16 10:48:55, Michal Hocko wrote: > Hi Andrew, > although this can be folded into patch 5 > (mm-oom_reaper-implement-oom-victims-queuing.patch) I think it would be > better to have it separate and revert after we sort out the proper > oom_kill_allocating_task behavior or handle exclusion at oom_reaper > level. An alternative would be something like the following. It is definitely less hackish but it steals one bit in mm->flags. We do not seem to be in shortage there now but who knows. Does this sound better? Later changes might even consider the flag for the victim selection and ignore those which already have the flag set. But I didn't think about it more to form a patch yet. --- >>From 8b17e66a70edac65ecd6df411a675cf3d840a9fe Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Wed, 17 Feb 2016 10:40:41 +0100 Subject: [PATCH] oom, oom_reaper: disable oom_reaper for oom_kill_allocating_task Tetsuo has reported that oom_kill_allocating_task=1 will cause oom_reaper_list corruption because oom_kill_process doesn't follow standard OOM exclusion (aka ignores TIF_MEMDIE) and allows to enqueue the same task multiple times - e.g. by sacrificing the same child multiple times. This patch fixes the issue by introducing a new MMF_OOM_KILLED mm flag which is set in oom_kill_process atomically and oom reaper is disabled if the flag was already set. Reported-by: Tetsuo Handa Signed-off-by: Michal Hocko --- include/linux/sched.h | 2 ++ mm/oom_kill.c | 6 +++++- 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index c25996c336de..0552cd5696c2 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -509,6 +509,8 @@ static inline int get_dumpable(struct mm_struct *mm) #define MMF_HAS_UPROBES 19 /* has uprobes */ #define MMF_RECALC_UPROBES 20 /* MMF_HAS_UPROBES can be wrong */ +#define MMF_OOM_KILLED 21 /* OOM killer has chosen this mm */ + #define MMF_INIT_MASK (MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK) struct sighand_struct { diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 7e9953a64489..32ce05b1aa10 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -678,7 +678,7 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p, unsigned int victim_points = 0; static DEFINE_RATELIMIT_STATE(oom_rs, DEFAULT_RATELIMIT_INTERVAL, DEFAULT_RATELIMIT_BURST); - bool can_oom_reap = true; + bool can_oom_reap; /* * If the task is already exiting, don't alarm the sysadmin or kill @@ -740,6 +740,10 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p, /* Get a reference to safely compare mm after task_unlock(victim) */ mm = victim->mm; atomic_inc(&mm->mm_count); + + /* Make sure we do not try to oom reap the mm multiple times */ + can_oom_reap = !test_and_set_bit(MMF_OOM_KILLED, &mm->flags); + /* * We should send SIGKILL before setting TIF_MEMDIE in order to prevent * the OOM victim from depleting the memory reserves from the user -- 2.7.0 -- Michal Hocko SUSE Labs