From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-24.6 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING, NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D8D8CC433EF for ; Wed, 8 Sep 2021 13:05:28 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 66F9B6115C for ; Wed, 8 Sep 2021 13:05:28 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 66F9B6115C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 97C3F6B006C; Wed, 8 Sep 2021 09:05:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 92C1B6B0071; Wed, 8 Sep 2021 09:05:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7CCFF900002; Wed, 8 Sep 2021 09:05:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0220.hostedemail.com [216.40.44.220]) by kanga.kvack.org (Postfix) with ESMTP id 6EAEF6B006C for ; Wed, 8 Sep 2021 09:05:27 -0400 (EDT) Received: from smtpin39.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 202BE2DEC9 for ; Wed, 8 Sep 2021 13:05:27 +0000 (UTC) X-FDA: 78564427494.39.56C9287 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf26.hostedemail.com (Postfix) with ESMTP id 9D46520019D6 for ; Wed, 8 Sep 2021 13:05:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1631106326; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=HoPJtnHGBsOc55e21GJcfBCbLNUNeFidP7o3s2TGU8E=; b=Vb8pIe3ML5iwieF8uN6XSB2Y4L43zmoxUjOX0s5xxt7pBu6/GU1mjzCph1Dq1XAGGvbM+j 5jtB2mpuUSZL6pT73fOvWqrXPNfpbDjCd1WClIKCDW1CEsxDJoI9fep3q6gngCeZB3vVuc I7cgiTLavNH3Yt0gHhmw481KuE+QN40= Received: from mail-lj1-f197.google.com (mail-lj1-f197.google.com [209.85.208.197]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-218-iz6fa4dWNXyO8cHMPO9xUg-1; Wed, 08 Sep 2021 09:05:24 -0400 X-MC-Unique: iz6fa4dWNXyO8cHMPO9xUg-1 Received: by mail-lj1-f197.google.com with SMTP id m9-20020a2e5809000000b001dccccc2bf6so945863ljb.7 for ; Wed, 08 Sep 2021 06:05:24 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:cc:subject:to:references:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=HoPJtnHGBsOc55e21GJcfBCbLNUNeFidP7o3s2TGU8E=; b=VKlkCiMwJ8nlq+xVOO64yc0o1mrnoNeBzGnse+5BY15lBg31DtJcH14SfNmb8GGkIf ZsXfR77FbxAkdnHv1+toGogvW38iaE4XKONk2UYDCxZrotZnL4TiciWuq6y/0FuaWNAU e2uwcjJIAglaSbGFpb0pp3xMLIT7tv/xwfy1697MG9ducFm5jYgX8PvvVlV1hydgudpQ QNV+eeEZgWJzACP+qh+MOHksoyhdJSbYV92XiHtesRfQ7zgr+NuM9MdRsT1FrEex2L0F 7rfVPvYvvCekORyxKeGc8OLwVKJAlHlCZSw5JvVQG76vSwVSIE4omolT+A5JkcdbV8cM 8uKQ== X-Gm-Message-State: AOAM532OQ4YTyF6lpLUUWRfBfvF7tSRWzAqhUM/igQzyPxnjJoCXSthh Th+tKbJp8mNfDG+xn6yeScC2nmpRkspapExhl6MEfQkTtX24jodWHNwh3gAuIbkSEAlG4uZjes6 Zlx4nfiMImiM= X-Received: by 2002:a05:651c:1103:: with SMTP id d3mr2648234ljo.445.1631106319467; Wed, 08 Sep 2021 06:05:19 -0700 (PDT) X-Google-Smtp-Source: ABdhPJx3MJC5oig2BAkkPz6RVuU0ggFNnkV977DKXhqjKirFuTwUrvqyiH9RYQSCGilEFF1HMkDXrA== X-Received: by 2002:a05:651c:1103:: with SMTP id d3mr2648195ljo.445.1631106319157; Wed, 08 Sep 2021 06:05:19 -0700 (PDT) Received: from [192.168.42.238] (87-59-106-155-cable.dk.customer.tdc.net. [87.59.106.155]) by smtp.gmail.com with ESMTPSA id o7sm237233lji.17.2021.09.08.06.05.17 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 08 Sep 2021 06:05:18 -0700 (PDT) From: Jesper Dangaard Brouer X-Google-Original-From: Jesper Dangaard Brouer Cc: brouer@redhat.com Subject: Re: [patch 031/147] mm, slub: protect put_cpu_partial() with disabled irqs instead of cmpxchg To: Andrew Morton , bigeasy@linutronix.de, cl@linux.com, efault@gmx.de, iamjoonsoo.kim@lge.com, jannh@google.com, linux-mm@kvack.org, mgorman@techsingularity.net, mm-commits@vger.kernel.org, penberg@kernel.org, quic_qiancai@quicinc.com, rientjes@google.com, tglx@linutronix.de, torvalds@linux-foundation.org, vbabka@suse.cz References: <20210908025436.dvsgeCXAh%akpm@linux-foundation.org> Message-ID: Date: Wed, 8 Sep 2021 15:05:16 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: <20210908025436.dvsgeCXAh%akpm@linux-foundation.org> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Vb8pIe3M; spf=none (imf26.hostedemail.com: domain of jbrouer@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=jbrouer@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Stat-Signature: qi47u15i1yozukwsktpo41batbmooh18 X-Rspamd-Queue-Id: 9D46520019D6 X-Rspamd-Server: rspam04 X-HE-Tag: 1631106326-313141 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 08/09/2021 04.54, Andrew Morton wrote: > From: Vlastimil Babka > Subject: mm, slub: protect put_cpu_partial() with disabled irqs instead of cmpxchg > > Jann Horn reported [1] the following theoretically possible race: > > task A: put_cpu_partial() calls preempt_disable() > task A: oldpage = this_cpu_read(s->cpu_slab->partial) > interrupt: kfree() reaches unfreeze_partials() and discards the page > task B (on another CPU): reallocates page as page cache > task A: reads page->pages and page->pobjects, which are actually > halves of the pointer page->lru.prev > task B (on another CPU): frees page > interrupt: allocates page as SLUB page and places it on the percpu partial list > task A: this_cpu_cmpxchg() succeeds > > which would cause page->pages and page->pobjects to end up containing > halves of pointers that would then influence when put_cpu_partial() > happens and show up in root-only sysfs files. Maybe that's acceptable, > I don't know. But there should probably at least be a comment for now > to point out that we're reading union fields of a page that might be > in a completely different state. > > Additionally, the this_cpu_cmpxchg() approach in put_cpu_partial() is only > safe against s->cpu_slab->partial manipulation in ___slab_alloc() if the > latter disables irqs, otherwise a __slab_free() in an irq handler could > call put_cpu_partial() in the middle of ___slab_alloc() manipulating > ->partial and corrupt it. This becomes an issue on RT after a local_lock > is introduced in later patch. The fix means taking the local_lock also in > put_cpu_partial() on RT. > > After debugging this issue, Mike Galbraith suggested [2] that to avoid > different locking schemes on RT and !RT, we can just protect > put_cpu_partial() with disabled irqs (to be converted to > local_lock_irqsave() later) everywhere. This should be acceptable as it's > not a fast path, and moving the actual partial unfreezing outside of the > irq disabled section makes it short, and with the retry loop gone the code > can be also simplified. In addition, the race reported by Jann should no > longer be possible. Based on my microbench[0] measurement changing preempt_disable to local_irq_save will cost us 11 cycles (TSC). I'm not against the change, I just want people to keep this in mind. On my E5-1650 v4 @ 3.60GHz: - preempt_disable(+enable) cost: 11 cycles(tsc) 3.161 ns - local_irq_save (+restore) cost: 22 cycles(tsc) 6.331 ns Notice the non-save/restore variant is superfast: - local_irq_disable(+enable) cost: 6 cycles(tsc) 1.844 ns [0] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/time_bench_sample.c > [1] https://lore.kernel.org/lkml/CAG48ez1mvUuXwg0YPH5ANzhQLpbphqk-ZS+jbRz+H66fvm4FcA@mail.gmail.com/ > [2] https://lore.kernel.org/linux-rt-users/e3470ab357b48bccfbd1f5133b982178a7d2befb.camel@gmx.de/ > > Link: https://lkml.kernel.org/r/20210904105003.11688-32-vbabka@suse.cz > Reported-by: Jann Horn > Suggested-by: Mike Galbraith > Signed-off-by: Vlastimil Babka > Cc: Christoph Lameter > Cc: David Rientjes > Cc: Jesper Dangaard Brouer > Cc: Joonsoo Kim > Cc: Mel Gorman > Cc: Pekka Enberg > Cc: Qian Cai > Cc: Sebastian Andrzej Siewior > Cc: Thomas Gleixner > Signed-off-by: Andrew Morton > --- > > mm/slub.c | 83 ++++++++++++++++++++++++++++------------------------ > 1 file changed, 45 insertions(+), 38 deletions(-) > > --- a/mm/slub.c~mm-slub-protect-put_cpu_partial-with-disabled-irqs-instead-of-cmpxchg > +++ a/mm/slub.c > @@ -2025,7 +2025,12 @@ static inline void *acquire_slab(struct > return freelist; > } > > +#ifdef CONFIG_SLUB_CPU_PARTIAL > static void put_cpu_partial(struct kmem_cache *s, struct page *page, int drain); > +#else > +static inline void put_cpu_partial(struct kmem_cache *s, struct page *page, > + int drain) { } > +#endif > static inline bool pfmemalloc_match(struct page *page, gfp_t gfpflags); > > /* > @@ -2459,14 +2464,6 @@ static void unfreeze_partials_cpu(struct > __unfreeze_partials(s, partial_page); > } > > -#else /* CONFIG_SLUB_CPU_PARTIAL */ > - > -static inline void unfreeze_partials(struct kmem_cache *s) { } > -static inline void unfreeze_partials_cpu(struct kmem_cache *s, > - struct kmem_cache_cpu *c) { } > - > -#endif /* CONFIG_SLUB_CPU_PARTIAL */ > - > /* > * Put a page that was just frozen (in __slab_free|get_partial_node) into a > * partial page slot if available. > @@ -2476,46 +2473,56 @@ static inline void unfreeze_partials_cpu > */ > static void put_cpu_partial(struct kmem_cache *s, struct page *page, int drain) > { > -#ifdef CONFIG_SLUB_CPU_PARTIAL > struct page *oldpage; > - int pages; > - int pobjects; > + struct page *page_to_unfreeze = NULL; > + unsigned long flags; > + int pages = 0; > + int pobjects = 0; > > - preempt_disable(); > - do { > - pages = 0; > - pobjects = 0; > - oldpage = this_cpu_read(s->cpu_slab->partial); > + local_irq_save(flags); > + > + oldpage = this_cpu_read(s->cpu_slab->partial); > > - if (oldpage) { > + if (oldpage) { > + if (drain && oldpage->pobjects > slub_cpu_partial(s)) { > + /* > + * Partial array is full. Move the existing set to the > + * per node partial list. Postpone the actual unfreezing > + * outside of the critical section. > + */ > + page_to_unfreeze = oldpage; > + oldpage = NULL; > + } else { > pobjects = oldpage->pobjects; > pages = oldpage->pages; > - if (drain && pobjects > slub_cpu_partial(s)) { > - /* > - * partial array is full. Move the existing > - * set to the per node partial list. > - */ > - unfreeze_partials(s); > - oldpage = NULL; > - pobjects = 0; > - pages = 0; > - stat(s, CPU_PARTIAL_DRAIN); > - } > } > + } > > - pages++; > - pobjects += page->objects - page->inuse; > + pages++; > + pobjects += page->objects - page->inuse; > > - page->pages = pages; > - page->pobjects = pobjects; > - page->next = oldpage; > - > - } while (this_cpu_cmpxchg(s->cpu_slab->partial, oldpage, page) > - != oldpage); > - preempt_enable(); > -#endif /* CONFIG_SLUB_CPU_PARTIAL */ > + page->pages = pages; > + page->pobjects = pobjects; > + page->next = oldpage; > + > + this_cpu_write(s->cpu_slab->partial, page); > + > + local_irq_restore(flags); > + > + if (page_to_unfreeze) { > + __unfreeze_partials(s, page_to_unfreeze); > + stat(s, CPU_PARTIAL_DRAIN); > + } > } > > +#else /* CONFIG_SLUB_CPU_PARTIAL */ > + > +static inline void unfreeze_partials(struct kmem_cache *s) { } > +static inline void unfreeze_partials_cpu(struct kmem_cache *s, > + struct kmem_cache_cpu *c) { } > + > +#endif /* CONFIG_SLUB_CPU_PARTIAL */ > + > static inline void flush_slab(struct kmem_cache *s, struct kmem_cache_cpu *c) > { > unsigned long flags; > _ > $ uname -a Linux broadwell 5.14.0-net-next+ #612 SMP PREEMPT Wed Sep 8 10:10:04 CEST 2021 x86_64 x86_64 x86_64 GNU/Linux My config: $ zcat /proc/config.gz | grep PREE # CONFIG_PREEMPT_NONE is not set # CONFIG_PREEMPT_VOLUNTARY is not set CONFIG_PREEMPT=y CONFIG_PREEMPT_COUNT=y CONFIG_PREEMPTION=y CONFIG_PREEMPT_DYNAMIC=y CONFIG_PREEMPT_RCU=y CONFIG_HAVE_PREEMPT_DYNAMIC=y CONFIG_PREEMPT_NOTIFIERS=y # CONFIG_DEBUG_PREEMPT is not set # CONFIG_PREEMPT_TRACER is not set # CONFIG_PREEMPTIRQ_DELAY_TEST is not set