Regressions List Tracking
 help / color / mirror / Atom feed
From: Yosry Ahmed <yosryahmed@google.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christian Heusel <christian@heusel.eu>,
	Chengming Zhou <chengming.zhou@linux.dev>,
	 Nhat Pham <nphamcs@gmail.com>,
	Seth Jennings <sjenning@redhat.com>,
	 Dan Streetman <ddstreet@ieee.org>,
	Vitaly Wool <vitaly.wool@konsulko.com>,
	 Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org,  linux-kernel@vger.kernel.org,
	David Runge <dave@sleepmap.de>,
	 "Richard W.M. Jones" <rjones@redhat.com>,
	Mark W <instruform@gmail.com>,
	regressions@lists.linux.dev
Subject: Re: [REGRESSION] Null pointer dereference while shrinking zswap
Date: Fri, 19 Apr 2024 12:10:01 -0700	[thread overview]
Message-ID: <CAJD7tkYOwO-LLrboSfz-wLJC+FD+ctaJ2KuZoUfiL6S3t0y7ow@mail.gmail.com> (raw)
In-Reply-To: <20240419142231.GD1055428@cmpxchg.org>

On Fri, Apr 19, 2024 at 7:22 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Thu, Apr 18, 2024 at 01:09:22PM -0700, Yosry Ahmed wrote:
> > On Thu, Apr 18, 2024 at 5:40 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
> > >
> > > On Wed, Apr 17, 2024 at 07:18:14PM +0200, Christian Heusel wrote:
> > > > On 24/04/17 10:33AM, Johannes Weiner wrote:
> > > > >
> > > > > Christian, can you please test the below patch on top of current
> > > > > upstream?
> > > > >
> > > >
> > > > Hey Johannes,
> > > >
> > > > I have applied your patch on top of 6.9-rc4 and it did solve the crash for
> > > > me, thanks for hacking together a fix so quickly! 🤗
> > > >
> > > > Tested-By: Christian Heusel <christian@heusel.eu>
> > >
> > > Thanks for confirming it, and sorry about the breakage.
> > >
> > > Andrew, can you please use the updated changelog below?
> > >
> > > ---
> > >
> > > From 52f67f5fab6a743c2aedfc8e04a582a9d1025c28 Mon Sep 17 00:00:00 2001
> > > From: Johannes Weiner <hannes@cmpxchg.org>
> > > Date: Thu, 18 Apr 2024 08:26:28 -0400
> > > Subject: [PATCH] mm: zswap: fix shrinker NULL crash with cgroup_disable=memory
> > >
> > > Christian reports a NULL deref in zswap that he bisected down to the
> > > zswap shrinker. The issue also cropped up in the bug trackers of
> > > libguestfs [1] and the Red Hat bugzilla [2].
> > >
> > > The problem is that when memcg is disabled with the boot time flag,
> > > the zswap shrinker might get called with sc->memcg == NULL. This is
> > > okay in many places, like the lruvec operations. But it crashes in
> > > memcg_page_state() - which is only used due to the non-node accounting
> > > of cgroup's the zswap memory to begin with.
> > >
> > > Nhat spotted that the memcg can be NULL in the memcg-disabled case,
> > > and I was then able to reproduce the crash locally as well.
> > >
> > > [1] https://github.com/libguestfs/libguestfs/issues/139
> > > [2] https://bugzilla.redhat.com/show_bug.cgi?id=2275252
> > >
> > > Fixes: b5ba474f3f51 ("zswap: shrink zswap pool based on memory pressure")
> > > Cc: stable@vger.kernel.org      [v6.8]
> > > Link: https://lkml.kernel.org/r/20240417143324.GA1055428@cmpxchg.org
> > > Reported-by: Christian Heusel <christian@heusel.eu>
> > > Debugged-by: Nhat Pham <nphamcs@gmail.com>
> > > Suggested-by: Nhat Pham <nphamcs@gmail.com>
> > > Tested-By: Christian Heusel <christian@heusel.eu>
> > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> >
> > Thanks for fixing this. A couple of comments/questions below, but anyway LGTM:
> >
> > Acked-by: Yosry Ahmed <yosryahmed@google.com>
> >
> > > ---
> > >  mm/zswap.c | 25 ++++++++++++++++---------
> > >  1 file changed, 16 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/mm/zswap.c b/mm/zswap.c
> > > index caed028945b0..6f8850c44b61 100644
> > > --- a/mm/zswap.c
> > > +++ b/mm/zswap.c
> > > @@ -1331,15 +1331,22 @@ static unsigned long zswap_shrinker_count(struct shrinker *shrinker,
> > >         if (!gfp_has_io_fs(sc->gfp_mask))
> > >                 return 0;
> > >
> > > -#ifdef CONFIG_MEMCG_KMEM
> > > -       mem_cgroup_flush_stats(memcg);
> > > -       nr_backing = memcg_page_state(memcg, MEMCG_ZSWAP_B) >> PAGE_SHIFT;
> > > -       nr_stored = memcg_page_state(memcg, MEMCG_ZSWAPPED);
> > > -#else
> > > -       /* use pool stats instead of memcg stats */
> > > -       nr_backing = zswap_pool_total_size >> PAGE_SHIFT;
> > > -       nr_stored = atomic_read(&zswap_nr_stored);
> > > -#endif
> > > +       /*
> > > +        * For memcg, use the cgroup-wide ZSWAP stats since we don't
> > > +        * have them per-node and thus per-lruvec. Careful if memcg is
> > > +        * runtime-disabled: we can get sc->memcg == NULL, which is ok
> > > +        * for the lruvec, but not for memcg_page_state().
> > > +        *
> > > +        * Without memcg, use the zswap pool-wide metrics.
> > > +        */
> > > +       if (!mem_cgroup_disabled()) {
> >
> > With the current shrinker code, it seems like we cannot get sc->memcg
> > == NULL unless mem_cgroup_disabled() is true indeed. However, maybe
> > it's better to check if sc->memcg is not NULL directly instead?
> >
> > This would be more resilient in case the shrinker code changes and
> > passing sc->memcg == NULL becomes possible in other cases (e.g. during
> > global shrinking). It seems like other shrinkers do this, for example
> > see count_shadow_nodes() and deferred_split_count().
>
> Eh, I'm not sure it's better or worse, so it's a bit hard to care. We
> shouldn't get NULL here when memcg is enabled, and if somebody
> introduces that bug it's better to catch it early than run into subtle
> priority inversions when the kernel is deployed to millions of hosts.

No strong opinion here, I just thought consistency with other
shrinkers would be nice.

>
> > I am also wondering if we should also check !mem_cgroup_is_root()
> > here? We can avoid the expensive global flush and use the global stats
> > directly in this case. I could also send a follow up patch for this if
> > that's preferred.
>
> I'd rather not proliferate more memcg internals in this code. If this
> is a concern, optimizing it in the flush and stat functions would make
> more sense. Reclaim already flushes the subtree before getting here,
> so odds are good this is a no-op in most cases.

I agree that with per-memcg thresholding, it is very unlikely that
this flush does anything.

FWIW, if it turns out to be a problem, optimizing it in the flush
functions would not be as straightforward, because the memcg code is
not aware that zswap has up-to-date global versions of the stats.

Anyway, let's table this until someone actually complains, just seemed
to me like a low hanging fruit.

  reply	other threads:[~2024-04-19 19:10 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-16 12:19 [REGRESSION] Null pointer dereference while shrinking zswap Christian Heusel
2024-04-16 19:18 ` Andrew Morton
2024-04-16 19:57   ` Christian Heusel
2024-04-16 22:14 ` Nhat Pham
2024-04-16 22:29   ` Christian Heusel
2024-04-16 23:29   ` Nhat Pham
2024-04-17  0:22     ` Nhat Pham
2024-04-17  3:44       ` Chengming Zhou
2024-04-17 14:33         ` Johannes Weiner
2024-04-17 15:08           ` Richard W.M. Jones
2024-04-17 17:18           ` Christian Heusel
2024-04-18 12:40             ` Johannes Weiner
2024-04-18 14:25               ` Linux regression tracking (Thorsten Leemhuis)
2024-04-18 20:09               ` Yosry Ahmed
2024-04-19 14:22                 ` Johannes Weiner
2024-04-19 19:10                   ` Yosry Ahmed [this message]
2024-04-17  0:33 ` Nhat Pham

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJD7tkYOwO-LLrboSfz-wLJC+FD+ctaJ2KuZoUfiL6S3t0y7ow@mail.gmail.com \
    --to=yosryahmed@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=chengming.zhou@linux.dev \
    --cc=christian@heusel.eu \
    --cc=dave@sleepmap.de \
    --cc=ddstreet@ieee.org \
    --cc=hannes@cmpxchg.org \
    --cc=instruform@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nphamcs@gmail.com \
    --cc=regressions@lists.linux.dev \
    --cc=rjones@redhat.com \
    --cc=sjenning@redhat.com \
    --cc=vitaly.wool@konsulko.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).