All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Marco Elver <elver@google.com>
To: Imran Khan <imran.f.khan@oracle.com>
Cc: vbabka@suse.cz, glider@google.com, dvyukov@google.com,
	cl@linux.com, penberg@kernel.org, rientjes@google.com,
	iamjoonsoo.kim@lge.com, akpm@linux-foundation.org,
	roman.gushchin@linux.dev, 42.hyeyoo@gmail.com,
	linux-kernel@vger.kernel.org, kasan-dev@googlegroups.com,
	linux-mm@kvack.org
Subject: Re: [PATCH v2] Introduce sysfs interface to disable kfence for selected slabs.
Date: Fri, 12 Aug 2022 11:11:43 +0200	[thread overview]
Message-ID: <CANpmjNOCFqHdNUQQJ_zzug06Miwqg6kQpCqM0ckhy6jXzX-bLQ@mail.gmail.com> (raw)
In-Reply-To: <26acafb0-9528-9b29-0b5d-738890853fca@oracle.com>

On Thu, 11 Aug 2022 at 17:10, Imran Khan <imran.f.khan@oracle.com> wrote:
>
> Hello Marco,
>
> On 11/8/22 11:21 pm, Marco Elver wrote:
> > On Thu, 11 Aug 2022 at 12:07, <vbabka@suse.cz> wrote:
> > [...]
> >>> new flag SLAB_SKIP_KFENCE, it also can serve a dual purpose, where
> >>> someone might want to explicitly opt out by default and pass it to
> >>> kmem_cache_create() (for whatever reason; not that we'd encourage
> >>> that).
> >>
> >> Right, not be able to do that would be a downside (although it should be
> >> possible even with opt-in to add an opt-out cache flag that would just make
> >> sure the opt-in flag is not set even if eligible by global defaults).
> >
> > True, but I'd avoid all this unnecessary complexity if possible.
> >
> >>> I feel that the real use cases for selectively enabling caches for
> >>> KFENCE are very narrow, and a design that introduces lots of
> >>> complexity elsewhere, just to support this feature cannot be justified
> >>> (which is why I suggested the simpler design here back in
> >>> https://urldefense.com/v3/__https://lore.kernel.org/lkml/CANpmjNNmD9z7oRqSaP72m90kWL7jYH*cxNAZEGpJP8oLrDV-vw@mail.gmail.com/__;Kw!!ACWV5N9M2RV99hQ!Oh4PBJ1NoN9mEgqGqdaNcWuKtJiC6TS_rIbALuqZadQoo93jpVJaFFmXUpOTuzRUdCwcRJWE6uJ4pe0$
> >>> )
> >>
> >> I don't mind strongly either way, just a suggestion to consider.
> >
> > While switching the semantics of the flag from opt-out to opt-in is
> > just as valid, I'm more comfortable with the opt-out flag: the rest of
> > the logic can stay the same, and we're aware of the fact that changing
> > cache coverage by KFENCE shouldn't be something that needs to be done
> > manually.
> >
> > My main point is that opting out or in to only a few select caches
> > should be a rarely used feature, and accordingly it should be as
> > simple as possible. Honestly, I still don't quite see the point of it,
> > and my solution would be to just increase the KFENCE pool, increase
> > sample rate, or decrease the "skip covered threshold%". But in the
> > case described by Imran, perhaps a running machine is having trouble
> > and limiting the caches to be analyzed by KFENCE might be worthwhile
> > if a more aggressive configuration doesn't yield anything (and then
> > there's of course KASAN, but I recognize it's not always possible to
> > switch kernel and run the same workload with it).
> >
> > The use case for the proposed change is definitely when an admin or
> > kernel dev is starting to debug a problem. KFENCE wasn't designed for
> > that (vs. deployment at scale, discovery of bugs). As such I'm having
> > a hard time admitting how useful this feature will really be, but
> > given the current implementation is simple, having it might actually
> > help a few people.
> >
> > Imran, just to make sure my assumptions here are right, have you had
> > success debugging an issue in this way? Can you elaborate on what
> > "certain debugging scenarios" you mean (admin debugging something, or
> > a kernel dev, production fleet, or test machine)?
> >
>
> I have not used kfence in this way because as of now we don't have such newer
> kernels in production fleet but I can cite a couple of instances where using
> slub_debug for few selected slabs helped me in locating the issue on a
> production system where KASAN or even full slub_debug were not feasible.
> Apologies in advance if I am elaborating more than you asked for :).

This is very useful to understand the use case.

> In one case a freed struct mutex was being used later on and by that time same
> address had been given to a kmalloc-32 object. The issue was appearing more
> frequently if one would enforce some cgroup memory limitation resulting in fork
> of a task exiting prematurely. From the vmcore we could see that mutex or more
> specifically task_struct.futex_exit_mutex was in bad shape and eventually using
> slub_debug for kmalloc-32 pointed to issue.
>
> Another case involved a mem_cgroup corruption which was causing system crash but
> was giving list corruption warnings beforehand. Since list corruption warnings
> were coming from cgroup subsystem, corresponding objects were in doubt.
> Enabling slub_debug for kmalloc-4k helped in locating the actual corruption.
>
> Admittedly both of the above issues were result of backporting mistakes but
> nonetheless they happened in production systems where very few debugging options
> were available.
>
> By "certain debugging scenarios" I meant such cases where some initial data
> (from production fleet) like vmcore or kernel debug messages can give some
> pointer towards which slab objects could be wrong and then we would use this
> feature (along with further tuning like increasing sampling frequency, pool size
> if needed/possible) to pinpoint the actual issue. The idea is that limiting
> KFENCE to few slabs will increase the probablity of catching the issue even if
> we are not able to tweak pool size.
>
> Please let me know if it sounds reasonable or if I missed something from your
> query.

Thanks for the elaboration on use cases - agreed that in few scenarios
this feature can help increase the probability of debugging an issue.

Reviewed-by: Marco Elver <elver@google.com>

With minor suggestions:

> +SLAB_ATTR(skip_kfence);
> +

^ Unnecessary space between SLAB_ATTR and #endif.

> +#endif
> +

And the patch title should be something like "kfence: add sysfs
interface to disable kfence for selected slabs" (to follow format
"<subsys>: <thing changed>").

  reply	other threads:[~2022-08-12  9:12 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-11  8:59 [PATCH v2] Introduce sysfs interface to disable kfence for selected slabs Imran Khan
2022-08-11  9:31 ` vbabka
2022-08-11  9:52   ` Marco Elver
2022-08-11 10:07     ` vbabka
2022-08-11 13:21       ` Marco Elver
2022-08-11 15:10         ` Imran Khan
2022-08-12  9:11           ` Marco Elver [this message]
2022-08-11 10:40 ` Hyeonggon Yoo
2022-08-12 10:28 ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CANpmjNOCFqHdNUQQJ_zzug06Miwqg6kQpCqM0ckhy6jXzX-bLQ@mail.gmail.com \
    --to=elver@google.com \
    --cc=42.hyeyoo@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=dvyukov@google.com \
    --cc=glider@google.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=imran.f.khan@oracle.com \
    --cc=kasan-dev@googlegroups.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=penberg@kernel.org \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.