Linux-mm Archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] mm: fix nested allocation context filtering
@ 2024-04-30  5:28 Dave Chinner
  2024-04-30  5:28 ` [PATCH 1/3] mm: lift gfp_kmemleak_mask() to gfp.h Dave Chinner
                   ` (6 more replies)
  0 siblings, 7 replies; 10+ messages in thread
From: Dave Chinner @ 2024-04-30  5:28 UTC (permalink / raw
  To: linux-mm, linux-xfs; +Cc: akpm, hch, osalvador, elver, vbabka, andreyknvl

This patchset is the followup to the comment I made earlier today:

https://lore.kernel.org/linux-xfs/ZjAyIWUzDipofHFJ@dread.disaster.area/

Tl;dr: Memory allocations that are done inside the public memory
allocation API need to obey the reclaim recursion constraints placed
on the allocation by the original caller, including the "don't track
recursion for this allocation" case defined by __GFP_NOLOCKDEP.

These nested allocations are generally in debug code that is
tracking something about the allocation (kmemleak, KASAN, etc) and
so are allocating private kernel objects that only that debug system
will use.

Neither the page-owner code nor the stack depot code get this right.
They also also clear GFP_ZONEMASK as a separate operation, which is
completely redundant because the constraint filter applied
immediately after guarantees that GFP_ZONEMASK bits are cleared.

kmemleak gets this filtering right. It preserves the allocation
constraints for deadlock prevention and clears all other context
flags whilst also ensuring that the nested allocation will fail
quickly, silently and without depleting emergency kernel reserves if
there is no memory available.

This can be made much more robust, immune to whack-a-mole games and
the code greatly simplified by lifting gfp_kmemleak_mask() to
include/linux/gfp.h and using that everywhere. Also document it so
that there is no excuse for not knowing about it when writing new
debug code that nests allocations.

Tested with lockdep, KASAN + page_owner=on and kmemleak=on over
multiple fstests runs with XFS.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/3] mm: lift gfp_kmemleak_mask() to gfp.h
  2024-04-30  5:28 [PATCH 0/3] mm: fix nested allocation context filtering Dave Chinner
@ 2024-04-30  5:28 ` Dave Chinner
  2024-04-30 12:39   ` Marco Elver
  2024-04-30  5:28 ` [PATCH 2/3] stackdepot: use gfp_nested_mask() instead of open coded masking Dave Chinner
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 10+ messages in thread
From: Dave Chinner @ 2024-04-30  5:28 UTC (permalink / raw
  To: linux-mm, linux-xfs; +Cc: akpm, hch, osalvador, elver, vbabka, andreyknvl

From: Dave Chinner <dchinner@redhat.com>

Any "internal" nested allocation done from within an allocation
context needs to obey the high level allocation gfp_mask
constraints. This is necessary for debug code like KASAN, kmemleak,
lockdep, etc that allocate memory for saving stack traces and other
information during memory allocation. If they don't obey things like
__GFP_NOLOCKDEP or __GFP_NOWARN, they produce false positive failure
detections.

kmemleak gets this right by using gfp_kmemleak_mask() to pass
through the relevant context flags to the nested allocation
to ensure that the allocation follows the constraints of the caller
context.

KASAN recently was foudn to be missing __GFP_NOLOCKDEP due to stack
depot allocations, and even more recently the page owner tracking
code was also found to be missing __GFP_NOLOCKDEP support.

We also don't wan't want KASAN or lockdep to drive the system into
OOM kill territory by exhausting emergency reserves. This is
something that kmemleak also gets right by adding (__GFP_NORETRY |
__GFP_NOMEMALLOC | __GFP_NOWARN) to the allocation mask.

Hence it is clear that we need to define a common nested allocation
filter mask for these sorts of third party nested allocations used
in debug code. So to start this process, lift gfp_kmemleak_mask() to
gfp.h and rename it to gfp_nested_mask(), and convert the kmemleak
callers to use it.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 include/linux/gfp.h | 25 +++++++++++++++++++++++++
 mm/kmemleak.c       | 10 ++--------
 2 files changed, 27 insertions(+), 8 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index c775ea3c6015..a4ca004f3b8e 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -154,6 +154,31 @@ static inline int gfp_zonelist(gfp_t flags)
 	return ZONELIST_FALLBACK;
 }
 
+/*
+ * gfp flag masking for nested internal allocations.
+ *
+ * For code that needs to do allocations inside the public allocation API (e.g.
+ * memory allocation tracking code) the allocations need to obey the caller
+ * allocation context constrains to prevent allocation context mismatches (e.g.
+ * GFP_KERNEL allocations in GFP_NOFS contexts) from potential deadlock
+ * situations.
+ *
+ * It is also assumed that these nested allocations are for internal kernel
+ * object storage purposes only and are not going to be used for DMA, etc. Hence
+ * we strip out all the zone information and leave just the context information
+ * intact.
+ *
+ * Further, internal allocations must fail before the higher level allocation
+ * can fail, so we must make them fail faster and fail silently. We also don't
+ * want them to deplete emergency reserves.  Hence nested allocations must be
+ * prepared for these allocations to fail.
+ */
+static inline gfp_t gfp_nested_mask(gfp_t flags)
+{
+	return ((flags & (GFP_KERNEL | GFP_ATOMIC | __GFP_NOLOCKDEP)) |
+		(__GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN));
+}
+
 /*
  * We get the zone list from the current node and the gfp_mask.
  * This zone list contains a maximum of MAX_NUMNODES*MAX_NR_ZONES zones.
diff --git a/mm/kmemleak.c b/mm/kmemleak.c
index 6a540c2b27c5..b723f937e513 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -114,12 +114,6 @@
 
 #define BYTES_PER_POINTER	sizeof(void *)
 
-/* GFP bitmask for kmemleak internal allocations */
-#define gfp_kmemleak_mask(gfp)	(((gfp) & (GFP_KERNEL | GFP_ATOMIC | \
-					   __GFP_NOLOCKDEP)) | \
-				 __GFP_NORETRY | __GFP_NOMEMALLOC | \
-				 __GFP_NOWARN)
-
 /* scanning area inside a memory block */
 struct kmemleak_scan_area {
 	struct hlist_node node;
@@ -463,7 +457,7 @@ static struct kmemleak_object *mem_pool_alloc(gfp_t gfp)
 
 	/* try the slab allocator first */
 	if (object_cache) {
-		object = kmem_cache_alloc(object_cache, gfp_kmemleak_mask(gfp));
+		object = kmem_cache_alloc(object_cache, gfp_nested_mask(gfp));
 		if (object)
 			return object;
 	}
@@ -947,7 +941,7 @@ static void add_scan_area(unsigned long ptr, size_t size, gfp_t gfp)
 	untagged_objp = (unsigned long)kasan_reset_tag((void *)object->pointer);
 
 	if (scan_area_cache)
-		area = kmem_cache_alloc(scan_area_cache, gfp_kmemleak_mask(gfp));
+		area = kmem_cache_alloc(scan_area_cache, gfp_nested_mask(gfp));
 
 	raw_spin_lock_irqsave(&object->lock, flags);
 	if (!area) {
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/3] stackdepot: use gfp_nested_mask() instead of open coded masking
  2024-04-30  5:28 [PATCH 0/3] mm: fix nested allocation context filtering Dave Chinner
  2024-04-30  5:28 ` [PATCH 1/3] mm: lift gfp_kmemleak_mask() to gfp.h Dave Chinner
@ 2024-04-30  5:28 ` Dave Chinner
  2024-04-30 12:39   ` Marco Elver
  2024-04-30  5:28 ` [PATCH 3/3] mm/page-owner: " Dave Chinner
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 10+ messages in thread
From: Dave Chinner @ 2024-04-30  5:28 UTC (permalink / raw
  To: linux-mm, linux-xfs; +Cc: akpm, hch, osalvador, elver, vbabka, andreyknvl

From: Dave Chinner <dchinner@redhat.com>

The stackdepot code is used by KASAN and lockdep for recoding stack
traces. Both of these track allocation context information, and so
their internal allocations must obey the caller allocation contexts
to avoid generating their own false positive warnings that have
nothing to do with the code they are instrumenting/tracking.

We also don't want recording stack traces to deplete emergency
memory reserves - debug code is useless if it creates new issues
that can't be replicated when the debug code is disabled.

Switch the stackdepot allocation masking to use gfp_nested_mask()
to address these issues. gfp_nested_mask() also strips GFP_ZONEMASK
naturally, so that greatly simplifies this code.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 lib/stackdepot.c | 11 ++---------
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index 68c97387aa54..0bbae49e6177 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -624,15 +624,8 @@ depot_stack_handle_t stack_depot_save_flags(unsigned long *entries,
 	 * we won't be able to do that under the lock.
 	 */
 	if (unlikely(can_alloc && !READ_ONCE(new_pool))) {
-		/*
-		 * Zero out zone modifiers, as we don't have specific zone
-		 * requirements. Keep the flags related to allocation in atomic
-		 * contexts and I/O.
-		 */
-		alloc_flags &= ~GFP_ZONEMASK;
-		alloc_flags &= (GFP_ATOMIC | GFP_KERNEL);
-		alloc_flags |= __GFP_NOWARN;
-		page = alloc_pages(alloc_flags, DEPOT_POOL_ORDER);
+		page = alloc_pages(gfp_nested_mask(alloc_flags),
+				DEPOT_POOL_ORDER);
 		if (page)
 			prealloc = page_address(page);
 	}
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/3] mm/page-owner: use gfp_nested_mask() instead of open coded masking
  2024-04-30  5:28 [PATCH 0/3] mm: fix nested allocation context filtering Dave Chinner
  2024-04-30  5:28 ` [PATCH 1/3] mm: lift gfp_kmemleak_mask() to gfp.h Dave Chinner
  2024-04-30  5:28 ` [PATCH 2/3] stackdepot: use gfp_nested_mask() instead of open coded masking Dave Chinner
@ 2024-04-30  5:28 ` Dave Chinner
  2024-04-30  9:35 ` [PATCH 0/3] mm: fix nested allocation context filtering Christoph Hellwig
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Dave Chinner @ 2024-04-30  5:28 UTC (permalink / raw
  To: linux-mm, linux-xfs; +Cc: akpm, hch, osalvador, elver, vbabka, andreyknvl

The page-owner tracking code records stack traces during page
allocation. To do this, it must do a memory allocation for the stack
information from inside an existing memory allocation context. This
internal allocation must obey the high level caller allocation
constraints to avoid generating false positive warnings that have
nothing to do with the code they are instrumenting/tracking (e.g.
through lockdep reclaim state tracking)

We also don't want recording stack traces to deplete emergency
memory reserves - debug code is useless if it creates new issues
that can't be replicated when the debug code is disabled.

Switch the stack tracking allocation masking to use
gfp_nested_mask() to address these issues. gfp_nested_mask()
naturally strips GFP_ZONEMASK, too, which greatly simplifies this
code.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 mm/page_owner.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/mm/page_owner.c b/mm/page_owner.c
index 742f432e5bf0..55e89c34f0cd 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -168,13 +168,8 @@ static void add_stack_record_to_list(struct stack_record *stack_record,
 	unsigned long flags;
 	struct stack *stack;
 
-	/* Filter gfp_mask the same way stackdepot does, for consistency */
-	gfp_mask &= ~GFP_ZONEMASK;
-	gfp_mask &= (GFP_ATOMIC | GFP_KERNEL);
-	gfp_mask |= __GFP_NOWARN;
-
 	set_current_in_page_owner();
-	stack = kmalloc(sizeof(*stack), gfp_mask);
+	stack = kmalloc(sizeof(*stack), gfp_nested_mask(gfp_mask));
 	if (!stack) {
 		unset_current_in_page_owner();
 		return;
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/3] mm: fix nested allocation context filtering
  2024-04-30  5:28 [PATCH 0/3] mm: fix nested allocation context filtering Dave Chinner
                   ` (2 preceding siblings ...)
  2024-04-30  5:28 ` [PATCH 3/3] mm/page-owner: " Dave Chinner
@ 2024-04-30  9:35 ` Christoph Hellwig
  2024-04-30 10:06 ` Vlastimil Babka
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2024-04-30  9:35 UTC (permalink / raw
  To: Dave Chinner
  Cc: linux-mm, linux-xfs, akpm, hch, osalvador, elver, vbabka,
	andreyknvl

The changes looks fine to me:

Reviewed-by: Christoph Hellwig <hch@lst.de>

Btw, I wonder why NOLOCKDEP isn't a context flag to start with,
but maybe this isn't the right time to start that discussion..



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/3] mm: fix nested allocation context filtering
  2024-04-30  5:28 [PATCH 0/3] mm: fix nested allocation context filtering Dave Chinner
                   ` (3 preceding siblings ...)
  2024-04-30  9:35 ` [PATCH 0/3] mm: fix nested allocation context filtering Christoph Hellwig
@ 2024-04-30 10:06 ` Vlastimil Babka
  2024-05-01  8:06 ` Oscar Salvador
  2024-05-02 17:05 ` Andrew Morton
  6 siblings, 0 replies; 10+ messages in thread
From: Vlastimil Babka @ 2024-04-30 10:06 UTC (permalink / raw
  To: Dave Chinner, linux-mm, linux-xfs; +Cc: akpm, hch, osalvador, elver, andreyknvl

On 4/30/24 7:28 AM, Dave Chinner wrote:
> This patchset is the followup to the comment I made earlier today:
> 
> https://lore.kernel.org/linux-xfs/ZjAyIWUzDipofHFJ@dread.disaster.area/
> 
> Tl;dr: Memory allocations that are done inside the public memory
> allocation API need to obey the reclaim recursion constraints placed
> on the allocation by the original caller, including the "don't track
> recursion for this allocation" case defined by __GFP_NOLOCKDEP.
> 
> These nested allocations are generally in debug code that is
> tracking something about the allocation (kmemleak, KASAN, etc) and
> so are allocating private kernel objects that only that debug system
> will use.
> 
> Neither the page-owner code nor the stack depot code get this right.
> They also also clear GFP_ZONEMASK as a separate operation, which is
> completely redundant because the constraint filter applied
> immediately after guarantees that GFP_ZONEMASK bits are cleared.
> 
> kmemleak gets this filtering right. It preserves the allocation
> constraints for deadlock prevention and clears all other context
> flags whilst also ensuring that the nested allocation will fail
> quickly, silently and without depleting emergency kernel reserves if
> there is no memory available.
> 
> This can be made much more robust, immune to whack-a-mole games and
> the code greatly simplified by lifting gfp_kmemleak_mask() to
> include/linux/gfp.h and using that everywhere. Also document it so
> that there is no excuse for not knowing about it when writing new
> debug code that nests allocations.
> 
> Tested with lockdep, KASAN + page_owner=on and kmemleak=on over
> multiple fstests runs with XFS.

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

Thanks.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] mm: lift gfp_kmemleak_mask() to gfp.h
  2024-04-30  5:28 ` [PATCH 1/3] mm: lift gfp_kmemleak_mask() to gfp.h Dave Chinner
@ 2024-04-30 12:39   ` Marco Elver
  0 siblings, 0 replies; 10+ messages in thread
From: Marco Elver @ 2024-04-30 12:39 UTC (permalink / raw
  To: Dave Chinner
  Cc: linux-mm, linux-xfs, akpm, hch, osalvador, vbabka, andreyknvl

On Tue, 30 Apr 2024 at 07:46, Dave Chinner <david@fromorbit.com> wrote:
>
> From: Dave Chinner <dchinner@redhat.com>
>
> Any "internal" nested allocation done from within an allocation
> context needs to obey the high level allocation gfp_mask
> constraints. This is necessary for debug code like KASAN, kmemleak,
> lockdep, etc that allocate memory for saving stack traces and other
> information during memory allocation. If they don't obey things like
> __GFP_NOLOCKDEP or __GFP_NOWARN, they produce false positive failure
> detections.
>
> kmemleak gets this right by using gfp_kmemleak_mask() to pass
> through the relevant context flags to the nested allocation
> to ensure that the allocation follows the constraints of the caller
> context.
>
> KASAN recently was foudn to be missing __GFP_NOLOCKDEP due to stack
> depot allocations, and even more recently the page owner tracking
> code was also found to be missing __GFP_NOLOCKDEP support.
>
> We also don't wan't want KASAN or lockdep to drive the system into
> OOM kill territory by exhausting emergency reserves. This is
> something that kmemleak also gets right by adding (__GFP_NORETRY |
> __GFP_NOMEMALLOC | __GFP_NOWARN) to the allocation mask.
>
> Hence it is clear that we need to define a common nested allocation
> filter mask for these sorts of third party nested allocations used
> in debug code. So to start this process, lift gfp_kmemleak_mask() to
> gfp.h and rename it to gfp_nested_mask(), and convert the kmemleak
> callers to use it.
>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Reviewed-by: Marco Elver <elver@google.com>

Looks very reasonable, thanks.

> ---
>  include/linux/gfp.h | 25 +++++++++++++++++++++++++
>  mm/kmemleak.c       | 10 ++--------
>  2 files changed, 27 insertions(+), 8 deletions(-)
>
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index c775ea3c6015..a4ca004f3b8e 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -154,6 +154,31 @@ static inline int gfp_zonelist(gfp_t flags)
>         return ZONELIST_FALLBACK;
>  }
>
> +/*
> + * gfp flag masking for nested internal allocations.
> + *
> + * For code that needs to do allocations inside the public allocation API (e.g.
> + * memory allocation tracking code) the allocations need to obey the caller
> + * allocation context constrains to prevent allocation context mismatches (e.g.
> + * GFP_KERNEL allocations in GFP_NOFS contexts) from potential deadlock
> + * situations.
> + *
> + * It is also assumed that these nested allocations are for internal kernel
> + * object storage purposes only and are not going to be used for DMA, etc. Hence
> + * we strip out all the zone information and leave just the context information
> + * intact.
> + *
> + * Further, internal allocations must fail before the higher level allocation
> + * can fail, so we must make them fail faster and fail silently. We also don't
> + * want them to deplete emergency reserves.  Hence nested allocations must be
> + * prepared for these allocations to fail.
> + */
> +static inline gfp_t gfp_nested_mask(gfp_t flags)
> +{
> +       return ((flags & (GFP_KERNEL | GFP_ATOMIC | __GFP_NOLOCKDEP)) |
> +               (__GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN));
> +}
> +
>  /*
>   * We get the zone list from the current node and the gfp_mask.
>   * This zone list contains a maximum of MAX_NUMNODES*MAX_NR_ZONES zones.
> diff --git a/mm/kmemleak.c b/mm/kmemleak.c
> index 6a540c2b27c5..b723f937e513 100644
> --- a/mm/kmemleak.c
> +++ b/mm/kmemleak.c
> @@ -114,12 +114,6 @@
>
>  #define BYTES_PER_POINTER      sizeof(void *)
>
> -/* GFP bitmask for kmemleak internal allocations */
> -#define gfp_kmemleak_mask(gfp) (((gfp) & (GFP_KERNEL | GFP_ATOMIC | \
> -                                          __GFP_NOLOCKDEP)) | \
> -                                __GFP_NORETRY | __GFP_NOMEMALLOC | \
> -                                __GFP_NOWARN)
> -
>  /* scanning area inside a memory block */
>  struct kmemleak_scan_area {
>         struct hlist_node node;
> @@ -463,7 +457,7 @@ static struct kmemleak_object *mem_pool_alloc(gfp_t gfp)
>
>         /* try the slab allocator first */
>         if (object_cache) {
> -               object = kmem_cache_alloc(object_cache, gfp_kmemleak_mask(gfp));
> +               object = kmem_cache_alloc(object_cache, gfp_nested_mask(gfp));
>                 if (object)
>                         return object;
>         }
> @@ -947,7 +941,7 @@ static void add_scan_area(unsigned long ptr, size_t size, gfp_t gfp)
>         untagged_objp = (unsigned long)kasan_reset_tag((void *)object->pointer);
>
>         if (scan_area_cache)
> -               area = kmem_cache_alloc(scan_area_cache, gfp_kmemleak_mask(gfp));
> +               area = kmem_cache_alloc(scan_area_cache, gfp_nested_mask(gfp));
>
>         raw_spin_lock_irqsave(&object->lock, flags);
>         if (!area) {
> --
> 2.43.0
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] stackdepot: use gfp_nested_mask() instead of open coded masking
  2024-04-30  5:28 ` [PATCH 2/3] stackdepot: use gfp_nested_mask() instead of open coded masking Dave Chinner
@ 2024-04-30 12:39   ` Marco Elver
  0 siblings, 0 replies; 10+ messages in thread
From: Marco Elver @ 2024-04-30 12:39 UTC (permalink / raw
  To: Dave Chinner
  Cc: linux-mm, linux-xfs, akpm, hch, osalvador, vbabka, andreyknvl

On Tue, 30 Apr 2024 at 07:46, Dave Chinner <david@fromorbit.com> wrote:
>
> From: Dave Chinner <dchinner@redhat.com>
>
> The stackdepot code is used by KASAN and lockdep for recoding stack
> traces. Both of these track allocation context information, and so
> their internal allocations must obey the caller allocation contexts
> to avoid generating their own false positive warnings that have
> nothing to do with the code they are instrumenting/tracking.
>
> We also don't want recording stack traces to deplete emergency
> memory reserves - debug code is useless if it creates new issues
> that can't be replicated when the debug code is disabled.
>
> Switch the stackdepot allocation masking to use gfp_nested_mask()
> to address these issues. gfp_nested_mask() also strips GFP_ZONEMASK
> naturally, so that greatly simplifies this code.
>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Reviewed-by: Marco Elver <elver@google.com>

> ---
>  lib/stackdepot.c | 11 ++---------
>  1 file changed, 2 insertions(+), 9 deletions(-)
>
> diff --git a/lib/stackdepot.c b/lib/stackdepot.c
> index 68c97387aa54..0bbae49e6177 100644
> --- a/lib/stackdepot.c
> +++ b/lib/stackdepot.c
> @@ -624,15 +624,8 @@ depot_stack_handle_t stack_depot_save_flags(unsigned long *entries,
>          * we won't be able to do that under the lock.
>          */
>         if (unlikely(can_alloc && !READ_ONCE(new_pool))) {
> -               /*
> -                * Zero out zone modifiers, as we don't have specific zone
> -                * requirements. Keep the flags related to allocation in atomic
> -                * contexts and I/O.
> -                */
> -               alloc_flags &= ~GFP_ZONEMASK;
> -               alloc_flags &= (GFP_ATOMIC | GFP_KERNEL);
> -               alloc_flags |= __GFP_NOWARN;
> -               page = alloc_pages(alloc_flags, DEPOT_POOL_ORDER);
> +               page = alloc_pages(gfp_nested_mask(alloc_flags),
> +                               DEPOT_POOL_ORDER);
>                 if (page)
>                         prealloc = page_address(page);
>         }
> --
> 2.43.0
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/3] mm: fix nested allocation context filtering
  2024-04-30  5:28 [PATCH 0/3] mm: fix nested allocation context filtering Dave Chinner
                   ` (4 preceding siblings ...)
  2024-04-30 10:06 ` Vlastimil Babka
@ 2024-05-01  8:06 ` Oscar Salvador
  2024-05-02 17:05 ` Andrew Morton
  6 siblings, 0 replies; 10+ messages in thread
From: Oscar Salvador @ 2024-05-01  8:06 UTC (permalink / raw
  To: Dave Chinner; +Cc: linux-mm, linux-xfs, akpm, hch, elver, vbabka, andreyknvl

On Tue, Apr 30, 2024 at 03:28:22PM +1000, Dave Chinner wrote:
> This patchset is the followup to the comment I made earlier today:
> 
> https://lore.kernel.org/linux-xfs/ZjAyIWUzDipofHFJ@dread.disaster.area/
> 
> Tl;dr: Memory allocations that are done inside the public memory
> allocation API need to obey the reclaim recursion constraints placed
> on the allocation by the original caller, including the "don't track
> recursion for this allocation" case defined by __GFP_NOLOCKDEP.
> 
> These nested allocations are generally in debug code that is
> tracking something about the allocation (kmemleak, KASAN, etc) and
> so are allocating private kernel objects that only that debug system
> will use.
> 
> Neither the page-owner code nor the stack depot code get this right.
> They also also clear GFP_ZONEMASK as a separate operation, which is
> completely redundant because the constraint filter applied
> immediately after guarantees that GFP_ZONEMASK bits are cleared.
> 
> kmemleak gets this filtering right. It preserves the allocation
> constraints for deadlock prevention and clears all other context
> flags whilst also ensuring that the nested allocation will fail
> quickly, silently and without depleting emergency kernel reserves if
> there is no memory available.
> 
> This can be made much more robust, immune to whack-a-mole games and
> the code greatly simplified by lifting gfp_kmemleak_mask() to
> include/linux/gfp.h and using that everywhere. Also document it so
> that there is no excuse for not knowing about it when writing new
> debug code that nests allocations.
> 
> Tested with lockdep, KASAN + page_owner=on and kmemleak=on over
> multiple fstests runs with XFS.

Reviewed-by: Oscar Salvador <osalvador@suse.de>

Thanks!


-- 
Oscar Salvador
SUSE Labs


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/3] mm: fix nested allocation context filtering
  2024-04-30  5:28 [PATCH 0/3] mm: fix nested allocation context filtering Dave Chinner
                   ` (5 preceding siblings ...)
  2024-05-01  8:06 ` Oscar Salvador
@ 2024-05-02 17:05 ` Andrew Morton
  6 siblings, 0 replies; 10+ messages in thread
From: Andrew Morton @ 2024-05-02 17:05 UTC (permalink / raw
  To: Dave Chinner
  Cc: linux-mm, linux-xfs, hch, osalvador, elver, vbabka, andreyknvl

On Tue, 30 Apr 2024 15:28:22 +1000 Dave Chinner <david@fromorbit.com> wrote:

> This patchset is the followup to the comment I made earlier today:
> 
> https://lore.kernel.org/linux-xfs/ZjAyIWUzDipofHFJ@dread.disaster.area/
> 
> Tl;dr: Memory allocations that are done inside the public memory
> allocation API need to obey the reclaim recursion constraints placed
> on the allocation by the original caller, including the "don't track
> recursion for this allocation" case defined by __GFP_NOLOCKDEP.

I resolved a number of conflicts here, mainly due to the addition
of GFP_NOLOCKDEP in stackdepot and page-owner.

https://lore.kernel.org/all/20240418141133.22950-1-ryabinin.a.a@gmail.com/T/#u
https://lkml.kernel.org/r/20240429082828.1615986-1-hch@lst.de

Please check the resulting code?


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-05-02 17:05 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-30  5:28 [PATCH 0/3] mm: fix nested allocation context filtering Dave Chinner
2024-04-30  5:28 ` [PATCH 1/3] mm: lift gfp_kmemleak_mask() to gfp.h Dave Chinner
2024-04-30 12:39   ` Marco Elver
2024-04-30  5:28 ` [PATCH 2/3] stackdepot: use gfp_nested_mask() instead of open coded masking Dave Chinner
2024-04-30 12:39   ` Marco Elver
2024-04-30  5:28 ` [PATCH 3/3] mm/page-owner: " Dave Chinner
2024-04-30  9:35 ` [PATCH 0/3] mm: fix nested allocation context filtering Christoph Hellwig
2024-04-30 10:06 ` Vlastimil Babka
2024-05-01  8:06 ` Oscar Salvador
2024-05-02 17:05 ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).