LKML Archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm/khugepaged: fix iteration in collapse_file
@ 2023-06-07  5:31 David Stevens
  2023-06-07 19:13 ` Peter Xu
  2023-06-09 20:02 ` Hugh Dickins
  0 siblings, 2 replies; 9+ messages in thread
From: David Stevens @ 2023-06-07  5:31 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: Peter Xu, Matthew Wilcox, Kirill A . Shutemov, Yang Shi,
	David Hildenbrand, Hugh Dickins, Jiaqi Yan, linux-kernel,
	David Stevens

From: David Stevens <stevensd@chromium.org>

Remove an unnecessary call to xas_set(index) when iterating over the
target range in collapse_file. The extra call to xas_set reset the xas
cursor to the top of the tree, causing the xas_next call on the next
iteration to walk the tree to index instead of advancing to index+1.
This returned the same page again, which would cause collapse_file to
fail because the page is already locked.

This bug was hidden when CONFIG_DEBUG_VM was set. When that config was
used, the xas_load in a subsequent VM_BUG_ON assert would walk xas from
the top of the tree to index, causing the xas_next call on the next loop
iteration to advance the cursor as expected.

Fixes: a2e17cc2efc7 ("mm/khugepaged: maintain page cache uptodate flag")
Signed-off-by: David Stevens <stevensd@chromium.org>
---
 mm/khugepaged.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 6b9d39d65b73..2d0d58fb4e7f 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2070,7 +2070,6 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
 					TTU_IGNORE_MLOCK | TTU_BATCH_FLUSH);
 
 		xas_lock_irq(&xas);
-		xas_set(&xas, index);
 
 		VM_BUG_ON_PAGE(page != xas_load(&xas), page);
 
-- 
2.41.0.rc2.161.g9c6817b8e7-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm/khugepaged: fix iteration in collapse_file
  2023-06-07  5:31 [PATCH] mm/khugepaged: fix iteration in collapse_file David Stevens
@ 2023-06-07 19:13 ` Peter Xu
  2023-06-09 20:02 ` Hugh Dickins
  1 sibling, 0 replies; 9+ messages in thread
From: Peter Xu @ 2023-06-07 19:13 UTC (permalink / raw)
  To: David Stevens
  Cc: linux-mm, Andrew Morton, Matthew Wilcox, Kirill A . Shutemov,
	Yang Shi, David Hildenbrand, Hugh Dickins, Jiaqi Yan,
	linux-kernel

On Wed, Jun 07, 2023 at 02:31:35PM +0900, David Stevens wrote:
> From: David Stevens <stevensd@chromium.org>
> 
> Remove an unnecessary call to xas_set(index) when iterating over the
> target range in collapse_file. The extra call to xas_set reset the xas
> cursor to the top of the tree, causing the xas_next call on the next
> iteration to walk the tree to index instead of advancing to index+1.
> This returned the same page again, which would cause collapse_file to
> fail because the page is already locked.
> 
> This bug was hidden when CONFIG_DEBUG_VM was set. When that config was
> used, the xas_load in a subsequent VM_BUG_ON assert would walk xas from
> the top of the tree to index, causing the xas_next call on the next loop
> iteration to advance the cursor as expected.
> 
> Fixes: a2e17cc2efc7 ("mm/khugepaged: maintain page cache uptodate flag")
> Signed-off-by: David Stevens <stevensd@chromium.org>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm/khugepaged: fix iteration in collapse_file
  2023-06-07  5:31 [PATCH] mm/khugepaged: fix iteration in collapse_file David Stevens
  2023-06-07 19:13 ` Peter Xu
@ 2023-06-09 20:02 ` Hugh Dickins
  2023-06-10  1:45   ` David Stevens
  1 sibling, 1 reply; 9+ messages in thread
From: Hugh Dickins @ 2023-06-09 20:02 UTC (permalink / raw)
  To: David Stevens
  Cc: linux-mm, Andrew Morton, Peter Xu, Matthew Wilcox,
	Kirill A . Shutemov, Yang Shi, David Hildenbrand, Hugh Dickins,
	Jiaqi Yan, linux-kernel

On Wed, 7 Jun 2023, David Stevens wrote:
> 
> Remove an unnecessary call to xas_set(index) when iterating over the
> target range in collapse_file. The extra call to xas_set reset the xas
> cursor to the top of the tree, causing the xas_next call on the next
> iteration to walk the tree to index instead of advancing to index+1.
> This returned the same page again, which would cause collapse_file to
> fail because the page is already locked.
> 
> This bug was hidden when CONFIG_DEBUG_VM was set. When that config was
> used, the xas_load in a subsequent VM_BUG_ON assert would walk xas from
> the top of the tree to index, causing the xas_next call on the next loop
> iteration to advance the cursor as expected.
> 
> Fixes: a2e17cc2efc7 ("mm/khugepaged: maintain page cache uptodate flag")
> Signed-off-by: David Stevens <stevensd@chromium.org>

This patch seems to be wrong, but I have not investigated why.

It's certainly an interesting and worrying observation,
if a CONFIG_DEBUG_VM=y kernel goes a significantly different way.

I almost always do have CONFIG_DEBUG_VM=y, so you won't be surprised that
I never saw the issue.  But once I ran an mm-everything with this patch in,
I hit that VM_BUG_ON_PAGE(page != xas_load(&xas), page) for the first time
(after about 2 hours of huge tmpfs swapping load).

As if you have just transferred the problem from DEBUG_VM=n to DEBUG_VM=y.
But I then tried a CONFIG_DEBUG_VM off 6.4-rc1 kernel (including the fixee
but not this fixer) under similar load, and saw no problem in 14 hours.
So I can't even reproduce the bug that is being fixed here: only hit a
bug that it introduces.

Hugh

> ---
>  mm/khugepaged.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 6b9d39d65b73..2d0d58fb4e7f 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -2070,7 +2070,6 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
>  					TTU_IGNORE_MLOCK | TTU_BATCH_FLUSH);
>  
>  		xas_lock_irq(&xas);
> -		xas_set(&xas, index);
>  
>  		VM_BUG_ON_PAGE(page != xas_load(&xas), page);
>  
> -- 
> 2.41.0.rc2.161.g9c6817b8e7-goog

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm/khugepaged: fix iteration in collapse_file
  2023-06-09 20:02 ` Hugh Dickins
@ 2023-06-10  1:45   ` David Stevens
  2023-06-11 18:26     ` Hugh Dickins
  0 siblings, 1 reply; 9+ messages in thread
From: David Stevens @ 2023-06-10  1:45 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: linux-mm, Andrew Morton, Peter Xu, Matthew Wilcox,
	Kirill A . Shutemov, Yang Shi, David Hildenbrand, Jiaqi Yan,
	linux-kernel

On Sat, Jun 10, 2023 at 5:03 AM Hugh Dickins <hughd@google.com> wrote:
>
> On Wed, 7 Jun 2023, David Stevens wrote:
> >
> > Remove an unnecessary call to xas_set(index) when iterating over the
> > target range in collapse_file. The extra call to xas_set reset the xas
> > cursor to the top of the tree, causing the xas_next call on the next
> > iteration to walk the tree to index instead of advancing to index+1.
> > This returned the same page again, which would cause collapse_file to
> > fail because the page is already locked.
> >
> > This bug was hidden when CONFIG_DEBUG_VM was set. When that config was
> > used, the xas_load in a subsequent VM_BUG_ON assert would walk xas from
> > the top of the tree to index, causing the xas_next call on the next loop
> > iteration to advance the cursor as expected.
> >
> > Fixes: a2e17cc2efc7 ("mm/khugepaged: maintain page cache uptodate flag")
> > Signed-off-by: David Stevens <stevensd@chromium.org>
>
> This patch seems to be wrong, but I have not investigated why.
>
> It's certainly an interesting and worrying observation,
> if a CONFIG_DEBUG_VM=y kernel goes a significantly different way.
>
> I almost always do have CONFIG_DEBUG_VM=y, so you won't be surprised that
> I never saw the issue.  But once I ran an mm-everything with this patch in,
> I hit that VM_BUG_ON_PAGE(page != xas_load(&xas), page) for the first time
> (after about 2 hours of huge tmpfs swapping load).

Is the particular workload one you can share? I haven't hit that
assert so far with my tests.

Also, I'm a little surprised that this is the assert which is being
hit. My patch series didn't make that many changes to the first loop
of the function, and the changes it did make were mostly about missing
pages, not present pages.

> As if you have just transferred the problem from DEBUG_VM=n to DEBUG_VM=y.
> But I then tried a CONFIG_DEBUG_VM off 6.4-rc1 kernel (including the fixee
> but not this fixer) under similar load, and saw no problem in 14 hours.
> So I can't even reproduce the bug that is being fixed here: only hit a
> bug that it introduces.

The bug this fixes isn't a crash - it's the fact that khugepage
becomes nearly unable to collapse pages. Specifically, it can only
collapse if there is exactly one present page, and that page is at
index 511. Since MADV_COLLAPSE uses the same code path, it's easy to
reproduce the bug I'm trying to fix with this program:

#define _GNU_SOURCE
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <unistd.h>

#define THP_SIZE (2 * 1024 * 1024)

#ifndef MADV_COLLAPSE
#define MADV_COLLAPSE 25
#endif

int main() {
  int memfd = memfd_create("memfd", MFD_CLOEXEC);
  assert(memfd >= 0);

  int ret = ftruncate(memfd, THP_SIZE);
  assert(ret >= 0);

  char *addr = mmap(NULL, THP_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, memfd, 0);
  assert(addr != MAP_FAILED);

  addr[0] = 0xff;

  ret = madvise(addr, THP_SIZE, MADV_COLLAPSE);
  assert(ret == 0);
}

If DEBUG_VM isn't set, then the test will trigger the assert. If
DEBUG_VM is set or if this fix is included, then the test will pass.

-David

> Hugh
>
> > ---
> >  mm/khugepaged.c | 1 -
> >  1 file changed, 1 deletion(-)
> >
> > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > index 6b9d39d65b73..2d0d58fb4e7f 100644
> > --- a/mm/khugepaged.c
> > +++ b/mm/khugepaged.c
> > @@ -2070,7 +2070,6 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
> >                                       TTU_IGNORE_MLOCK | TTU_BATCH_FLUSH);
> >
> >               xas_lock_irq(&xas);
> > -             xas_set(&xas, index);
> >
> >               VM_BUG_ON_PAGE(page != xas_load(&xas), page);
> >
> > --
> > 2.41.0.rc2.161.g9c6817b8e7-goog

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm/khugepaged: fix iteration in collapse_file
  2023-06-10  1:45   ` David Stevens
@ 2023-06-11 18:26     ` Hugh Dickins
  2023-06-25 19:06       ` [PATCH] mm/khugepaged: fix regression in collapse_file() Hugh Dickins
  0 siblings, 1 reply; 9+ messages in thread
From: Hugh Dickins @ 2023-06-11 18:26 UTC (permalink / raw)
  To: David Stevens
  Cc: Hugh Dickins, linux-mm, Andrew Morton, Peter Xu, Matthew Wilcox,
	Kirill A . Shutemov, Yang Shi, David Hildenbrand, Jiaqi Yan,
	linux-kernel

[-- Attachment #1: Type: text/plain, Size: 5774 bytes --]

On Sat, 10 Jun 2023, David Stevens wrote:
> On Sat, Jun 10, 2023 at 5:03 AM Hugh Dickins <hughd@google.com> wrote:
> > On Wed, 7 Jun 2023, David Stevens wrote:
> > >
> > > Remove an unnecessary call to xas_set(index) when iterating over the
> > > target range in collapse_file. The extra call to xas_set reset the xas
> > > cursor to the top of the tree, causing the xas_next call on the next
> > > iteration to walk the tree to index instead of advancing to index+1.
> > > This returned the same page again, which would cause collapse_file to
> > > fail because the page is already locked.
> > >
> > > This bug was hidden when CONFIG_DEBUG_VM was set. When that config was
> > > used, the xas_load in a subsequent VM_BUG_ON assert would walk xas from
> > > the top of the tree to index, causing the xas_next call on the next loop
> > > iteration to advance the cursor as expected.
> > >
> > > Fixes: a2e17cc2efc7 ("mm/khugepaged: maintain page cache uptodate flag")
> > > Signed-off-by: David Stevens <stevensd@chromium.org>
> >
> > This patch seems to be wrong,

I withdraw the accusation!  I haven't thought about the patch itself:
I'd have to learn a bit more about the xarray rules, so cannot quickly
ack it myself; but withdraw my implicit nack.

> > but I have not investigated why.
> >
> > It's certainly an interesting and worrying observation,
> > if a CONFIG_DEBUG_VM=y kernel goes a significantly different way.
> >
> > I almost always do have CONFIG_DEBUG_VM=y, so you won't be surprised that
> > I never saw the issue.  But once I ran an mm-everything with this patch in,
> > I hit that VM_BUG_ON_PAGE(page != xas_load(&xas), page) for the first time
> > (after about 2 hours of huge tmpfs swapping load).
> 
> Is the particular workload one you can share?

It's a workload familiar to me, and no secret, but which would take much
more time to share than I have to spare; with no great likelihood that
you could easily reproduce all the conditions at the end of it.

> I haven't hit that assert so far with my tests.

And nor have I since.  I've been trying 6.4-rc5 with and without your
patch, mm-everything (of a few days ago) with and without your patch,
with and without my pte_offset_map changes.  Not once have I seen that
VM_BUG_ON_PAGE(page != xas_load(&xas), page) again, even with the exact
same kernel.

I've no doubt that I did hit it, but I'm now having to assume that it's
something very rare (or very rare in my particular testing).  And because
the first time I saw it coincided with the first time I had in your patch
to the last non-blank line above it, I jumped to the conclusion that it
was your patch causing it.  Not unreasonable at the time, but not
justified by later attempts.  Perhaps there was something unrelated
in that kernel, corrupting an xarray node.

If I ever see it again, I hope I'll be able to spend more time
investigating; or if I study that code with more leisure, maybe a
hypothesis for a rare race will arise.  But for now, we'd better
just forget about it.

> 
> Also, I'm a little surprised that this is the assert which is being
> hit. My patch series didn't make that many changes to the first loop
> of the function, and the changes it did make were mostly about missing
> pages, not present pages.

I was suspicious of that patch series, as I said at the time; and might
one day want to revert it; but it has not given me any actual problem -
that only came with this fix to it.

> 
> > As if you have just transferred the problem from DEBUG_VM=n to DEBUG_VM=y.
> > But I then tried a CONFIG_DEBUG_VM off 6.4-rc1 kernel (including the fixee
> > but not this fixer) under similar load, and saw no problem in 14 hours.
> > So I can't even reproduce the bug that is being fixed here: only hit a
> > bug that it introduces.
> 
> The bug this fixes isn't a crash - it's the fact that khugepage

Sorry, yes, I should have re-read your commit message before writing
all that.

Thanks,
Hugh

> becomes nearly unable to collapse pages. Specifically, it can only
> collapse if there is exactly one present page, and that page is at
> index 511. Since MADV_COLLAPSE uses the same code path, it's easy to
> reproduce the bug I'm trying to fix with this program:
> 
> #define _GNU_SOURCE
> #include <assert.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <sys/mman.h>
> #include <unistd.h>
> 
> #define THP_SIZE (2 * 1024 * 1024)
> 
> #ifndef MADV_COLLAPSE
> #define MADV_COLLAPSE 25
> #endif
> 
> int main() {
>   int memfd = memfd_create("memfd", MFD_CLOEXEC);
>   assert(memfd >= 0);
> 
>   int ret = ftruncate(memfd, THP_SIZE);
>   assert(ret >= 0);
> 
>   char *addr = mmap(NULL, THP_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, memfd, 0);
>   assert(addr != MAP_FAILED);
> 
>   addr[0] = 0xff;
> 
>   ret = madvise(addr, THP_SIZE, MADV_COLLAPSE);
>   assert(ret == 0);
> }
> 
> If DEBUG_VM isn't set, then the test will trigger the assert. If
> DEBUG_VM is set or if this fix is included, then the test will pass.
> 
> -David
> 
> > Hugh
> >
> > > ---
> > >  mm/khugepaged.c | 1 -
> > >  1 file changed, 1 deletion(-)
> > >
> > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > > index 6b9d39d65b73..2d0d58fb4e7f 100644
> > > --- a/mm/khugepaged.c
> > > +++ b/mm/khugepaged.c
> > > @@ -2070,7 +2070,6 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
> > >                                       TTU_IGNORE_MLOCK | TTU_BATCH_FLUSH);
> > >
> > >               xas_lock_irq(&xas);
> > > -             xas_set(&xas, index);
> > >
> > >               VM_BUG_ON_PAGE(page != xas_load(&xas), page);
> > >
> > > --
> > > 2.41.0.rc2.161.g9c6817b8e7-goog

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] mm/khugepaged: fix regression in collapse_file()
  2023-06-11 18:26     ` Hugh Dickins
@ 2023-06-25 19:06       ` Hugh Dickins
  2023-06-25 19:42         ` Linus Torvalds
  0 siblings, 1 reply; 9+ messages in thread
From: Hugh Dickins @ 2023-06-25 19:06 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Stevens, Andrew Morton, Peter Xu, Matthew Wilcox,
	Kirill A . Shutemov, Yang Shi, David Hildenbrand, Jiaqi Yan,
	linux-kernel, linux-mm

There is no xas_pause(&xas) in collapse_file()'s main loop, at the points
where it does xas_unlock_irq(&xas) and then continues.

That would explain why, once two weeks ago and twice yesterday, I have
hit the VM_BUG_ON_PAGE(page != xas_load(&xas), page) since "mm/khugepaged:
fix iteration in collapse_file" removed the xas_set(&xas, index) just
before it: xas.xa_node could be left pointing to a stale node, if there
was concurrent activity on the file which transformed its xarray.

I tried inserting xas_pause()s, but then even bootup crashed on that
VM_BUG_ON_PAGE(): there appears to be a subtle "nextness" implicit in
xas_pause().

xas_next() and xas_pause() are good for use in simple loops, but not in
this one: xas_set() worked well until now, so use xas_set(&xas, index)
explicitly at the head of the loop; and change that VM_BUG_ON_PAGE() not
to need its own xas_set(), and not to interfere with the xa_state (which
would probably stop the crashes from xas_pause(), but I trust that less).

Link: https://lore.kernel.org/linux-mm/f18e4b64-3f88-a8ab-56cc-d1f5f9c58d4@google.com/
Fixes: c8a8f3b4a95a ("mm/khugepaged: fix iteration in collapse_file")
Signed-off-by: Hugh Dickins <hughd@google.com>
---
Linus, I'm rushing this directly to you, but not really expecting you
to put it in at this stage, unless you're very comfortable with it,
or perhaps it catches Matthew's eye and gets a quick Ack from him.

The commit being fixed only got in after -rc7, after being held up by my
initial report of this crash; but I had to rescind that when I couldn't
reproduce it at all. Then yesterday morning it hit again on two machines,
and reading XArray Doc reminded me of xas_pause() - seems obvious now.
Patch ran for 14 hours overnight on those two machines without a problem.

 mm/khugepaged.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 2d0d58fb4e7f..47b59f2843f6 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1918,9 +1918,9 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
 		}
 	} while (1);
 
-	xas_set(&xas, start);
 	for (index = start; index < end; index++) {
-		page = xas_next(&xas);
+		xas_set(&xas, index);
+		page = xas_load(&xas);
 
 		VM_BUG_ON(index != xas.xa_index);
 		if (is_shmem) {
@@ -1935,7 +1935,6 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
 						result = SCAN_TRUNCATED;
 						goto xa_locked;
 					}
-					xas_set(&xas, index + 1);
 				}
 				if (!shmem_charge(mapping->host, 1)) {
 					result = SCAN_FAIL;
@@ -2071,7 +2070,7 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
 
 		xas_lock_irq(&xas);
 
-		VM_BUG_ON_PAGE(page != xas_load(&xas), page);
+		VM_BUG_ON_PAGE(page != xa_load(xas.xa, index), page);
 
 		/*
 		 * We control three references to the page:
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm/khugepaged: fix regression in collapse_file()
  2023-06-25 19:06       ` [PATCH] mm/khugepaged: fix regression in collapse_file() Hugh Dickins
@ 2023-06-25 19:42         ` Linus Torvalds
  2023-06-29  4:31           ` [PATCH hotfix] " Hugh Dickins
  0 siblings, 1 reply; 9+ messages in thread
From: Linus Torvalds @ 2023-06-25 19:42 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: David Stevens, Andrew Morton, Peter Xu, Matthew Wilcox,
	Kirill A . Shutemov, Yang Shi, David Hildenbrand, Jiaqi Yan,
	linux-kernel, linux-mm

On Sun, 25 Jun 2023 at 12:06, Hugh Dickins <hughd@google.com> wrote:
>
> Linus, I'm rushing this directly to you, but not really expecting you
> to put it in at this stage, unless you're very comfortable with it,
> or perhaps it catches Matthew's eye and gets a quick Ack from him.

Yeah, the fix and explanation sound fine to me, but the maple tree
code confuses me enough and has been subtle enough (as exemplified by
this bug and many others) that there's no way I'll apply this just
before the final 6.4 without some "Ack, that's obviously correct" from
Willy or Liam or somebody.

Willy?

                    Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH hotfix] mm/khugepaged: fix regression in collapse_file()
  2023-06-25 19:42         ` Linus Torvalds
@ 2023-06-29  4:31           ` Hugh Dickins
  2023-06-29 16:42             ` Linus Torvalds
  0 siblings, 1 reply; 9+ messages in thread
From: Hugh Dickins @ 2023-06-29  4:31 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Matthew Wilcox, Linus Torvalds, Hugh Dickins, David Stevens,
	Peter Xu, Kirill A . Shutemov, Yang Shi, David Hildenbrand,
	Jiaqi Yan, linux-kernel, linux-mm

There is no xas_pause(&xas) in collapse_file()'s main loop, at the points
where it does xas_unlock_irq(&xas) and then continues.

That would explain why, once two weeks ago and twice yesterday, I have
hit the VM_BUG_ON_PAGE(page != xas_load(&xas), page) since "mm/khugepaged:
fix iteration in collapse_file" removed the xas_set(&xas, index) just
before it: xas.xa_node could be left pointing to a stale node, if there
was concurrent activity on the file which transformed its xarray.

I tried inserting xas_pause()s, but then even bootup crashed on that
VM_BUG_ON_PAGE(): there appears to be a subtle "nextness" implicit in
xas_pause().

xas_next() and xas_pause() are good for use in simple loops, but not in
this one: xas_set() worked well until now, so use xas_set(&xas, index)
explicitly at the head of the loop; and change that VM_BUG_ON_PAGE() not
to need its own xas_set(), and not to interfere with the xa_state (which
would probably stop the crashes from xas_pause(), but I trust that less).

The user-visible effects of this bug (if VM_BUG_ONs are configured out)
would be data loss and data leak - potentially - though in practice I
expect it is more likely that a subsequent check (e.g. on mapping or on
nr_none) would notice an inconsistency, and just abandon the collapse.

Link: https://lore.kernel.org/linux-mm/f18e4b64-3f88-a8ab-56cc-d1f5f9c58d4@google.com/
Fixes: c8a8f3b4a95a ("mm/khugepaged: fix iteration in collapse_file")
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
---
 mm/khugepaged.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 3beb4ad2ee5e..78c8d5d8b628 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1937,9 +1937,9 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
 		}
 	} while (1);
 
-	xas_set(&xas, start);
 	for (index = start; index < end; index++) {
-		page = xas_next(&xas);
+		xas_set(&xas, index);
+		page = xas_load(&xas);
 
 		VM_BUG_ON(index != xas.xa_index);
 		if (is_shmem) {
@@ -1954,7 +1954,6 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
 						result = SCAN_TRUNCATED;
 						goto xa_locked;
 					}
-					xas_set(&xas, index + 1);
 				}
 				if (!shmem_charge(mapping->host, 1)) {
 					result = SCAN_FAIL;
@@ -2090,7 +2089,7 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
 
 		xas_lock_irq(&xas);
 
-		VM_BUG_ON_PAGE(page != xas_load(&xas), page);
+		VM_BUG_ON_PAGE(page != xa_load(xas.xa, index), page);
 
 		/*
 		 * We control three references to the page:
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH hotfix] mm/khugepaged: fix regression in collapse_file()
  2023-06-29  4:31           ` [PATCH hotfix] " Hugh Dickins
@ 2023-06-29 16:42             ` Linus Torvalds
  0 siblings, 0 replies; 9+ messages in thread
From: Linus Torvalds @ 2023-06-29 16:42 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Matthew Wilcox, David Stevens, Peter Xu,
	Kirill A . Shutemov, Yang Shi, David Hildenbrand, Jiaqi Yan,
	linux-kernel, linux-mm

On Wed, 28 Jun 2023 at 21:31, Hugh Dickins <hughd@google.com> wrote:
>
> There is no xas_pause(&xas) in collapse_file()'s main loop, at the points
> where it does xas_unlock_irq(&xas) and then continues.

Thanks for reminding me. Now applied,

                   Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-06-29 16:42 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-07  5:31 [PATCH] mm/khugepaged: fix iteration in collapse_file David Stevens
2023-06-07 19:13 ` Peter Xu
2023-06-09 20:02 ` Hugh Dickins
2023-06-10  1:45   ` David Stevens
2023-06-11 18:26     ` Hugh Dickins
2023-06-25 19:06       ` [PATCH] mm/khugepaged: fix regression in collapse_file() Hugh Dickins
2023-06-25 19:42         ` Linus Torvalds
2023-06-29  4:31           ` [PATCH hotfix] " Hugh Dickins
2023-06-29 16:42             ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).