All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Ryan Roberts <ryan.roberts@arm.com>
Cc: Zi Yan <ziy@nvidia.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, Yang Shi <shy828301@gmail.com>,
	Huang Ying <ying.huang@intel.com>
Subject: Re: [PATCH v3 10/18] mm: Allow non-hugetlb large folios to be batch processed
Date: Sun, 10 Mar 2024 11:08:49 +0000	[thread overview]
Message-ID: <Ze2UwZDhuxatwHgM@casper.infradead.org> (raw)
In-Reply-To: <59098a73-636a-497d-bc20-0abc90b4868c@arm.com>

On Sun, Mar 10, 2024 at 08:23:12AM +0000, Ryan Roberts wrote:
> It doesn't sound completely impossible to me that there is a rare error path that accidentally folio_put()s an extra time...

Your debug below seems to prove that it's an extra folio_put()
somewhere.

> >         list_for_each_entry_safe(folio, next, &ds_queue->split_queue,
> >                                                         _deferred_list) {
> > +		VM_BUG_ON_FOLIO(folio_nid(folio) != sc->nid, folio);
> > +		VM_BUG_ON_FOLIO(folio_order(folio) < 2, folio);
> >                 list_del_init(&folio->_deferred_list);
> > 
> > (also testing the hypothesis that somehow a split folio has ended up
> > on the deferred split list)
> 
> OK, ran with these checks, and get the following oops:
> 
> [  411.719461] page:0000000059c1826b refcount:0 mapcount:0 mapping:0000000000000000 index:0x1 pfn:0x8c6a40
> [  411.720807] page:0000000059c1826b refcount:0 mapcount:-128 mapping:0000000000000000 index:0x1 pfn:0x8c6a40
> [  411.721792] flags: 0xbfffc0000000000(node=0|zone=2|lastcpupid=0xffff)
> [  411.722453] page_type: 0xffffff7f(buddy)
> [  411.722870] raw: 0bfffc0000000000 fffffc001227e808 fffffc002a857408 0000000000000000
> [  411.723672] raw: 0000000000000001 0000000000000004 00000000ffffff7f 0000000000000000
> [  411.724470] page dumped because: VM_BUG_ON_FOLIO(!folio_test_large(folio))
> [  411.725176] ------------[ cut here ]------------
> [  411.725642] kernel BUG at include/linux/mm.h:1191!
> [  411.726341] Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
> [  411.727021] Modules linked in:
> [  411.727329] CPU: 40 PID: 2704 Comm: usemem Not tainted 6.8.0-rc5-00391-g44b0dc848590-dirty #45
> [  411.728179] Hardware name: linux,dummy-virt (DT)
> [  411.728657] pstate: 604000c5 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [  411.729381] pc : __dump_page+0x450/0x4a8
> [  411.729789] lr : __dump_page+0x450/0x4a8
> [  411.730187] sp : ffff80008b97b6f0
> [  411.730525] x29: ffff80008b97b6f0 x28: 00000000000000e2 x27: ffff80008b97b988
> [  411.731227] x26: ffff80008b97b988 x25: ffff800082105000 x24: 0000000000000001
> [  411.731926] x23: 0000000000000000 x22: 0000000000000001 x21: fffffc00221a9000
> [  411.732630] x20: fffffc00221a9000 x19: fffffc00221a9000 x18: ffffffffffffffff
> [  411.733331] x17: 3030303030303030 x16: 2066376666666666 x15: 076c076f07660721
> [  411.734035] x14: 0728074f0749074c x13: 076c076f07660721 x12: 0000000000000000
> [  411.734757] x11: 0720072007200729 x10: ffff0013f5e756c0 x9 : ffff80008014b604
> [  411.735473] x8 : 00000000ffffbfff x7 : ffff0013f5e756c0 x6 : 0000000000000000
> [  411.736198] x5 : ffff0013a5a24d88 x4 : 0000000000000000 x3 : 0000000000000000
> [  411.736923] x2 : 0000000000000000 x1 : ffff0000c2849b80 x0 : 000000000000003e
> [  411.737621] Call trace:
> [  411.737870]  __dump_page+0x450/0x4a8
> [  411.738229]  dump_page+0x2c/0x70
> [  411.738551]  deferred_split_scan+0x258/0x368
> [  411.738973]  do_shrink_slab+0x184/0x750
> 
> The new VM_BUG_ON_FOLIO(folio_order(folio) < 2, folio); is firing, but then when dump_page() does this:
> 
> 	if (compound) {
> 		pr_warn("head:%p order:%u entire_mapcount:%d nr_pages_mapped:%d pincount:%d\n",
> 				head, compound_order(head),
> 				folio_entire_mapcount(folio),
> 				folio_nr_pages_mapped(folio),
> 				atomic_read(&folio->_pincount));
> 	}
> 
> VM_BUG_ON_FOLIO(!folio_test_large(folio), folio); inside folio_entire_mapcount() fires so we have a nested oops.

Ah.  I'm not sure what 44b0dc848590 is -- probably a local commit, but
I guess you don't have fae7d834c43c in it which would prevent the nested
oops.  Nevertheless, the nested oops does tell us something interesting.

> So the very first line is from the first oops and the rest is from the second. I guess we are racing with the page being freed? I find the change in mapcount interesting; 0 -> -128. Not sure why this would happen?

That's PG_buddy being set in PageType.

> Given the NID check didn't fire, I wonder if this points more towards extra folio_put than corrupt folio nid?

Must be if PG_buddy got set.  But we're still left with the question of
how the page gets freed while still being on the deferred list and
doesn't trigger bad_page(page, "still on deferred list") ...

Anyway, we've made some progress.  We now understand how a freed page
gets its deferred list overwritten -- we've found a split page on the
deferred list with refcount 0, we _assumed_ it was still intact and
overwrote a different page's ->mapping.  And it makes sense that my
patch opened the window wider to hit this problem.

I just checked that free_unref_folios() still does the right thing, and
that also relies on the page not yet being split:

                unsigned int order = folio_order(folio);

                if (order > 0 && folio_test_large_rmappable(folio))
                        folio_undo_large_rmappable(folio);
                if (!free_unref_page_prepare(&folio->page, pfn, order))
                        continue;

so there shouldn't be a point in the page freeing process where the
folio is split before we take it off the deferred list.

split_huge_page_to_list_to_order() is also very careful to take the
ds_queue->split_queue_lock before freezing the folio ref, so it's
not a race with that.  I don't see what it is yet.


  reply	other threads:[~2024-03-10 11:08 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-27 17:42 [PATCH v3 00/18] Rearrange batched folio freeing Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 01/18] mm: Make folios_put() the basis of release_pages() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 02/18] mm: Convert free_unref_page_list() to use folios Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 03/18] mm: Add free_unref_folios() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 04/18] mm: Use folios_put() in __folio_batch_release() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 05/18] memcg: Add mem_cgroup_uncharge_folios() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 06/18] mm: Remove use of folio list from folios_put() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 07/18] mm: Use free_unref_folios() in put_pages_list() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 08/18] mm: use __page_cache_release() in folios_put() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 09/18] mm: Handle large folios in free_unref_folios() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 10/18] mm: Allow non-hugetlb large folios to be batch processed Matthew Wilcox (Oracle)
2024-03-06 13:42   ` Ryan Roberts
2024-03-06 16:09     ` Matthew Wilcox
2024-03-06 16:19       ` Ryan Roberts
2024-03-06 17:41         ` Ryan Roberts
2024-03-06 18:41           ` Zi Yan
2024-03-06 19:55             ` Matthew Wilcox
2024-03-06 21:55               ` Matthew Wilcox
2024-03-07  8:56                 ` Ryan Roberts
2024-03-07 13:50                   ` Yin, Fengwei
2024-03-07 14:05                     ` Re: Matthew Wilcox
2024-03-07 15:24                       ` Re: Ryan Roberts
2024-03-07 16:24                         ` Re: Ryan Roberts
2024-03-07 23:02                           ` Re: Matthew Wilcox
2024-03-08  1:06                       ` Re: Yin, Fengwei
2024-03-07 17:33                   ` [PATCH v3 10/18] mm: Allow non-hugetlb large folios to be batch processed Matthew Wilcox
2024-03-07 18:35                     ` Ryan Roberts
2024-03-07 20:42                       ` Matthew Wilcox
2024-03-08 11:44                     ` Ryan Roberts
2024-03-08 12:09                       ` Ryan Roberts
2024-03-08 14:21                         ` Ryan Roberts
2024-03-08 15:11                           ` Matthew Wilcox
2024-03-08 16:03                             ` Matthew Wilcox
2024-03-08 17:13                               ` Ryan Roberts
2024-03-08 18:09                                 ` Ryan Roberts
2024-03-08 18:18                                   ` Matthew Wilcox
2024-03-09  4:34                                     ` Andrew Morton
2024-03-09  4:52                                       ` Matthew Wilcox
2024-03-09  8:05                                         ` Ryan Roberts
2024-03-09 12:33                                           ` Ryan Roberts
2024-03-10 13:38                                             ` Matthew Wilcox
2024-03-08 15:33                         ` Matthew Wilcox
2024-03-09  6:09                       ` Matthew Wilcox
2024-03-09  7:59                         ` Ryan Roberts
2024-03-09  8:18                           ` Ryan Roberts
2024-03-09  9:38                             ` Ryan Roberts
2024-03-10  4:23                               ` Matthew Wilcox
2024-03-10  8:23                                 ` Ryan Roberts
2024-03-10 11:08                                   ` Matthew Wilcox [this message]
2024-03-10 11:01       ` Ryan Roberts
2024-03-10 11:11         ` Matthew Wilcox
2024-03-10 16:31           ` Ryan Roberts
2024-03-10 19:57             ` Matthew Wilcox
2024-03-10 19:59             ` Ryan Roberts
2024-03-10 20:46               ` Matthew Wilcox
2024-03-10 21:52                 ` Matthew Wilcox
2024-03-11  9:01                   ` Ryan Roberts
2024-03-11 12:26                     ` Matthew Wilcox
2024-03-11 12:36                       ` Ryan Roberts
2024-03-11 15:50                         ` Matthew Wilcox
2024-03-11 16:14                           ` Ryan Roberts
2024-03-11 17:49                             ` Matthew Wilcox
2024-03-12 11:57                               ` Ryan Roberts
2024-03-11 19:26                             ` Matthew Wilcox
2024-03-10 11:14         ` Ryan Roberts
2024-02-27 17:42 ` [PATCH v3 11/18] mm: Free folios in a batch in shrink_folio_list() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 12/18] mm: Free folios directly in move_folios_to_lru() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 13/18] memcg: Remove mem_cgroup_uncharge_list() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 14/18] mm: Remove free_unref_page_list() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 15/18] mm: Remove lru_to_page() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 16/18] mm: Convert free_pages_and_swap_cache() to use folios_put() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 17/18] mm: Use a folio in __collapse_huge_page_copy_succeeded() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 18/18] mm: Convert free_swap_cache() to take a folio Matthew Wilcox (Oracle)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Ze2UwZDhuxatwHgM@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=linux-mm@kvack.org \
    --cc=ryan.roberts@arm.com \
    --cc=shy828301@gmail.com \
    --cc=ying.huang@intel.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.