NVDIMM Device and Persistent Memory development
 help / color / mirror / Atom feed
From: Alistair Popple <apopple@nvidia.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org, david@fromorbit.com, jhubbard@nvidia.com,
	rcampbell@nvidia.com, willy@infradead.org, jgg@nvidia.com,
	linux-fsdevel@vger.kernel.org, jack@suse.cz, djwong@kernel.org,
	hch@lst.de, david@redhat.com, ruansy.fnst@fujitsu.com,
	nvdimm@lists.linux.dev, linux-xfs@vger.kernel.org,
	linux-ext4@vger.kernel.org, jglisse@redhat.com
Subject: Re: [RFC 00/10] fs/dax: Fix FS DAX page reference counts
Date: Fri, 12 Apr 2024 13:54:24 +1000	[thread overview]
Message-ID: <87frvr5has.fsf@nvdebian.thelocal> (raw)
In-Reply-To: <66181dd83f74e_15786294e8@dwillia2-mobl3.amr.corp.intel.com.notmuch>


Dan Williams <dan.j.williams@intel.com> writes:

> Alistair Popple wrote:
>> FS DAX pages have always maintained their own page reference counts
>> without following the normal rules for page reference counting. In
>> particular pages are considered free when the refcount hits one rather
>> than zero and refcounts are not added when mapping the page.
>
>> Tracking this requires special PTE bits (PTE_DEVMAP) and a secondary
>> mechanism for allowing GUP to hold references on the page (see
>> get_dev_pagemap). However there doesn't seem to be any reason why FS
>> DAX pages need their own reference counting scheme.
>
> This is fair. However, for anyone coming in fresh to this situation
> maybe some more "how we get here" history helps. That longer story is
> here:
>
> http://lore.kernel.org/all/166579181584.2236710.17813547487183983273.stgit@dwillia2-xfh.jf.intel.com/

Good idea.

>> This RFC is an initial attempt at removing the special reference
>> counting and instead refcount FS DAX pages the same as normal pages.
>> 
>> There are still a couple of rough edges - in particular I haven't
>> completely removed the devmap PTE bit references from arch specific
>> code and there is probably some more cleanup of dev_pagemap reference
>> counting that could be done, particular in mm/gup.c. I also haven't
>> yet compiled on anything other than x86_64.
>> 
>> Before continuing further with this clean-up though I would appreciate
>> some feedback on the viability of this approach and any issues I may
>> have overlooked, as I am not intimately familiar with FS DAX code (or
>> for that matter the FS layer in general).
>> 
>> I have of course run some basic testing which didn't reveal any
>> problems.
>
> FWIW I see the following with the ndctl/dax test-suite (double-checked
> that vanilla v6.6 passes). I will take a look at the patches, but in the
> meantime...

Hmmm...

> # meson test -C build --suite ndctl:dax
> ninja: no work to do.
> ninja: Entering directory `/root/git/ndctl/build'
> [1/70] Generating version.h with a custom command
>  1/13 ndctl:dax / daxdev-errors.sh          OK              14.46s
>  2/13 ndctl:dax / multi-dax.sh              OK               2.70s
>  3/13 ndctl:dax / sub-section.sh            OK               7.21s
>  4/13 ndctl:dax / dax-dev                   OK               0.08s
> [5/13] 🌖 ndctl:dax / dax-ext4.sh                            0/600s

...thanks for pasting that output. Turns out I didn't have destructive
testing enabled during the build so hadn't noticed these tests were not
running. It would be nice if these were reported as skipped when not
enabled rather than hidden.

With that fixed I'm seeing a couple of kernel warnings (and I think I
know why), so it might be worth holding off looking at this too closely
until I've fixed these.

> ...that last test crashed with:
>
>  EXT4-fs (pmem0): mounted filesystem 2adea02a-a791-4714-be40-125afd16634b r/w with ordered
> ota mode: none.
>  page:ffffea0005f00000 refcount:0 mapcount:0 mapping:ffff8882a8a6be10 index:0x5800 pfn:0x1
>
>  head:ffffea0005f00000 order:9 entire_mapcount:0 nr_pages_mapped:0 pincount:0
>  aops:ext4_dax_aops ino:c dentry name:"image"
>  flags: 0x4ffff800004040(reserved|head|node=0|zone=4|lastcpupid=0x1ffff)
>  page_type: 0xffffffff()
>  raw: 004ffff800004040 ffff888202681520 0000000000000000 ffff8882a8a6be10
>  raw: 0000000000005800 0000000000000000 00000000ffffffff 0000000000000000
>  page dumped because: VM_BUG_ON_FOLIO(((unsigned int) folio_ref_count(folio) + 127u <= 127
>
>  ------------[ cut here ]------------
>  kernel BUG at include/linux/mm.h:1419!
>  invalid opcode: 0000 [#1] PREEMPT SMP PTI
>  CPU: 0 PID: 1415 Comm: dax-pmd Tainted: G           OE    N 6.6.0+ #209
>  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20230524-3.fc38 05/24/2023
>  RIP: 0010:dax_insert_pfn_pmd+0x41c/0x430
>  Code: 89 c1 41 b8 01 00 00 00 48 89 ea 4c 89 e6 4c 89 f7 e8 18 8a c7 ff e9 e0 fc ff ff 48
> c b3 48 89 c7 e8 a4 53 f7 ff <0f> 0b e8 0d ba a8 00 48 8b 15 86 8a 62 01 e9 89 fc ff ff 90
>
>  RSP: 0000:ffffc90001d57b68 EFLAGS: 00010246
>  RAX: 000000000000005c RBX: ffffea0005f00000 RCX: 0000000000000000
>  RDX: 0000000000000000 RSI: ffffffffb3749a15 RDI: 00000000ffffffff
>  RBP: ffff8882982c07e0 R08: 00000000ffffdfff R09: 0000000000000001
>  R10: 00000000ffffdfff R11: ffffffffb3a771c0 R12: 800000017c0008e7
>  R13: 8000000000000025 R14: ffff888202a395f8 R15: ffffea0005f00000
>  FS:  00007fdaa00e3d80(0000) GS:ffff888477000000(0000) knlGS:0000000000000000
>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  CR2: 00007fda9f800000 CR3: 0000000296224000 CR4: 00000000000006f0
>  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>  Call Trace:
>   <TASK>
>   ? die+0x32/0x80
>   ? do_trap+0xd6/0x100
>   ? dax_insert_pfn_pmd+0x41c/0x430
>   ? dax_insert_pfn_pmd+0x41c/0x430
>   ? do_error_trap+0x81/0x110
>   ? dax_insert_pfn_pmd+0x41c/0x430
>   ? exc_invalid_op+0x4c/0x60
>   ? dax_insert_pfn_pmd+0x41c/0x430
>   ? asm_exc_invalid_op+0x16/0x20
>   ? dax_insert_pfn_pmd+0x41c/0x430
>   ? dax_insert_pfn_pmd+0x41c/0x430
>   dax_fault_iter+0x5d0/0x700
>   dax_iomap_pmd_fault+0x212/0x450
>   ext4_dax_huge_fault+0x1dc/0x470
>   __handle_mm_fault+0x808/0x13e0
>   handle_mm_fault+0x178/0x3e0
>   do_user_addr_fault+0x186/0x830
>   exc_page_fault+0x6f/0x1d0
>   asm_exc_page_fault+0x22/0x30
>  RIP: 0033:0x7fdaa072d009


  parent reply	other threads:[~2024-04-12  4:36 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-11  0:57 [RFC 00/10] fs/dax: Fix FS DAX page reference counts Alistair Popple
2024-04-11  0:57 ` [RFC 01/10] mm/gup.c: Remove redundant check for PCI P2PDMA page Alistair Popple
2024-04-11 12:59   ` Jason Gunthorpe
2024-04-11 13:47   ` David Hildenbrand
2024-04-12  1:37     ` Alistair Popple
2024-04-11  0:57 ` [RFC 02/10] mm/hmm: Remove dead check for HugeTLB and FS DAX Alistair Popple
2024-04-11 12:25   ` Jason Gunthorpe
2024-04-11 13:37     ` Peter Xu
2024-04-12  1:28       ` Alistair Popple
2024-04-11  0:57 ` [RFC 03/10] pci/p2pdma: Don't initialise page refcount to one Alistair Popple
2024-04-11 12:29   ` Jason Gunthorpe
2024-04-12  5:40     ` Alistair Popple
2024-04-12 17:20   ` Dan Williams
2024-05-09 21:59   ` Logan Gunthorpe
2024-05-09 23:14     ` Alistair Popple
2024-04-11  0:57 ` [RFC 04/10] fs/dax: Don't track page mapping/index Alistair Popple
2024-04-12 15:22   ` Jan Kara
2024-04-12 17:31     ` Dan Williams
2024-04-15  7:03       ` Alistair Popple
2024-04-15 20:51         ` Dan Williams
2024-04-16  0:07           ` Alistair Popple
2024-04-16  0:36             ` Dan Williams
2024-04-12 17:21   ` Dan Williams
2024-04-11  0:57 ` [RFC 05/10] fs/dax: Refactor wait for dax idle page Alistair Popple
2024-04-12 14:37   ` Jan Kara
2024-04-13 20:19   ` John Hubbard
2024-04-15  8:41     ` Alistair Popple
2024-04-11  0:57 ` [RFC 06/10] fs/dax: Add dax_page_free callback Alistair Popple
2024-04-11  0:57 ` [RFC 07/10] mm: Allow compound zone device pages Alistair Popple
2024-04-11 12:32   ` Jason Gunthorpe
2024-04-11 14:10   ` Matthew Wilcox
2024-04-12  1:38     ` Alistair Popple
2024-04-11  0:57 ` [RFC 08/10] fs/dax: Properly refcount fs dax pages Alistair Popple
2024-04-11  0:57 ` [RFC 09/10] mm/khugepage.c: Warn if trying to scan devmap pmd Alistair Popple
2024-04-11 13:45   ` David Hildenbrand
2024-04-12  1:34     ` Alistair Popple
2024-04-11  0:57 ` [RFC 10/10] mm: Remove pXX_devmap Alistair Popple
2024-04-11 12:57   ` Jason Gunthorpe
2024-04-11 17:28 ` [RFC 00/10] fs/dax: Fix FS DAX page reference counts Dan Williams
2024-04-11 17:35   ` Jason Gunthorpe
2024-04-11 17:56     ` Dan Williams
2024-04-12  3:54   ` Alistair Popple [this message]
2024-04-12  6:55     ` Alistair Popple
2024-04-12 11:53       ` Jason Gunthorpe
2024-04-12 17:32         ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87frvr5has.fsf@nvdebian.thelocal \
    --to=apopple@nvidia.com \
    --cc=dan.j.williams@intel.com \
    --cc=david@fromorbit.com \
    --cc=david@redhat.com \
    --cc=djwong@kernel.org \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=jgg@nvidia.com \
    --cc=jglisse@redhat.com \
    --cc=jhubbard@nvidia.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=nvdimm@lists.linux.dev \
    --cc=rcampbell@nvidia.com \
    --cc=ruansy.fnst@fujitsu.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).