From: Jason Gunthorpe <jgg@nvidia.com>
To: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Jonathan Corbet <corbet@lwn.net>,
Matthew Wilcox <willy@infradead.org>, Guo Ren <guoren@kernel.org>,
Thomas Bogendoerfer <tsbogend@alpha.franken.de>,
Heiko Carstens <hca@linux.ibm.com>,
Vasily Gorbik <gor@linux.ibm.com>,
Alexander Gordeev <agordeev@linux.ibm.com>,
Christian Borntraeger <borntraeger@linux.ibm.com>,
Sven Schnelle <svens@linux.ibm.com>,
"David S . Miller" <davem@davemloft.net>,
Andreas Larsson <andreas@gaisler.com>,
Arnd Bergmann <arnd@arndb.de>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Dan Williams <dan.j.williams@intel.com>,
Vishal Verma <vishal.l.verma@intel.com>,
Dave Jiang <dave.jiang@intel.com>,
Nicolas Pitre <nico@fluxnic.net>,
Muchun Song <muchun.song@linux.dev>,
Oscar Salvador <osalvador@suse.de>,
David Hildenbrand <david@redhat.com>,
Konstantin Komarov <almaz.alexandrovich@paragon-software.com>,
Baoquan He <bhe@redhat.com>, Vivek Goyal <vgoyal@redhat.com>,
Dave Young <dyoung@redhat.com>, Tony Luck <tony.luck@intel.com>,
Reinette Chatre <reinette.chatre@intel.com>,
Dave Martin <Dave.Martin@arm.com>,
James Morse <james.morse@arm.com>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
"Liam R . Howlett" <Liam.Howlett@oracle.com>,
Vlastimil Babka <vbabka@suse.cz>, Mike Rapoport <rppt@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Michal Hocko <mhocko@suse.com>, Hugh Dickins <hughd@google.com>,
Baolin Wang <baolin.wang@linux.alibaba.com>,
Uladzislau Rezki <urezki@gmail.com>,
Dmitry Vyukov <dvyukov@google.com>,
Andrey Konovalov <andreyknvl@gmail.com>,
Jann Horn <jannh@google.com>, Pedro Falcato <pfalcato@suse.de>,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-fsdevel@vger.kernel.org, linux-csky@vger.kernel.org,
linux-mips@vger.kernel.org, linux-s390@vger.kernel.org,
sparclinux@vger.kernel.org, nvdimm@lists.linux.dev,
linux-cxl@vger.kernel.org, linux-mm@kvack.org,
ntfs3@lists.linux.dev, kexec@lists.infradead.org,
kasan-dev@googlegroups.com, iommu@lists.linux.dev,
Kevin Tian <kevin.tian@intel.com>, Will Deacon <will@kernel.org>,
Robin Murphy <robin.murphy@arm.com>
Subject: Re: [PATCH v3 06/13] mm: add remap_pfn_range_prepare(), remap_pfn_range_complete()
Date: Tue, 16 Sep 2025 14:07:23 -0300 [thread overview]
Message-ID: <20250916170723.GO1086830@nvidia.com> (raw)
In-Reply-To: <7c050219963aade148332365f8d2223f267dd89a.1758031792.git.lorenzo.stoakes@oracle.com>
On Tue, Sep 16, 2025 at 03:11:52PM +0100, Lorenzo Stoakes wrote:
> We need the ability to split PFN remap between updating the VMA and
> performing the actual remap, in order to do away with the legacy
> f_op->mmap hook.
>
> To do so, update the PFN remap code to provide shared logic, and also make
> remap_pfn_range_notrack() static, as its one user, io_mapping_map_user()
> was removed in commit 9a4f90e24661 ("mm: remove mm/io-mapping.c").
>
> Then, introduce remap_pfn_range_prepare(), which accepts VMA descriptor
> and PFN parameters, and remap_pfn_range_complete() which accepts the same
> parameters as remap_pfn_rangte().
>
> remap_pfn_range_prepare() will set the cow vma->vm_pgoff if necessary, so
> it must be supplied with a correct PFN to do so. If the caller must hold
> locks to be able to do this, those locks should be held across the
> operation, and mmap_abort() should be provided to revoke the lock should
> an error arise.
It looks like the patches have changed to remove mmap_abort so this
paragraph can probably be dropped.
> static int remap_pfn_range_internal(struct vm_area_struct *vma, unsigned long addr,
> - unsigned long pfn, unsigned long size, pgprot_t prot)
> + unsigned long pfn, unsigned long size, pgprot_t prot, bool set_vma)
> {
> pgd_t *pgd;
> unsigned long next;
> @@ -2912,32 +2931,17 @@ static int remap_pfn_range_internal(struct vm_area_struct *vma, unsigned long ad
> if (WARN_ON_ONCE(!PAGE_ALIGNED(addr)))
> return -EINVAL;
>
> - /*
> - * Physically remapped pages are special. Tell the
> - * rest of the world about it:
> - * VM_IO tells people not to look at these pages
> - * (accesses can have side effects).
> - * VM_PFNMAP tells the core MM that the base pages are just
> - * raw PFN mappings, and do not have a "struct page" associated
> - * with them.
> - * VM_DONTEXPAND
> - * Disable vma merging and expanding with mremap().
> - * VM_DONTDUMP
> - * Omit vma from core dump, even when VM_IO turned off.
> - *
> - * There's a horrible special case to handle copy-on-write
> - * behaviour that some programs depend on. We mark the "original"
> - * un-COW'ed pages by matching them up with "vma->vm_pgoff".
> - * See vm_normal_page() for details.
> - */
> - if (is_cow_mapping(vma->vm_flags)) {
> - if (addr != vma->vm_start || end != vma->vm_end)
> - return -EINVAL;
> - vma->vm_pgoff = pfn;
> + if (set_vma) {
> + err = get_remap_pgoff(vma->vm_flags, addr, end,
> + vma->vm_start, vma->vm_end,
> + pfn, &vma->vm_pgoff);
> + if (err)
> + return err;
> + vm_flags_set(vma, VM_REMAP_FLAGS);
> + } else {
> + VM_WARN_ON_ONCE((vma->vm_flags & VM_REMAP_FLAGS) != VM_REMAP_FLAGS);
> }
It looks like you can avoid the changes to add set_vma by making
remap_pfn_range_internal() only do the complete portion and giving
the legacy calls a helper to do prepare in line:
int remap_pfn_range_prepare_vma(..)
{
int err;
err = get_remap_pgoff(vma->vm_flags, addr, end,
vma->vm_start, vma->vm_end,
pfn, &vma->vm_pgoff);
if (err)
return err;
vm_flags_set(vma, VM_REMAP_FLAGS);
return 0;
}
int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr,
unsigned long pfn, unsigned long size, pgprot_t prot)
{
int err;
err = remap_pfn_range_prepare_vma(vma, addr, pfn, size)
if (err)
return err;
if (IS_ENABLED(__HAVE_PFNMAP_TRACKING))
return remap_pfn_range_track(vma, addr, pfn, size, prot);
return remap_pfn_range_notrack(vma, addr, pfn, size, prot);
}
(fix pgtable_Types.h to #define to 1 so IS_ENABLED works)
But the logic here is all fine
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Jason
next prev parent reply other threads:[~2025-09-16 17:07 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-16 14:11 [PATCH v3 00/13] expand mmap_prepare functionality, port more users Lorenzo Stoakes
2025-09-16 14:11 ` [PATCH v3 01/13] mm/shmem: update shmem to use mmap_prepare Lorenzo Stoakes
2025-09-16 16:42 ` Jason Gunthorpe
2025-09-17 10:30 ` Pedro Falcato
2025-09-16 14:11 ` [PATCH v3 02/13] device/dax: update devdax " Lorenzo Stoakes
2025-09-16 16:43 ` Jason Gunthorpe
2025-09-17 10:37 ` Pedro Falcato
2025-09-17 13:54 ` Lorenzo Stoakes
2025-09-16 14:11 ` [PATCH v3 03/13] mm: add vma_desc_size(), vma_desc_pages() helpers Lorenzo Stoakes
2025-09-16 16:46 ` Jason Gunthorpe
2025-09-17 10:39 ` Pedro Falcato
2025-09-16 14:11 ` [PATCH v3 04/13] relay: update relay to use mmap_prepare Lorenzo Stoakes
2025-09-16 16:48 ` Jason Gunthorpe
2025-09-17 10:41 ` Pedro Falcato
2025-09-16 14:11 ` [PATCH v3 05/13] mm/vma: rename __mmap_prepare() function to avoid confusion Lorenzo Stoakes
2025-09-16 16:48 ` Jason Gunthorpe
2025-09-17 10:49 ` Pedro Falcato
2025-09-17 13:24 ` Lorenzo Stoakes
2025-09-16 14:11 ` [PATCH v3 06/13] mm: add remap_pfn_range_prepare(), remap_pfn_range_complete() Lorenzo Stoakes
2025-09-16 17:07 ` Jason Gunthorpe [this message]
2025-09-16 17:37 ` Lorenzo Stoakes
2025-09-17 11:07 ` Pedro Falcato
2025-09-17 11:16 ` Lorenzo Stoakes
2025-09-16 14:11 ` [PATCH v3 07/13] mm: introduce io_remap_pfn_range_[prepare, complete]() Lorenzo Stoakes
2025-09-16 17:19 ` Jason Gunthorpe
2025-09-16 17:34 ` Lorenzo Stoakes
2025-09-17 11:12 ` Pedro Falcato
2025-09-17 11:15 ` Lorenzo Stoakes
2025-09-16 14:11 ` [PATCH v3 08/13] mm: add ability to take further action in vm_area_desc Lorenzo Stoakes
2025-09-16 17:28 ` Jason Gunthorpe
2025-09-16 17:57 ` Lorenzo Stoakes
2025-09-16 18:08 ` Jason Gunthorpe
2025-09-16 19:31 ` Lorenzo Stoakes
2025-09-17 11:32 ` Pedro Falcato
2025-09-17 15:34 ` Lorenzo Stoakes
2025-09-16 14:11 ` [PATCH v3 09/13] doc: update porting, vfs documentation for mmap_prepare actions Lorenzo Stoakes
2025-09-16 14:11 ` [PATCH v3 10/13] mm/hugetlbfs: update hugetlbfs to use mmap_prepare Lorenzo Stoakes
2025-09-16 17:30 ` Jason Gunthorpe
2025-09-16 14:11 ` [PATCH v3 11/13] mm: update mem char driver " Lorenzo Stoakes
2025-09-16 17:40 ` Jason Gunthorpe
2025-09-16 18:02 ` Lorenzo Stoakes
2025-09-16 14:11 ` [PATCH v3 12/13] mm: update resctl " Lorenzo Stoakes
2025-09-16 17:40 ` Jason Gunthorpe
2025-09-16 18:02 ` Lorenzo Stoakes
2025-09-16 14:11 ` [PATCH v3 13/13] iommufd: update " Lorenzo Stoakes
2025-09-16 15:40 ` Jason Gunthorpe
2025-09-16 16:23 ` Lorenzo Stoakes
2025-09-17 1:32 ` Andrew Morton
2025-09-17 10:13 ` Lorenzo Stoakes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250916170723.GO1086830@nvidia.com \
--to=jgg@nvidia.com \
--cc=Dave.Martin@arm.com \
--cc=Liam.Howlett@oracle.com \
--cc=agordeev@linux.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=almaz.alexandrovich@paragon-software.com \
--cc=andreas@gaisler.com \
--cc=andreyknvl@gmail.com \
--cc=arnd@arndb.de \
--cc=baolin.wang@linux.alibaba.com \
--cc=bhe@redhat.com \
--cc=borntraeger@linux.ibm.com \
--cc=brauner@kernel.org \
--cc=corbet@lwn.net \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=davem@davemloft.net \
--cc=david@redhat.com \
--cc=dvyukov@google.com \
--cc=dyoung@redhat.com \
--cc=gor@linux.ibm.com \
--cc=gregkh@linuxfoundation.org \
--cc=guoren@kernel.org \
--cc=hca@linux.ibm.com \
--cc=hughd@google.com \
--cc=iommu@lists.linux.dev \
--cc=jack@suse.cz \
--cc=james.morse@arm.com \
--cc=jannh@google.com \
--cc=kasan-dev@googlegroups.com \
--cc=kevin.tian@intel.com \
--cc=kexec@lists.infradead.org \
--cc=linux-csky@vger.kernel.org \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mips@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-s390@vger.kernel.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@suse.com \
--cc=muchun.song@linux.dev \
--cc=nico@fluxnic.net \
--cc=ntfs3@lists.linux.dev \
--cc=nvdimm@lists.linux.dev \
--cc=osalvador@suse.de \
--cc=pfalcato@suse.de \
--cc=reinette.chatre@intel.com \
--cc=robin.murphy@arm.com \
--cc=rppt@kernel.org \
--cc=sparclinux@vger.kernel.org \
--cc=surenb@google.com \
--cc=svens@linux.ibm.com \
--cc=tony.luck@intel.com \
--cc=tsbogend@alpha.franken.de \
--cc=urezki@gmail.com \
--cc=vbabka@suse.cz \
--cc=vgoyal@redhat.com \
--cc=viro@zeniv.linux.org.uk \
--cc=vishal.l.verma@intel.com \
--cc=will@kernel.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).