* [PATCH v4 00/14] expand mmap_prepare functionality, port more users
@ 2025-09-17 19:11 Lorenzo Stoakes
2025-09-17 19:11 ` [PATCH v4 01/14] mm/shmem: update shmem to use mmap_prepare Lorenzo Stoakes
` (14 more replies)
0 siblings, 15 replies; 29+ messages in thread
From: Lorenzo Stoakes @ 2025-09-17 19:11 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Matthew Wilcox, Guo Ren, Thomas Bogendoerfer,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, David S . Miller,
Andreas Larsson, Arnd Bergmann, Greg Kroah-Hartman, Dan Williams,
Vishal Verma, Dave Jiang, Nicolas Pitre, Muchun Song,
Oscar Salvador, David Hildenbrand, Konstantin Komarov, Baoquan He,
Vivek Goyal, Dave Young, Tony Luck, Reinette Chatre, Dave Martin,
James Morse, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Hugh Dickins, Baolin Wang,
Uladzislau Rezki, Dmitry Vyukov, Andrey Konovalov, Jann Horn,
Pedro Falcato, linux-doc, linux-kernel, linux-fsdevel, linux-csky,
linux-mips, linux-s390, sparclinux, nvdimm, linux-cxl, linux-mm,
ntfs3, kexec, kasan-dev, Jason Gunthorpe, iommu, Kevin Tian,
Will Deacon, Robin Murphy
Since commit c84bf6dd2b83 ("mm: introduce new .mmap_prepare() file
callback"), The f_op->mmap hook has been deprecated in favour of
f_op->mmap_prepare.
This was introduced in order to make it possible for us to eventually
eliminate the f_op->mmap hook which is highly problematic as it allows
drivers and filesystems raw access to a VMA which is not yet correctly
initialised.
This hook also introduced complexity for the memory mapping operation, as
we must correctly unwind what we do should an error arises.
Overall this interface being so open has caused significant problems for
us, including security issues, it is important for us to simply eliminate
this as a source of problems.
Therefore this series continues what was established by extending the
functionality further to permit more drivers and filesystems to use
mmap_prepare.
We start by udpating some existing users who can use the mmap_prepare
functionality as-is.
We then introduce the concept of an mmap 'action', which a user, on
mmap_prepare, can request to be performed upon the VMA:
* Nothing - default, we're done
* Remap PFN - perform PFN remap with specified parameters
* I/O remap PFN - perform I/O PFN remap with specified parameters
By setting the action in mmap_prepare, this allows us to dynamically decide
what to do next, so if a driver/filesystem needs to determine whether to
e.g. remap or use a mixed map, it can do so then change which is done.
This significantly expands the capabilities of the mmap_prepare hook, while
maintaining as much control as possible in the mm logic.
We split [io_]remap_pfn_range*() functions which allow for PFN remap (a
typical mapping prepopulation operation) split between a prepare/complete
step, as well as io_mremap_pfn_range_prepare, complete for a similar
purpose.
From there we update various mm-adjacent logic to use this functionality as
a first set of changes.
We also add success and error hooks for post-action processing for
e.g. output debug log on success and filtering error codes.
v4:
* Dropped accidentally still-included reference to mmap_abort() in the
commit message for the patch in which remap_pfn_range_[prepare,
complete]() are introduced as per Jason.
* Avoided set_vma boolean parameter in remap_pfn_range_internal() as per
Jason.
* Further refactored remap_pfn_range() et al. as per Jason - couldn't make
IS_ENABLED() work nicely, as have to declare remap_pfn_range_track()
otherwise, so did least-nasty thing.
* Abstracted I/O remap on PFN calculation as suggested by Jason, however do
this more generally across io_remap_pfn_range() as a whole, before
introducing prepare/complete variants.
* Made [io_]remap_pfn_range_[prepare, complete]() internal-only as per
Pedro.
* Renamed [__]compat_vma_prepare to [__]compat_vma as per Jason.
* Dropped duplicated debug check in mmap_action_complete() as per Jason.
* Added MMAP_IO_REMAP_PFN action type as per Jason.
* Various small refactorings as suggested by Jason.
* Shared code between mmu and nommu mmap_action_complete() as per Jason.
* Add missing return in kdoc for shmem_zero_setup().
* Separate out introduction of shmem_zero_setup_desc() into another patch
as per Jason.
* Looked into Jason's request re: using shmem_zero_setup_desc() in vma.c -
It isn't really worthwhile for now as we'd have to set VMA fields from
the desc after the fields were already set from the map, though once we
convert all callers to mmap_prepare we can look at this again.
* Fixed bug with char mem driver not correctly setting MAP_PRIVATE
/dev/zero anonymous (with vma->vm_file still set), use success hook
instead.
* Renamed mmap_prepare_zero to mmap_zero_prepare to be consistent with
mmap_mem_prepare.
v3:
* Squashed fix patches.
* Propagated tags (thanks everyone!)
* Dropped kcov as per Jason.
* Dropped vmcore as per Jason.
* Dropped procfs patch as per Jason.
* Dropped cramfs patch as per Jason.
* Dropped mmap_action_mixedmap() as per Jason.
* Dropped mmap_action_mixedmap_pages() as per Jason.
* Dropped all remaining mixedmap logic as per Jason.
* Dropped custom action as per Jason.
* Parameterise helpers by vm_area_desc * rather than mmap_action * as per
discussion with Jason.
* Renamed addr to start for remap action as per discussion with Jason.
* Added kernel documentation tags for mmap_action_remap() as per Jason.
* Added mmap_action_remap_full() as per Jason.
* Removed pgprot parameter from mmap_action_remap() to tighten up the
interface as per discussion with Jason.
* Added a warning if the caller tries to remap past the end or before the
start of a VMA.
* const-ified vma_desc_size() and vma_desc_pages() as per David.
* Added a comment describing mmap_action.
* Updated char mm driver patch to utilise mmap_action_remap_full().
* Updated resctl patch to utilise mmap_action_remap_full().
* Fixed typo in mmap_action->success_hook comment as per Reinette.
* Const-ify VMA in success_hook so drivers which do odd things with the VMA
at this point stand out.
* Fixed mistake in mmap_action_complete() not returning error on success
hook failure.
* Fixed up comments for mmap_action_type enum values.
* Added ability to invoke I/O remap.
* Added mmap_action_ioremap() and mmap_action_ioremap_full() helpers for
this.
* Added iommufd I/O remap implementation.
https://lore.kernel.org/all/cover.1758031792.git.lorenzo.stoakes@oracle.com
v2:
* Propagated tags, thanks everyone! :)
* Refactored resctl patch to avoid assigned-but-not-used variable.
* Updated resctl change to not use .mmap_abort as discussed with Jason.
* Removed .mmap_abort as discussed with Jason.
* Removed references to .mmap_abort from documentation.
* Fixed silly VM_WARN_ON_ONCE() mistake (asserting opposite of what we mean
to) as per report from Alexander.
* Fixed relay kerneldoc error.
* Renamed __mmap_prelude to __mmap_setup, keep __mmap_complete the same as
per David.
* Fixed docs typo in mmap_complete description + formatted bold rather than
capitalised as per Randy.
* Eliminated mmap_complete and rework into actions specified in
mmap_prepare (via vm_area_desc) which therefore eliminates the driver's
ability to do anything crazy and allows us to control generic logic.
* Added helper functions for these - vma_desc_set_remap(),
vma_desc_set_mixedmap().
* However unfortunately had to add post action hooks to vm_area_desc, as
already hugetlbfs for instance needs to access the VMA to function
correctly. It is at least the smallest possible means of doing this.
* Updated VMA test logic, the stacked filesystem compatibility layer and
documentation to reflect this.
* Updated hugetlbfs implementation to use new approach, and refactored to
accept desc where at all possible and to do as much as possible in
.mmap_prepare, and the minimum required in the new post_hook callback.
* Updated /dev/mem and /dev/zero mmap logic to use the new mechanism.
* Updated cramfs, resctl to use the new mechanism.
* Updated proc_mmap hooks to only have proc_mmap_prepare.
* Updated the vmcore implementation to use the new hooks.
* Updated kcov to use the new hooks.
* Added hooks for success/failure for post-action handling.
* Added custom action hook for truly custom cases.
* Abstracted actions to separate type so we can use generic custom actions
in custom handlers when necessary.
* Added callout re: lock issue raised in
https://lore.kernel.org/linux-mm/20250801162930.GB184255@nvidia.com/ as
per discussion with Jason.
https://lore.kernel.org/all/cover.1757534913.git.lorenzo.stoakes@oracle.com/
v1:
https://lore.kernel.org/all/cover.1757329751.git.lorenzo.stoakes@oracle.com/
Lorenzo Stoakes (14):
mm/shmem: update shmem to use mmap_prepare
device/dax: update devdax to use mmap_prepare
mm: add vma_desc_size(), vma_desc_pages() helpers
relay: update relay to use mmap_prepare
mm/vma: rename __mmap_prepare() function to avoid confusion
mm: add remap_pfn_range_prepare(), remap_pfn_range_complete()
mm: abstract io_remap_pfn_range() based on PFN
mm: introduce io_remap_pfn_range_[prepare, complete]()
mm: add ability to take further action in vm_area_desc
doc: update porting, vfs documentation for mmap_prepare actions
mm/hugetlbfs: update hugetlbfs to use mmap_prepare
mm: add shmem_zero_setup_desc()
mm: update mem char driver to use mmap_prepare
mm: update resctl to use mmap_prepare
Documentation/filesystems/porting.rst | 5 +
Documentation/filesystems/vfs.rst | 4 +
arch/csky/include/asm/pgtable.h | 3 +-
arch/mips/alchemy/common/setup.c | 9 +-
arch/mips/include/asm/pgtable.h | 5 +-
arch/sparc/include/asm/pgtable_32.h | 12 +--
arch/sparc/include/asm/pgtable_64.h | 12 +--
drivers/char/mem.c | 84 +++++++++------
drivers/dax/device.c | 32 ++++--
fs/hugetlbfs/inode.c | 36 ++++---
fs/ntfs3/file.c | 2 +-
fs/resctrl/pseudo_lock.c | 20 ++--
include/linux/fs.h | 6 +-
include/linux/hugetlb.h | 9 +-
include/linux/hugetlb_inline.h | 15 ++-
include/linux/mm.h | 136 ++++++++++++++++++++++--
include/linux/mm_types.h | 46 +++++++++
include/linux/shmem_fs.h | 3 +-
kernel/relay.c | 33 +++---
mm/hugetlb.c | 77 ++++++++------
mm/internal.h | 22 ++++
mm/memory.c | 133 ++++++++++++++++--------
mm/secretmem.c | 2 +-
mm/shmem.c | 50 ++++++---
mm/util.c | 143 ++++++++++++++++++++++++--
mm/vma.c | 74 ++++++++-----
tools/testing/vma/vma_internal.h | 90 ++++++++++++++--
27 files changed, 799 insertions(+), 264 deletions(-)
--
2.51.0
^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH v4 01/14] mm/shmem: update shmem to use mmap_prepare
2025-09-17 19:11 [PATCH v4 00/14] expand mmap_prepare functionality, port more users Lorenzo Stoakes
@ 2025-09-17 19:11 ` Lorenzo Stoakes
2025-09-17 19:11 ` [PATCH v4 02/14] device/dax: update devdax " Lorenzo Stoakes
` (13 subsequent siblings)
14 siblings, 0 replies; 29+ messages in thread
From: Lorenzo Stoakes @ 2025-09-17 19:11 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Matthew Wilcox, Guo Ren, Thomas Bogendoerfer,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, David S . Miller,
Andreas Larsson, Arnd Bergmann, Greg Kroah-Hartman, Dan Williams,
Vishal Verma, Dave Jiang, Nicolas Pitre, Muchun Song,
Oscar Salvador, David Hildenbrand, Konstantin Komarov, Baoquan He,
Vivek Goyal, Dave Young, Tony Luck, Reinette Chatre, Dave Martin,
James Morse, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Hugh Dickins, Baolin Wang,
Uladzislau Rezki, Dmitry Vyukov, Andrey Konovalov, Jann Horn,
Pedro Falcato, linux-doc, linux-kernel, linux-fsdevel, linux-csky,
linux-mips, linux-s390, sparclinux, nvdimm, linux-cxl, linux-mm,
ntfs3, kexec, kasan-dev, Jason Gunthorpe, iommu, Kevin Tian,
Will Deacon, Robin Murphy
This simply assigns the vm_ops so is easily updated - do so.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
---
mm/shmem.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/mm/shmem.c b/mm/shmem.c
index 87005c086d5a..df02a2e0ebbb 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2938,16 +2938,17 @@ int shmem_lock(struct file *file, int lock, struct ucounts *ucounts)
return retval;
}
-static int shmem_mmap(struct file *file, struct vm_area_struct *vma)
+static int shmem_mmap_prepare(struct vm_area_desc *desc)
{
+ struct file *file = desc->file;
struct inode *inode = file_inode(file);
file_accessed(file);
/* This is anonymous shared memory if it is unlinked at the time of mmap */
if (inode->i_nlink)
- vma->vm_ops = &shmem_vm_ops;
+ desc->vm_ops = &shmem_vm_ops;
else
- vma->vm_ops = &shmem_anon_vm_ops;
+ desc->vm_ops = &shmem_anon_vm_ops;
return 0;
}
@@ -5217,7 +5218,7 @@ static const struct address_space_operations shmem_aops = {
};
static const struct file_operations shmem_file_operations = {
- .mmap = shmem_mmap,
+ .mmap_prepare = shmem_mmap_prepare,
.open = shmem_file_open,
.get_unmapped_area = shmem_get_unmapped_area,
#ifdef CONFIG_TMPFS
--
2.51.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v4 02/14] device/dax: update devdax to use mmap_prepare
2025-09-17 19:11 [PATCH v4 00/14] expand mmap_prepare functionality, port more users Lorenzo Stoakes
2025-09-17 19:11 ` [PATCH v4 01/14] mm/shmem: update shmem to use mmap_prepare Lorenzo Stoakes
@ 2025-09-17 19:11 ` Lorenzo Stoakes
2025-09-17 19:11 ` [PATCH v4 03/14] mm: add vma_desc_size(), vma_desc_pages() helpers Lorenzo Stoakes
` (12 subsequent siblings)
14 siblings, 0 replies; 29+ messages in thread
From: Lorenzo Stoakes @ 2025-09-17 19:11 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Matthew Wilcox, Guo Ren, Thomas Bogendoerfer,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, David S . Miller,
Andreas Larsson, Arnd Bergmann, Greg Kroah-Hartman, Dan Williams,
Vishal Verma, Dave Jiang, Nicolas Pitre, Muchun Song,
Oscar Salvador, David Hildenbrand, Konstantin Komarov, Baoquan He,
Vivek Goyal, Dave Young, Tony Luck, Reinette Chatre, Dave Martin,
James Morse, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Hugh Dickins, Baolin Wang,
Uladzislau Rezki, Dmitry Vyukov, Andrey Konovalov, Jann Horn,
Pedro Falcato, linux-doc, linux-kernel, linux-fsdevel, linux-csky,
linux-mips, linux-s390, sparclinux, nvdimm, linux-cxl, linux-mm,
ntfs3, kexec, kasan-dev, Jason Gunthorpe, iommu, Kevin Tian,
Will Deacon, Robin Murphy
The devdax driver does nothing special in its f_op->mmap hook, so
straightforwardly update it to use the mmap_prepare hook instead.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Pedro Falcato <pfalcato@suse.de>
---
drivers/dax/device.c | 32 +++++++++++++++++++++-----------
1 file changed, 21 insertions(+), 11 deletions(-)
diff --git a/drivers/dax/device.c b/drivers/dax/device.c
index 2bb40a6060af..c2181439f925 100644
--- a/drivers/dax/device.c
+++ b/drivers/dax/device.c
@@ -13,8 +13,9 @@
#include "dax-private.h"
#include "bus.h"
-static int check_vma(struct dev_dax *dev_dax, struct vm_area_struct *vma,
- const char *func)
+static int __check_vma(struct dev_dax *dev_dax, vm_flags_t vm_flags,
+ unsigned long start, unsigned long end, struct file *file,
+ const char *func)
{
struct device *dev = &dev_dax->dev;
unsigned long mask;
@@ -23,7 +24,7 @@ static int check_vma(struct dev_dax *dev_dax, struct vm_area_struct *vma,
return -ENXIO;
/* prevent private mappings from being established */
- if ((vma->vm_flags & VM_MAYSHARE) != VM_MAYSHARE) {
+ if ((vm_flags & VM_MAYSHARE) != VM_MAYSHARE) {
dev_info_ratelimited(dev,
"%s: %s: fail, attempted private mapping\n",
current->comm, func);
@@ -31,15 +32,15 @@ static int check_vma(struct dev_dax *dev_dax, struct vm_area_struct *vma,
}
mask = dev_dax->align - 1;
- if (vma->vm_start & mask || vma->vm_end & mask) {
+ if (start & mask || end & mask) {
dev_info_ratelimited(dev,
"%s: %s: fail, unaligned vma (%#lx - %#lx, %#lx)\n",
- current->comm, func, vma->vm_start, vma->vm_end,
+ current->comm, func, start, end,
mask);
return -EINVAL;
}
- if (!vma_is_dax(vma)) {
+ if (!file_is_dax(file)) {
dev_info_ratelimited(dev,
"%s: %s: fail, vma is not DAX capable\n",
current->comm, func);
@@ -49,6 +50,13 @@ static int check_vma(struct dev_dax *dev_dax, struct vm_area_struct *vma,
return 0;
}
+static int check_vma(struct dev_dax *dev_dax, struct vm_area_struct *vma,
+ const char *func)
+{
+ return __check_vma(dev_dax, vma->vm_flags, vma->vm_start, vma->vm_end,
+ vma->vm_file, func);
+}
+
/* see "strong" declaration in tools/testing/nvdimm/dax-dev.c */
__weak phys_addr_t dax_pgoff_to_phys(struct dev_dax *dev_dax, pgoff_t pgoff,
unsigned long size)
@@ -285,8 +293,9 @@ static const struct vm_operations_struct dax_vm_ops = {
.pagesize = dev_dax_pagesize,
};
-static int dax_mmap(struct file *filp, struct vm_area_struct *vma)
+static int dax_mmap_prepare(struct vm_area_desc *desc)
{
+ struct file *filp = desc->file;
struct dev_dax *dev_dax = filp->private_data;
int rc, id;
@@ -297,13 +306,14 @@ static int dax_mmap(struct file *filp, struct vm_area_struct *vma)
* fault time.
*/
id = dax_read_lock();
- rc = check_vma(dev_dax, vma, __func__);
+ rc = __check_vma(dev_dax, desc->vm_flags, desc->start, desc->end, filp,
+ __func__);
dax_read_unlock(id);
if (rc)
return rc;
- vma->vm_ops = &dax_vm_ops;
- vm_flags_set(vma, VM_HUGEPAGE);
+ desc->vm_ops = &dax_vm_ops;
+ desc->vm_flags |= VM_HUGEPAGE;
return 0;
}
@@ -377,7 +387,7 @@ static const struct file_operations dax_fops = {
.open = dax_open,
.release = dax_release,
.get_unmapped_area = dax_get_unmapped_area,
- .mmap = dax_mmap,
+ .mmap_prepare = dax_mmap_prepare,
.fop_flags = FOP_MMAP_SYNC,
};
--
2.51.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v4 03/14] mm: add vma_desc_size(), vma_desc_pages() helpers
2025-09-17 19:11 [PATCH v4 00/14] expand mmap_prepare functionality, port more users Lorenzo Stoakes
2025-09-17 19:11 ` [PATCH v4 01/14] mm/shmem: update shmem to use mmap_prepare Lorenzo Stoakes
2025-09-17 19:11 ` [PATCH v4 02/14] device/dax: update devdax " Lorenzo Stoakes
@ 2025-09-17 19:11 ` Lorenzo Stoakes
2025-09-17 19:11 ` [PATCH v4 04/14] relay: update relay to use mmap_prepare Lorenzo Stoakes
` (11 subsequent siblings)
14 siblings, 0 replies; 29+ messages in thread
From: Lorenzo Stoakes @ 2025-09-17 19:11 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Matthew Wilcox, Guo Ren, Thomas Bogendoerfer,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, David S . Miller,
Andreas Larsson, Arnd Bergmann, Greg Kroah-Hartman, Dan Williams,
Vishal Verma, Dave Jiang, Nicolas Pitre, Muchun Song,
Oscar Salvador, David Hildenbrand, Konstantin Komarov, Baoquan He,
Vivek Goyal, Dave Young, Tony Luck, Reinette Chatre, Dave Martin,
James Morse, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Hugh Dickins, Baolin Wang,
Uladzislau Rezki, Dmitry Vyukov, Andrey Konovalov, Jann Horn,
Pedro Falcato, linux-doc, linux-kernel, linux-fsdevel, linux-csky,
linux-mips, linux-s390, sparclinux, nvdimm, linux-cxl, linux-mm,
ntfs3, kexec, kasan-dev, Jason Gunthorpe, iommu, Kevin Tian,
Will Deacon, Robin Murphy
It's useful to be able to determine the size of a VMA descriptor range
used on f_op->mmap_prepare, expressed both in bytes and pages, so add
helpers for both and update code that could make use of it to do so.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
---
fs/ntfs3/file.c | 2 +-
include/linux/mm.h | 10 ++++++++++
mm/secretmem.c | 2 +-
3 files changed, 12 insertions(+), 2 deletions(-)
diff --git a/fs/ntfs3/file.c b/fs/ntfs3/file.c
index c1ece707b195..86eb88f62714 100644
--- a/fs/ntfs3/file.c
+++ b/fs/ntfs3/file.c
@@ -304,7 +304,7 @@ static int ntfs_file_mmap_prepare(struct vm_area_desc *desc)
if (rw) {
u64 to = min_t(loff_t, i_size_read(inode),
- from + desc->end - desc->start);
+ from + vma_desc_size(desc));
if (is_sparsed(ni)) {
/* Allocate clusters for rw map. */
diff --git a/include/linux/mm.h b/include/linux/mm.h
index da6e0abad2cb..dd1fec5f028a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3571,6 +3571,16 @@ static inline unsigned long vma_pages(const struct vm_area_struct *vma)
return (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
}
+static inline unsigned long vma_desc_size(const struct vm_area_desc *desc)
+{
+ return desc->end - desc->start;
+}
+
+static inline unsigned long vma_desc_pages(const struct vm_area_desc *desc)
+{
+ return vma_desc_size(desc) >> PAGE_SHIFT;
+}
+
/* Look up the first VMA which exactly match the interval vm_start ... vm_end */
static inline struct vm_area_struct *find_exact_vma(struct mm_struct *mm,
unsigned long vm_start, unsigned long vm_end)
diff --git a/mm/secretmem.c b/mm/secretmem.c
index 60137305bc20..62066ddb1e9c 100644
--- a/mm/secretmem.c
+++ b/mm/secretmem.c
@@ -120,7 +120,7 @@ static int secretmem_release(struct inode *inode, struct file *file)
static int secretmem_mmap_prepare(struct vm_area_desc *desc)
{
- const unsigned long len = desc->end - desc->start;
+ const unsigned long len = vma_desc_size(desc);
if ((desc->vm_flags & (VM_SHARED | VM_MAYSHARE)) == 0)
return -EINVAL;
--
2.51.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v4 04/14] relay: update relay to use mmap_prepare
2025-09-17 19:11 [PATCH v4 00/14] expand mmap_prepare functionality, port more users Lorenzo Stoakes
` (2 preceding siblings ...)
2025-09-17 19:11 ` [PATCH v4 03/14] mm: add vma_desc_size(), vma_desc_pages() helpers Lorenzo Stoakes
@ 2025-09-17 19:11 ` Lorenzo Stoakes
2025-09-17 19:11 ` [PATCH v4 05/14] mm/vma: rename __mmap_prepare() function to avoid confusion Lorenzo Stoakes
` (10 subsequent siblings)
14 siblings, 0 replies; 29+ messages in thread
From: Lorenzo Stoakes @ 2025-09-17 19:11 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Matthew Wilcox, Guo Ren, Thomas Bogendoerfer,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, David S . Miller,
Andreas Larsson, Arnd Bergmann, Greg Kroah-Hartman, Dan Williams,
Vishal Verma, Dave Jiang, Nicolas Pitre, Muchun Song,
Oscar Salvador, David Hildenbrand, Konstantin Komarov, Baoquan He,
Vivek Goyal, Dave Young, Tony Luck, Reinette Chatre, Dave Martin,
James Morse, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Hugh Dickins, Baolin Wang,
Uladzislau Rezki, Dmitry Vyukov, Andrey Konovalov, Jann Horn,
Pedro Falcato, linux-doc, linux-kernel, linux-fsdevel, linux-csky,
linux-mips, linux-s390, sparclinux, nvdimm, linux-cxl, linux-mm,
ntfs3, kexec, kasan-dev, Jason Gunthorpe, iommu, Kevin Tian,
Will Deacon, Robin Murphy
It is relatively trivial to update this code to use the f_op->mmap_prepare
hook in favour of the deprecated f_op->mmap hook, so do so.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
---
kernel/relay.c | 33 +++++++++++++++++----------------
1 file changed, 17 insertions(+), 16 deletions(-)
diff --git a/kernel/relay.c b/kernel/relay.c
index 8d915fe98198..e36f6b926f7f 100644
--- a/kernel/relay.c
+++ b/kernel/relay.c
@@ -72,17 +72,18 @@ static void relay_free_page_array(struct page **array)
}
/**
- * relay_mmap_buf: - mmap channel buffer to process address space
- * @buf: relay channel buffer
- * @vma: vm_area_struct describing memory to be mapped
+ * relay_mmap_prepare_buf: - mmap channel buffer to process address space
+ * @buf: the relay channel buffer
+ * @desc: describing what to map
*
* Returns 0 if ok, negative on error
*
* Caller should already have grabbed mmap_lock.
*/
-static int relay_mmap_buf(struct rchan_buf *buf, struct vm_area_struct *vma)
+static int relay_mmap_prepare_buf(struct rchan_buf *buf,
+ struct vm_area_desc *desc)
{
- unsigned long length = vma->vm_end - vma->vm_start;
+ unsigned long length = vma_desc_size(desc);
if (!buf)
return -EBADF;
@@ -90,9 +91,9 @@ static int relay_mmap_buf(struct rchan_buf *buf, struct vm_area_struct *vma)
if (length != (unsigned long)buf->chan->alloc_size)
return -EINVAL;
- vma->vm_ops = &relay_file_mmap_ops;
- vm_flags_set(vma, VM_DONTEXPAND);
- vma->vm_private_data = buf;
+ desc->vm_ops = &relay_file_mmap_ops;
+ desc->vm_flags |= VM_DONTEXPAND;
+ desc->private_data = buf;
return 0;
}
@@ -749,16 +750,16 @@ static int relay_file_open(struct inode *inode, struct file *filp)
}
/**
- * relay_file_mmap - mmap file op for relay files
- * @filp: the file
- * @vma: the vma describing what to map
+ * relay_file_mmap_prepare - mmap file op for relay files
+ * @desc: describing what to map
*
- * Calls upon relay_mmap_buf() to map the file into user space.
+ * Calls upon relay_mmap_prepare_buf() to map the file into user space.
*/
-static int relay_file_mmap(struct file *filp, struct vm_area_struct *vma)
+static int relay_file_mmap_prepare(struct vm_area_desc *desc)
{
- struct rchan_buf *buf = filp->private_data;
- return relay_mmap_buf(buf, vma);
+ struct rchan_buf *buf = desc->file->private_data;
+
+ return relay_mmap_prepare_buf(buf, desc);
}
/**
@@ -1006,7 +1007,7 @@ static ssize_t relay_file_read(struct file *filp,
const struct file_operations relay_file_operations = {
.open = relay_file_open,
.poll = relay_file_poll,
- .mmap = relay_file_mmap,
+ .mmap_prepare = relay_file_mmap_prepare,
.read = relay_file_read,
.release = relay_file_release,
};
--
2.51.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v4 05/14] mm/vma: rename __mmap_prepare() function to avoid confusion
2025-09-17 19:11 [PATCH v4 00/14] expand mmap_prepare functionality, port more users Lorenzo Stoakes
` (3 preceding siblings ...)
2025-09-17 19:11 ` [PATCH v4 04/14] relay: update relay to use mmap_prepare Lorenzo Stoakes
@ 2025-09-17 19:11 ` Lorenzo Stoakes
2025-09-17 19:11 ` [PATCH v4 06/14] mm: add remap_pfn_range_prepare(), remap_pfn_range_complete() Lorenzo Stoakes
` (9 subsequent siblings)
14 siblings, 0 replies; 29+ messages in thread
From: Lorenzo Stoakes @ 2025-09-17 19:11 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Matthew Wilcox, Guo Ren, Thomas Bogendoerfer,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, David S . Miller,
Andreas Larsson, Arnd Bergmann, Greg Kroah-Hartman, Dan Williams,
Vishal Verma, Dave Jiang, Nicolas Pitre, Muchun Song,
Oscar Salvador, David Hildenbrand, Konstantin Komarov, Baoquan He,
Vivek Goyal, Dave Young, Tony Luck, Reinette Chatre, Dave Martin,
James Morse, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Hugh Dickins, Baolin Wang,
Uladzislau Rezki, Dmitry Vyukov, Andrey Konovalov, Jann Horn,
Pedro Falcato, linux-doc, linux-kernel, linux-fsdevel, linux-csky,
linux-mips, linux-s390, sparclinux, nvdimm, linux-cxl, linux-mm,
ntfs3, kexec, kasan-dev, Jason Gunthorpe, iommu, Kevin Tian,
Will Deacon, Robin Murphy
Now we have the f_op->mmap_prepare() hook, having a static function called
__mmap_prepare() that has nothing to do with it is confusing, so rename
the function to __mmap_setup().
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
---
mm/vma.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/mm/vma.c b/mm/vma.c
index ac791ed8c92f..bdb070a62a2e 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -2329,7 +2329,7 @@ static void update_ksm_flags(struct mmap_state *map)
}
/*
- * __mmap_prepare() - Prepare to gather any overlapping VMAs that need to be
+ * __mmap_setup() - Prepare to gather any overlapping VMAs that need to be
* unmapped once the map operation is completed, check limits, account mapping
* and clean up any pre-existing VMAs.
*
@@ -2338,7 +2338,7 @@ static void update_ksm_flags(struct mmap_state *map)
*
* Returns: 0 on success, error code otherwise.
*/
-static int __mmap_prepare(struct mmap_state *map, struct list_head *uf)
+static int __mmap_setup(struct mmap_state *map, struct list_head *uf)
{
int error;
struct vma_iterator *vmi = map->vmi;
@@ -2649,7 +2649,7 @@ static unsigned long __mmap_region(struct file *file, unsigned long addr,
map.check_ksm_early = can_set_ksm_flags_early(&map);
- error = __mmap_prepare(&map, uf);
+ error = __mmap_setup(&map, uf);
if (!error && have_mmap_prepare)
error = call_mmap_prepare(&map);
if (error)
@@ -2679,7 +2679,7 @@ static unsigned long __mmap_region(struct file *file, unsigned long addr,
return addr;
- /* Accounting was done by __mmap_prepare(). */
+ /* Accounting was done by __mmap_setup(). */
unacct_error:
if (map.charged)
vm_unacct_memory(map.charged);
--
2.51.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v4 06/14] mm: add remap_pfn_range_prepare(), remap_pfn_range_complete()
2025-09-17 19:11 [PATCH v4 00/14] expand mmap_prepare functionality, port more users Lorenzo Stoakes
` (4 preceding siblings ...)
2025-09-17 19:11 ` [PATCH v4 05/14] mm/vma: rename __mmap_prepare() function to avoid confusion Lorenzo Stoakes
@ 2025-09-17 19:11 ` Lorenzo Stoakes
2025-09-17 21:32 ` Jason Gunthorpe
2025-09-17 19:11 ` [PATCH v4 07/14] mm: abstract io_remap_pfn_range() based on PFN Lorenzo Stoakes
` (8 subsequent siblings)
14 siblings, 1 reply; 29+ messages in thread
From: Lorenzo Stoakes @ 2025-09-17 19:11 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Matthew Wilcox, Guo Ren, Thomas Bogendoerfer,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, David S . Miller,
Andreas Larsson, Arnd Bergmann, Greg Kroah-Hartman, Dan Williams,
Vishal Verma, Dave Jiang, Nicolas Pitre, Muchun Song,
Oscar Salvador, David Hildenbrand, Konstantin Komarov, Baoquan He,
Vivek Goyal, Dave Young, Tony Luck, Reinette Chatre, Dave Martin,
James Morse, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Hugh Dickins, Baolin Wang,
Uladzislau Rezki, Dmitry Vyukov, Andrey Konovalov, Jann Horn,
Pedro Falcato, linux-doc, linux-kernel, linux-fsdevel, linux-csky,
linux-mips, linux-s390, sparclinux, nvdimm, linux-cxl, linux-mm,
ntfs3, kexec, kasan-dev, Jason Gunthorpe, iommu, Kevin Tian,
Will Deacon, Robin Murphy
We need the ability to split PFN remap between updating the VMA and
performing the actual remap, in order to do away with the legacy f_op->mmap
hook.
To do so, update the PFN remap code to provide shared logic, and also make
remap_pfn_range_notrack() static, as its one user, io_mapping_map_user()
was removed in commit 9a4f90e24661 ("mm: remove mm/io-mapping.c").
Then, introduce remap_pfn_range_prepare(), which accepts VMA descriptor
and PFN parameters, and remap_pfn_range_complete() which accepts the same
parameters as remap_pfn_rangte().
remap_pfn_range_prepare() will set the cow vma->vm_pgoff if necessary, so
it must be supplied with a correct PFN to do so.
While we're here, also clean up the duplicated #ifdef
__HAVE_PFNMAP_TRACKING check and put into a single #ifdef/#else block.
We keep these internal to mm as they should only be used by internal
helpers.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Pedro Falcato <pfalcato@suse.de>
---
include/linux/mm.h | 22 ++++++--
mm/internal.h | 4 ++
mm/memory.c | 133 ++++++++++++++++++++++++++++++---------------
3 files changed, 110 insertions(+), 49 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index dd1fec5f028a..8e4006eaf4dd 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -489,6 +489,21 @@ extern unsigned int kobjsize(const void *objp);
*/
#define VM_SPECIAL (VM_IO | VM_DONTEXPAND | VM_PFNMAP | VM_MIXEDMAP)
+/*
+ * Physically remapped pages are special. Tell the
+ * rest of the world about it:
+ * VM_IO tells people not to look at these pages
+ * (accesses can have side effects).
+ * VM_PFNMAP tells the core MM that the base pages are just
+ * raw PFN mappings, and do not have a "struct page" associated
+ * with them.
+ * VM_DONTEXPAND
+ * Disable vma merging and expanding with mremap().
+ * VM_DONTDUMP
+ * Omit vma from core dump, even when VM_IO turned off.
+ */
+#define VM_REMAP_FLAGS (VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP)
+
/* This mask prevents VMA from being scanned with khugepaged */
#define VM_NO_KHUGEPAGED (VM_SPECIAL | VM_HUGETLB)
@@ -3622,10 +3637,9 @@ unsigned long change_prot_numa(struct vm_area_struct *vma,
struct vm_area_struct *find_extend_vma_locked(struct mm_struct *,
unsigned long addr);
-int remap_pfn_range(struct vm_area_struct *, unsigned long addr,
- unsigned long pfn, unsigned long size, pgprot_t);
-int remap_pfn_range_notrack(struct vm_area_struct *vma, unsigned long addr,
- unsigned long pfn, unsigned long size, pgprot_t prot);
+int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr,
+ unsigned long pfn, unsigned long size, pgprot_t pgprot);
+
int vm_insert_page(struct vm_area_struct *, unsigned long addr, struct page *);
int vm_insert_pages(struct vm_area_struct *vma, unsigned long addr,
struct page **pages, unsigned long *num);
diff --git a/mm/internal.h b/mm/internal.h
index 63e3ec8d63be..c6655f76cf69 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1653,4 +1653,8 @@ static inline bool reclaim_pt_is_enabled(unsigned long start, unsigned long end,
void dup_mm_exe_file(struct mm_struct *mm, struct mm_struct *oldmm);
int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm);
+void remap_pfn_range_prepare(struct vm_area_desc *desc, unsigned long pfn);
+int remap_pfn_range_complete(struct vm_area_struct *vma, unsigned long addr,
+ unsigned long pfn, unsigned long size, pgprot_t pgprot);
+
#endif /* __MM_INTERNAL_H */
diff --git a/mm/memory.c b/mm/memory.c
index 41e641823558..daa7124d371d 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2900,6 +2900,25 @@ static inline int remap_p4d_range(struct mm_struct *mm, pgd_t *pgd,
return 0;
}
+static int get_remap_pgoff(vm_flags_t vm_flags, unsigned long addr,
+ unsigned long end, unsigned long vm_start, unsigned long vm_end,
+ unsigned long pfn, pgoff_t *vm_pgoff_p)
+{
+ /*
+ * There's a horrible special case to handle copy-on-write
+ * behaviour that some programs depend on. We mark the "original"
+ * un-COW'ed pages by matching them up with "vma->vm_pgoff".
+ * See vm_normal_page() for details.
+ */
+ if (is_cow_mapping(vm_flags)) {
+ if (addr != vm_start || end != vm_end)
+ return -EINVAL;
+ *vm_pgoff_p = pfn;
+ }
+
+ return 0;
+}
+
static int remap_pfn_range_internal(struct vm_area_struct *vma, unsigned long addr,
unsigned long pfn, unsigned long size, pgprot_t prot)
{
@@ -2912,31 +2931,7 @@ static int remap_pfn_range_internal(struct vm_area_struct *vma, unsigned long ad
if (WARN_ON_ONCE(!PAGE_ALIGNED(addr)))
return -EINVAL;
- /*
- * Physically remapped pages are special. Tell the
- * rest of the world about it:
- * VM_IO tells people not to look at these pages
- * (accesses can have side effects).
- * VM_PFNMAP tells the core MM that the base pages are just
- * raw PFN mappings, and do not have a "struct page" associated
- * with them.
- * VM_DONTEXPAND
- * Disable vma merging and expanding with mremap().
- * VM_DONTDUMP
- * Omit vma from core dump, even when VM_IO turned off.
- *
- * There's a horrible special case to handle copy-on-write
- * behaviour that some programs depend on. We mark the "original"
- * un-COW'ed pages by matching them up with "vma->vm_pgoff".
- * See vm_normal_page() for details.
- */
- if (is_cow_mapping(vma->vm_flags)) {
- if (addr != vma->vm_start || end != vma->vm_end)
- return -EINVAL;
- vma->vm_pgoff = pfn;
- }
-
- vm_flags_set(vma, VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP);
+ VM_WARN_ON_ONCE((vma->vm_flags & VM_REMAP_FLAGS) != VM_REMAP_FLAGS);
BUG_ON(addr >= end);
pfn -= addr >> PAGE_SHIFT;
@@ -2957,11 +2952,10 @@ static int remap_pfn_range_internal(struct vm_area_struct *vma, unsigned long ad
* Variant of remap_pfn_range that does not call track_pfn_remap. The caller
* must have pre-validated the caching bits of the pgprot_t.
*/
-int remap_pfn_range_notrack(struct vm_area_struct *vma, unsigned long addr,
+static int remap_pfn_range_notrack(struct vm_area_struct *vma, unsigned long addr,
unsigned long pfn, unsigned long size, pgprot_t prot)
{
int error = remap_pfn_range_internal(vma, addr, pfn, size, prot);
-
if (!error)
return 0;
@@ -3002,23 +2996,9 @@ void pfnmap_track_ctx_release(struct kref *ref)
pfnmap_untrack(ctx->pfn, ctx->size);
kfree(ctx);
}
-#endif /* __HAVE_PFNMAP_TRACKING */
-/**
- * remap_pfn_range - remap kernel memory to userspace
- * @vma: user vma to map to
- * @addr: target page aligned user address to start at
- * @pfn: page frame number of kernel physical memory address
- * @size: size of mapping area
- * @prot: page protection flags for this mapping
- *
- * Note: this is only safe if the mm semaphore is held when called.
- *
- * Return: %0 on success, negative error code otherwise.
- */
-#ifdef __HAVE_PFNMAP_TRACKING
-int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr,
- unsigned long pfn, unsigned long size, pgprot_t prot)
+static int remap_pfn_range_track(struct vm_area_struct *vma, unsigned long addr,
+ unsigned long pfn, unsigned long size, pgprot_t prot)
{
struct pfnmap_track_ctx *ctx = NULL;
int err;
@@ -3054,15 +3034,78 @@ int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr,
return err;
}
+static int do_remap_pfn_range(struct vm_area_struct *vma, unsigned long addr,
+ unsigned long pfn, unsigned long size, pgprot_t prot)
+{
+ return remap_pfn_range_track(vma, addr, pfn, size, prot);
+}
#else
-int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr,
- unsigned long pfn, unsigned long size, pgprot_t prot)
+static int do_remap_pfn_range(struct vm_area_struct *vma, unsigned long addr,
+ unsigned long pfn, unsigned long size, pgprot_t prot)
{
return remap_pfn_range_notrack(vma, addr, pfn, size, prot);
}
#endif
+
+void remap_pfn_range_prepare(struct vm_area_desc *desc, unsigned long pfn)
+{
+ /*
+ * We set addr=VMA start, end=VMA end here, so this won't fail, but we
+ * check it again on complete and will fail there if specified addr is
+ * invalid.
+ */
+ get_remap_pgoff(desc->vm_flags, desc->start, desc->end,
+ desc->start, desc->end, pfn, &desc->pgoff);
+ desc->vm_flags |= VM_REMAP_FLAGS;
+}
+
+static int remap_pfn_range_prepare_vma(struct vm_area_struct *vma, unsigned long addr,
+ unsigned long pfn, unsigned long size)
+{
+ unsigned long end = addr + PAGE_ALIGN(size);
+ int err;
+
+ err = get_remap_pgoff(vma->vm_flags, addr, end,
+ vma->vm_start, vma->vm_end,
+ pfn, &vma->vm_pgoff);
+ if (err)
+ return err;
+
+ vm_flags_set(vma, VM_REMAP_FLAGS);
+ return 0;
+}
+
+/**
+ * remap_pfn_range - remap kernel memory to userspace
+ * @vma: user vma to map to
+ * @addr: target page aligned user address to start at
+ * @pfn: page frame number of kernel physical memory address
+ * @size: size of mapping area
+ * @prot: page protection flags for this mapping
+ *
+ * Note: this is only safe if the mm semaphore is held when called.
+ *
+ * Return: %0 on success, negative error code otherwise.
+ */
+int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr,
+ unsigned long pfn, unsigned long size, pgprot_t prot)
+{
+ int err;
+
+ err = remap_pfn_range_prepare_vma(vma, addr, pfn, size);
+ if (err)
+ return err;
+
+ return do_remap_pfn_range(vma, addr, pfn, size, prot);
+}
EXPORT_SYMBOL(remap_pfn_range);
+int remap_pfn_range_complete(struct vm_area_struct *vma, unsigned long addr,
+ unsigned long pfn, unsigned long size, pgprot_t prot)
+{
+ return do_remap_pfn_range(vma, addr, pfn, size, prot);
+}
+
/**
* vm_iomap_memory - remap memory to userspace
* @vma: user vma to map to
--
2.51.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v4 07/14] mm: abstract io_remap_pfn_range() based on PFN
2025-09-17 19:11 [PATCH v4 00/14] expand mmap_prepare functionality, port more users Lorenzo Stoakes
` (5 preceding siblings ...)
2025-09-17 19:11 ` [PATCH v4 06/14] mm: add remap_pfn_range_prepare(), remap_pfn_range_complete() Lorenzo Stoakes
@ 2025-09-17 19:11 ` Lorenzo Stoakes
2025-09-17 21:19 ` Jason Gunthorpe
2025-09-18 9:11 ` Lorenzo Stoakes
2025-09-17 19:11 ` [PATCH v4 08/14] mm: introduce io_remap_pfn_range_[prepare, complete]() Lorenzo Stoakes
` (7 subsequent siblings)
14 siblings, 2 replies; 29+ messages in thread
From: Lorenzo Stoakes @ 2025-09-17 19:11 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Matthew Wilcox, Guo Ren, Thomas Bogendoerfer,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, David S . Miller,
Andreas Larsson, Arnd Bergmann, Greg Kroah-Hartman, Dan Williams,
Vishal Verma, Dave Jiang, Nicolas Pitre, Muchun Song,
Oscar Salvador, David Hildenbrand, Konstantin Komarov, Baoquan He,
Vivek Goyal, Dave Young, Tony Luck, Reinette Chatre, Dave Martin,
James Morse, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Hugh Dickins, Baolin Wang,
Uladzislau Rezki, Dmitry Vyukov, Andrey Konovalov, Jann Horn,
Pedro Falcato, linux-doc, linux-kernel, linux-fsdevel, linux-csky,
linux-mips, linux-s390, sparclinux, nvdimm, linux-cxl, linux-mm,
ntfs3, kexec, kasan-dev, Jason Gunthorpe, iommu, Kevin Tian,
Will Deacon, Robin Murphy
The only instances in which we customise this function are ones in which we
customise the PFN used, other than the fact that, when a custom
io_remap_pfn_range() function is provided, the prot value passed is not
filtered through pgprot_decrypted().
Use this fact to simplify the use of io_remap_pfn_range(), by abstracting
the PFN function as io_remap_pfn_range_pfn(), and simply have the
convention that, should a custom handler be specified, we do not utilise
pgprot_decrypted().
If we require in future prot customisation, we can make
io_remap_pfn_range_prot() available for override.
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
arch/csky/include/asm/pgtable.h | 3 +--
arch/mips/alchemy/common/setup.c | 9 +++++----
arch/mips/include/asm/pgtable.h | 5 ++---
arch/sparc/include/asm/pgtable_32.h | 12 ++++--------
arch/sparc/include/asm/pgtable_64.h | 12 ++++--------
include/linux/mm.h | 30 ++++++++++++++++++++++++-----
6 files changed, 41 insertions(+), 30 deletions(-)
diff --git a/arch/csky/include/asm/pgtable.h b/arch/csky/include/asm/pgtable.h
index 5a394be09c35..967c86b38f11 100644
--- a/arch/csky/include/asm/pgtable.h
+++ b/arch/csky/include/asm/pgtable.h
@@ -263,7 +263,6 @@ void update_mmu_cache_range(struct vm_fault *vmf, struct vm_area_struct *vma,
#define update_mmu_cache(vma, addr, ptep) \
update_mmu_cache_range(NULL, vma, addr, ptep, 1)
-#define io_remap_pfn_range(vma, vaddr, pfn, size, prot) \
- remap_pfn_range(vma, vaddr, pfn, size, prot)
+#define io_remap_pfn_range_pfn(pfn, size) (pfn)
#endif /* __ASM_CSKY_PGTABLE_H */
diff --git a/arch/mips/alchemy/common/setup.c b/arch/mips/alchemy/common/setup.c
index a7a6d31a7a41..c35b4f809d51 100644
--- a/arch/mips/alchemy/common/setup.c
+++ b/arch/mips/alchemy/common/setup.c
@@ -94,12 +94,13 @@ phys_addr_t fixup_bigphys_addr(phys_addr_t phys_addr, phys_addr_t size)
return phys_addr;
}
-int io_remap_pfn_range(struct vm_area_struct *vma, unsigned long vaddr,
- unsigned long pfn, unsigned long size, pgprot_t prot)
+static inline unsigned long io_remap_pfn_range_pfn(unsigned long pfn,
+ unsigned long size)
{
phys_addr_t phys_addr = fixup_bigphys_addr(pfn << PAGE_SHIFT, size);
- return remap_pfn_range(vma, vaddr, phys_addr >> PAGE_SHIFT, size, prot);
+ return phys_addr >> PAGE_SHIFT;
}
-EXPORT_SYMBOL(io_remap_pfn_range);
+EXPORT_SYMBOL(io_remap_pfn_range_pfn);
+
#endif /* CONFIG_MIPS_FIXUP_BIGPHYS_ADDR */
diff --git a/arch/mips/include/asm/pgtable.h b/arch/mips/include/asm/pgtable.h
index ae73ecf4c41a..9c06a612d33a 100644
--- a/arch/mips/include/asm/pgtable.h
+++ b/arch/mips/include/asm/pgtable.h
@@ -604,9 +604,8 @@ static inline void update_mmu_cache_pmd(struct vm_area_struct *vma,
*/
#ifdef CONFIG_MIPS_FIXUP_BIGPHYS_ADDR
phys_addr_t fixup_bigphys_addr(phys_addr_t addr, phys_addr_t size);
-int io_remap_pfn_range(struct vm_area_struct *vma, unsigned long vaddr,
- unsigned long pfn, unsigned long size, pgprot_t prot);
-#define io_remap_pfn_range io_remap_pfn_range
+unsigned long io_remap_pfn_range_pfn(unsigned long pfn, unsigned long size);
+#define io_remap_pfn_range_pfn io_remap_pfn_range_pfn
#else
#define fixup_bigphys_addr(addr, size) (addr)
#endif /* CONFIG_MIPS_FIXUP_BIGPHYS_ADDR */
diff --git a/arch/sparc/include/asm/pgtable_32.h b/arch/sparc/include/asm/pgtable_32.h
index 7c199c003ffe..fd7be02dd46c 100644
--- a/arch/sparc/include/asm/pgtable_32.h
+++ b/arch/sparc/include/asm/pgtable_32.h
@@ -395,12 +395,8 @@ __get_iospace (unsigned long addr)
#define GET_IOSPACE(pfn) (pfn >> (BITS_PER_LONG - 4))
#define GET_PFN(pfn) (pfn & 0x0fffffffUL)
-int remap_pfn_range(struct vm_area_struct *, unsigned long, unsigned long,
- unsigned long, pgprot_t);
-
-static inline int io_remap_pfn_range(struct vm_area_struct *vma,
- unsigned long from, unsigned long pfn,
- unsigned long size, pgprot_t prot)
+static inline unsigned long io_remap_pfn_range_pfn(unsigned long pfn,
+ unsigned long size)
{
unsigned long long offset, space, phys_base;
@@ -408,9 +404,9 @@ static inline int io_remap_pfn_range(struct vm_area_struct *vma,
space = GET_IOSPACE(pfn);
phys_base = offset | (space << 32ULL);
- return remap_pfn_range(vma, from, phys_base >> PAGE_SHIFT, size, prot);
+ return phys_base >> PAGE_SHIFT;
}
-#define io_remap_pfn_range io_remap_pfn_range
+#define io_remap_pfn_range_pfn io_remap_pfn_range_pfn
#define __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS
#define ptep_set_access_flags(__vma, __address, __ptep, __entry, __dirty) \
diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h
index 669cd02469a1..f54f385a92c6 100644
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -1048,9 +1048,6 @@ int page_in_phys_avail(unsigned long paddr);
#define GET_IOSPACE(pfn) (pfn >> (BITS_PER_LONG - 4))
#define GET_PFN(pfn) (pfn & 0x0fffffffffffffffUL)
-int remap_pfn_range(struct vm_area_struct *, unsigned long, unsigned long,
- unsigned long, pgprot_t);
-
void adi_restore_tags(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long addr, pte_t pte);
@@ -1084,9 +1081,8 @@ static inline int arch_unmap_one(struct mm_struct *mm,
return 0;
}
-static inline int io_remap_pfn_range(struct vm_area_struct *vma,
- unsigned long from, unsigned long pfn,
- unsigned long size, pgprot_t prot)
+static inline unsigned long io_remap_pfn_range_pfn(unsigned long pfn,
+ unsigned long size)
{
unsigned long offset = GET_PFN(pfn) << PAGE_SHIFT;
int space = GET_IOSPACE(pfn);
@@ -1094,9 +1090,9 @@ static inline int io_remap_pfn_range(struct vm_area_struct *vma,
phys_base = offset | (((unsigned long) space) << 32UL);
- return remap_pfn_range(vma, from, phys_base >> PAGE_SHIFT, size, prot);
+ return phys_base >> PAGE_SHIFT;
}
-#define io_remap_pfn_range io_remap_pfn_range
+#define io_remap_pfn_range_pfn io_remap_pfn_range_pfn
static inline unsigned long __untagged_addr(unsigned long start)
{
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 8e4006eaf4dd..9b65c33bb31a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3672,15 +3672,35 @@ static inline vm_fault_t vmf_insert_page(struct vm_area_struct *vma,
return VM_FAULT_NOPAGE;
}
-#ifndef io_remap_pfn_range
-static inline int io_remap_pfn_range(struct vm_area_struct *vma,
- unsigned long addr, unsigned long pfn,
- unsigned long size, pgprot_t prot)
+#ifdef io_remap_pfn_range_pfn
+static inline unsigned long io_remap_pfn_range_prot(pgprot_t prot)
+{
+ /* We do not decrypt if arch customises PFN. */
+ return prot;
+}
+#else
+static inline unsigned long io_remap_pfn_range_pfn(unsigned long pfn,
+ unsigned long size)
+{
+ return pfn;
+}
+
+static inline pgprot_t io_remap_pfn_range_prot(pgprot_t prot)
{
- return remap_pfn_range(vma, addr, pfn, size, pgprot_decrypted(prot));
+ return pgprot_decrypted(prot);
}
#endif
+static inline int io_remap_pfn_range(struct vm_area_struct *vma,
+ unsigned long addr, unsigned long orig_pfn,
+ unsigned long size, pgprot_t orig_prot)
+{
+ const unsigned long pfn = io_remap_pfn_range_pfn(orig_pfn, size);
+ const pgprot_t prot = io_remap_pfn_range_prot(orig_prot);
+
+ return remap_pfn_range(vma, addr, pfn, size, prot);
+}
+
static inline vm_fault_t vmf_error(int err)
{
if (err == -ENOMEM)
--
2.51.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v4 08/14] mm: introduce io_remap_pfn_range_[prepare, complete]()
2025-09-17 19:11 [PATCH v4 00/14] expand mmap_prepare functionality, port more users Lorenzo Stoakes
` (6 preceding siblings ...)
2025-09-17 19:11 ` [PATCH v4 07/14] mm: abstract io_remap_pfn_range() based on PFN Lorenzo Stoakes
@ 2025-09-17 19:11 ` Lorenzo Stoakes
2025-09-18 9:12 ` Lorenzo Stoakes
2025-09-17 19:11 ` [PATCH v4 09/14] mm: add ability to take further action in vm_area_desc Lorenzo Stoakes
` (6 subsequent siblings)
14 siblings, 1 reply; 29+ messages in thread
From: Lorenzo Stoakes @ 2025-09-17 19:11 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Matthew Wilcox, Guo Ren, Thomas Bogendoerfer,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, David S . Miller,
Andreas Larsson, Arnd Bergmann, Greg Kroah-Hartman, Dan Williams,
Vishal Verma, Dave Jiang, Nicolas Pitre, Muchun Song,
Oscar Salvador, David Hildenbrand, Konstantin Komarov, Baoquan He,
Vivek Goyal, Dave Young, Tony Luck, Reinette Chatre, Dave Martin,
James Morse, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Hugh Dickins, Baolin Wang,
Uladzislau Rezki, Dmitry Vyukov, Andrey Konovalov, Jann Horn,
Pedro Falcato, linux-doc, linux-kernel, linux-fsdevel, linux-csky,
linux-mips, linux-s390, sparclinux, nvdimm, linux-cxl, linux-mm,
ntfs3, kexec, kasan-dev, Jason Gunthorpe, iommu, Kevin Tian,
Will Deacon, Robin Murphy
We introduce the io_remap*() equivalents of remap_pfn_range_prepare() and
remap_pfn_range_complete() to allow for I/O remapping via mmap_prepare.
Make these internal to mm, as they should only be used by internal helpers.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
---
mm/internal.h | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/mm/internal.h b/mm/internal.h
index c6655f76cf69..085e34f84bae 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1657,4 +1657,22 @@ void remap_pfn_range_prepare(struct vm_area_desc *desc, unsigned long pfn);
int remap_pfn_range_complete(struct vm_area_struct *vma, unsigned long addr,
unsigned long pfn, unsigned long size, pgprot_t pgprot);
+static inline void io_remap_pfn_range_prepare(struct vm_area_desc *desc,
+ unsigned long orig_pfn, unsigned long size)
+{
+ const unsigned long pfn = io_remap_pfn_range_pfn(orig_pfn, size);
+
+ return remap_pfn_range_prepare(desc, pfn);
+}
+
+static inline int io_remap_pfn_range_complete(struct vm_area_struct *vma,
+ unsigned long addr, unsigned long orig_pfn, unsigned long size,
+ pgprot_t orig_prot)
+{
+ const unsigned long pfn = io_remap_pfn_range_pfn(orig_pfn, size);
+ const pgprot_t prot = io_remap_pfn_range_prot(orig_prot);
+
+ return remap_pfn_range_complete(vma, addr, pfn, size, prot);
+}
+
#endif /* __MM_INTERNAL_H */
--
2.51.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v4 09/14] mm: add ability to take further action in vm_area_desc
2025-09-17 19:11 [PATCH v4 00/14] expand mmap_prepare functionality, port more users Lorenzo Stoakes
` (7 preceding siblings ...)
2025-09-17 19:11 ` [PATCH v4 08/14] mm: introduce io_remap_pfn_range_[prepare, complete]() Lorenzo Stoakes
@ 2025-09-17 19:11 ` Lorenzo Stoakes
2025-09-17 21:37 ` Jason Gunthorpe
2025-09-18 9:14 ` Lorenzo Stoakes
2025-09-17 19:11 ` [PATCH v4 10/14] doc: update porting, vfs documentation for mmap_prepare actions Lorenzo Stoakes
` (5 subsequent siblings)
14 siblings, 2 replies; 29+ messages in thread
From: Lorenzo Stoakes @ 2025-09-17 19:11 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Matthew Wilcox, Guo Ren, Thomas Bogendoerfer,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, David S . Miller,
Andreas Larsson, Arnd Bergmann, Greg Kroah-Hartman, Dan Williams,
Vishal Verma, Dave Jiang, Nicolas Pitre, Muchun Song,
Oscar Salvador, David Hildenbrand, Konstantin Komarov, Baoquan He,
Vivek Goyal, Dave Young, Tony Luck, Reinette Chatre, Dave Martin,
James Morse, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Hugh Dickins, Baolin Wang,
Uladzislau Rezki, Dmitry Vyukov, Andrey Konovalov, Jann Horn,
Pedro Falcato, linux-doc, linux-kernel, linux-fsdevel, linux-csky,
linux-mips, linux-s390, sparclinux, nvdimm, linux-cxl, linux-mm,
ntfs3, kexec, kasan-dev, Jason Gunthorpe, iommu, Kevin Tian,
Will Deacon, Robin Murphy
Some drivers/filesystems need to perform additional tasks after the VMA is
set up. This is typically in the form of pre-population.
The forms of pre-population most likely to be performed are a PFN remap
or the insertion of normal folios and PFNs into a mixed map.
We start by implementing the PFN remap functionality, ensuring that we
perform the appropriate actions at the appropriate time - that is setting
flags at the point of .mmap_prepare, and performing the actual remap at the
point at which the VMA is fully established.
This prevents the driver from doing anything too crazy with a VMA at any
stage, and we retain complete control over how the mm functionality is
applied.
Unfortunately callers still do often require some kind of custom action,
so we add an optional success/error _hook to allow the caller to do
something after the action has succeeded or failed.
This is done at the point when the VMA has already been established, so
the harm that can be done is limited.
The error hook can be used to filter errors if necessary.
If any error arises on these final actions, we simply unmap the VMA
altogether.
Also update the stacked filesystem compatibility layer to utilise the
action behaviour, and update the VMA tests accordingly.
While we're here, rename __compat_vma_mmap_prepare() to __compat_vma_mmap()
as we are now performing actions invoked by the mmap_prepare in addition to
just the mmap_prepare hook.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
include/linux/fs.h | 6 +-
include/linux/mm.h | 74 ++++++++++++++++
include/linux/mm_types.h | 46 ++++++++++
mm/util.c | 143 ++++++++++++++++++++++++++++---
mm/vma.c | 70 ++++++++++-----
tools/testing/vma/vma_internal.h | 90 +++++++++++++++++--
6 files changed, 385 insertions(+), 44 deletions(-)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 594bd4d0521e..680910611ba1 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2279,14 +2279,14 @@ static inline bool can_mmap_file(struct file *file)
return true;
}
-int __compat_vma_mmap_prepare(const struct file_operations *f_op,
+int __compat_vma_mmap(const struct file_operations *f_op,
struct file *file, struct vm_area_struct *vma);
-int compat_vma_mmap_prepare(struct file *file, struct vm_area_struct *vma);
+int compat_vma_mmap(struct file *file, struct vm_area_struct *vma);
static inline int vfs_mmap(struct file *file, struct vm_area_struct *vma)
{
if (file->f_op->mmap_prepare)
- return compat_vma_mmap_prepare(file, vma);
+ return compat_vma_mmap(file, vma);
return file->f_op->mmap(file, vma);
}
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 9b65c33bb31a..7ab6bc9e6659 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3596,6 +3596,80 @@ static inline unsigned long vma_desc_pages(const struct vm_area_desc *desc)
return vma_desc_size(desc) >> PAGE_SHIFT;
}
+/**
+ * mmap_action_remap - helper for mmap_prepare hook to specify that a pure PFN
+ * remap is required.
+ * @desc: The VMA descriptor for the VMA requiring remap.
+ * @start: The virtual address to start the remap from, must be within the VMA.
+ * @start_pfn: The first PFN in the range to remap.
+ * @size: The size of the range to remap, in bytes, at most spanning to the end
+ * of the VMA.
+ */
+static inline void mmap_action_remap(struct vm_area_desc *desc,
+ unsigned long start,
+ unsigned long start_pfn,
+ unsigned long size)
+{
+ struct mmap_action *action = &desc->action;
+
+ /* [start, start + size) must be within the VMA. */
+ WARN_ON_ONCE(start < desc->start || start >= desc->end);
+ WARN_ON_ONCE(start + size > desc->end);
+
+ action->type = MMAP_REMAP_PFN;
+ action->remap.start = start;
+ action->remap.start_pfn = start_pfn;
+ action->remap.size = size;
+ action->remap.pgprot = desc->page_prot;
+}
+
+/**
+ * mmap_action_remap_full - helper for mmap_prepare hook to specify that the
+ * entirety of a VMA should be PFN remapped.
+ * @desc: The VMA descriptor for the VMA requiring remap.
+ * @start_pfn: The first PFN in the range to remap.
+ */
+static inline void mmap_action_remap_full(struct vm_area_desc *desc,
+ unsigned long start_pfn)
+{
+ mmap_action_remap(desc, desc->start, start_pfn, vma_desc_size(desc));
+}
+
+/**
+ * mmap_action_ioremap - helper for mmap_prepare hook to specify that a pure PFN
+ * I/O remap is required.
+ * @desc: The VMA descriptor for the VMA requiring remap.
+ * @start: The virtual address to start the remap from, must be within the VMA.
+ * @start_pfn: The first PFN in the range to remap.
+ * @size: The size of the range to remap, in bytes, at most spanning to the end
+ * of the VMA.
+ */
+static inline void mmap_action_ioremap(struct vm_area_desc *desc,
+ unsigned long start,
+ unsigned long start_pfn,
+ unsigned long size)
+{
+ mmap_action_remap(desc, start, start_pfn, size);
+ desc->action.type = MMAP_IO_REMAP_PFN;
+}
+
+/**
+ * mmap_action_ioremap_full - helper for mmap_prepare hook to specify that the
+ * entirety of a VMA should be PFN I/O remapped.
+ * @desc: The VMA descriptor for the VMA requiring remap.
+ * @start_pfn: The first PFN in the range to remap.
+ */
+static inline void mmap_action_ioremap_full(struct vm_area_desc *desc,
+ unsigned long start_pfn)
+{
+ mmap_action_ioremap(desc, desc->start, start_pfn, vma_desc_size(desc));
+}
+
+void mmap_action_prepare(struct mmap_action *action,
+ struct vm_area_desc *desc);
+int mmap_action_complete(struct mmap_action *action,
+ struct vm_area_struct *vma);
+
/* Look up the first VMA which exactly match the interval vm_start ... vm_end */
static inline struct vm_area_struct *find_exact_vma(struct mm_struct *mm,
unsigned long vm_start, unsigned long vm_end)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 31b27086586d..abaea35c2bb3 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -775,6 +775,49 @@ struct pfnmap_track_ctx {
};
#endif
+/* What action should be taken after an .mmap_prepare call is complete? */
+enum mmap_action_type {
+ MMAP_NOTHING, /* Mapping is complete, no further action. */
+ MMAP_REMAP_PFN, /* Remap PFN range. */
+ MMAP_IO_REMAP_PFN, /* I/O remap PFN range. */
+};
+
+/*
+ * Describes an action an mmap_prepare hook can instruct to be taken to complete
+ * the mapping of a VMA. Specified in vm_area_desc.
+ */
+struct mmap_action {
+ union {
+ /* Remap range. */
+ struct {
+ unsigned long start;
+ unsigned long start_pfn;
+ unsigned long size;
+ pgprot_t pgprot;
+ } remap;
+ };
+ enum mmap_action_type type;
+
+ /*
+ * If specified, this hook is invoked after the selected action has been
+ * successfully completed. Note that the VMA write lock still held.
+ *
+ * The absolute minimum ought to be done here.
+ *
+ * Returns 0 on success, or an error code.
+ */
+ int (*success_hook)(const struct vm_area_struct *vma);
+
+ /*
+ * If specified, this hook is invoked when an error occurred when
+ * attempting the selection action.
+ *
+ * The hook can return an error code in order to filter the error, but
+ * it is not valid to clear the error here.
+ */
+ int (*error_hook)(int err);
+};
+
/*
* Describes a VMA that is about to be mmap()'ed. Drivers may choose to
* manipulate mutable fields which will cause those fields to be updated in the
@@ -798,6 +841,9 @@ struct vm_area_desc {
/* Write-only fields. */
const struct vm_operations_struct *vm_ops;
void *private_data;
+
+ /* Take further action? */
+ struct mmap_action action;
};
/*
diff --git a/mm/util.c b/mm/util.c
index 6c1d64ed0221..0c1c68285675 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -1134,7 +1134,7 @@ EXPORT_SYMBOL(flush_dcache_folio);
#endif
/**
- * __compat_vma_mmap_prepare() - See description for compat_vma_mmap_prepare()
+ * __compat_vma_mmap() - See description for compat_vma_mmap()
* for details. This is the same operation, only with a specific file operations
* struct which may or may not be the same as vma->vm_file->f_op.
* @f_op: The file operations whose .mmap_prepare() hook is specified.
@@ -1142,7 +1142,7 @@ EXPORT_SYMBOL(flush_dcache_folio);
* @vma: The VMA to apply the .mmap_prepare() hook to.
* Returns: 0 on success or error.
*/
-int __compat_vma_mmap_prepare(const struct file_operations *f_op,
+int __compat_vma_mmap(const struct file_operations *f_op,
struct file *file, struct vm_area_struct *vma)
{
struct vm_area_desc desc = {
@@ -1155,21 +1155,24 @@ int __compat_vma_mmap_prepare(const struct file_operations *f_op,
.vm_file = vma->vm_file,
.vm_flags = vma->vm_flags,
.page_prot = vma->vm_page_prot,
+
+ .action.type = MMAP_NOTHING, /* Default */
};
int err;
err = f_op->mmap_prepare(&desc);
if (err)
return err;
- set_vma_from_desc(vma, &desc);
- return 0;
+ mmap_action_prepare(&desc.action, &desc);
+ set_vma_from_desc(vma, &desc);
+ return mmap_action_complete(&desc.action, vma);
}
-EXPORT_SYMBOL(__compat_vma_mmap_prepare);
+EXPORT_SYMBOL(__compat_vma_mmap);
/**
- * compat_vma_mmap_prepare() - Apply the file's .mmap_prepare() hook to an
- * existing VMA.
+ * compat_vma_mmap() - Apply the file's .mmap_prepare() hook to an
+ * existing VMA and execute any requested actions.
* @file: The file which possesss an f_op->mmap_prepare() hook.
* @vma: The VMA to apply the .mmap_prepare() hook to.
*
@@ -1184,7 +1187,7 @@ EXPORT_SYMBOL(__compat_vma_mmap_prepare);
* .mmap_prepare() hook, as we are in a different context when we invoke the
* .mmap() hook, already having a VMA to deal with.
*
- * compat_vma_mmap_prepare() is a compatibility function that takes VMA state,
+ * compat_vma_mmap() is a compatibility function that takes VMA state,
* establishes a struct vm_area_desc descriptor, passes to the underlying
* .mmap_prepare() hook and applies any changes performed by it.
*
@@ -1193,11 +1196,11 @@ EXPORT_SYMBOL(__compat_vma_mmap_prepare);
*
* Returns: 0 on success or error.
*/
-int compat_vma_mmap_prepare(struct file *file, struct vm_area_struct *vma)
+int compat_vma_mmap(struct file *file, struct vm_area_struct *vma)
{
- return __compat_vma_mmap_prepare(file->f_op, file, vma);
+ return __compat_vma_mmap(file->f_op, file, vma);
}
-EXPORT_SYMBOL(compat_vma_mmap_prepare);
+EXPORT_SYMBOL(compat_vma_mmap);
static void set_ps_flags(struct page_snapshot *ps, const struct folio *folio,
const struct page *page)
@@ -1279,6 +1282,124 @@ void snapshot_page(struct page_snapshot *ps, const struct page *page)
}
}
+static int mmap_action_finish(struct mmap_action *action,
+ const struct vm_area_struct *vma, int err)
+{
+ /*
+ * If an error occurs, unmap the VMA altogether and return an error. We
+ * only clear the newly allocated VMA, since this function is only
+ * invoked if we do NOT merge, so we only clean up the VMA we created.
+ */
+ if (err) {
+ const size_t len = vma_pages(vma) << PAGE_SHIFT;
+
+ do_munmap(current->mm, vma->vm_start, len, NULL);
+
+ if (action->error_hook) {
+ /* We may want to filter the error. */
+ err = action->error_hook(err);
+
+ /* The caller should not clear the error. */
+ VM_WARN_ON_ONCE(!err);
+ }
+ return err;
+ }
+
+ if (action->success_hook)
+ return action->success_hook(vma);
+
+ return 0;
+}
+
+#ifdef CONFIG_MMU
+/**
+ * mmap_action_prepare - Perform preparatory setup for an VMA descriptor
+ * action which need to be performed.
+ * @desc: The VMA descriptor to prepare for @action.
+ * @action: The action to perform.
+ */
+void mmap_action_prepare(struct mmap_action *action,
+ struct vm_area_desc *desc)
+{
+ switch (action->type) {
+ case MMAP_NOTHING:
+ break;
+ case MMAP_REMAP_PFN:
+ remap_pfn_range_prepare(desc, action->remap.start_pfn);
+ break;
+ case MMAP_IO_REMAP_PFN:
+ io_remap_pfn_range_prepare(desc, action->remap.start_pfn,
+ action->remap.size);
+ break;
+ }
+}
+EXPORT_SYMBOL(mmap_action_prepare);
+
+/**
+ * mmap_action_complete - Execute VMA descriptor action.
+ * @action: The action to perform.
+ * @vma: The VMA to perform the action upon.
+ *
+ * Similar to mmap_action_prepare().
+ *
+ * Return: 0 on success, or error, at which point the VMA will be unmapped.
+ */
+int mmap_action_complete(struct mmap_action *action,
+ struct vm_area_struct *vma)
+{
+ int err = 0;
+
+ switch (action->type) {
+ case MMAP_NOTHING:
+ break;
+ case MMAP_REMAP_PFN:
+ err = remap_pfn_range_complete(vma, action->remap.start,
+ action->remap.start_pfn, action->remap.size,
+ action->remap.pgprot);
+ break;
+ case MMAP_IO_REMAP_PFN:
+ err = io_remap_pfn_range_complete(vma, action->remap.start,
+ action->remap.start_pfn, action->remap.size,
+ action->remap.pgprot);
+ break;
+ }
+
+ return mmap_action_finish(action, vma, err);
+}
+EXPORT_SYMBOL(mmap_action_complete);
+#else
+void mmap_action_prepare(struct mmap_action *action,
+ struct vm_area_desc *desc)
+{
+ switch (action->type) {
+ case MMAP_NOTHING:
+ break;
+ case MMAP_REMAP_PFN:
+ case MMAP_IO_REMAP_PFN:
+ WARN_ON_ONCE(1); /* nommu cannot handle these. */
+ break;
+ }
+}
+EXPORT_SYMBOL(mmap_action_prepare);
+
+int mmap_action_complete(struct mmap_action *action,
+ struct vm_area_struct *vma)
+{
+ switch (action->type) {
+ case MMAP_NOTHING:
+ break;
+ case MMAP_REMAP_PFN:
+ case MMAP_IO_REMAP_PFN:
+ WARN_ON_ONCE(1); /* nommu cannot handle this. */
+
+ break;
+ }
+
+ return mmap_action_finish(action, vma, /* err = */0);
+}
+EXPORT_SYMBOL(mmap_action_complete);
+#endif
+
#ifdef CONFIG_MMU
/**
* folio_pte_batch - detect a PTE batch for a large folio
diff --git a/mm/vma.c b/mm/vma.c
index bdb070a62a2e..1be297f7bb00 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -2328,17 +2328,33 @@ static void update_ksm_flags(struct mmap_state *map)
map->vm_flags = ksm_vma_flags(map->mm, map->file, map->vm_flags);
}
+static void set_desc_from_map(struct vm_area_desc *desc,
+ const struct mmap_state *map)
+{
+ desc->start = map->addr;
+ desc->end = map->end;
+
+ desc->pgoff = map->pgoff;
+ desc->vm_file = map->file;
+ desc->vm_flags = map->vm_flags;
+ desc->page_prot = map->page_prot;
+}
+
/*
* __mmap_setup() - Prepare to gather any overlapping VMAs that need to be
* unmapped once the map operation is completed, check limits, account mapping
* and clean up any pre-existing VMAs.
*
+ * As a result it sets up the @map and @desc objects.
+ *
* @map: Mapping state.
+ * @desc: VMA descriptor
* @uf: Userfaultfd context list.
*
* Returns: 0 on success, error code otherwise.
*/
-static int __mmap_setup(struct mmap_state *map, struct list_head *uf)
+static int __mmap_setup(struct mmap_state *map, struct vm_area_desc *desc,
+ struct list_head *uf)
{
int error;
struct vma_iterator *vmi = map->vmi;
@@ -2395,6 +2411,7 @@ static int __mmap_setup(struct mmap_state *map, struct list_head *uf)
*/
vms_clean_up_area(vms, &map->mas_detach);
+ set_desc_from_map(desc, map);
return 0;
}
@@ -2567,34 +2584,26 @@ static void __mmap_complete(struct mmap_state *map, struct vm_area_struct *vma)
*
* Returns 0 on success, or an error code otherwise.
*/
-static int call_mmap_prepare(struct mmap_state *map)
+static int call_mmap_prepare(struct mmap_state *map,
+ struct vm_area_desc *desc)
{
int err;
- struct vm_area_desc desc = {
- .mm = map->mm,
- .file = map->file,
- .start = map->addr,
- .end = map->end,
-
- .pgoff = map->pgoff,
- .vm_file = map->file,
- .vm_flags = map->vm_flags,
- .page_prot = map->page_prot,
- };
/* Invoke the hook. */
- err = vfs_mmap_prepare(map->file, &desc);
+ err = vfs_mmap_prepare(map->file, desc);
if (err)
return err;
+ mmap_action_prepare(&desc->action, desc);
+
/* Update fields permitted to be changed. */
- map->pgoff = desc.pgoff;
- map->file = desc.vm_file;
- map->vm_flags = desc.vm_flags;
- map->page_prot = desc.page_prot;
+ map->pgoff = desc->pgoff;
+ map->file = desc->vm_file;
+ map->vm_flags = desc->vm_flags;
+ map->page_prot = desc->page_prot;
/* User-defined fields. */
- map->vm_ops = desc.vm_ops;
- map->vm_private_data = desc.private_data;
+ map->vm_ops = desc->vm_ops;
+ map->vm_private_data = desc->private_data;
return 0;
}
@@ -2642,16 +2651,24 @@ static unsigned long __mmap_region(struct file *file, unsigned long addr,
{
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma = NULL;
- int error;
bool have_mmap_prepare = file && file->f_op->mmap_prepare;
VMA_ITERATOR(vmi, mm, addr);
MMAP_STATE(map, mm, &vmi, addr, len, pgoff, vm_flags, file);
+ struct vm_area_desc desc = {
+ .mm = mm,
+ .file = file,
+ .action = {
+ .type = MMAP_NOTHING, /* Default to no further action. */
+ },
+ };
+ bool allocated_new = false;
+ int error;
map.check_ksm_early = can_set_ksm_flags_early(&map);
- error = __mmap_setup(&map, uf);
+ error = __mmap_setup(&map, &desc, uf);
if (!error && have_mmap_prepare)
- error = call_mmap_prepare(&map);
+ error = call_mmap_prepare(&map, &desc);
if (error)
goto abort_munmap;
@@ -2670,6 +2687,7 @@ static unsigned long __mmap_region(struct file *file, unsigned long addr,
error = __mmap_new_vma(&map, &vma);
if (error)
goto unacct_error;
+ allocated_new = true;
}
if (have_mmap_prepare)
@@ -2677,6 +2695,12 @@ static unsigned long __mmap_region(struct file *file, unsigned long addr,
__mmap_complete(&map, vma);
+ if (have_mmap_prepare && allocated_new) {
+ error = mmap_action_complete(&desc.action, vma);
+ if (error)
+ return error;
+ }
+
return addr;
/* Accounting was done by __mmap_setup(). */
diff --git a/tools/testing/vma/vma_internal.h b/tools/testing/vma/vma_internal.h
index 07167446dcf4..22ed38e8714e 100644
--- a/tools/testing/vma/vma_internal.h
+++ b/tools/testing/vma/vma_internal.h
@@ -274,6 +274,49 @@ struct mm_struct {
struct vm_area_struct;
+
+/* What action should be taken after an .mmap_prepare call is complete? */
+enum mmap_action_type {
+ MMAP_NOTHING, /* Mapping is complete, no further action. */
+ MMAP_REMAP_PFN, /* Remap PFN range. */
+};
+
+/*
+ * Describes an action an mmap_prepare hook can instruct to be taken to complete
+ * the mapping of a VMA. Specified in vm_area_desc.
+ */
+struct mmap_action {
+ union {
+ /* Remap range. */
+ struct {
+ unsigned long start;
+ unsigned long start_pfn;
+ unsigned long size;
+ pgprot_t pgprot;
+ } remap;
+ };
+ enum mmap_action_type type;
+
+ /*
+ * If specified, this hook is invoked after the selected action has been
+ * successfully completed. Note that the VMA write lock still held.
+ *
+ * The absolute minimum ought to be done here.
+ *
+ * Returns 0 on success, or an error code.
+ */
+ int (*success_hook)(const struct vm_area_struct *vma);
+
+ /*
+ * If specified, this hook is invoked when an error occurred when
+ * attempting the selection action.
+ *
+ * The hook can return an error code in order to filter the error, but
+ * it is not valid to clear the error here.
+ */
+ int (*error_hook)(int err);
+};
+
/*
* Describes a VMA that is about to be mmap()'ed. Drivers may choose to
* manipulate mutable fields which will cause those fields to be updated in the
@@ -297,6 +340,9 @@ struct vm_area_desc {
/* Write-only fields. */
const struct vm_operations_struct *vm_ops;
void *private_data;
+
+ /* Take further action? */
+ struct mmap_action action;
};
struct file_operations {
@@ -1466,12 +1512,23 @@ static inline void free_anon_vma_name(struct vm_area_struct *vma)
static inline void set_vma_from_desc(struct vm_area_struct *vma,
struct vm_area_desc *desc);
-static inline int __compat_vma_mmap_prepare(const struct file_operations *f_op,
+static inline void mmap_action_prepare(struct mmap_action *action,
+ struct vm_area_desc *desc)
+{
+}
+
+static inline int mmap_action_complete(struct mmap_action *action,
+ struct vm_area_struct *vma)
+{
+ return 0;
+}
+
+static inline int __compat_vma_mmap(const struct file_operations *f_op,
struct file *file, struct vm_area_struct *vma)
{
struct vm_area_desc desc = {
.mm = vma->vm_mm,
- .file = vma->vm_file,
+ .file = file,
.start = vma->vm_start,
.end = vma->vm_end,
@@ -1479,21 +1536,24 @@ static inline int __compat_vma_mmap_prepare(const struct file_operations *f_op,
.vm_file = vma->vm_file,
.vm_flags = vma->vm_flags,
.page_prot = vma->vm_page_prot,
+
+ .action.type = MMAP_NOTHING, /* Default */
};
int err;
err = f_op->mmap_prepare(&desc);
if (err)
return err;
- set_vma_from_desc(vma, &desc);
- return 0;
+ mmap_action_prepare(&desc.action, &desc);
+ set_vma_from_desc(vma, &desc);
+ return mmap_action_complete(&desc.action, vma);
}
-static inline int compat_vma_mmap_prepare(struct file *file,
+static inline int compat_vma_mmap(struct file *file,
struct vm_area_struct *vma)
{
- return __compat_vma_mmap_prepare(file->f_op, file, vma);
+ return __compat_vma_mmap(file->f_op, file, vma);
}
/* Did the driver provide valid mmap hook configuration? */
@@ -1514,7 +1574,7 @@ static inline bool can_mmap_file(struct file *file)
static inline int vfs_mmap(struct file *file, struct vm_area_struct *vma)
{
if (file->f_op->mmap_prepare)
- return compat_vma_mmap_prepare(file, vma);
+ return compat_vma_mmap(file, vma);
return file->f_op->mmap(file, vma);
}
@@ -1548,4 +1608,20 @@ static inline vm_flags_t ksm_vma_flags(const struct mm_struct *, const struct fi
return vm_flags;
}
+static inline void remap_pfn_range_prepare(struct vm_area_desc *desc, unsigned long pfn)
+{
+}
+
+static inline int remap_pfn_range_complete(struct vm_area_struct *vma, unsigned long addr,
+ unsigned long pfn, unsigned long size, pgprot_t pgprot)
+{
+ return 0;
+}
+
+static inline int do_munmap(struct mm_struct *, unsigned long, size_t,
+ struct list_head *uf)
+{
+ return 0;
+}
+
#endif /* __MM_VMA_INTERNAL_H */
--
2.51.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v4 10/14] doc: update porting, vfs documentation for mmap_prepare actions
2025-09-17 19:11 [PATCH v4 00/14] expand mmap_prepare functionality, port more users Lorenzo Stoakes
` (8 preceding siblings ...)
2025-09-17 19:11 ` [PATCH v4 09/14] mm: add ability to take further action in vm_area_desc Lorenzo Stoakes
@ 2025-09-17 19:11 ` Lorenzo Stoakes
2025-09-17 19:11 ` [PATCH v4 11/14] mm/hugetlbfs: update hugetlbfs to use mmap_prepare Lorenzo Stoakes
` (4 subsequent siblings)
14 siblings, 0 replies; 29+ messages in thread
From: Lorenzo Stoakes @ 2025-09-17 19:11 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Matthew Wilcox, Guo Ren, Thomas Bogendoerfer,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, David S . Miller,
Andreas Larsson, Arnd Bergmann, Greg Kroah-Hartman, Dan Williams,
Vishal Verma, Dave Jiang, Nicolas Pitre, Muchun Song,
Oscar Salvador, David Hildenbrand, Konstantin Komarov, Baoquan He,
Vivek Goyal, Dave Young, Tony Luck, Reinette Chatre, Dave Martin,
James Morse, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Hugh Dickins, Baolin Wang,
Uladzislau Rezki, Dmitry Vyukov, Andrey Konovalov, Jann Horn,
Pedro Falcato, linux-doc, linux-kernel, linux-fsdevel, linux-csky,
linux-mips, linux-s390, sparclinux, nvdimm, linux-cxl, linux-mm,
ntfs3, kexec, kasan-dev, Jason Gunthorpe, iommu, Kevin Tian,
Will Deacon, Robin Murphy
Now we have introduced the ability to specify that actions should be taken
after a VMA is established via the vm_area_desc->action field as specified
in mmap_prepare, update both the VFS documentation and the porting guide
to describe this.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
Documentation/filesystems/porting.rst | 5 +++++
Documentation/filesystems/vfs.rst | 4 ++++
2 files changed, 9 insertions(+)
diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst
index 85f590254f07..6743ed0b9112 100644
--- a/Documentation/filesystems/porting.rst
+++ b/Documentation/filesystems/porting.rst
@@ -1285,3 +1285,8 @@ rather than a VMA, as the VMA at this stage is not yet valid.
The vm_area_desc provides the minimum required information for a filesystem
to initialise state upon memory mapping of a file-backed region, and output
parameters for the file system to set this state.
+
+In nearly all cases, this is all that is required for a filesystem. However, if
+a filesystem needs to perform an operation such a pre-population of page tables,
+then that action can be specified in the vm_area_desc->action field, which can
+be configured using the mmap_action_*() helpers.
diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
index 486a91633474..9e96c46ee10e 100644
--- a/Documentation/filesystems/vfs.rst
+++ b/Documentation/filesystems/vfs.rst
@@ -1236,6 +1236,10 @@ otherwise noted.
file-backed memory mapping, most notably establishing relevant
private state and VMA callbacks.
+ If further action such as pre-population of page tables is required,
+ this can be specified by the vm_area_desc->action field and related
+ parameters.
+
Note that the file operations are implemented by the specific
filesystem in which the inode resides. When opening a device node
(character or block special) most filesystems will call special
--
2.51.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v4 11/14] mm/hugetlbfs: update hugetlbfs to use mmap_prepare
2025-09-17 19:11 [PATCH v4 00/14] expand mmap_prepare functionality, port more users Lorenzo Stoakes
` (9 preceding siblings ...)
2025-09-17 19:11 ` [PATCH v4 10/14] doc: update porting, vfs documentation for mmap_prepare actions Lorenzo Stoakes
@ 2025-09-17 19:11 ` Lorenzo Stoakes
2025-09-23 11:52 ` Sumanth Korikkar
2025-09-17 19:11 ` [PATCH v4 12/14] mm: add shmem_zero_setup_desc() Lorenzo Stoakes
` (3 subsequent siblings)
14 siblings, 1 reply; 29+ messages in thread
From: Lorenzo Stoakes @ 2025-09-17 19:11 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Matthew Wilcox, Guo Ren, Thomas Bogendoerfer,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, David S . Miller,
Andreas Larsson, Arnd Bergmann, Greg Kroah-Hartman, Dan Williams,
Vishal Verma, Dave Jiang, Nicolas Pitre, Muchun Song,
Oscar Salvador, David Hildenbrand, Konstantin Komarov, Baoquan He,
Vivek Goyal, Dave Young, Tony Luck, Reinette Chatre, Dave Martin,
James Morse, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Hugh Dickins, Baolin Wang,
Uladzislau Rezki, Dmitry Vyukov, Andrey Konovalov, Jann Horn,
Pedro Falcato, linux-doc, linux-kernel, linux-fsdevel, linux-csky,
linux-mips, linux-s390, sparclinux, nvdimm, linux-cxl, linux-mm,
ntfs3, kexec, kasan-dev, Jason Gunthorpe, iommu, Kevin Tian,
Will Deacon, Robin Murphy
Since we can now perform actions after the VMA is established via
mmap_prepare, use desc->action_success_hook to set up the hugetlb lock
once the VMA is setup.
We also make changes throughout hugetlbfs to make this possible.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
---
fs/hugetlbfs/inode.c | 36 ++++++++++------
include/linux/hugetlb.h | 9 +++-
include/linux/hugetlb_inline.h | 15 ++++---
mm/hugetlb.c | 77 ++++++++++++++++++++--------------
4 files changed, 85 insertions(+), 52 deletions(-)
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index f42548ee9083..9e0625167517 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -96,8 +96,15 @@ static const struct fs_parameter_spec hugetlb_fs_parameters[] = {
#define PGOFF_LOFFT_MAX \
(((1UL << (PAGE_SHIFT + 1)) - 1) << (BITS_PER_LONG - (PAGE_SHIFT + 1)))
-static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
+static int hugetlb_file_mmap_prepare_success(const struct vm_area_struct *vma)
{
+ /* Unfortunate we have to reassign vma->vm_private_data. */
+ return hugetlb_vma_lock_alloc((struct vm_area_struct *)vma);
+}
+
+static int hugetlbfs_file_mmap_prepare(struct vm_area_desc *desc)
+{
+ struct file *file = desc->file;
struct inode *inode = file_inode(file);
loff_t len, vma_len;
int ret;
@@ -112,8 +119,8 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
* way when do_mmap unwinds (may be important on powerpc
* and ia64).
*/
- vm_flags_set(vma, VM_HUGETLB | VM_DONTEXPAND);
- vma->vm_ops = &hugetlb_vm_ops;
+ desc->vm_flags |= VM_HUGETLB | VM_DONTEXPAND;
+ desc->vm_ops = &hugetlb_vm_ops;
/*
* page based offset in vm_pgoff could be sufficiently large to
@@ -122,16 +129,16 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
* sizeof(unsigned long). So, only check in those instances.
*/
if (sizeof(unsigned long) == sizeof(loff_t)) {
- if (vma->vm_pgoff & PGOFF_LOFFT_MAX)
+ if (desc->pgoff & PGOFF_LOFFT_MAX)
return -EINVAL;
}
/* must be huge page aligned */
- if (vma->vm_pgoff & (~huge_page_mask(h) >> PAGE_SHIFT))
+ if (desc->pgoff & (~huge_page_mask(h) >> PAGE_SHIFT))
return -EINVAL;
- vma_len = (loff_t)(vma->vm_end - vma->vm_start);
- len = vma_len + ((loff_t)vma->vm_pgoff << PAGE_SHIFT);
+ vma_len = (loff_t)vma_desc_size(desc);
+ len = vma_len + ((loff_t)desc->pgoff << PAGE_SHIFT);
/* check for overflow */
if (len < vma_len)
return -EINVAL;
@@ -141,7 +148,7 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
ret = -ENOMEM;
- vm_flags = vma->vm_flags;
+ vm_flags = desc->vm_flags;
/*
* for SHM_HUGETLB, the pages are reserved in the shmget() call so skip
* reserving here. Note: only for SHM hugetlbfs file, the inode
@@ -151,17 +158,20 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
vm_flags |= VM_NORESERVE;
if (hugetlb_reserve_pages(inode,
- vma->vm_pgoff >> huge_page_order(h),
- len >> huge_page_shift(h), vma,
- vm_flags) < 0)
+ desc->pgoff >> huge_page_order(h),
+ len >> huge_page_shift(h), desc,
+ vm_flags) < 0)
goto out;
ret = 0;
- if (vma->vm_flags & VM_WRITE && inode->i_size < len)
+ if ((desc->vm_flags & VM_WRITE) && inode->i_size < len)
i_size_write(inode, len);
out:
inode_unlock(inode);
+ /* Allocate the VMA lock after we set it up. */
+ if (!ret)
+ desc->action.success_hook = hugetlb_file_mmap_prepare_success;
return ret;
}
@@ -1221,7 +1231,7 @@ static void init_once(void *foo)
static const struct file_operations hugetlbfs_file_operations = {
.read_iter = hugetlbfs_read_iter,
- .mmap = hugetlbfs_file_mmap,
+ .mmap_prepare = hugetlbfs_file_mmap_prepare,
.fsync = noop_fsync,
.get_unmapped_area = hugetlb_get_unmapped_area,
.llseek = default_llseek,
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 8e63e46b8e1f..2387513d6ae5 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -150,8 +150,7 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte,
struct folio **foliop);
#endif /* CONFIG_USERFAULTFD */
long hugetlb_reserve_pages(struct inode *inode, long from, long to,
- struct vm_area_struct *vma,
- vm_flags_t vm_flags);
+ struct vm_area_desc *desc, vm_flags_t vm_flags);
long hugetlb_unreserve_pages(struct inode *inode, long start, long end,
long freed);
bool folio_isolate_hugetlb(struct folio *folio, struct list_head *list);
@@ -280,6 +279,7 @@ bool is_hugetlb_entry_hwpoisoned(pte_t pte);
void hugetlb_unshare_all_pmds(struct vm_area_struct *vma);
void fixup_hugetlb_reservations(struct vm_area_struct *vma);
void hugetlb_split(struct vm_area_struct *vma, unsigned long addr);
+int hugetlb_vma_lock_alloc(struct vm_area_struct *vma);
#else /* !CONFIG_HUGETLB_PAGE */
@@ -466,6 +466,11 @@ static inline void fixup_hugetlb_reservations(struct vm_area_struct *vma)
static inline void hugetlb_split(struct vm_area_struct *vma, unsigned long addr) {}
+static inline int hugetlb_vma_lock_alloc(struct vm_area_struct *vma)
+{
+ return 0;
+}
+
#endif /* !CONFIG_HUGETLB_PAGE */
#ifndef pgd_write
diff --git a/include/linux/hugetlb_inline.h b/include/linux/hugetlb_inline.h
index 0660a03d37d9..a27aa0162918 100644
--- a/include/linux/hugetlb_inline.h
+++ b/include/linux/hugetlb_inline.h
@@ -2,22 +2,27 @@
#ifndef _LINUX_HUGETLB_INLINE_H
#define _LINUX_HUGETLB_INLINE_H
-#ifdef CONFIG_HUGETLB_PAGE
-
#include <linux/mm.h>
-static inline bool is_vm_hugetlb_page(struct vm_area_struct *vma)
+#ifdef CONFIG_HUGETLB_PAGE
+
+static inline bool is_vm_hugetlb_flags(vm_flags_t vm_flags)
{
- return !!(vma->vm_flags & VM_HUGETLB);
+ return !!(vm_flags & VM_HUGETLB);
}
#else
-static inline bool is_vm_hugetlb_page(struct vm_area_struct *vma)
+static inline bool is_vm_hugetlb_flags(vm_flags_t vm_flags)
{
return false;
}
#endif
+static inline bool is_vm_hugetlb_page(struct vm_area_struct *vma)
+{
+ return is_vm_hugetlb_flags(vma->vm_flags);
+}
+
#endif
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 1806685ea326..af28f7fbabb8 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -119,7 +119,6 @@ struct mutex *hugetlb_fault_mutex_table __ro_after_init;
/* Forward declaration */
static int hugetlb_acct_memory(struct hstate *h, long delta);
static void hugetlb_vma_lock_free(struct vm_area_struct *vma);
-static void hugetlb_vma_lock_alloc(struct vm_area_struct *vma);
static void __hugetlb_vma_unlock_write_free(struct vm_area_struct *vma);
static void hugetlb_unshare_pmds(struct vm_area_struct *vma,
unsigned long start, unsigned long end, bool take_locks);
@@ -427,17 +426,21 @@ static void hugetlb_vma_lock_free(struct vm_area_struct *vma)
}
}
-static void hugetlb_vma_lock_alloc(struct vm_area_struct *vma)
+/*
+ * vma specific semaphore used for pmd sharing and fault/truncation
+ * synchronization
+ */
+int hugetlb_vma_lock_alloc(struct vm_area_struct *vma)
{
struct hugetlb_vma_lock *vma_lock;
/* Only establish in (flags) sharable vmas */
if (!vma || !(vma->vm_flags & VM_MAYSHARE))
- return;
+ return 0;
/* Should never get here with non-NULL vm_private_data */
if (vma->vm_private_data)
- return;
+ return -EINVAL;
vma_lock = kmalloc(sizeof(*vma_lock), GFP_KERNEL);
if (!vma_lock) {
@@ -452,13 +455,15 @@ static void hugetlb_vma_lock_alloc(struct vm_area_struct *vma)
* allocation failure.
*/
pr_warn_once("HugeTLB: unable to allocate vma specific lock\n");
- return;
+ return -EINVAL;
}
kref_init(&vma_lock->refs);
init_rwsem(&vma_lock->rw_sema);
vma_lock->vma = vma;
vma->vm_private_data = vma_lock;
+
+ return 0;
}
/* Helper that removes a struct file_region from the resv_map cache and returns
@@ -1190,20 +1195,28 @@ static struct resv_map *vma_resv_map(struct vm_area_struct *vma)
}
}
-static void set_vma_resv_map(struct vm_area_struct *vma, struct resv_map *map)
+static void set_vma_resv_flags(struct vm_area_struct *vma, unsigned long flags)
{
- VM_BUG_ON_VMA(!is_vm_hugetlb_page(vma), vma);
- VM_BUG_ON_VMA(vma->vm_flags & VM_MAYSHARE, vma);
+ VM_WARN_ON_ONCE_VMA(!is_vm_hugetlb_page(vma), vma);
+ VM_WARN_ON_ONCE_VMA(vma->vm_flags & VM_MAYSHARE, vma);
- set_vma_private_data(vma, (unsigned long)map);
+ set_vma_private_data(vma, get_vma_private_data(vma) | flags);
}
-static void set_vma_resv_flags(struct vm_area_struct *vma, unsigned long flags)
+static void set_vma_desc_resv_map(struct vm_area_desc *desc, struct resv_map *map)
{
- VM_BUG_ON_VMA(!is_vm_hugetlb_page(vma), vma);
- VM_BUG_ON_VMA(vma->vm_flags & VM_MAYSHARE, vma);
+ VM_WARN_ON_ONCE(!is_vm_hugetlb_flags(desc->vm_flags));
+ VM_WARN_ON_ONCE(desc->vm_flags & VM_MAYSHARE);
- set_vma_private_data(vma, get_vma_private_data(vma) | flags);
+ desc->private_data = map;
+}
+
+static void set_vma_desc_resv_flags(struct vm_area_desc *desc, unsigned long flags)
+{
+ VM_WARN_ON_ONCE(!is_vm_hugetlb_flags(desc->vm_flags));
+ VM_WARN_ON_ONCE(desc->vm_flags & VM_MAYSHARE);
+
+ desc->private_data = (void *)((unsigned long)desc->private_data | flags);
}
static int is_vma_resv_set(struct vm_area_struct *vma, unsigned long flag)
@@ -1213,6 +1226,13 @@ static int is_vma_resv_set(struct vm_area_struct *vma, unsigned long flag)
return (get_vma_private_data(vma) & flag) != 0;
}
+static bool is_vma_desc_resv_set(struct vm_area_desc *desc, unsigned long flag)
+{
+ VM_WARN_ON_ONCE(!is_vm_hugetlb_flags(desc->vm_flags));
+
+ return ((unsigned long)desc->private_data) & flag;
+}
+
bool __vma_private_lock(struct vm_area_struct *vma)
{
return !(vma->vm_flags & VM_MAYSHARE) &&
@@ -7250,9 +7270,9 @@ long hugetlb_change_protection(struct vm_area_struct *vma,
*/
long hugetlb_reserve_pages(struct inode *inode,
- long from, long to,
- struct vm_area_struct *vma,
- vm_flags_t vm_flags)
+ long from, long to,
+ struct vm_area_desc *desc,
+ vm_flags_t vm_flags)
{
long chg = -1, add = -1, spool_resv, gbl_resv;
struct hstate *h = hstate_inode(inode);
@@ -7267,12 +7287,6 @@ long hugetlb_reserve_pages(struct inode *inode,
return -EINVAL;
}
- /*
- * vma specific semaphore used for pmd sharing and fault/truncation
- * synchronization
- */
- hugetlb_vma_lock_alloc(vma);
-
/*
* Only apply hugepage reservation if asked. At fault time, an
* attempt will be made for VM_NORESERVE to allocate a page
@@ -7285,9 +7299,9 @@ long hugetlb_reserve_pages(struct inode *inode,
* Shared mappings base their reservation on the number of pages that
* are already allocated on behalf of the file. Private mappings need
* to reserve the full area even if read-only as mprotect() may be
- * called to make the mapping read-write. Assume !vma is a shm mapping
+ * called to make the mapping read-write. Assume !desc is a shm mapping
*/
- if (!vma || vma->vm_flags & VM_MAYSHARE) {
+ if (!desc || desc->vm_flags & VM_MAYSHARE) {
/*
* resv_map can not be NULL as hugetlb_reserve_pages is only
* called for inodes for which resv_maps were created (see
@@ -7304,8 +7318,8 @@ long hugetlb_reserve_pages(struct inode *inode,
chg = to - from;
- set_vma_resv_map(vma, resv_map);
- set_vma_resv_flags(vma, HPAGE_RESV_OWNER);
+ set_vma_desc_resv_map(desc, resv_map);
+ set_vma_desc_resv_flags(desc, HPAGE_RESV_OWNER);
}
if (chg < 0)
@@ -7315,7 +7329,7 @@ long hugetlb_reserve_pages(struct inode *inode,
chg * pages_per_huge_page(h), &h_cg) < 0)
goto out_err;
- if (vma && !(vma->vm_flags & VM_MAYSHARE) && h_cg) {
+ if (desc && !(desc->vm_flags & VM_MAYSHARE) && h_cg) {
/* For private mappings, the hugetlb_cgroup uncharge info hangs
* of the resv_map.
*/
@@ -7349,7 +7363,7 @@ long hugetlb_reserve_pages(struct inode *inode,
* consumed reservations are stored in the map. Hence, nothing
* else has to be done for private mappings here
*/
- if (!vma || vma->vm_flags & VM_MAYSHARE) {
+ if (!desc || desc->vm_flags & VM_MAYSHARE) {
add = region_add(resv_map, from, to, regions_needed, h, h_cg);
if (unlikely(add < 0)) {
@@ -7403,16 +7417,15 @@ long hugetlb_reserve_pages(struct inode *inode,
hugetlb_cgroup_uncharge_cgroup_rsvd(hstate_index(h),
chg * pages_per_huge_page(h), h_cg);
out_err:
- hugetlb_vma_lock_free(vma);
- if (!vma || vma->vm_flags & VM_MAYSHARE)
+ if (!desc || desc->vm_flags & VM_MAYSHARE)
/* Only call region_abort if the region_chg succeeded but the
* region_add failed or didn't run.
*/
if (chg >= 0 && add < 0)
region_abort(resv_map, from, to, regions_needed);
- if (vma && is_vma_resv_set(vma, HPAGE_RESV_OWNER)) {
+ if (desc && is_vma_desc_resv_set(desc, HPAGE_RESV_OWNER)) {
kref_put(&resv_map->refs, resv_map_release);
- set_vma_resv_map(vma, NULL);
+ set_vma_desc_resv_map(desc, NULL);
}
return chg < 0 ? chg : add < 0 ? add : -EINVAL;
}
--
2.51.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v4 12/14] mm: add shmem_zero_setup_desc()
2025-09-17 19:11 [PATCH v4 00/14] expand mmap_prepare functionality, port more users Lorenzo Stoakes
` (10 preceding siblings ...)
2025-09-17 19:11 ` [PATCH v4 11/14] mm/hugetlbfs: update hugetlbfs to use mmap_prepare Lorenzo Stoakes
@ 2025-09-17 19:11 ` Lorenzo Stoakes
2025-09-17 21:38 ` Jason Gunthorpe
2025-09-17 19:11 ` [PATCH v4 13/14] mm: update mem char driver to use mmap_prepare Lorenzo Stoakes
` (2 subsequent siblings)
14 siblings, 1 reply; 29+ messages in thread
From: Lorenzo Stoakes @ 2025-09-17 19:11 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Matthew Wilcox, Guo Ren, Thomas Bogendoerfer,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, David S . Miller,
Andreas Larsson, Arnd Bergmann, Greg Kroah-Hartman, Dan Williams,
Vishal Verma, Dave Jiang, Nicolas Pitre, Muchun Song,
Oscar Salvador, David Hildenbrand, Konstantin Komarov, Baoquan He,
Vivek Goyal, Dave Young, Tony Luck, Reinette Chatre, Dave Martin,
James Morse, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Hugh Dickins, Baolin Wang,
Uladzislau Rezki, Dmitry Vyukov, Andrey Konovalov, Jann Horn,
Pedro Falcato, linux-doc, linux-kernel, linux-fsdevel, linux-csky,
linux-mips, linux-s390, sparclinux, nvdimm, linux-cxl, linux-mm,
ntfs3, kexec, kasan-dev, Jason Gunthorpe, iommu, Kevin Tian,
Will Deacon, Robin Murphy
Add the ability to set up a shared anonymous mapping based on a VMA
descriptor rather than a VMA.
This is a prerequisite for converting to the char mm driver to use the
mmap_prepare hook.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
include/linux/shmem_fs.h | 3 ++-
mm/shmem.c | 41 ++++++++++++++++++++++++++++++++--------
2 files changed, 35 insertions(+), 9 deletions(-)
diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index 0e47465ef0fd..5b368f9549d6 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -94,7 +94,8 @@ extern struct file *shmem_kernel_file_setup(const char *name, loff_t size,
unsigned long flags);
extern struct file *shmem_file_setup_with_mnt(struct vfsmount *mnt,
const char *name, loff_t size, unsigned long flags);
-extern int shmem_zero_setup(struct vm_area_struct *);
+int shmem_zero_setup(struct vm_area_struct *vma);
+int shmem_zero_setup_desc(struct vm_area_desc *desc);
extern unsigned long shmem_get_unmapped_area(struct file *, unsigned long addr,
unsigned long len, unsigned long pgoff, unsigned long flags);
extern int shmem_lock(struct file *file, int lock, struct ucounts *ucounts);
diff --git a/mm/shmem.c b/mm/shmem.c
index df02a2e0ebbb..72aa176023de 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -5893,14 +5893,9 @@ struct file *shmem_file_setup_with_mnt(struct vfsmount *mnt, const char *name,
}
EXPORT_SYMBOL_GPL(shmem_file_setup_with_mnt);
-/**
- * shmem_zero_setup - setup a shared anonymous mapping
- * @vma: the vma to be mmapped is prepared by do_mmap
- */
-int shmem_zero_setup(struct vm_area_struct *vma)
+static struct file *__shmem_zero_setup(unsigned long start, unsigned long end, vm_flags_t vm_flags)
{
- struct file *file;
- loff_t size = vma->vm_end - vma->vm_start;
+ loff_t size = end - start;
/*
* Cloning a new file under mmap_lock leads to a lock ordering conflict
@@ -5908,7 +5903,18 @@ int shmem_zero_setup(struct vm_area_struct *vma)
* accessible to the user through its mapping, use S_PRIVATE flag to
* bypass file security, in the same way as shmem_kernel_file_setup().
*/
- file = shmem_kernel_file_setup("dev/zero", size, vma->vm_flags);
+ return shmem_kernel_file_setup("dev/zero", size, vm_flags);
+}
+
+/**
+ * shmem_zero_setup - setup a shared anonymous mapping
+ * @vma: the vma to be mmapped is prepared by do_mmap
+ * Returns: 0 on success, or error
+ */
+int shmem_zero_setup(struct vm_area_struct *vma)
+{
+ struct file *file = __shmem_zero_setup(vma->vm_start, vma->vm_end, vma->vm_flags);
+
if (IS_ERR(file))
return PTR_ERR(file);
@@ -5920,6 +5926,25 @@ int shmem_zero_setup(struct vm_area_struct *vma)
return 0;
}
+/**
+ * shmem_zero_setup_desc - same as shmem_zero_setup, but determined by VMA
+ * descriptor for convenience.
+ * @desc: Describes VMA
+ * Returns: 0 on success, or error
+ */
+int shmem_zero_setup_desc(struct vm_area_desc *desc)
+{
+ struct file *file = __shmem_zero_setup(desc->start, desc->end, desc->vm_flags);
+
+ if (IS_ERR(file))
+ return PTR_ERR(file);
+
+ desc->vm_file = file;
+ desc->vm_ops = &shmem_anon_vm_ops;
+
+ return 0;
+}
+
/**
* shmem_read_folio_gfp - read into page cache, using specified page allocation flags.
* @mapping: the folio's address_space
--
2.51.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v4 13/14] mm: update mem char driver to use mmap_prepare
2025-09-17 19:11 [PATCH v4 00/14] expand mmap_prepare functionality, port more users Lorenzo Stoakes
` (11 preceding siblings ...)
2025-09-17 19:11 ` [PATCH v4 12/14] mm: add shmem_zero_setup_desc() Lorenzo Stoakes
@ 2025-09-17 19:11 ` Lorenzo Stoakes
2025-09-17 19:11 ` [PATCH v4 14/14] mm: update resctl " Lorenzo Stoakes
2025-09-17 20:31 ` [PATCH v4 00/14] expand mmap_prepare functionality, port more users Andrew Morton
14 siblings, 0 replies; 29+ messages in thread
From: Lorenzo Stoakes @ 2025-09-17 19:11 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Matthew Wilcox, Guo Ren, Thomas Bogendoerfer,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, David S . Miller,
Andreas Larsson, Arnd Bergmann, Greg Kroah-Hartman, Dan Williams,
Vishal Verma, Dave Jiang, Nicolas Pitre, Muchun Song,
Oscar Salvador, David Hildenbrand, Konstantin Komarov, Baoquan He,
Vivek Goyal, Dave Young, Tony Luck, Reinette Chatre, Dave Martin,
James Morse, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Hugh Dickins, Baolin Wang,
Uladzislau Rezki, Dmitry Vyukov, Andrey Konovalov, Jann Horn,
Pedro Falcato, linux-doc, linux-kernel, linux-fsdevel, linux-csky,
linux-mips, linux-s390, sparclinux, nvdimm, linux-cxl, linux-mm,
ntfs3, kexec, kasan-dev, Jason Gunthorpe, iommu, Kevin Tian,
Will Deacon, Robin Murphy
Update the mem char driver (backing /dev/mem and /dev/zero) to use
f_op->mmap_prepare hook rather than the deprecated f_op->mmap.
The /dev/zero implementation has a very unique and rather concerning
characteristic in that it converts MAP_PRIVATE mmap() mappings anonymous
when they are, in fact, not.
The new f_op->mmap_prepare() can support this, but rather than introducing
a helper function to perform this hack (and risk introducing other users),
utilise the success hook to do so.
We utilise the newly introduced shmem_zero_setup_desc() to allow for the
shared mapping case via an f_op->mmap_prepare() hook.
We also use the desc->action_error_hook to filter the remap error to
-EAGAIN to keep behaviour consistent.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/char/mem.c | 84 +++++++++++++++++++++++++++-------------------
1 file changed, 50 insertions(+), 34 deletions(-)
diff --git a/drivers/char/mem.c b/drivers/char/mem.c
index 34b815901b20..b67feb74b5da 100644
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -304,13 +304,13 @@ static unsigned zero_mmap_capabilities(struct file *file)
}
/* can't do an in-place private mapping if there's no MMU */
-static inline int private_mapping_ok(struct vm_area_struct *vma)
+static inline int private_mapping_ok(struct vm_area_desc *desc)
{
- return is_nommu_shared_mapping(vma->vm_flags);
+ return is_nommu_shared_mapping(desc->vm_flags);
}
#else
-static inline int private_mapping_ok(struct vm_area_struct *vma)
+static inline int private_mapping_ok(struct vm_area_desc *desc)
{
return 1;
}
@@ -322,46 +322,49 @@ static const struct vm_operations_struct mmap_mem_ops = {
#endif
};
-static int mmap_mem(struct file *file, struct vm_area_struct *vma)
+static int mmap_filter_error(int err)
{
- size_t size = vma->vm_end - vma->vm_start;
- phys_addr_t offset = (phys_addr_t)vma->vm_pgoff << PAGE_SHIFT;
+ return -EAGAIN;
+}
+
+static int mmap_mem_prepare(struct vm_area_desc *desc)
+{
+ struct file *file = desc->file;
+ const size_t size = vma_desc_size(desc);
+ const phys_addr_t offset = (phys_addr_t)desc->pgoff << PAGE_SHIFT;
/* Does it even fit in phys_addr_t? */
- if (offset >> PAGE_SHIFT != vma->vm_pgoff)
+ if (offset >> PAGE_SHIFT != desc->pgoff)
return -EINVAL;
/* It's illegal to wrap around the end of the physical address space. */
if (offset + (phys_addr_t)size - 1 < offset)
return -EINVAL;
- if (!valid_mmap_phys_addr_range(vma->vm_pgoff, size))
+ if (!valid_mmap_phys_addr_range(desc->pgoff, size))
return -EINVAL;
- if (!private_mapping_ok(vma))
+ if (!private_mapping_ok(desc))
return -ENOSYS;
- if (!range_is_allowed(vma->vm_pgoff, size))
+ if (!range_is_allowed(desc->pgoff, size))
return -EPERM;
- if (!phys_mem_access_prot_allowed(file, vma->vm_pgoff, size,
- &vma->vm_page_prot))
+ if (!phys_mem_access_prot_allowed(file, desc->pgoff, size,
+ &desc->page_prot))
return -EINVAL;
- vma->vm_page_prot = phys_mem_access_prot(file, vma->vm_pgoff,
- size,
- vma->vm_page_prot);
+ desc->page_prot = phys_mem_access_prot(file, desc->pgoff,
+ size,
+ desc->page_prot);
- vma->vm_ops = &mmap_mem_ops;
+ desc->vm_ops = &mmap_mem_ops;
+
+ /* Remap-pfn-range will mark the range VM_IO. */
+ mmap_action_remap_full(desc, desc->pgoff);
+ /* We filter remap errors to -EAGAIN. */
+ desc->action.error_hook = mmap_filter_error;
- /* Remap-pfn-range will mark the range VM_IO */
- if (remap_pfn_range(vma,
- vma->vm_start,
- vma->vm_pgoff,
- size,
- vma->vm_page_prot)) {
- return -EAGAIN;
- }
return 0;
}
@@ -501,14 +504,26 @@ static ssize_t read_zero(struct file *file, char __user *buf,
return cleared;
}
-static int mmap_zero(struct file *file, struct vm_area_struct *vma)
+static int mmap_zero_private_success(const struct vm_area_struct *vma)
+{
+ /*
+ * This is a highly unique situation where we mark a MAP_PRIVATE mapping
+ * of /dev/zero anonymous, despite it not being.
+ */
+ vma_set_anonymous((struct vm_area_struct *)vma);
+
+ return 0;
+}
+
+static int mmap_zero_prepare(struct vm_area_desc *desc)
{
#ifndef CONFIG_MMU
return -ENOSYS;
#endif
- if (vma->vm_flags & VM_SHARED)
- return shmem_zero_setup(vma);
- vma_set_anonymous(vma);
+ if (desc->vm_flags & VM_SHARED)
+ return shmem_zero_setup_desc(desc);
+
+ desc->action.success_hook = mmap_zero_private_success;
return 0;
}
@@ -526,10 +541,11 @@ static unsigned long get_unmapped_area_zero(struct file *file,
{
if (flags & MAP_SHARED) {
/*
- * mmap_zero() will call shmem_zero_setup() to create a file,
- * so use shmem's get_unmapped_area in case it can be huge;
- * and pass NULL for file as in mmap.c's get_unmapped_area(),
- * so as not to confuse shmem with our handle on "/dev/zero".
+ * mmap_zero_prepare() will call shmem_zero_setup() to create a
+ * file, so use shmem's get_unmapped_area in case it can be
+ * huge; and pass NULL for file as in mmap.c's
+ * get_unmapped_area(), so as not to confuse shmem with our
+ * handle on "/dev/zero".
*/
return shmem_get_unmapped_area(NULL, addr, len, pgoff, flags);
}
@@ -632,7 +648,7 @@ static const struct file_operations __maybe_unused mem_fops = {
.llseek = memory_lseek,
.read = read_mem,
.write = write_mem,
- .mmap = mmap_mem,
+ .mmap_prepare = mmap_mem_prepare,
.open = open_mem,
#ifndef CONFIG_MMU
.get_unmapped_area = get_unmapped_area_mem,
@@ -668,7 +684,7 @@ static const struct file_operations zero_fops = {
.write_iter = write_iter_zero,
.splice_read = copy_splice_read,
.splice_write = splice_write_zero,
- .mmap = mmap_zero,
+ .mmap_prepare = mmap_zero_prepare,
.get_unmapped_area = get_unmapped_area_zero,
#ifndef CONFIG_MMU
.mmap_capabilities = zero_mmap_capabilities,
--
2.51.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v4 14/14] mm: update resctl to use mmap_prepare
2025-09-17 19:11 [PATCH v4 00/14] expand mmap_prepare functionality, port more users Lorenzo Stoakes
` (12 preceding siblings ...)
2025-09-17 19:11 ` [PATCH v4 13/14] mm: update mem char driver to use mmap_prepare Lorenzo Stoakes
@ 2025-09-17 19:11 ` Lorenzo Stoakes
2025-09-17 20:31 ` [PATCH v4 00/14] expand mmap_prepare functionality, port more users Andrew Morton
14 siblings, 0 replies; 29+ messages in thread
From: Lorenzo Stoakes @ 2025-09-17 19:11 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Matthew Wilcox, Guo Ren, Thomas Bogendoerfer,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, David S . Miller,
Andreas Larsson, Arnd Bergmann, Greg Kroah-Hartman, Dan Williams,
Vishal Verma, Dave Jiang, Nicolas Pitre, Muchun Song,
Oscar Salvador, David Hildenbrand, Konstantin Komarov, Baoquan He,
Vivek Goyal, Dave Young, Tony Luck, Reinette Chatre, Dave Martin,
James Morse, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Hugh Dickins, Baolin Wang,
Uladzislau Rezki, Dmitry Vyukov, Andrey Konovalov, Jann Horn,
Pedro Falcato, linux-doc, linux-kernel, linux-fsdevel, linux-csky,
linux-mips, linux-s390, sparclinux, nvdimm, linux-cxl, linux-mm,
ntfs3, kexec, kasan-dev, Jason Gunthorpe, iommu, Kevin Tian,
Will Deacon, Robin Murphy
Make use of the ability to specify a remap action within mmap_prepare to
update the resctl pseudo-lock to use mmap_prepare in favour of the
deprecated mmap hook.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
---
fs/resctrl/pseudo_lock.c | 20 +++++++++-----------
1 file changed, 9 insertions(+), 11 deletions(-)
diff --git a/fs/resctrl/pseudo_lock.c b/fs/resctrl/pseudo_lock.c
index 87bbc2605de1..0bfc13c5b96d 100644
--- a/fs/resctrl/pseudo_lock.c
+++ b/fs/resctrl/pseudo_lock.c
@@ -995,10 +995,11 @@ static const struct vm_operations_struct pseudo_mmap_ops = {
.mremap = pseudo_lock_dev_mremap,
};
-static int pseudo_lock_dev_mmap(struct file *filp, struct vm_area_struct *vma)
+static int pseudo_lock_dev_mmap_prepare(struct vm_area_desc *desc)
{
- unsigned long vsize = vma->vm_end - vma->vm_start;
- unsigned long off = vma->vm_pgoff << PAGE_SHIFT;
+ unsigned long off = desc->pgoff << PAGE_SHIFT;
+ unsigned long vsize = vma_desc_size(desc);
+ struct file *filp = desc->file;
struct pseudo_lock_region *plr;
struct rdtgroup *rdtgrp;
unsigned long physical;
@@ -1043,7 +1044,7 @@ static int pseudo_lock_dev_mmap(struct file *filp, struct vm_area_struct *vma)
* Ensure changes are carried directly to the memory being mapped,
* do not allow copy-on-write mapping.
*/
- if (!(vma->vm_flags & VM_SHARED)) {
+ if (!(desc->vm_flags & VM_SHARED)) {
mutex_unlock(&rdtgroup_mutex);
return -EINVAL;
}
@@ -1055,12 +1056,9 @@ static int pseudo_lock_dev_mmap(struct file *filp, struct vm_area_struct *vma)
memset(plr->kmem + off, 0, vsize);
- if (remap_pfn_range(vma, vma->vm_start, physical + vma->vm_pgoff,
- vsize, vma->vm_page_prot)) {
- mutex_unlock(&rdtgroup_mutex);
- return -EAGAIN;
- }
- vma->vm_ops = &pseudo_mmap_ops;
+ desc->vm_ops = &pseudo_mmap_ops;
+ mmap_action_remap_full(desc, physical + desc->pgoff);
+
mutex_unlock(&rdtgroup_mutex);
return 0;
}
@@ -1071,7 +1069,7 @@ static const struct file_operations pseudo_lock_dev_fops = {
.write = NULL,
.open = pseudo_lock_dev_open,
.release = pseudo_lock_dev_release,
- .mmap = pseudo_lock_dev_mmap,
+ .mmap_prepare = pseudo_lock_dev_mmap_prepare,
};
int rdt_pseudo_lock_init(void)
--
2.51.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH v4 00/14] expand mmap_prepare functionality, port more users
2025-09-17 19:11 [PATCH v4 00/14] expand mmap_prepare functionality, port more users Lorenzo Stoakes
` (13 preceding siblings ...)
2025-09-17 19:11 ` [PATCH v4 14/14] mm: update resctl " Lorenzo Stoakes
@ 2025-09-17 20:31 ` Andrew Morton
14 siblings, 0 replies; 29+ messages in thread
From: Andrew Morton @ 2025-09-17 20:31 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Jonathan Corbet, Matthew Wilcox, Guo Ren, Thomas Bogendoerfer,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, David S . Miller,
Andreas Larsson, Arnd Bergmann, Greg Kroah-Hartman, Dan Williams,
Vishal Verma, Dave Jiang, Nicolas Pitre, Muchun Song,
Oscar Salvador, David Hildenbrand, Konstantin Komarov, Baoquan He,
Vivek Goyal, Dave Young, Tony Luck, Reinette Chatre, Dave Martin,
James Morse, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Hugh Dickins, Baolin Wang,
Uladzislau Rezki, Dmitry Vyukov, Andrey Konovalov, Jann Horn,
Pedro Falcato, linux-doc, linux-kernel, linux-fsdevel, linux-csky,
linux-mips, linux-s390, sparclinux, nvdimm, linux-cxl, linux-mm,
ntfs3, kexec, kasan-dev, Jason Gunthorpe, iommu, Kevin Tian,
Will Deacon, Robin Murphy
On Wed, 17 Sep 2025 20:11:02 +0100 Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote:
> Since commit c84bf6dd2b83 ("mm: introduce new .mmap_prepare() file
> callback"), The f_op->mmap hook has been deprecated in favour of
> f_op->mmap_prepare.
>
> This was introduced in order to make it possible for us to eventually
> eliminate the f_op->mmap hook which is highly problematic as it allows
> drivers and filesystems raw access to a VMA which is not yet correctly
> initialised.
>
> This hook also introduced complexity for the memory mapping operation, as
> we must correctly unwind what we do should an error arises.
>
> Overall this interface being so open has caused significant problems for
> us, including security issues, it is important for us to simply eliminate
> this as a source of problems.
>
> Therefore this series continues what was established by extending the
> functionality further to permit more drivers and filesystems to use
> mmap_prepare.
Thanks, I updated mm.git's mm-new branch to this version.
> v4:
> * Dropped accidentally still-included reference to mmap_abort() in the
> commit message for the patch in which remap_pfn_range_[prepare,
> complete]() are introduced as per Jason.
> * Avoided set_vma boolean parameter in remap_pfn_range_internal() as per
> Jason.
> * Further refactored remap_pfn_range() et al. as per Jason - couldn't make
> IS_ENABLED() work nicely, as have to declare remap_pfn_range_track()
> otherwise, so did least-nasty thing.
> * Abstracted I/O remap on PFN calculation as suggested by Jason, however do
> this more generally across io_remap_pfn_range() as a whole, before
> introducing prepare/complete variants.
> * Made [io_]remap_pfn_range_[prepare, complete]() internal-only as per
> Pedro.
> * Renamed [__]compat_vma_prepare to [__]compat_vma as per Jason.
> * Dropped duplicated debug check in mmap_action_complete() as per Jason.
> * Added MMAP_IO_REMAP_PFN action type as per Jason.
> * Various small refactorings as suggested by Jason.
> * Shared code between mmu and nommu mmap_action_complete() as per Jason.
> * Add missing return in kdoc for shmem_zero_setup().
> * Separate out introduction of shmem_zero_setup_desc() into another patch
> as per Jason.
> * Looked into Jason's request re: using shmem_zero_setup_desc() in vma.c -
> It isn't really worthwhile for now as we'd have to set VMA fields from
> the desc after the fields were already set from the map, though once we
> convert all callers to mmap_prepare we can look at this again.
> * Fixed bug with char mem driver not correctly setting MAP_PRIVATE
> /dev/zero anonymous (with vma->vm_file still set), use success hook
> instead.
> * Renamed mmap_prepare_zero to mmap_zero_prepare to be consistent with
> mmap_mem_prepare.
For those following along at home, here's the overall v3->v4 diff.
It's quite substantial...
--- a/arch/csky/include/asm/pgtable.h~b
+++ a/arch/csky/include/asm/pgtable.h
@@ -263,12 +263,6 @@ void update_mmu_cache_range(struct vm_fa
#define update_mmu_cache(vma, addr, ptep) \
update_mmu_cache_range(NULL, vma, addr, ptep, 1)
-#define io_remap_pfn_range(vma, vaddr, pfn, size, prot) \
- remap_pfn_range(vma, vaddr, pfn, size, prot)
-
-/* default io_remap_pfn_range_prepare can be used. */
-
-#define io_remap_pfn_range_complete(vma, addr, pfn, size, prot) \
- remap_pfn_range_complete(vma, addr, pfn, size, prot)
+#define io_remap_pfn_range_pfn(pfn, size) (pfn)
#endif /* __ASM_CSKY_PGTABLE_H */
--- a/arch/mips/alchemy/common/setup.c~b
+++ a/arch/mips/alchemy/common/setup.c
@@ -94,34 +94,13 @@ phys_addr_t fixup_bigphys_addr(phys_addr
return phys_addr;
}
-static unsigned long calc_pfn(unsigned long pfn, unsigned long size)
+static inline unsigned long io_remap_pfn_range_pfn(unsigned long pfn,
+ unsigned long size)
{
phys_addr_t phys_addr = fixup_bigphys_addr(pfn << PAGE_SHIFT, size);
return phys_addr >> PAGE_SHIFT;
}
-
-int io_remap_pfn_range(struct vm_area_struct *vma, unsigned long vaddr,
- unsigned long pfn, unsigned long size, pgprot_t prot)
-{
- return remap_pfn_range(vma, vaddr, calc_pfn(pfn, size), size, prot);
-}
-EXPORT_SYMBOL(io_remap_pfn_range);
-
-void io_remap_pfn_range_prepare(struct vm_area_desc *desc, unsigned long pfn,
- unsigned long size)
-{
- remap_pfn_range_prepare(desc, calc_pfn(pfn, size));
-}
-EXPORT_SYMBOL(io_remap_pfn_range_prepare);
-
-int io_remap_pfn_range_complete(struct vm_area_struct *vma,
- unsigned long addr, unsigned long pfn, unsigned long size,
- pgprot_t prot)
-{
- return remap_pfn_range_complete(vma, addr, calc_pfn(pfn, size),
- size, prot);
-}
-EXPORT_SYMBOL(io_remap_pfn_range_complete);
+EXPORT_SYMBOL(io_remap_pfn_range_pfn);
#endif /* CONFIG_MIPS_FIXUP_BIGPHYS_ADDR */
--- a/arch/mips/include/asm/pgtable.h~b
+++ a/arch/mips/include/asm/pgtable.h
@@ -604,19 +604,8 @@ static inline void update_mmu_cache_pmd(
*/
#ifdef CONFIG_MIPS_FIXUP_BIGPHYS_ADDR
phys_addr_t fixup_bigphys_addr(phys_addr_t addr, phys_addr_t size);
-int io_remap_pfn_range(struct vm_area_struct *vma, unsigned long vaddr,
- unsigned long pfn, unsigned long size, pgprot_t prot);
-#define io_remap_pfn_range io_remap_pfn_range
-
-void io_remap_pfn_range_prepare(struct vm_area_desc *desc, unsigned long pfn,
- unsigned long size);
-#define io_remap_pfn_range_prepare io_remap_pfn_range_prepare
-
-int io_remap_pfn_range_complete(struct vm_area_struct *vma,
- unsigned long addr, unsigned long pfn, unsigned long size,
- pgprot_t prot);
-#define io_remap_pfn_range_complete io_remap_pfn_range_complete
-
+unsigned long io_remap_pfn_range_pfn(unsigned long pfn, unsigned long size);
+#define io_remap_pfn_range_pfn io_remap_pfn_range_pfn
#else
#define fixup_bigphys_addr(addr, size) (addr)
#endif /* CONFIG_MIPS_FIXUP_BIGPHYS_ADDR */
--- a/arch/sparc/include/asm/pgtable_32.h~b
+++ a/arch/sparc/include/asm/pgtable_32.h
@@ -395,13 +395,8 @@ __get_iospace (unsigned long addr)
#define GET_IOSPACE(pfn) (pfn >> (BITS_PER_LONG - 4))
#define GET_PFN(pfn) (pfn & 0x0fffffffUL)
-int remap_pfn_range(struct vm_area_struct *, unsigned long, unsigned long,
- unsigned long, pgprot_t);
-void remap_pfn_range_prepare(struct vm_area_desc *desc, unsigned long pfn);
-int remap_pfn_range_complete(struct vm_area_struct *vma, unsigned long addr,
- unsigned long pfn, unsigned long size, pgprot_t pgprot);
-
-static inline unsigned long calc_io_remap_pfn(unsigned long pfn)
+static inline unsigned long io_remap_pfn_range_pfn(unsigned long pfn,
+ unsigned long size)
{
unsigned long long offset, space, phys_base;
@@ -411,30 +406,7 @@ static inline unsigned long calc_io_rema
return phys_base >> PAGE_SHIFT;
}
-
-static inline int io_remap_pfn_range(struct vm_area_struct *vma,
- unsigned long from, unsigned long pfn,
- unsigned long size, pgprot_t prot)
-{
- return remap_pfn_range(vma, from, calc_io_remap_pfn(pfn), size, prot);
-}
-#define io_remap_pfn_range io_remap_pfn_range
-
-static inline void io_remap_pfn_range_prepare(struct vm_area_desc *desc, unsigned long pfn,
- unsigned long size)
-{
- remap_pfn_range_prepare(desc, calc_io_remap_pfn(pfn));
-}
-#define io_remap_pfn_range_prepare io_remap_pfn_range_prepare
-
-static inline int io_remap_pfn_range_complete(struct vm_area_struct *vma,
- unsigned long addr, unsigned long pfn, unsigned long size,
- pgprot_t prot)
-{
- return remap_pfn_range_complete(vma, addr, calc_io_remap_pfn(pfn),
- size, prot);
-}
-#define io_remap_pfn_range_complete io_remap_pfn_range_complete
+#define io_remap_pfn_range_pfn io_remap_pfn_range_pfn
#define __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS
#define ptep_set_access_flags(__vma, __address, __ptep, __entry, __dirty) \
--- a/arch/sparc/include/asm/pgtable_64.h~b
+++ a/arch/sparc/include/asm/pgtable_64.h
@@ -1048,12 +1048,6 @@ int page_in_phys_avail(unsigned long pad
#define GET_IOSPACE(pfn) (pfn >> (BITS_PER_LONG - 4))
#define GET_PFN(pfn) (pfn & 0x0fffffffffffffffUL)
-int remap_pfn_range(struct vm_area_struct *, unsigned long, unsigned long,
- unsigned long, pgprot_t);
-void remap_pfn_range_prepare(struct vm_area_desc *desc, unsigned long pfn);
-int remap_pfn_range_complete(struct vm_area_struct *vma, unsigned long addr,
- unsigned long pfn, unsigned long size, pgprot_t pgprot);
-
void adi_restore_tags(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long addr, pte_t pte);
@@ -1087,7 +1081,8 @@ static inline int arch_unmap_one(struct
return 0;
}
-static inline unsigned long calc_io_remap_pfn(unsigned long pfn)
+static inline unsigned long io_remap_pfn_range_pfn(unsigned long pfn,
+ unsigned long size)
{
unsigned long offset = GET_PFN(pfn) << PAGE_SHIFT;
int space = GET_IOSPACE(pfn);
@@ -1097,30 +1092,7 @@ static inline unsigned long calc_io_rema
return phys_base >> PAGE_SHIFT;
}
-
-static inline int io_remap_pfn_range(struct vm_area_struct *vma,
- unsigned long from, unsigned long pfn,
- unsigned long size, pgprot_t prot)
-{
- return remap_pfn_range(vma, from, calc_io_remap_pfn(pfn), size, prot);
-}
-#define io_remap_pfn_range io_remap_pfn_range
-
-static inline void io_remap_pfn_range_prepare(struct vm_area_desc *desc, unsigned long pfn,
- unsigned long size)
-{
- return remap_pfn_range_prepare(desc, calc_io_remap_pfn(pfn));
-}
-#define io_remap_pfn_range_prepare io_remap_pfn_range_prepare
-
-static inline int io_remap_pfn_range_complete(struct vm_area_struct *vma,
- unsigned long addr, unsigned long pfn, unsigned long size,
- pgprot_t prot)
-{
- return remap_pfn_range_complete(vma, addr, calc_io_remap_pfn(pfn),
- size, prot);
-}
-#define io_remap_pfn_range_complete io_remap_pfn_range_complete
+#define io_remap_pfn_range_pfn io_remap_pfn_range_pfn
static inline unsigned long __untagged_addr(unsigned long start)
{
--- a/drivers/char/mem.c~b
+++ a/drivers/char/mem.c
@@ -504,18 +504,26 @@ static ssize_t read_zero(struct file *fi
return cleared;
}
-static int mmap_prepare_zero(struct vm_area_desc *desc)
+static int mmap_zero_private_success(const struct vm_area_struct *vma)
+{
+ /*
+ * This is a highly unique situation where we mark a MAP_PRIVATE mapping
+ * of /dev/zero anonymous, despite it not being.
+ */
+ vma_set_anonymous((struct vm_area_struct *)vma);
+
+ return 0;
+}
+
+static int mmap_zero_prepare(struct vm_area_desc *desc)
{
#ifndef CONFIG_MMU
return -ENOSYS;
#endif
if (desc->vm_flags & VM_SHARED)
return shmem_zero_setup_desc(desc);
- /*
- * This is a highly unique situation where we mark a MAP_PRIVATE mapping
- * of /dev/zero anonymous, despite it not being.
- */
- desc->vm_ops = NULL;
+
+ desc->action.success_hook = mmap_zero_private_success;
return 0;
}
@@ -533,7 +541,7 @@ static unsigned long get_unmapped_area_z
{
if (flags & MAP_SHARED) {
/*
- * mmap_prepare_zero() will call shmem_zero_setup() to create a
+ * mmap_zero_prepare() will call shmem_zero_setup() to create a
* file, so use shmem's get_unmapped_area in case it can be
* huge; and pass NULL for file as in mmap.c's
* get_unmapped_area(), so as not to confuse shmem with our
@@ -676,7 +684,7 @@ static const struct file_operations zero
.write_iter = write_iter_zero,
.splice_read = copy_splice_read,
.splice_write = splice_write_zero,
- .mmap_prepare = mmap_prepare_zero,
+ .mmap_prepare = mmap_zero_prepare,
.get_unmapped_area = get_unmapped_area_zero,
#ifndef CONFIG_MMU
.mmap_capabilities = zero_mmap_capabilities,
--- a/include/linux/fs.h~b
+++ a/include/linux/fs.h
@@ -2279,14 +2279,14 @@ static inline bool can_mmap_file(struct
return true;
}
-int __compat_vma_mmap_prepare(const struct file_operations *f_op,
+int __compat_vma_mmap(const struct file_operations *f_op,
struct file *file, struct vm_area_struct *vma);
-int compat_vma_mmap_prepare(struct file *file, struct vm_area_struct *vma);
+int compat_vma_mmap(struct file *file, struct vm_area_struct *vma);
static inline int vfs_mmap(struct file *file, struct vm_area_struct *vma)
{
if (file->f_op->mmap_prepare)
- return compat_vma_mmap_prepare(file, vma);
+ return compat_vma_mmap(file, vma);
return file->f_op->mmap(file, vma);
}
--- a/include/linux/mm.h~b
+++ a/include/linux/mm.h
@@ -3650,7 +3650,7 @@ static inline void mmap_action_ioremap(s
unsigned long size)
{
mmap_action_remap(desc, start, start_pfn, size);
- desc->action.remap.is_io_remap = true;
+ desc->action.type = MMAP_IO_REMAP_PFN;
}
/**
@@ -3713,9 +3713,6 @@ struct vm_area_struct *find_extend_vma_l
unsigned long addr);
int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr,
unsigned long pfn, unsigned long size, pgprot_t pgprot);
-void remap_pfn_range_prepare(struct vm_area_desc *desc, unsigned long pfn);
-int remap_pfn_range_complete(struct vm_area_struct *vma, unsigned long addr,
- unsigned long pfn, unsigned long size, pgprot_t pgprot);
int vm_insert_page(struct vm_area_struct *, unsigned long addr, struct page *);
int vm_insert_pages(struct vm_area_struct *vma, unsigned long addr,
@@ -3749,32 +3746,34 @@ static inline vm_fault_t vmf_insert_page
return VM_FAULT_NOPAGE;
}
-#ifndef io_remap_pfn_range
-static inline int io_remap_pfn_range(struct vm_area_struct *vma,
- unsigned long addr, unsigned long pfn,
- unsigned long size, pgprot_t prot)
+#ifdef io_remap_pfn_range_pfn
+static inline unsigned long io_remap_pfn_range_prot(pgprot_t prot)
{
- return remap_pfn_range(vma, addr, pfn, size, pgprot_decrypted(prot));
+ /* We do not decrypt if arch customises PFN. */
+ return prot;
+}
+#else
+static inline unsigned long io_remap_pfn_range_pfn(unsigned long pfn,
+ unsigned long size)
+{
+ return pfn;
}
-#endif
-#ifndef io_remap_pfn_range_prepare
-static inline void io_remap_pfn_range_prepare(struct vm_area_desc *desc, unsigned long pfn,
- unsigned long size)
+static inline pgprot_t io_remap_pfn_range_prot(pgprot_t prot)
{
- return remap_pfn_range_prepare(desc, pfn);
+ return pgprot_decrypted(prot);
}
#endif
-#ifndef io_remap_pfn_range_complete
-static inline int io_remap_pfn_range_complete(struct vm_area_struct *vma,
- unsigned long addr, unsigned long pfn, unsigned long size,
- pgprot_t prot)
+static inline int io_remap_pfn_range(struct vm_area_struct *vma,
+ unsigned long addr, unsigned long orig_pfn,
+ unsigned long size, pgprot_t orig_prot)
{
- return remap_pfn_range_complete(vma, addr, pfn, size,
- pgprot_decrypted(prot));
+ const unsigned long pfn = io_remap_pfn_range_pfn(orig_pfn, size);
+ const pgprot_t prot = io_remap_pfn_range_prot(orig_prot);
+
+ return remap_pfn_range(vma, addr, pfn, size, prot);
}
-#endif
static inline vm_fault_t vmf_error(int err)
{
--- a/include/linux/mm_types.h~b
+++ a/include/linux/mm_types.h
@@ -777,6 +777,7 @@ struct pfnmap_track_ctx {
enum mmap_action_type {
MMAP_NOTHING, /* Mapping is complete, no further action. */
MMAP_REMAP_PFN, /* Remap PFN range. */
+ MMAP_IO_REMAP_PFN, /* I/O remap PFN range. */
};
/*
@@ -791,7 +792,6 @@ struct mmap_action {
unsigned long start_pfn;
unsigned long size;
pgprot_t pgprot;
- bool is_io_remap;
} remap;
};
enum mmap_action_type type;
--- a/mm/internal.h~b
+++ a/mm/internal.h
@@ -1653,4 +1653,26 @@ static inline bool reclaim_pt_is_enabled
void dup_mm_exe_file(struct mm_struct *mm, struct mm_struct *oldmm);
int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm);
+void remap_pfn_range_prepare(struct vm_area_desc *desc, unsigned long pfn);
+int remap_pfn_range_complete(struct vm_area_struct *vma, unsigned long addr,
+ unsigned long pfn, unsigned long size, pgprot_t pgprot);
+
+static inline void io_remap_pfn_range_prepare(struct vm_area_desc *desc,
+ unsigned long orig_pfn, unsigned long size)
+{
+ const unsigned long pfn = io_remap_pfn_range_pfn(orig_pfn, size);
+
+ return remap_pfn_range_prepare(desc, pfn);
+}
+
+static inline int io_remap_pfn_range_complete(struct vm_area_struct *vma,
+ unsigned long addr, unsigned long orig_pfn, unsigned long size,
+ pgprot_t orig_prot)
+{
+ const unsigned long pfn = io_remap_pfn_range_pfn(orig_pfn, size);
+ const pgprot_t prot = io_remap_pfn_range_prot(orig_prot);
+
+ return remap_pfn_range_complete(vma, addr, pfn, size, prot);
+}
+
#endif /* __MM_INTERNAL_H */
--- a/mm/memory.c~b
+++ a/mm/memory.c
@@ -2919,7 +2919,7 @@ static int get_remap_pgoff(vm_flags_t vm
}
static int remap_pfn_range_internal(struct vm_area_struct *vma, unsigned long addr,
- unsigned long pfn, unsigned long size, pgprot_t prot, bool set_vma)
+ unsigned long pfn, unsigned long size, pgprot_t prot)
{
pgd_t *pgd;
unsigned long next;
@@ -2930,16 +2930,7 @@ static int remap_pfn_range_internal(stru
if (WARN_ON_ONCE(!PAGE_ALIGNED(addr)))
return -EINVAL;
- if (set_vma) {
- err = get_remap_pgoff(vma->vm_flags, addr, end,
- vma->vm_start, vma->vm_end,
- pfn, &vma->vm_pgoff);
- if (err)
- return err;
- vm_flags_set(vma, VM_REMAP_FLAGS);
- } else {
- VM_WARN_ON_ONCE((vma->vm_flags & VM_REMAP_FLAGS) != VM_REMAP_FLAGS);
- }
+ VM_WARN_ON_ONCE((vma->vm_flags & VM_REMAP_FLAGS) != VM_REMAP_FLAGS);
BUG_ON(addr >= end);
pfn -= addr >> PAGE_SHIFT;
@@ -2961,9 +2952,9 @@ static int remap_pfn_range_internal(stru
* must have pre-validated the caching bits of the pgprot_t.
*/
static int remap_pfn_range_notrack(struct vm_area_struct *vma, unsigned long addr,
- unsigned long pfn, unsigned long size, pgprot_t prot, bool set_vma)
+ unsigned long pfn, unsigned long size, pgprot_t prot)
{
- int error = remap_pfn_range_internal(vma, addr, pfn, size, prot, set_vma);
+ int error = remap_pfn_range_internal(vma, addr, pfn, size, prot);
if (!error)
return 0;
@@ -2976,18 +2967,6 @@ static int remap_pfn_range_notrack(struc
return error;
}
-void remap_pfn_range_prepare(struct vm_area_desc *desc, unsigned long pfn)
-{
- /*
- * We set addr=VMA start, end=VMA end here, so this won't fail, but we
- * check it again on complete and will fail there if specified addr is
- * invalid.
- */
- get_remap_pgoff(desc->vm_flags, desc->start, desc->end,
- desc->start, desc->end, pfn, &desc->pgoff);
- desc->vm_flags |= VM_REMAP_FLAGS;
-}
-
#ifdef __HAVE_PFNMAP_TRACKING
static inline struct pfnmap_track_ctx *pfnmap_track_ctx_alloc(unsigned long pfn,
unsigned long size, pgprot_t *prot)
@@ -3018,7 +2997,7 @@ void pfnmap_track_ctx_release(struct kre
}
static int remap_pfn_range_track(struct vm_area_struct *vma, unsigned long addr,
- unsigned long pfn, unsigned long size, pgprot_t prot, bool set_vma)
+ unsigned long pfn, unsigned long size, pgprot_t prot)
{
struct pfnmap_track_ctx *ctx = NULL;
int err;
@@ -3044,7 +3023,7 @@ static int remap_pfn_range_track(struct
return -EINVAL;
}
- err = remap_pfn_range_notrack(vma, addr, pfn, size, prot, set_vma);
+ err = remap_pfn_range_notrack(vma, addr, pfn, size, prot);
if (ctx) {
if (err)
kref_put(&ctx->kref, pfnmap_track_ctx_release);
@@ -3054,6 +3033,47 @@ static int remap_pfn_range_track(struct
return err;
}
+static int do_remap_pfn_range(struct vm_area_struct *vma, unsigned long addr,
+ unsigned long pfn, unsigned long size, pgprot_t prot)
+{
+ return remap_pfn_range_track(vma, addr, pfn, size, prot);
+}
+#else
+static int do_remap_pfn_range(struct vm_area_struct *vma, unsigned long addr,
+ unsigned long pfn, unsigned long size, pgprot_t prot)
+{
+ return remap_pfn_range_notrack(vma, addr, pfn, size, prot);
+}
+#endif
+
+void remap_pfn_range_prepare(struct vm_area_desc *desc, unsigned long pfn)
+{
+ /*
+ * We set addr=VMA start, end=VMA end here, so this won't fail, but we
+ * check it again on complete and will fail there if specified addr is
+ * invalid.
+ */
+ get_remap_pgoff(desc->vm_flags, desc->start, desc->end,
+ desc->start, desc->end, pfn, &desc->pgoff);
+ desc->vm_flags |= VM_REMAP_FLAGS;
+}
+
+static int remap_pfn_range_prepare_vma(struct vm_area_struct *vma, unsigned long addr,
+ unsigned long pfn, unsigned long size)
+{
+ unsigned long end = addr + PAGE_ALIGN(size);
+ int err;
+
+ err = get_remap_pgoff(vma->vm_flags, addr, end,
+ vma->vm_start, vma->vm_end,
+ pfn, &vma->vm_pgoff);
+ if (err)
+ return err;
+
+ vm_flags_set(vma, VM_REMAP_FLAGS);
+ return 0;
+}
+
/**
* remap_pfn_range - remap kernel memory to userspace
* @vma: user vma to map to
@@ -3069,32 +3089,21 @@ static int remap_pfn_range_track(struct
int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr,
unsigned long pfn, unsigned long size, pgprot_t prot)
{
- return remap_pfn_range_track(vma, addr, pfn, size, prot,
- /* set_vma = */true);
-}
+ int err;
-int remap_pfn_range_complete(struct vm_area_struct *vma, unsigned long addr,
- unsigned long pfn, unsigned long size, pgprot_t prot)
-{
- /* With set_vma = false, the VMA will not be modified. */
- return remap_pfn_range_track(vma, addr, pfn, size, prot,
- /* set_vma = */false);
-}
-#else
-int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr,
- unsigned long pfn, unsigned long size, pgprot_t prot)
-{
- return remap_pfn_range_notrack(vma, addr, pfn, size, prot, /* set_vma = */true);
+ err = remap_pfn_range_prepare_vma(vma, addr, pfn, size);
+ if (err)
+ return err;
+
+ return do_remap_pfn_range(vma, addr, pfn, size, prot);
}
+EXPORT_SYMBOL(remap_pfn_range);
int remap_pfn_range_complete(struct vm_area_struct *vma, unsigned long addr,
- unsigned long pfn, unsigned long size, pgprot_t prot)
+ unsigned long pfn, unsigned long size, pgprot_t prot)
{
- return remap_pfn_range_notrack(vma, addr, pfn, size, prot,
- /* set_vma = */false);
+ return do_remap_pfn_range(vma, addr, pfn, size, prot);
}
-#endif
-EXPORT_SYMBOL(remap_pfn_range);
/**
* vm_iomap_memory - remap memory to userspace
--- a/mm/shmem.c~b
+++ a/mm/shmem.c
@@ -5908,6 +5908,7 @@ static struct file *__shmem_zero_setup(u
/**
* shmem_zero_setup - setup a shared anonymous mapping
* @vma: the vma to be mmapped is prepared by do_mmap
+ * Returns: 0 on success, or error
*/
int shmem_zero_setup(struct vm_area_struct *vma)
{
--- a/mm/util.c~b
+++ a/mm/util.c
@@ -1134,7 +1134,7 @@ EXPORT_SYMBOL(flush_dcache_folio);
#endif
/**
- * __compat_vma_mmap_prepare() - See description for compat_vma_mmap_prepare()
+ * __compat_vma_mmap() - See description for compat_vma_mmap()
* for details. This is the same operation, only with a specific file operations
* struct which may or may not be the same as vma->vm_file->f_op.
* @f_op: The file operations whose .mmap_prepare() hook is specified.
@@ -1142,7 +1142,7 @@ EXPORT_SYMBOL(flush_dcache_folio);
* @vma: The VMA to apply the .mmap_prepare() hook to.
* Returns: 0 on success or error.
*/
-int __compat_vma_mmap_prepare(const struct file_operations *f_op,
+int __compat_vma_mmap(const struct file_operations *f_op,
struct file *file, struct vm_area_struct *vma)
{
struct vm_area_desc desc = {
@@ -1168,11 +1168,11 @@ int __compat_vma_mmap_prepare(const stru
set_vma_from_desc(vma, &desc);
return mmap_action_complete(&desc.action, vma);
}
-EXPORT_SYMBOL(__compat_vma_mmap_prepare);
+EXPORT_SYMBOL(__compat_vma_mmap);
/**
- * compat_vma_mmap_prepare() - Apply the file's .mmap_prepare() hook to an
- * existing VMA.
+ * compat_vma_mmap() - Apply the file's .mmap_prepare() hook to an
+ * existing VMA and execute any requested actions.
* @file: The file which possesss an f_op->mmap_prepare() hook.
* @vma: The VMA to apply the .mmap_prepare() hook to.
*
@@ -1187,7 +1187,7 @@ EXPORT_SYMBOL(__compat_vma_mmap_prepare)
* .mmap_prepare() hook, as we are in a different context when we invoke the
* .mmap() hook, already having a VMA to deal with.
*
- * compat_vma_mmap_prepare() is a compatibility function that takes VMA state,
+ * compat_vma_mmap() is a compatibility function that takes VMA state,
* establishes a struct vm_area_desc descriptor, passes to the underlying
* .mmap_prepare() hook and applies any changes performed by it.
*
@@ -1196,11 +1196,11 @@ EXPORT_SYMBOL(__compat_vma_mmap_prepare)
*
* Returns: 0 on success or error.
*/
-int compat_vma_mmap_prepare(struct file *file, struct vm_area_struct *vma)
+int compat_vma_mmap(struct file *file, struct vm_area_struct *vma)
{
- return __compat_vma_mmap_prepare(file->f_op, file, vma);
+ return __compat_vma_mmap(file->f_op, file, vma);
}
-EXPORT_SYMBOL(compat_vma_mmap_prepare);
+EXPORT_SYMBOL(compat_vma_mmap);
static void set_ps_flags(struct page_snapshot *ps, const struct folio *folio,
const struct page *page)
@@ -1282,6 +1282,35 @@ again:
}
}
+static int mmap_action_finish(struct mmap_action *action,
+ const struct vm_area_struct *vma, int err)
+{
+ /*
+ * If an error occurs, unmap the VMA altogether and return an error. We
+ * only clear the newly allocated VMA, since this function is only
+ * invoked if we do NOT merge, so we only clean up the VMA we created.
+ */
+ if (err) {
+ const size_t len = vma_pages(vma) << PAGE_SHIFT;
+
+ do_munmap(current->mm, vma->vm_start, len, NULL);
+
+ if (action->error_hook) {
+ /* We may want to filter the error. */
+ err = action->error_hook(err);
+
+ /* The caller should not clear the error. */
+ VM_WARN_ON_ONCE(!err);
+ }
+ return err;
+ }
+
+ if (action->success_hook)
+ return action->success_hook(vma);
+
+ return 0;
+}
+
#ifdef CONFIG_MMU
/**
* mmap_action_prepare - Perform preparatory setup for an VMA descriptor
@@ -1296,11 +1325,11 @@ void mmap_action_prepare(struct mmap_act
case MMAP_NOTHING:
break;
case MMAP_REMAP_PFN:
- if (action->remap.is_io_remap)
- io_remap_pfn_range_prepare(desc, action->remap.start_pfn,
- action->remap.size);
- else
- remap_pfn_range_prepare(desc, action->remap.start_pfn);
+ remap_pfn_range_prepare(desc, action->remap.start_pfn);
+ break;
+ case MMAP_IO_REMAP_PFN:
+ io_remap_pfn_range_prepare(desc, action->remap.start_pfn,
+ action->remap.size);
break;
}
}
@@ -1324,44 +1353,18 @@ int mmap_action_complete(struct mmap_act
case MMAP_NOTHING:
break;
case MMAP_REMAP_PFN:
- VM_WARN_ON_ONCE((vma->vm_flags & VM_REMAP_FLAGS) !=
- VM_REMAP_FLAGS);
-
- if (action->remap.is_io_remap)
- err = io_remap_pfn_range_complete(vma, action->remap.start,
+ err = remap_pfn_range_complete(vma, action->remap.start,
action->remap.start_pfn, action->remap.size,
action->remap.pgprot);
- else
- err = remap_pfn_range_complete(vma, action->remap.start,
+ break;
+ case MMAP_IO_REMAP_PFN:
+ err = io_remap_pfn_range_complete(vma, action->remap.start,
action->remap.start_pfn, action->remap.size,
action->remap.pgprot);
break;
}
- /*
- * If an error occurs, unmap the VMA altogether and return an error. We
- * only clear the newly allocated VMA, since this function is only
- * invoked if we do NOT merge, so we only clean up the VMA we created.
- */
- if (err) {
- const size_t len = vma_pages(vma) << PAGE_SHIFT;
-
- do_munmap(current->mm, vma->vm_start, len, NULL);
-
- if (action->error_hook) {
- /* We may want to filter the error. */
- err = action->error_hook(err);
-
- /* The caller should not clear the error. */
- VM_WARN_ON_ONCE(!err);
- }
- return err;
- }
-
- if (action->success_hook)
- err = action->success_hook(vma);
-
- return err;
+ return mmap_action_finish(action, vma, err);
}
EXPORT_SYMBOL(mmap_action_complete);
#else
@@ -1372,6 +1375,7 @@ void mmap_action_prepare(struct mmap_act
case MMAP_NOTHING:
break;
case MMAP_REMAP_PFN:
+ case MMAP_IO_REMAP_PFN:
WARN_ON_ONCE(1); /* nommu cannot handle these. */
break;
}
@@ -1381,41 +1385,17 @@ EXPORT_SYMBOL(mmap_action_prepare);
int mmap_action_complete(struct mmap_action *action,
struct vm_area_struct *vma)
{
- int err = 0;
-
switch (action->type) {
case MMAP_NOTHING:
break;
case MMAP_REMAP_PFN:
+ case MMAP_IO_REMAP_PFN:
WARN_ON_ONCE(1); /* nommu cannot handle this. */
break;
}
- /*
- * If an error occurs, unmap the VMA altogether and return an error. We
- * only clear the newly allocated VMA, since this function is only
- * invoked if we do NOT merge, so we only clean up the VMA we created.
- */
- if (err) {
- const size_t len = vma_pages(vma) << PAGE_SHIFT;
-
- do_munmap(current->mm, vma->vm_start, len, NULL);
-
- if (action->error_hook) {
- /* We may want to filter the error. */
- err = action->error_hook(err);
-
- /* The caller should not clear the error. */
- VM_WARN_ON_ONCE(!err);
- }
- return err;
- }
-
- if (action->success_hook)
- err = action->success_hook(vma);
-
- return 0;
+ return mmap_action_finish(action, vma, /* err = */0);
}
EXPORT_SYMBOL(mmap_action_complete);
#endif
--- a/tools/testing/vma/vma_internal.h~b
+++ a/tools/testing/vma/vma_internal.h
@@ -293,7 +293,6 @@ struct mmap_action {
unsigned long start_pfn;
unsigned long size;
pgprot_t pgprot;
- bool is_io_remap;
} remap;
};
enum mmap_action_type type;
@@ -1524,7 +1523,7 @@ static inline int mmap_action_complete(s
return 0;
}
-static inline int __compat_vma_mmap_prepare(const struct file_operations *f_op,
+static inline int __compat_vma_mmap(const struct file_operations *f_op,
struct file *file, struct vm_area_struct *vma)
{
struct vm_area_desc desc = {
@@ -1551,10 +1550,10 @@ static inline int __compat_vma_mmap_prep
return mmap_action_complete(&desc.action, vma);
}
-static inline int compat_vma_mmap_prepare(struct file *file,
+static inline int compat_vma_mmap(struct file *file,
struct vm_area_struct *vma)
{
- return __compat_vma_mmap_prepare(file->f_op, file, vma);
+ return __compat_vma_mmap(file->f_op, file, vma);
}
/* Did the driver provide valid mmap hook configuration? */
@@ -1575,7 +1574,7 @@ static inline bool can_mmap_file(struct
static inline int vfs_mmap(struct file *file, struct vm_area_struct *vma)
{
if (file->f_op->mmap_prepare)
- return compat_vma_mmap_prepare(file, vma);
+ return compat_vma_mmap(file, vma);
return file->f_op->mmap(file, vma);
}
_
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v4 07/14] mm: abstract io_remap_pfn_range() based on PFN
2025-09-17 19:11 ` [PATCH v4 07/14] mm: abstract io_remap_pfn_range() based on PFN Lorenzo Stoakes
@ 2025-09-17 21:19 ` Jason Gunthorpe
2025-09-18 6:26 ` Lorenzo Stoakes
2025-09-18 9:11 ` Lorenzo Stoakes
1 sibling, 1 reply; 29+ messages in thread
From: Jason Gunthorpe @ 2025-09-17 21:19 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Andrew Morton, Jonathan Corbet, Matthew Wilcox, Guo Ren,
Thomas Bogendoerfer, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
David S . Miller, Andreas Larsson, Arnd Bergmann,
Greg Kroah-Hartman, Dan Williams, Vishal Verma, Dave Jiang,
Nicolas Pitre, Muchun Song, Oscar Salvador, David Hildenbrand,
Konstantin Komarov, Baoquan He, Vivek Goyal, Dave Young,
Tony Luck, Reinette Chatre, Dave Martin, James Morse,
Alexander Viro, Christian Brauner, Jan Kara, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Hugh Dickins, Baolin Wang, Uladzislau Rezki, Dmitry Vyukov,
Andrey Konovalov, Jann Horn, Pedro Falcato, linux-doc,
linux-kernel, linux-fsdevel, linux-csky, linux-mips, linux-s390,
sparclinux, nvdimm, linux-cxl, linux-mm, ntfs3, kexec, kasan-dev,
iommu, Kevin Tian, Will Deacon, Robin Murphy
On Wed, Sep 17, 2025 at 08:11:09PM +0100, Lorenzo Stoakes wrote:
> -#define io_remap_pfn_range(vma, vaddr, pfn, size, prot) \
> - remap_pfn_range(vma, vaddr, pfn, size, prot)
> +#define io_remap_pfn_range_pfn(pfn, size) (pfn)
??
Just delete it? Looks like cargo cult cruft, see below about
pgprot_decrypted().
> +#ifdef io_remap_pfn_range_pfn
> +static inline unsigned long io_remap_pfn_range_prot(pgprot_t prot)
> +{
> + /* We do not decrypt if arch customises PFN. */
> + return prot;
pgprot_decrypted() is a NOP on all the arches that use this override,
please drop this.
Soon future work will require something more complicated to compute if
pgprot_decrypted() should be called so this unused stuff isn't going
to hold up.
Otherwise looks good to me
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Jason
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v4 06/14] mm: add remap_pfn_range_prepare(), remap_pfn_range_complete()
2025-09-17 19:11 ` [PATCH v4 06/14] mm: add remap_pfn_range_prepare(), remap_pfn_range_complete() Lorenzo Stoakes
@ 2025-09-17 21:32 ` Jason Gunthorpe
2025-09-18 6:09 ` Lorenzo Stoakes
0 siblings, 1 reply; 29+ messages in thread
From: Jason Gunthorpe @ 2025-09-17 21:32 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Andrew Morton, Jonathan Corbet, Matthew Wilcox, Guo Ren,
Thomas Bogendoerfer, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
David S . Miller, Andreas Larsson, Arnd Bergmann,
Greg Kroah-Hartman, Dan Williams, Vishal Verma, Dave Jiang,
Nicolas Pitre, Muchun Song, Oscar Salvador, David Hildenbrand,
Konstantin Komarov, Baoquan He, Vivek Goyal, Dave Young,
Tony Luck, Reinette Chatre, Dave Martin, James Morse,
Alexander Viro, Christian Brauner, Jan Kara, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Hugh Dickins, Baolin Wang, Uladzislau Rezki, Dmitry Vyukov,
Andrey Konovalov, Jann Horn, Pedro Falcato, linux-doc,
linux-kernel, linux-fsdevel, linux-csky, linux-mips, linux-s390,
sparclinux, nvdimm, linux-cxl, linux-mm, ntfs3, kexec, kasan-dev,
iommu, Kevin Tian, Will Deacon, Robin Murphy
On Wed, Sep 17, 2025 at 08:11:08PM +0100, Lorenzo Stoakes wrote:
> -int remap_pfn_range_notrack(struct vm_area_struct *vma, unsigned long addr,
> +static int remap_pfn_range_notrack(struct vm_area_struct *vma, unsigned long addr,
> unsigned long pfn, unsigned long size, pgprot_t prot)
> {
> int error = remap_pfn_range_internal(vma, addr, pfn, size, prot);
> -
> if (!error)
> return 0;
Stray edit
Jason
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v4 09/14] mm: add ability to take further action in vm_area_desc
2025-09-17 19:11 ` [PATCH v4 09/14] mm: add ability to take further action in vm_area_desc Lorenzo Stoakes
@ 2025-09-17 21:37 ` Jason Gunthorpe
2025-09-18 6:09 ` Lorenzo Stoakes
2025-09-18 9:14 ` Lorenzo Stoakes
1 sibling, 1 reply; 29+ messages in thread
From: Jason Gunthorpe @ 2025-09-17 21:37 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Andrew Morton, Jonathan Corbet, Matthew Wilcox, Guo Ren,
Thomas Bogendoerfer, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
David S . Miller, Andreas Larsson, Arnd Bergmann,
Greg Kroah-Hartman, Dan Williams, Vishal Verma, Dave Jiang,
Nicolas Pitre, Muchun Song, Oscar Salvador, David Hildenbrand,
Konstantin Komarov, Baoquan He, Vivek Goyal, Dave Young,
Tony Luck, Reinette Chatre, Dave Martin, James Morse,
Alexander Viro, Christian Brauner, Jan Kara, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Hugh Dickins, Baolin Wang, Uladzislau Rezki, Dmitry Vyukov,
Andrey Konovalov, Jann Horn, Pedro Falcato, linux-doc,
linux-kernel, linux-fsdevel, linux-csky, linux-mips, linux-s390,
sparclinux, nvdimm, linux-cxl, linux-mm, ntfs3, kexec, kasan-dev,
iommu, Kevin Tian, Will Deacon, Robin Murphy
On Wed, Sep 17, 2025 at 08:11:11PM +0100, Lorenzo Stoakes wrote:
> +static int mmap_action_finish(struct mmap_action *action,
> + const struct vm_area_struct *vma, int err)
> +{
> + /*
> + * If an error occurs, unmap the VMA altogether and return an error. We
> + * only clear the newly allocated VMA, since this function is only
> + * invoked if we do NOT merge, so we only clean up the VMA we created.
> + */
> + if (err) {
> + const size_t len = vma_pages(vma) << PAGE_SHIFT;
> +
> + do_munmap(current->mm, vma->vm_start, len, NULL);
> +
> + if (action->error_hook) {
> + /* We may want to filter the error. */
> + err = action->error_hook(err);
> +
> + /* The caller should not clear the error. */
> + VM_WARN_ON_ONCE(!err);
> + }
> + return err;
> + }
> +
> + if (action->success_hook)
> + return action->success_hook(vma);
I thought you were going to use a single hook function as was
suggested?
return action->finish_hook(vma, err);
> +int mmap_action_complete(struct mmap_action *action,
> + struct vm_area_struct *vma)
> +{
> + switch (action->type) {
> + case MMAP_NOTHING:
> + break;
> + case MMAP_REMAP_PFN:
> + case MMAP_IO_REMAP_PFN:
> + WARN_ON_ONCE(1); /* nommu cannot handle this. */
This should be:
if (WARN_ON_ONCE(true))
err = -EINVAL
To abort the thing and try to recover.
> diff --git a/tools/testing/vma/vma_internal.h b/tools/testing/vma/vma_internal.h
> index 07167446dcf4..22ed38e8714e 100644
> --- a/tools/testing/vma/vma_internal.h
> +++ b/tools/testing/vma/vma_internal.h
> @@ -274,6 +274,49 @@ struct mm_struct {
>
> struct vm_area_struct;
>
> +
> +/* What action should be taken after an .mmap_prepare call is complete? */
> +enum mmap_action_type {
> + MMAP_NOTHING, /* Mapping is complete, no further action. */
> + MMAP_REMAP_PFN, /* Remap PFN range. */
> +};
> +
> +/*
> + * Describes an action an mmap_prepare hook can instruct to be taken to complete
> + * the mapping of a VMA. Specified in vm_area_desc.
> + */
> +struct mmap_action {
> + union {
> + /* Remap range. */
> + struct {
> + unsigned long start;
> + unsigned long start_pfn;
> + unsigned long size;
> + pgprot_t pgprot;
> + } remap;
> + };
> + enum mmap_action_type type;
> +
> + /*
> + * If specified, this hook is invoked after the selected action has been
> + * successfully completed. Note that the VMA write lock still held.
> + *
> + * The absolute minimum ought to be done here.
> + *
> + * Returns 0 on success, or an error code.
> + */
> + int (*success_hook)(const struct vm_area_struct *vma);
> +
> + /*
> + * If specified, this hook is invoked when an error occurred when
> + * attempting the selection action.
> + *
> + * The hook can return an error code in order to filter the error, but
> + * it is not valid to clear the error here.
> + */
> + int (*error_hook)(int err);
> +};
I didn't try to understand what vma_internal.h is for, but should this
block be an exact copy of the normal one? ie MMAP_IO_REMAP_PFN is missing?
Jason
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v4 12/14] mm: add shmem_zero_setup_desc()
2025-09-17 19:11 ` [PATCH v4 12/14] mm: add shmem_zero_setup_desc() Lorenzo Stoakes
@ 2025-09-17 21:38 ` Jason Gunthorpe
0 siblings, 0 replies; 29+ messages in thread
From: Jason Gunthorpe @ 2025-09-17 21:38 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Andrew Morton, Jonathan Corbet, Matthew Wilcox, Guo Ren,
Thomas Bogendoerfer, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
David S . Miller, Andreas Larsson, Arnd Bergmann,
Greg Kroah-Hartman, Dan Williams, Vishal Verma, Dave Jiang,
Nicolas Pitre, Muchun Song, Oscar Salvador, David Hildenbrand,
Konstantin Komarov, Baoquan He, Vivek Goyal, Dave Young,
Tony Luck, Reinette Chatre, Dave Martin, James Morse,
Alexander Viro, Christian Brauner, Jan Kara, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Hugh Dickins, Baolin Wang, Uladzislau Rezki, Dmitry Vyukov,
Andrey Konovalov, Jann Horn, Pedro Falcato, linux-doc,
linux-kernel, linux-fsdevel, linux-csky, linux-mips, linux-s390,
sparclinux, nvdimm, linux-cxl, linux-mm, ntfs3, kexec, kasan-dev,
iommu, Kevin Tian, Will Deacon, Robin Murphy
On Wed, Sep 17, 2025 at 08:11:14PM +0100, Lorenzo Stoakes wrote:
> Add the ability to set up a shared anonymous mapping based on a VMA
> descriptor rather than a VMA.
>
> This is a prerequisite for converting to the char mm driver to use the
> mmap_prepare hook.
>
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
> include/linux/shmem_fs.h | 3 ++-
> mm/shmem.c | 41 ++++++++++++++++++++++++++++++++--------
> 2 files changed, 35 insertions(+), 9 deletions(-)
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Jason
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v4 09/14] mm: add ability to take further action in vm_area_desc
2025-09-17 21:37 ` Jason Gunthorpe
@ 2025-09-18 6:09 ` Lorenzo Stoakes
0 siblings, 0 replies; 29+ messages in thread
From: Lorenzo Stoakes @ 2025-09-18 6:09 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Andrew Morton, Jonathan Corbet, Matthew Wilcox, Guo Ren,
Thomas Bogendoerfer, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
David S . Miller, Andreas Larsson, Arnd Bergmann,
Greg Kroah-Hartman, Dan Williams, Vishal Verma, Dave Jiang,
Nicolas Pitre, Muchun Song, Oscar Salvador, David Hildenbrand,
Konstantin Komarov, Baoquan He, Vivek Goyal, Dave Young,
Tony Luck, Reinette Chatre, Dave Martin, James Morse,
Alexander Viro, Christian Brauner, Jan Kara, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Hugh Dickins, Baolin Wang, Uladzislau Rezki, Dmitry Vyukov,
Andrey Konovalov, Jann Horn, Pedro Falcato, linux-doc,
linux-kernel, linux-fsdevel, linux-csky, linux-mips, linux-s390,
sparclinux, nvdimm, linux-cxl, linux-mm, ntfs3, kexec, kasan-dev,
iommu, Kevin Tian, Will Deacon, Robin Murphy
On Wed, Sep 17, 2025 at 06:37:37PM -0300, Jason Gunthorpe wrote:
> On Wed, Sep 17, 2025 at 08:11:11PM +0100, Lorenzo Stoakes wrote:
> > +static int mmap_action_finish(struct mmap_action *action,
> > + const struct vm_area_struct *vma, int err)
> > +{
> > + /*
> > + * If an error occurs, unmap the VMA altogether and return an error. We
> > + * only clear the newly allocated VMA, since this function is only
> > + * invoked if we do NOT merge, so we only clean up the VMA we created.
> > + */
> > + if (err) {
> > + const size_t len = vma_pages(vma) << PAGE_SHIFT;
> > +
> > + do_munmap(current->mm, vma->vm_start, len, NULL);
> > +
> > + if (action->error_hook) {
> > + /* We may want to filter the error. */
> > + err = action->error_hook(err);
> > +
> > + /* The caller should not clear the error. */
> > + VM_WARN_ON_ONCE(!err);
> > + }
> > + return err;
> > + }
> > +
> > + if (action->success_hook)
> > + return action->success_hook(vma);
>
> I thought you were going to use a single hook function as was
> suggested?
>
> return action->finish_hook(vma, err);
Err, no? I said no to this suggestion from Pedro? I don't like it.
In practice I've found callers need to EITHER do something on success or
filter errors. I think it's more expressive this way.
I also think you make it more likely that a driver will get things wrong if
they intend only to do something on success and you have an 'err'
parameter.
>
> > +int mmap_action_complete(struct mmap_action *action,
> > + struct vm_area_struct *vma)
> > +{
> > + switch (action->type) {
> > + case MMAP_NOTHING:
> > + break;
> > + case MMAP_REMAP_PFN:
> > + case MMAP_IO_REMAP_PFN:
> > + WARN_ON_ONCE(1); /* nommu cannot handle this. */
>
> This should be:
>
> if (WARN_ON_ONCE(true))
> err = -EINVAL
>
> To abort the thing and try to recover.
'Try to recover'... how exactly...
It'd be a serious programmatic kernel bug so I'm not sure going out of our way
to error out here is brilliantly valuable. You might even mask the bug this way,
because the mmap() will just fail instad of nuking the process on fault.
But fine, let me send a fix-patch...
>
> > diff --git a/tools/testing/vma/vma_internal.h b/tools/testing/vma/vma_internal.h
> > index 07167446dcf4..22ed38e8714e 100644
> > --- a/tools/testing/vma/vma_internal.h
> > +++ b/tools/testing/vma/vma_internal.h
> > @@ -274,6 +274,49 @@ struct mm_struct {
> >
> > struct vm_area_struct;
> >
> > +
> > +/* What action should be taken after an .mmap_prepare call is complete? */
> > +enum mmap_action_type {
> > + MMAP_NOTHING, /* Mapping is complete, no further action. */
> > + MMAP_REMAP_PFN, /* Remap PFN range. */
> > +};
> > +
> > +/*
> > + * Describes an action an mmap_prepare hook can instruct to be taken to complete
> > + * the mapping of a VMA. Specified in vm_area_desc.
> > + */
> > +struct mmap_action {
> > + union {
> > + /* Remap range. */
> > + struct {
> > + unsigned long start;
> > + unsigned long start_pfn;
> > + unsigned long size;
> > + pgprot_t pgprot;
> > + } remap;
> > + };
> > + enum mmap_action_type type;
> > +
> > + /*
> > + * If specified, this hook is invoked after the selected action has been
> > + * successfully completed. Note that the VMA write lock still held.
> > + *
> > + * The absolute minimum ought to be done here.
> > + *
> > + * Returns 0 on success, or an error code.
> > + */
> > + int (*success_hook)(const struct vm_area_struct *vma);
> > +
> > + /*
> > + * If specified, this hook is invoked when an error occurred when
> > + * attempting the selection action.
> > + *
> > + * The hook can return an error code in order to filter the error, but
> > + * it is not valid to clear the error here.
> > + */
> > + int (*error_hook)(int err);
> > +};
>
> I didn't try to understand what vma_internal.h is for, but should this
> block be an exact copy of the normal one? ie MMAP_IO_REMAP_PFN is missing?
Right. Of course. I'll include that in the fix-patch...
>
> Jason
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v4 06/14] mm: add remap_pfn_range_prepare(), remap_pfn_range_complete()
2025-09-17 21:32 ` Jason Gunthorpe
@ 2025-09-18 6:09 ` Lorenzo Stoakes
0 siblings, 0 replies; 29+ messages in thread
From: Lorenzo Stoakes @ 2025-09-18 6:09 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Andrew Morton, Jonathan Corbet, Matthew Wilcox, Guo Ren,
Thomas Bogendoerfer, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
David S . Miller, Andreas Larsson, Arnd Bergmann,
Greg Kroah-Hartman, Dan Williams, Vishal Verma, Dave Jiang,
Nicolas Pitre, Muchun Song, Oscar Salvador, David Hildenbrand,
Konstantin Komarov, Baoquan He, Vivek Goyal, Dave Young,
Tony Luck, Reinette Chatre, Dave Martin, James Morse,
Alexander Viro, Christian Brauner, Jan Kara, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Hugh Dickins, Baolin Wang, Uladzislau Rezki, Dmitry Vyukov,
Andrey Konovalov, Jann Horn, Pedro Falcato, linux-doc,
linux-kernel, linux-fsdevel, linux-csky, linux-mips, linux-s390,
sparclinux, nvdimm, linux-cxl, linux-mm, ntfs3, kexec, kasan-dev,
iommu, Kevin Tian, Will Deacon, Robin Murphy
On Wed, Sep 17, 2025 at 06:32:09PM -0300, Jason Gunthorpe wrote:
> On Wed, Sep 17, 2025 at 08:11:08PM +0100, Lorenzo Stoakes wrote:
> > -int remap_pfn_range_notrack(struct vm_area_struct *vma, unsigned long addr,
> > +static int remap_pfn_range_notrack(struct vm_area_struct *vma, unsigned long addr,
> > unsigned long pfn, unsigned long size, pgprot_t prot)
> > {
> > int error = remap_pfn_range_internal(vma, addr, pfn, size, prot);
> > -
> > if (!error)
> > return 0;
>
> Stray edit
Andrew - can you fix that up? I can send a fix-patch if needed. Just accidental
newline delete. Thanks.
>
> Jason
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v4 07/14] mm: abstract io_remap_pfn_range() based on PFN
2025-09-17 21:19 ` Jason Gunthorpe
@ 2025-09-18 6:26 ` Lorenzo Stoakes
0 siblings, 0 replies; 29+ messages in thread
From: Lorenzo Stoakes @ 2025-09-18 6:26 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Andrew Morton, Jonathan Corbet, Matthew Wilcox, Guo Ren,
Thomas Bogendoerfer, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
David S . Miller, Andreas Larsson, Arnd Bergmann,
Greg Kroah-Hartman, Dan Williams, Vishal Verma, Dave Jiang,
Nicolas Pitre, Muchun Song, Oscar Salvador, David Hildenbrand,
Konstantin Komarov, Baoquan He, Vivek Goyal, Dave Young,
Tony Luck, Reinette Chatre, Dave Martin, James Morse,
Alexander Viro, Christian Brauner, Jan Kara, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Hugh Dickins, Baolin Wang, Uladzislau Rezki, Dmitry Vyukov,
Andrey Konovalov, Jann Horn, Pedro Falcato, linux-doc,
linux-kernel, linux-fsdevel, linux-csky, linux-mips, linux-s390,
sparclinux, nvdimm, linux-cxl, linux-mm, ntfs3, kexec, kasan-dev,
iommu, Kevin Tian, Will Deacon, Robin Murphy
On Wed, Sep 17, 2025 at 06:19:44PM -0300, Jason Gunthorpe wrote:
> On Wed, Sep 17, 2025 at 08:11:09PM +0100, Lorenzo Stoakes wrote:
>
> > -#define io_remap_pfn_range(vma, vaddr, pfn, size, prot) \
> > - remap_pfn_range(vma, vaddr, pfn, size, prot)
> > +#define io_remap_pfn_range_pfn(pfn, size) (pfn)
>
> ??
>
> Just delete it? Looks like cargo cult cruft, see below about
> pgprot_decrypted().
?? yourself! I'm not responsible for the code I touch ;)
I very obviously did this to prevent pgprot_decrypted() being invoked,
keeping the code idempotent to the original.
I obviously didn't account for the fact it's a nop on these arches, which
is your main point here. Which is a great point and really neatly cleans
things up, thanks!
>
> > +#ifdef io_remap_pfn_range_pfn
> > +static inline unsigned long io_remap_pfn_range_prot(pgprot_t prot)
> > +{
> > + /* We do not decrypt if arch customises PFN. */
> > + return prot;
>
> pgprot_decrypted() is a NOP on all the arches that use this override,
> please drop this.
Yes that's a great insight that I missed, and radically simplifies this.
I think my discovering that the PFN is all that varies apart from this +
your pedan^W careful review has led us somewhere nice once I drop this
stuff.
>
> Soon future work will require something more complicated to compute if
> pgprot_decrypted() should be called so this unused stuff isn't going
> to hold up.
Right, not sure what you're getting at here, for these arches will be nop,
so we're all good?
>
> Otherwise looks good to me
>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Thanks!
>
> Jason
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v4 07/14] mm: abstract io_remap_pfn_range() based on PFN
2025-09-17 19:11 ` [PATCH v4 07/14] mm: abstract io_remap_pfn_range() based on PFN Lorenzo Stoakes
2025-09-17 21:19 ` Jason Gunthorpe
@ 2025-09-18 9:11 ` Lorenzo Stoakes
1 sibling, 0 replies; 29+ messages in thread
From: Lorenzo Stoakes @ 2025-09-18 9:11 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Matthew Wilcox, Guo Ren, Thomas Bogendoerfer,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, David S . Miller,
Andreas Larsson, Arnd Bergmann, Greg Kroah-Hartman, Dan Williams,
Vishal Verma, Dave Jiang, Nicolas Pitre, Muchun Song,
Oscar Salvador, David Hildenbrand, Konstantin Komarov, Baoquan He,
Vivek Goyal, Dave Young, Tony Luck, Reinette Chatre, Dave Martin,
James Morse, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Hugh Dickins, Baolin Wang,
Uladzislau Rezki, Dmitry Vyukov, Andrey Konovalov, Jann Horn,
Pedro Falcato, linux-doc, linux-kernel, linux-fsdevel, linux-csky,
linux-mips, linux-s390, sparclinux, nvdimm, linux-cxl, linux-mm,
ntfs3, kexec, kasan-dev, Jason Gunthorpe, iommu, Kevin Tian,
Will Deacon, Robin Murphy
Hi Andrew,
Could you apply the below fix-patch please?
Jason pointed out correctly that pgprot_decrypted() is a noop for the arches in
question so there's no need to do anything special with them.
Cheers, Lorenzo
----8<----
From 9bd1cafa84108a06db8e2135f5e5b0d3e0bf3859 Mon Sep 17 00:00:00 2001
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Date: Thu, 18 Sep 2025 07:41:37 +0100
Subject: [PATCH] io_remap_pfn_range_pfn fixup
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
arch/csky/include/asm/pgtable.h | 2 --
include/linux/mm.h | 15 ++-------------
2 files changed, 2 insertions(+), 15 deletions(-)
diff --git a/arch/csky/include/asm/pgtable.h b/arch/csky/include/asm/pgtable.h
index 967c86b38f11..d606afbabce1 100644
--- a/arch/csky/include/asm/pgtable.h
+++ b/arch/csky/include/asm/pgtable.h
@@ -263,6 +263,4 @@ void update_mmu_cache_range(struct vm_fault *vmf, struct vm_area_struct *vma,
#define update_mmu_cache(vma, addr, ptep) \
update_mmu_cache_range(NULL, vma, addr, ptep, 1)
-#define io_remap_pfn_range_pfn(pfn, size) (pfn)
-
#endif /* __ASM_CSKY_PGTABLE_H */
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 9b65c33bb31a..08261f2f6244 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3672,23 +3672,12 @@ static inline vm_fault_t vmf_insert_page(struct vm_area_struct *vma,
return VM_FAULT_NOPAGE;
}
-#ifdef io_remap_pfn_range_pfn
-static inline unsigned long io_remap_pfn_range_prot(pgprot_t prot)
-{
- /* We do not decrypt if arch customises PFN. */
- return prot;
-}
-#else
+#ifndef io_remap_pfn_range_pfn
static inline unsigned long io_remap_pfn_range_pfn(unsigned long pfn,
unsigned long size)
{
return pfn;
}
-
-static inline pgprot_t io_remap_pfn_range_prot(pgprot_t prot)
-{
- return pgprot_decrypted(prot);
-}
#endif
static inline int io_remap_pfn_range(struct vm_area_struct *vma,
@@ -3696,7 +3685,7 @@ static inline int io_remap_pfn_range(struct vm_area_struct *vma,
unsigned long size, pgprot_t orig_prot)
{
const unsigned long pfn = io_remap_pfn_range_pfn(orig_pfn, size);
- const pgprot_t prot = io_remap_pfn_range_prot(orig_prot);
+ const pgprot_t prot = pgprot_decrypted(orig_prot);
return remap_pfn_range(vma, addr, pfn, size, prot);
}
--
2.51.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH v4 08/14] mm: introduce io_remap_pfn_range_[prepare, complete]()
2025-09-17 19:11 ` [PATCH v4 08/14] mm: introduce io_remap_pfn_range_[prepare, complete]() Lorenzo Stoakes
@ 2025-09-18 9:12 ` Lorenzo Stoakes
0 siblings, 0 replies; 29+ messages in thread
From: Lorenzo Stoakes @ 2025-09-18 9:12 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Matthew Wilcox, Guo Ren, Thomas Bogendoerfer,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, David S . Miller,
Andreas Larsson, Arnd Bergmann, Greg Kroah-Hartman, Dan Williams,
Vishal Verma, Dave Jiang, Nicolas Pitre, Muchun Song,
Oscar Salvador, David Hildenbrand, Konstantin Komarov, Baoquan He,
Vivek Goyal, Dave Young, Tony Luck, Reinette Chatre, Dave Martin,
James Morse, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Hugh Dickins, Baolin Wang,
Uladzislau Rezki, Dmitry Vyukov, Andrey Konovalov, Jann Horn,
Pedro Falcato, linux-doc, linux-kernel, linux-fsdevel, linux-csky,
linux-mips, linux-s390, sparclinux, nvdimm, linux-cxl, linux-mm,
ntfs3, kexec, kasan-dev, Jason Gunthorpe, iommu, Kevin Tian,
Will Deacon, Robin Murphy
Hi Andrew,
Could you also apply the below, so we propagate the fact that we don't need
io_remap_pfn_range_prot()?
Cheers, Lorenzo
----8<----
From cc311eeb5b155601e3223797000f13e07b28bc30 Mon Sep 17 00:00:00 2001
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Date: Thu, 18 Sep 2025 07:43:21 +0100
Subject: [PATCH] fixup io_remap_pfn_range_[prepare, complete]
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
mm/internal.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/internal.h b/mm/internal.h
index 085e34f84bae..38607b2821d9 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1670,7 +1670,7 @@ static inline int io_remap_pfn_range_complete(struct vm_area_struct *vma,
pgprot_t orig_prot)
{
const unsigned long pfn = io_remap_pfn_range_pfn(orig_pfn, size);
- const pgprot_t prot = io_remap_pfn_range_prot(orig_prot);
+ const pgprot_t prot = pgprot_decrypted(orig_prot);
return remap_pfn_range_complete(vma, addr, pfn, size, prot);
}
--
2.51.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH v4 09/14] mm: add ability to take further action in vm_area_desc
2025-09-17 19:11 ` [PATCH v4 09/14] mm: add ability to take further action in vm_area_desc Lorenzo Stoakes
2025-09-17 21:37 ` Jason Gunthorpe
@ 2025-09-18 9:14 ` Lorenzo Stoakes
1 sibling, 0 replies; 29+ messages in thread
From: Lorenzo Stoakes @ 2025-09-18 9:14 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Matthew Wilcox, Guo Ren, Thomas Bogendoerfer,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, David S . Miller,
Andreas Larsson, Arnd Bergmann, Greg Kroah-Hartman, Dan Williams,
Vishal Verma, Dave Jiang, Nicolas Pitre, Muchun Song,
Oscar Salvador, David Hildenbrand, Konstantin Komarov, Baoquan He,
Vivek Goyal, Dave Young, Tony Luck, Reinette Chatre, Dave Martin,
James Morse, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Hugh Dickins, Baolin Wang,
Uladzislau Rezki, Dmitry Vyukov, Andrey Konovalov, Jann Horn,
Pedro Falcato, linux-doc, linux-kernel, linux-fsdevel, linux-csky,
linux-mips, linux-s390, sparclinux, nvdimm, linux-cxl, linux-mm,
ntfs3, kexec, kasan-dev, Jason Gunthorpe, iommu, Kevin Tian,
Will Deacon, Robin Murphy
Hi Andrew,
Finally could you apply the below, which has us return an error in case of
somebody implementing a buggy nommu action.
I also include a fix for the VMA unit tests where an enum declare was not
correctly propagated.
Cheers, Lorenzo
----8<----
From 17c8037bc3bfd5cdd52369dc6140d0fbbd03480d Mon Sep 17 00:00:00 2001
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Date: Thu, 18 Sep 2025 08:08:31 +0100
Subject: [PATCH] fixup: return error on broken path, update vma_internal.h
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
mm/util.c | 6 ++++--
tools/testing/vma/vma_internal.h | 1 +
2 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/mm/util.c b/mm/util.c
index 0c1c68285675..30ed284bb819 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -1385,17 +1385,19 @@ EXPORT_SYMBOL(mmap_action_prepare);
int mmap_action_complete(struct mmap_action *action,
struct vm_area_struct *vma)
{
+ int err = 0;
+
switch (action->type) {
case MMAP_NOTHING:
break;
case MMAP_REMAP_PFN:
case MMAP_IO_REMAP_PFN:
WARN_ON_ONCE(1); /* nommu cannot handle this. */
-
+ err = -EINVAL;
break;
}
- return mmap_action_finish(action, vma, /* err = */0);
+ return mmap_action_finish(action, vma, err);
}
EXPORT_SYMBOL(mmap_action_complete);
#endif
diff --git a/tools/testing/vma/vma_internal.h b/tools/testing/vma/vma_internal.h
index 22ed38e8714e..d5028e5e905b 100644
--- a/tools/testing/vma/vma_internal.h
+++ b/tools/testing/vma/vma_internal.h
@@ -279,6 +279,7 @@ struct vm_area_struct;
enum mmap_action_type {
MMAP_NOTHING, /* Mapping is complete, no further action. */
MMAP_REMAP_PFN, /* Remap PFN range. */
+ MMAP_IO_REMAP_PFN, /* I/O remap PFN range. */
};
/*
--
2.51.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH v4 11/14] mm/hugetlbfs: update hugetlbfs to use mmap_prepare
2025-09-17 19:11 ` [PATCH v4 11/14] mm/hugetlbfs: update hugetlbfs to use mmap_prepare Lorenzo Stoakes
@ 2025-09-23 11:52 ` Sumanth Korikkar
2025-09-23 21:17 ` Andrew Morton
0 siblings, 1 reply; 29+ messages in thread
From: Sumanth Korikkar @ 2025-09-23 11:52 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Andrew Morton, Jonathan Corbet, Matthew Wilcox, Guo Ren,
Thomas Bogendoerfer, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
David S . Miller, Andreas Larsson, Arnd Bergmann,
Greg Kroah-Hartman, Dan Williams, Vishal Verma, Dave Jiang,
Nicolas Pitre, Muchun Song, Oscar Salvador, David Hildenbrand,
Konstantin Komarov, Baoquan He, Vivek Goyal, Dave Young,
Tony Luck, Reinette Chatre, Dave Martin, James Morse,
Alexander Viro, Christian Brauner, Jan Kara, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Hugh Dickins, Baolin Wang, Uladzislau Rezki, Dmitry Vyukov,
Andrey Konovalov, Jann Horn, Pedro Falcato, linux-doc,
linux-kernel, linux-fsdevel, linux-csky, linux-mips, linux-s390,
sparclinux, nvdimm, linux-cxl, linux-mm, ntfs3, kexec, kasan-dev,
Jason Gunthorpe, iommu, Kevin Tian, Will Deacon, Robin Murphy
On Wed, Sep 17, 2025 at 08:11:13PM +0100, Lorenzo Stoakes wrote:
> Since we can now perform actions after the VMA is established via
> mmap_prepare, use desc->action_success_hook to set up the hugetlb lock
> once the VMA is setup.
>
> We also make changes throughout hugetlbfs to make this possible.
>
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
> fs/hugetlbfs/inode.c | 36 ++++++++++------
> include/linux/hugetlb.h | 9 +++-
> include/linux/hugetlb_inline.h | 15 ++++---
> mm/hugetlb.c | 77 ++++++++++++++++++++--------------
> 4 files changed, 85 insertions(+), 52 deletions(-)
>
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index f42548ee9083..9e0625167517 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -96,8 +96,15 @@ static const struct fs_parameter_spec hugetlb_fs_parameters[] = {
> #define PGOFF_LOFFT_MAX \
> (((1UL << (PAGE_SHIFT + 1)) - 1) << (BITS_PER_LONG - (PAGE_SHIFT + 1)))
>
> -static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
> +static int hugetlb_file_mmap_prepare_success(const struct vm_area_struct *vma)
> {
> + /* Unfortunate we have to reassign vma->vm_private_data. */
> + return hugetlb_vma_lock_alloc((struct vm_area_struct *)vma);
> +}
Hi Lorenzo,
The following tests causes the kernel to enter a blocked state,
suggesting an issue related to locking order. I was able to reproduce
this behavior in certain test runs.
Test case:
git clone https://github.com/libhugetlbfs/libhugetlbfs.git
cd libhugetlbfs ; ./configure
make -j32
cd tests
echo 100 > /proc/sys/vm/nr_hugepages
mkdir -p /test-hugepages && mount -t hugetlbfs nodev /test-hugepages
./run_tests.py <in a loop>
...
shm-fork 10 100 (1024K: 64): PASS
set shmmax limit to 104857600
shm-getraw 100 /dev/full (1024K: 32):
shm-getraw 100 /dev/full (1024K: 64): PASS
fallocate_stress.sh (1024K: 64): <blocked>
Blocked task state below:
task:fallocate_stres state:D stack:0 pid:5106 tgid:5106 ppid:5103
task_flags:0x400000 flags:0x00000001
Call Trace:
[<00000255adc646f0>] __schedule+0x370/0x7f0
[<00000255adc64bb0>] schedule+0x40/0xc0
[<00000255adc64d32>] schedule_preempt_disabled+0x22/0x30
[<00000255adc68492>] rwsem_down_write_slowpath+0x232/0x610
[<00000255adc68922>] down_write_killable+0x52/0x80
[<00000255ad12c980>] vm_mmap_pgoff+0xc0/0x1f0
[<00000255ad164bbe>] ksys_mmap_pgoff+0x17e/0x220
[<00000255ad164d3c>] __s390x_sys_old_mmap+0x7c/0xa0
[<00000255adc60e4e>] __do_syscall+0x12e/0x350
[<00000255adc6cfee>] system_call+0x6e/0x90
task:fallocate_stres state:D stack:0 pid:5109 tgid:5106 ppid:5103
task_flags:0x400040 flags:0x00000001
Call Trace:
[<00000255adc646f0>] __schedule+0x370/0x7f0
[<00000255adc64bb0>] schedule+0x40/0xc0
[<00000255adc64d32>] schedule_preempt_disabled+0x22/0x30
[<00000255adc68492>] rwsem_down_write_slowpath+0x232/0x610
[<00000255adc688be>] down_write+0x4e/0x60
[<00000255ad1c11ec>] __hugetlb_zap_begin+0x3c/0x70
[<00000255ad158b9c>] unmap_vmas+0x10c/0x1a0
[<00000255ad180844>] vms_complete_munmap_vmas+0x134/0x2e0
[<00000255ad1811be>] do_vmi_align_munmap+0x13e/0x170
[<00000255ad1812ae>] do_vmi_munmap+0xbe/0x140
[<00000255ad183f86>] __vm_munmap+0xe6/0x190
[<00000255ad166832>] __s390x_sys_munmap+0x32/0x40
[<00000255adc60e4e>] __do_syscall+0x12e/0x350
[<00000255adc6cfee>] system_call+0x6e/0x90
Thanks,
Sumanth
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v4 11/14] mm/hugetlbfs: update hugetlbfs to use mmap_prepare
2025-09-23 11:52 ` Sumanth Korikkar
@ 2025-09-23 21:17 ` Andrew Morton
2025-09-24 12:03 ` Lorenzo Stoakes
0 siblings, 1 reply; 29+ messages in thread
From: Andrew Morton @ 2025-09-23 21:17 UTC (permalink / raw)
To: Sumanth Korikkar
Cc: Lorenzo Stoakes, Jonathan Corbet, Matthew Wilcox, Guo Ren,
Thomas Bogendoerfer, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
David S . Miller, Andreas Larsson, Arnd Bergmann,
Greg Kroah-Hartman, Dan Williams, Vishal Verma, Dave Jiang,
Nicolas Pitre, Muchun Song, Oscar Salvador, David Hildenbrand,
Konstantin Komarov, Baoquan He, Vivek Goyal, Dave Young,
Tony Luck, Reinette Chatre, Dave Martin, James Morse,
Alexander Viro, Christian Brauner, Jan Kara, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Hugh Dickins, Baolin Wang, Uladzislau Rezki, Dmitry Vyukov,
Andrey Konovalov, Jann Horn, Pedro Falcato, linux-doc,
linux-kernel, linux-fsdevel, linux-csky, linux-mips, linux-s390,
sparclinux, nvdimm, linux-cxl, linux-mm, ntfs3, kexec, kasan-dev,
Jason Gunthorpe, iommu, Kevin Tian, Will Deacon, Robin Murphy
On Tue, 23 Sep 2025 13:52:09 +0200 Sumanth Korikkar <sumanthk@linux.ibm.com> wrote:
> > --- a/fs/hugetlbfs/inode.c
> > +++ b/fs/hugetlbfs/inode.c
> > @@ -96,8 +96,15 @@ static const struct fs_parameter_spec hugetlb_fs_parameters[] = {
> > #define PGOFF_LOFFT_MAX \
> > (((1UL << (PAGE_SHIFT + 1)) - 1) << (BITS_PER_LONG - (PAGE_SHIFT + 1)))
> >
> > -static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
> > +static int hugetlb_file_mmap_prepare_success(const struct vm_area_struct *vma)
> > {
> > + /* Unfortunate we have to reassign vma->vm_private_data. */
> > + return hugetlb_vma_lock_alloc((struct vm_area_struct *)vma);
> > +}
>
> Hi Lorenzo,
>
> The following tests causes the kernel to enter a blocked state,
> suggesting an issue related to locking order. I was able to reproduce
> this behavior in certain test runs.
Thanks. I pulled this series out of mm.git's mm-stable branch, put it
back into mm-unstable.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v4 11/14] mm/hugetlbfs: update hugetlbfs to use mmap_prepare
2025-09-23 21:17 ` Andrew Morton
@ 2025-09-24 12:03 ` Lorenzo Stoakes
0 siblings, 0 replies; 29+ messages in thread
From: Lorenzo Stoakes @ 2025-09-24 12:03 UTC (permalink / raw)
To: Andrew Morton
Cc: Sumanth Korikkar, Jonathan Corbet, Matthew Wilcox, Guo Ren,
Thomas Bogendoerfer, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
David S . Miller, Andreas Larsson, Arnd Bergmann,
Greg Kroah-Hartman, Dan Williams, Vishal Verma, Dave Jiang,
Nicolas Pitre, Muchun Song, Oscar Salvador, David Hildenbrand,
Konstantin Komarov, Baoquan He, Vivek Goyal, Dave Young,
Tony Luck, Reinette Chatre, Dave Martin, James Morse,
Alexander Viro, Christian Brauner, Jan Kara, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Hugh Dickins, Baolin Wang, Uladzislau Rezki, Dmitry Vyukov,
Andrey Konovalov, Jann Horn, Pedro Falcato, linux-doc,
linux-kernel, linux-fsdevel, linux-csky, linux-mips, linux-s390,
sparclinux, nvdimm, linux-cxl, linux-mm, ntfs3, kexec, kasan-dev,
Jason Gunthorpe, iommu, Kevin Tian, Will Deacon, Robin Murphy
On Tue, Sep 23, 2025 at 02:17:04PM -0700, Andrew Morton wrote:
> On Tue, 23 Sep 2025 13:52:09 +0200 Sumanth Korikkar <sumanthk@linux.ibm.com> wrote:
>
> > > --- a/fs/hugetlbfs/inode.c
> > > +++ b/fs/hugetlbfs/inode.c
> > > @@ -96,8 +96,15 @@ static const struct fs_parameter_spec hugetlb_fs_parameters[] = {
> > > #define PGOFF_LOFFT_MAX \
> > > (((1UL << (PAGE_SHIFT + 1)) - 1) << (BITS_PER_LONG - (PAGE_SHIFT + 1)))
> > >
> > > -static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
> > > +static int hugetlb_file_mmap_prepare_success(const struct vm_area_struct *vma)
> > > {
> > > + /* Unfortunate we have to reassign vma->vm_private_data. */
> > > + return hugetlb_vma_lock_alloc((struct vm_area_struct *)vma);
> > > +}
> >
> > Hi Lorenzo,
> >
> > The following tests causes the kernel to enter a blocked state,
> > suggesting an issue related to locking order. I was able to reproduce
> > this behavior in certain test runs.
>
> Thanks. I pulled this series out of mm.git's mm-stable branch, put it
> back into mm-unstable.
I'm at a conference right now and after that I'm on leave for a couple weeks,
returning on first week of 6.18-rc1, so I think best to delay this series for a
cycle so I can properly dig in here and determine best way forward then :)
Cheers, Lorenzo
^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2025-09-24 12:04 UTC | newest]
Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-17 19:11 [PATCH v4 00/14] expand mmap_prepare functionality, port more users Lorenzo Stoakes
2025-09-17 19:11 ` [PATCH v4 01/14] mm/shmem: update shmem to use mmap_prepare Lorenzo Stoakes
2025-09-17 19:11 ` [PATCH v4 02/14] device/dax: update devdax " Lorenzo Stoakes
2025-09-17 19:11 ` [PATCH v4 03/14] mm: add vma_desc_size(), vma_desc_pages() helpers Lorenzo Stoakes
2025-09-17 19:11 ` [PATCH v4 04/14] relay: update relay to use mmap_prepare Lorenzo Stoakes
2025-09-17 19:11 ` [PATCH v4 05/14] mm/vma: rename __mmap_prepare() function to avoid confusion Lorenzo Stoakes
2025-09-17 19:11 ` [PATCH v4 06/14] mm: add remap_pfn_range_prepare(), remap_pfn_range_complete() Lorenzo Stoakes
2025-09-17 21:32 ` Jason Gunthorpe
2025-09-18 6:09 ` Lorenzo Stoakes
2025-09-17 19:11 ` [PATCH v4 07/14] mm: abstract io_remap_pfn_range() based on PFN Lorenzo Stoakes
2025-09-17 21:19 ` Jason Gunthorpe
2025-09-18 6:26 ` Lorenzo Stoakes
2025-09-18 9:11 ` Lorenzo Stoakes
2025-09-17 19:11 ` [PATCH v4 08/14] mm: introduce io_remap_pfn_range_[prepare, complete]() Lorenzo Stoakes
2025-09-18 9:12 ` Lorenzo Stoakes
2025-09-17 19:11 ` [PATCH v4 09/14] mm: add ability to take further action in vm_area_desc Lorenzo Stoakes
2025-09-17 21:37 ` Jason Gunthorpe
2025-09-18 6:09 ` Lorenzo Stoakes
2025-09-18 9:14 ` Lorenzo Stoakes
2025-09-17 19:11 ` [PATCH v4 10/14] doc: update porting, vfs documentation for mmap_prepare actions Lorenzo Stoakes
2025-09-17 19:11 ` [PATCH v4 11/14] mm/hugetlbfs: update hugetlbfs to use mmap_prepare Lorenzo Stoakes
2025-09-23 11:52 ` Sumanth Korikkar
2025-09-23 21:17 ` Andrew Morton
2025-09-24 12:03 ` Lorenzo Stoakes
2025-09-17 19:11 ` [PATCH v4 12/14] mm: add shmem_zero_setup_desc() Lorenzo Stoakes
2025-09-17 21:38 ` Jason Gunthorpe
2025-09-17 19:11 ` [PATCH v4 13/14] mm: update mem char driver to use mmap_prepare Lorenzo Stoakes
2025-09-17 19:11 ` [PATCH v4 14/14] mm: update resctl " Lorenzo Stoakes
2025-09-17 20:31 ` [PATCH v4 00/14] expand mmap_prepare functionality, port more users Andrew Morton
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).