All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Ralph Campbell <rcampbell@nvidia.com>
To: Alex Sierra <alex.sierra@amd.com>, <akpm@linux-foundation.org>,
	<Felix.Kuehling@amd.com>, <linux-mm@kvack.org>,
	<linux-ext4@vger.kernel.org>,  <linux-xfs@vger.kernel.org>
Cc: jglisse@redhat.com, jgg@nvidia.com,
	dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org,
	hch@lst.de
Subject: Re: [PATCH v3 2/8] mm: remove extra ZONE_DEVICE struct page refcount
Date: Thu, 17 Jun 2021 12:16:26 -0700	[thread overview]
Message-ID: <7163dbb6-67b5-6eef-5772-500fd2107e5c@nvidia.com> (raw)
In-Reply-To: <20210617151705.15367-3-alex.sierra@amd.com>


On 6/17/21 8:16 AM, Alex Sierra wrote:
> From: Ralph Campbell <rcampbell@nvidia.com>
>
> ZONE_DEVICE struct pages have an extra reference count that complicates the
> code for put_page() and several places in the kernel that need to check the
> reference count to see that a page is not being used (gup, compaction,
> migration, etc.). Clean up the code so the reference count doesn't need to
> be treated specially for ZONE_DEVICE.
>
> v2:
> AS: merged this patch in linux 5.11 version
>
> Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
> Signed-off-by: Alex Sierra <alex.sierra@amd.com>
> ---
>   arch/powerpc/kvm/book3s_hv_uvmem.c     |  2 +-
>   drivers/gpu/drm/nouveau/nouveau_dmem.c |  2 +-
>   fs/dax.c                               |  4 +-
>   include/linux/dax.h                    |  2 +-
>   include/linux/memremap.h               |  7 +--
>   include/linux/mm.h                     | 44 -----------------
>   lib/test_hmm.c                         |  2 +-
>   mm/internal.h                          |  8 +++
>   mm/memremap.c                          | 68 +++++++-------------------
>   mm/migrate.c                           |  5 --
>   mm/page_alloc.c                        |  3 ++
>   mm/swap.c                              | 45 ++---------------
>   12 files changed, 45 insertions(+), 147 deletions(-)
>
I think it is great that you are picking this up and trying to revive it.

However, I have a number of concerns about how it affects existing ZONE_DEVICE
MEMORY_DEVICE_GENERIC and MEMORY_DEVICE_FS_DAX users and I don't see this
addressing them. For example, dev_dax_probe() allocates MEMORY_DEVICE_GENERIC
struct pages and then:
   dev_dax_fault()
     dev_dax_huge_fault()
       __dev_dax_pte_fault()
         vmf_insert_mixed()
which just inserts the PFN into the CPU page tables without increasing the page
refcount so it is zero (whereas it was one before). But using get_page() will
trigger VM_BUG_ON_PAGE() if it is enabled. There isn't any current notion of
free verses allocated for these struct pages. I suppose init_page_count()
could be called on all the struct pages in dev_dax_probe() to fix that though.

I'm even less clear about how to fix MEMORY_DEVICE_FS_DAX. File systems have clear
allocate and free states for backing storage but there are the complications with
the page cache references, etc. to consider. The >1 to 1 reference count seems to
be used to tell when a page is idle (no I/O, reclaim scanners) rather than free
(not allocated to any file) but I'm not 100% sure about that since I don't really
understand all the issues around why a file system needs to have a DAX mount option
besides knowing that the storage block size has to be a multiple of the page size.

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

WARNING: multiple messages have this Message-ID (diff)
From: Ralph Campbell <rcampbell@nvidia.com>
To: Alex Sierra <alex.sierra@amd.com>, <akpm@linux-foundation.org>,
	<Felix.Kuehling@amd.com>, <linux-mm@kvack.org>,
	<linux-ext4@vger.kernel.org>, <linux-xfs@vger.kernel.org>
Cc: <amd-gfx@lists.freedesktop.org>,
	<dri-devel@lists.freedesktop.org>, <hch@lst.de>, <jgg@nvidia.com>,
	<jglisse@redhat.com>
Subject: Re: [PATCH v3 2/8] mm: remove extra ZONE_DEVICE struct page refcount
Date: Thu, 17 Jun 2021 12:16:26 -0700	[thread overview]
Message-ID: <7163dbb6-67b5-6eef-5772-500fd2107e5c@nvidia.com> (raw)
In-Reply-To: <20210617151705.15367-3-alex.sierra@amd.com>


On 6/17/21 8:16 AM, Alex Sierra wrote:
> From: Ralph Campbell <rcampbell@nvidia.com>
>
> ZONE_DEVICE struct pages have an extra reference count that complicates the
> code for put_page() and several places in the kernel that need to check the
> reference count to see that a page is not being used (gup, compaction,
> migration, etc.). Clean up the code so the reference count doesn't need to
> be treated specially for ZONE_DEVICE.
>
> v2:
> AS: merged this patch in linux 5.11 version
>
> Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
> Signed-off-by: Alex Sierra <alex.sierra@amd.com>
> ---
>   arch/powerpc/kvm/book3s_hv_uvmem.c     |  2 +-
>   drivers/gpu/drm/nouveau/nouveau_dmem.c |  2 +-
>   fs/dax.c                               |  4 +-
>   include/linux/dax.h                    |  2 +-
>   include/linux/memremap.h               |  7 +--
>   include/linux/mm.h                     | 44 -----------------
>   lib/test_hmm.c                         |  2 +-
>   mm/internal.h                          |  8 +++
>   mm/memremap.c                          | 68 +++++++-------------------
>   mm/migrate.c                           |  5 --
>   mm/page_alloc.c                        |  3 ++
>   mm/swap.c                              | 45 ++---------------
>   12 files changed, 45 insertions(+), 147 deletions(-)
>
I think it is great that you are picking this up and trying to revive it.

However, I have a number of concerns about how it affects existing ZONE_DEVICE
MEMORY_DEVICE_GENERIC and MEMORY_DEVICE_FS_DAX users and I don't see this
addressing them. For example, dev_dax_probe() allocates MEMORY_DEVICE_GENERIC
struct pages and then:
   dev_dax_fault()
     dev_dax_huge_fault()
       __dev_dax_pte_fault()
         vmf_insert_mixed()
which just inserts the PFN into the CPU page tables without increasing the page
refcount so it is zero (whereas it was one before). But using get_page() will
trigger VM_BUG_ON_PAGE() if it is enabled. There isn't any current notion of
free verses allocated for these struct pages. I suppose init_page_count()
could be called on all the struct pages in dev_dax_probe() to fix that though.

I'm even less clear about how to fix MEMORY_DEVICE_FS_DAX. File systems have clear
allocate and free states for backing storage but there are the complications with
the page cache references, etc. to consider. The >1 to 1 reference count seems to
be used to tell when a page is idle (no I/O, reclaim scanners) rather than free
(not allocated to any file) but I'm not 100% sure about that since I don't really
understand all the issues around why a file system needs to have a DAX mount option
besides knowing that the storage block size has to be a multiple of the page size.


WARNING: multiple messages have this Message-ID (diff)
From: Ralph Campbell <rcampbell@nvidia.com>
To: Alex Sierra <alex.sierra@amd.com>, <akpm@linux-foundation.org>,
	<Felix.Kuehling@amd.com>, <linux-mm@kvack.org>,
	<linux-ext4@vger.kernel.org>,  <linux-xfs@vger.kernel.org>
Cc: jglisse@redhat.com, jgg@nvidia.com,
	dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org,
	hch@lst.de
Subject: Re: [PATCH v3 2/8] mm: remove extra ZONE_DEVICE struct page refcount
Date: Thu, 17 Jun 2021 12:16:26 -0700	[thread overview]
Message-ID: <7163dbb6-67b5-6eef-5772-500fd2107e5c@nvidia.com> (raw)
In-Reply-To: <20210617151705.15367-3-alex.sierra@amd.com>


On 6/17/21 8:16 AM, Alex Sierra wrote:
> From: Ralph Campbell <rcampbell@nvidia.com>
>
> ZONE_DEVICE struct pages have an extra reference count that complicates the
> code for put_page() and several places in the kernel that need to check the
> reference count to see that a page is not being used (gup, compaction,
> migration, etc.). Clean up the code so the reference count doesn't need to
> be treated specially for ZONE_DEVICE.
>
> v2:
> AS: merged this patch in linux 5.11 version
>
> Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
> Signed-off-by: Alex Sierra <alex.sierra@amd.com>
> ---
>   arch/powerpc/kvm/book3s_hv_uvmem.c     |  2 +-
>   drivers/gpu/drm/nouveau/nouveau_dmem.c |  2 +-
>   fs/dax.c                               |  4 +-
>   include/linux/dax.h                    |  2 +-
>   include/linux/memremap.h               |  7 +--
>   include/linux/mm.h                     | 44 -----------------
>   lib/test_hmm.c                         |  2 +-
>   mm/internal.h                          |  8 +++
>   mm/memremap.c                          | 68 +++++++-------------------
>   mm/migrate.c                           |  5 --
>   mm/page_alloc.c                        |  3 ++
>   mm/swap.c                              | 45 ++---------------
>   12 files changed, 45 insertions(+), 147 deletions(-)
>
I think it is great that you are picking this up and trying to revive it.

However, I have a number of concerns about how it affects existing ZONE_DEVICE
MEMORY_DEVICE_GENERIC and MEMORY_DEVICE_FS_DAX users and I don't see this
addressing them. For example, dev_dax_probe() allocates MEMORY_DEVICE_GENERIC
struct pages and then:
   dev_dax_fault()
     dev_dax_huge_fault()
       __dev_dax_pte_fault()
         vmf_insert_mixed()
which just inserts the PFN into the CPU page tables without increasing the page
refcount so it is zero (whereas it was one before). But using get_page() will
trigger VM_BUG_ON_PAGE() if it is enabled. There isn't any current notion of
free verses allocated for these struct pages. I suppose init_page_count()
could be called on all the struct pages in dev_dax_probe() to fix that though.

I'm even less clear about how to fix MEMORY_DEVICE_FS_DAX. File systems have clear
allocate and free states for backing storage but there are the complications with
the page cache references, etc. to consider. The >1 to 1 reference count seems to
be used to tell when a page is idle (no I/O, reclaim scanners) rather than free
(not allocated to any file) but I'm not 100% sure about that since I don't really
understand all the issues around why a file system needs to have a DAX mount option
besides knowing that the storage block size has to be a multiple of the page size.


  reply	other threads:[~2021-06-17 19:16 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-17 15:16 [PATCH v3 0/8] Support DEVICE_GENERIC memory in migrate_vma_* Alex Sierra
2021-06-17 15:16 ` Alex Sierra
2021-06-17 15:16 ` Alex Sierra
2021-06-17 15:16 ` [PATCH v3 1/8] ext4/xfs: add page refcount helper Alex Sierra
2021-06-17 15:16   ` Alex Sierra
2021-06-17 15:16   ` Alex Sierra
2021-06-17 16:11   ` Darrick J. Wong
2021-06-17 16:11     ` Darrick J. Wong
2021-06-17 16:11     ` Darrick J. Wong
2021-06-17 23:52   ` Dave Chinner
2021-06-17 23:52     ` Dave Chinner
2021-06-17 23:52     ` Dave Chinner
2021-06-19 16:14     ` Sierra Guiza, Alejandro (Alex)
2021-06-19 16:14       ` Sierra Guiza, Alejandro (Alex)
2021-06-19 16:14       ` Sierra Guiza, Alejandro (Alex)
2021-06-19 18:31   ` Alex Sierra
2021-06-19 18:31     ` Alex Sierra
2021-06-19 18:31     ` Alex Sierra
2021-06-17 15:16 ` [PATCH v3 2/8] mm: remove extra ZONE_DEVICE struct page refcount Alex Sierra
2021-06-17 15:16   ` Alex Sierra
2021-06-17 15:16   ` Alex Sierra
2021-06-17 19:16   ` Ralph Campbell [this message]
2021-06-17 19:16     ` Ralph Campbell
2021-06-17 19:16     ` Ralph Campbell
2021-06-28 16:46     ` Felix Kuehling
2021-06-28 16:46       ` Felix Kuehling
2021-06-28 16:46       ` Felix Kuehling
2021-06-30  0:23       ` Ralph Campbell
2021-06-30  0:23         ` Ralph Campbell
2021-06-30  0:23         ` Ralph Campbell
2021-06-17 15:17 ` [PATCH v3 3/8] kernel: resource: lookup_resource as exported symbol Alex Sierra
2021-06-17 15:17   ` Alex Sierra
2021-06-17 15:17   ` Alex Sierra
2021-06-17 15:17 ` [PATCH v3 4/8] drm/amdkfd: add SPM support for SVM Alex Sierra
2021-06-17 15:17   ` Alex Sierra
2021-06-17 15:17   ` Alex Sierra
2021-06-17 15:17 ` [PATCH v3 5/8] drm/amdkfd: generic type as sys mem on migration to ram Alex Sierra
2021-06-17 15:17   ` Alex Sierra
2021-06-17 15:17   ` Alex Sierra
2021-06-17 15:17 ` [PATCH v3 6/8] include/linux/mm.h: helpers to check zone device generic type Alex Sierra
2021-06-17 15:17   ` Alex Sierra
2021-06-17 15:17   ` Alex Sierra
2021-06-17 15:17 ` [PATCH v3 7/8] mm: add generic type support to migrate_vma helpers Alex Sierra
2021-06-17 15:17   ` Alex Sierra
2021-06-17 15:17   ` Alex Sierra
2021-06-17 15:17 ` [PATCH v3 8/8] mm: call pgmap->ops->page_free for DEVICE_GENERIC pages Alex Sierra
2021-06-17 15:17   ` Alex Sierra
2021-06-17 15:17   ` Alex Sierra
2021-06-17 15:56 ` [PATCH v3 0/8] Support DEVICE_GENERIC memory in migrate_vma_* Sierra Guiza, Alejandro (Alex)
2021-06-17 15:56   ` Sierra Guiza, Alejandro (Alex)
2021-06-17 15:56   ` Sierra Guiza, Alejandro (Alex)
2021-06-20 14:14 ` Theodore Ts'o
2021-06-20 14:14   ` Theodore Ts'o
2021-06-20 14:14   ` Theodore Ts'o
2021-06-23 21:49   ` Felix Kuehling
2021-06-23 21:49     ` Felix Kuehling
2021-06-23 21:49     ` Felix Kuehling
2021-06-24  5:30     ` Christoph Hellwig
2021-06-24  5:30       ` Christoph Hellwig
2021-06-24 15:08       ` Felix Kuehling
2021-06-24 15:08         ` Felix Kuehling
2021-06-24 15:08         ` Felix Kuehling
2021-07-16 15:07     ` Theodore Y. Ts'o
2021-07-16 15:07       ` Theodore Y. Ts'o
2021-07-16 15:07       ` Theodore Y. Ts'o
2021-07-16 22:14       ` Felix Kuehling
2021-07-16 22:14         ` Felix Kuehling
2021-07-16 22:14         ` Felix Kuehling
2021-07-17 19:54         ` Sierra Guiza, Alejandro (Alex)
2021-07-17 19:54           ` Sierra Guiza, Alejandro (Alex)
2021-07-17 19:54           ` Sierra Guiza, Alejandro (Alex)
2021-07-23 22:46           ` Sierra Guiza, Alejandro (Alex)
2021-07-23 22:46             ` Sierra Guiza, Alejandro (Alex)
2021-07-23 22:46             ` Sierra Guiza, Alejandro (Alex)
2021-07-30 19:02             ` Felix Kuehling

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7163dbb6-67b5-6eef-5772-500fd2107e5c@nvidia.com \
    --to=rcampbell@nvidia.com \
    --cc=Felix.Kuehling@amd.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.sierra@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=hch@lst.de \
    --cc=jgg@nvidia.com \
    --cc=jglisse@redhat.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.