All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Feng Tang <feng.tang@intel.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	kernel test robot <oliver.sang@intel.com>,
	John Hubbard <jhubbard@nvidia.com>,
	linux-kernel@vger.kernel.org, lkp@lists.01.org,
	Peter Xu <peterx@redhat.com>
Subject: Re: [PATCH] mm: relocate 'write_protect_seq' in struct mm_struct
Date: Mon, 14 Jun 2021 11:27:39 +0800	[thread overview]
Message-ID: <20210614032738.GA72794@shbuild999.sh.intel.com> (raw)
In-Reply-To: <20210611170917.GW1002214@nvidia.com>

Hi Jason,

On Fri, Jun 11, 2021 at 02:09:17PM -0300, Jason Gunthorpe wrote:
> On Fri, Jun 11, 2021 at 09:54:42AM +0800, Feng Tang wrote:
> > 0day robot reported a 9.2% regression for will-it-scale mmap1 test
> > case[1], caused by commit 57efa1fe5957 ("mm/gup: prevent gup_fast
> > from racing with COW during fork").
> > 
> > Further debug shows the regression is due to that commit changes
> > the offset of hot fields 'mmap_lock' inside structure 'mm_struct',
> > thus some cache alignment changes.
> > 
> > From the perf data, the contention for 'mmap_lock' is very severe
> > and takes around 95% cpu cycles, and it is a rw_semaphore
> > 
> >         struct rw_semaphore {
> >                 atomic_long_t count;	/* 8 bytes */
> >                 atomic_long_t owner;	/* 8 bytes */
> >                 struct optimistic_spin_queue osq; /* spinner MCS lock */
> >                 ...
> > 
> > Before commit 57efa1fe5957 adds the 'write_protect_seq', it
> > happens to have a very optimal cache alignment layout, as
> > Linus explained:
> > 
> >  "and before the addition of the 'write_protect_seq' field, the
> >   mmap_sem was at offset 120 in 'struct mm_struct'.
> > 
> >   Which meant that count and owner were in two different cachelines,
> >   and then when you have contention and spend time in
> >   rwsem_down_write_slowpath(), this is probably *exactly* the kind
> >   of layout you want.
> > 
> >   Because first the rwsem_write_trylock() will do a cmpxchg on the
> >   first cacheline (for the optimistic fast-path), and then in the
> >   case of contention, rwsem_down_write_slowpath() will just access
> >   the second cacheline.
> > 
> >   Which is probably just optimal for a load that spends a lot of
> >   time contended - new waiters touch that first cacheline, and then
> >   they queue themselves up on the second cacheline."
> > 
> > After the commit, the rw_semaphore is at offset 128, which means
> > the 'count' and 'owner' fields are now in the same cacheline,
> > and causes more cache bouncing.
> > 
> > Currently there are 3 "#ifdef CONFIG_XXX" before 'mmap_lock' which
> > will affect its offset:
> > 
> >   CONFIG_MMU
> >   CONFIG_MEMBARRIER
> >   CONFIG_HAVE_ARCH_COMPAT_MMAP_BASES
> > 
> > The layout above is on 64 bits system with 0day's default kernel
> > config (similar to RHEL-8.3's config), in which all these 3 options
> > are 'y'. And the layout can vary with different kernel configs.
> > 
> > Relayouting a structure is usually a double-edged sword, as sometimes
> > it can helps one case, but hurt other cases. For this case, one
> > solution is, as the newly added 'write_protect_seq' is a 4 bytes long
> > seqcount_t (when CONFIG_DEBUG_LOCK_ALLOC=n), placing it into an
> > existing 4 bytes hole in 'mm_struct' will not change other fields'
> > alignment, while restoring the regression. 
> > 
> > [1]. https://lore.kernel.org/lkml/20210525031636.GB7744@xsang-OptiPlex-9020/
> > Reported-by: kernel test robot <oliver.sang@intel.com>
> > Signed-off-by: Feng Tang <feng.tang@intel.com>
> > Reviewed-by: John Hubbard <jhubbard@nvidia.com>
> > Cc: Jason Gunthorpe <jgg@nvidia.com>
> > Cc: Linus Torvalds <torvalds@linux-foundation.org>
> > Cc: Peter Xu <peterx@redhat.com>
> > ---
> >  include/linux/mm_types.h | 27 ++++++++++++++++++++-------
> >  1 file changed, 20 insertions(+), 7 deletions(-)
> 
> It seems Ok to me, but didn't we earlier add the has_pinned which
> would have changed the layout too? Are we chasing performance delta's
> nobody cares about?

Good point! I checked my email folder for 0day's reports, and haven't
found a report related with Peter's commit 008cfe4418b3 ("mm: Introduce
mm_struct.has_pinned) which adds 'has_pinned' field. 

Will run the same test for it and report back.

> Still it is mechanically fine, so:
> 
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Thanks for the review!

- Feng

> Jason

WARNING: multiple messages have this Message-ID (diff)
From: Feng Tang <feng.tang@intel.com>
To: lkp@lists.01.org
Subject: Re: [PATCH] mm: relocate 'write_protect_seq' in struct mm_struct
Date: Mon, 14 Jun 2021 11:27:39 +0800	[thread overview]
Message-ID: <20210614032738.GA72794@shbuild999.sh.intel.com> (raw)
In-Reply-To: <20210611170917.GW1002214@nvidia.com>

[-- Attachment #1: Type: text/plain, Size: 3980 bytes --]

Hi Jason,

On Fri, Jun 11, 2021 at 02:09:17PM -0300, Jason Gunthorpe wrote:
> On Fri, Jun 11, 2021 at 09:54:42AM +0800, Feng Tang wrote:
> > 0day robot reported a 9.2% regression for will-it-scale mmap1 test
> > case[1], caused by commit 57efa1fe5957 ("mm/gup: prevent gup_fast
> > from racing with COW during fork").
> > 
> > Further debug shows the regression is due to that commit changes
> > the offset of hot fields 'mmap_lock' inside structure 'mm_struct',
> > thus some cache alignment changes.
> > 
> > From the perf data, the contention for 'mmap_lock' is very severe
> > and takes around 95% cpu cycles, and it is a rw_semaphore
> > 
> >         struct rw_semaphore {
> >                 atomic_long_t count;	/* 8 bytes */
> >                 atomic_long_t owner;	/* 8 bytes */
> >                 struct optimistic_spin_queue osq; /* spinner MCS lock */
> >                 ...
> > 
> > Before commit 57efa1fe5957 adds the 'write_protect_seq', it
> > happens to have a very optimal cache alignment layout, as
> > Linus explained:
> > 
> >  "and before the addition of the 'write_protect_seq' field, the
> >   mmap_sem was at offset 120 in 'struct mm_struct'.
> > 
> >   Which meant that count and owner were in two different cachelines,
> >   and then when you have contention and spend time in
> >   rwsem_down_write_slowpath(), this is probably *exactly* the kind
> >   of layout you want.
> > 
> >   Because first the rwsem_write_trylock() will do a cmpxchg on the
> >   first cacheline (for the optimistic fast-path), and then in the
> >   case of contention, rwsem_down_write_slowpath() will just access
> >   the second cacheline.
> > 
> >   Which is probably just optimal for a load that spends a lot of
> >   time contended - new waiters touch that first cacheline, and then
> >   they queue themselves up on the second cacheline."
> > 
> > After the commit, the rw_semaphore is at offset 128, which means
> > the 'count' and 'owner' fields are now in the same cacheline,
> > and causes more cache bouncing.
> > 
> > Currently there are 3 "#ifdef CONFIG_XXX" before 'mmap_lock' which
> > will affect its offset:
> > 
> >   CONFIG_MMU
> >   CONFIG_MEMBARRIER
> >   CONFIG_HAVE_ARCH_COMPAT_MMAP_BASES
> > 
> > The layout above is on 64 bits system with 0day's default kernel
> > config (similar to RHEL-8.3's config), in which all these 3 options
> > are 'y'. And the layout can vary with different kernel configs.
> > 
> > Relayouting a structure is usually a double-edged sword, as sometimes
> > it can helps one case, but hurt other cases. For this case, one
> > solution is, as the newly added 'write_protect_seq' is a 4 bytes long
> > seqcount_t (when CONFIG_DEBUG_LOCK_ALLOC=n), placing it into an
> > existing 4 bytes hole in 'mm_struct' will not change other fields'
> > alignment, while restoring the regression. 
> > 
> > [1]. https://lore.kernel.org/lkml/20210525031636.GB7744(a)xsang-OptiPlex-9020/
> > Reported-by: kernel test robot <oliver.sang@intel.com>
> > Signed-off-by: Feng Tang <feng.tang@intel.com>
> > Reviewed-by: John Hubbard <jhubbard@nvidia.com>
> > Cc: Jason Gunthorpe <jgg@nvidia.com>
> > Cc: Linus Torvalds <torvalds@linux-foundation.org>
> > Cc: Peter Xu <peterx@redhat.com>
> > ---
> >  include/linux/mm_types.h | 27 ++++++++++++++++++++-------
> >  1 file changed, 20 insertions(+), 7 deletions(-)
> 
> It seems Ok to me, but didn't we earlier add the has_pinned which
> would have changed the layout too? Are we chasing performance delta's
> nobody cares about?

Good point! I checked my email folder for 0day's reports, and haven't
found a report related with Peter's commit 008cfe4418b3 ("mm: Introduce
mm_struct.has_pinned) which adds 'has_pinned' field. 

Will run the same test for it and report back.

> Still it is mechanically fine, so:
> 
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Thanks for the review!

- Feng

> Jason

  reply	other threads:[~2021-06-14  3:32 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-11  1:54 [PATCH] mm: relocate 'write_protect_seq' in struct mm_struct Feng Tang
2021-06-11  1:54 ` Feng Tang
2021-06-11 17:09 ` Jason Gunthorpe
2021-06-11 17:09   ` Jason Gunthorpe
2021-06-14  3:27   ` Feng Tang [this message]
2021-06-14  3:27     ` Feng Tang
2021-06-15  1:11     ` Feng Tang
2021-06-15  1:11       ` Feng Tang
2021-06-15 18:52       ` Peter Xu
2021-06-15 18:52         ` Peter Xu
2021-06-16  1:51         ` Feng Tang
2021-06-16  1:51           ` Feng Tang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210614032738.GA72794@shbuild999.sh.intel.com \
    --to=feng.tang@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=jgg@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lkp@lists.01.org \
    --cc=oliver.sang@intel.com \
    --cc=peterx@redhat.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.