Re: [PATCH v5 02/11] mm/mempolicy: introduce MPOL_WEIGHTED_INTERLEAVE for weighted interleaving

Linux-api Archive mirror
 help / color / mirror / Atom feed

From: Gregory Price <gregory.price@memverge.com>
To: "Huang, Ying" <ying.huang@intel.com>
Cc: Gregory Price <gourry.memverge@gmail.com>,
	linux-mm@kvack.org, linux-doc@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-api@vger.kernel.org, x86@kernel.org,
	akpm@linux-foundation.org, arnd@arndb.de, tglx@linutronix.de,
	luto@kernel.org, mingo@redhat.com, bp@alien8.de,
	dave.hansen@linux.intel.com, hpa@zytor.com, mhocko@kernel.org,
	tj@kernel.org, corbet@lwn.net, rakie.kim@sk.com,
	hyeongtak.ji@sk.com, honggyu.kim@sk.com, vtavarespetr@micron.com,
	peterz@infradead.org, jgroves@micron.com,
	ravis.opensrc@micron.com, sthanneeru@micron.com,
	emirakhur@micron.com, Hasan.Maruf@amd.com,
	seungjun.ha@samsung.com,
	Srinivasulu Thanneeru <sthanneeru.opensrc@micron.com>
Subject: Re: [PATCH v5 02/11] mm/mempolicy: introduce MPOL_WEIGHTED_INTERLEAVE for weighted interleaving
Date: Tue, 2 Jan 2024 15:30:36 -0500	[thread overview]
Message-ID: <ZZRybDPSoLme8Ldh@memverge.com> (raw)
In-Reply-To: <878r58dt31.fsf@yhuang6-desk2.ccr.corp.intel.com>

On Tue, Jan 02, 2024 at 04:42:42PM +0800, Huang, Ying wrote:
> Gregory Price <gregory.price@memverge.com> writes:
> 
> > On Wed, Dec 27, 2023 at 04:32:37PM +0800, Huang, Ying wrote:
> >> Gregory Price <gourry.memverge@gmail.com> writes:
> >> 
> >> > +static unsigned int weighted_interleave_nid(struct mempolicy *pol, pgoff_t ilx)
> >> > +{
> >> > +	nodemask_t nodemask = pol->nodes;
> >> > +	unsigned int target, weight_total = 0;
> >> > +	int nid;
> >> > +	unsigned char weights[MAX_NUMNODES];
> >> 
> >> MAX_NUMNODSE could be as large as 1024.  1KB stack space may be too
> >> large?
> >> 
> >
> > I've been struggling with a good solution to this.  We need a local copy
> > of weights to prevent weights from changing out from under us during
> > allocation (which may take quite some time), but it seemed unwise to
> > to allocate 1KB heap in this particular path.
> >
> > Is my concern unfounded?  If so, I can go ahead and add the allocation
> > code.
> 
> Please take a look at NODEMASK_ALLOC().
>

This is not my question. NODEMASK_ALLOC calls kmalloc/kfree. 

Some of the allocations on the stack can be replaced with a scratch
allocation, that's no big deal.

I'm specifically concerned about:
	weighted_interleave_nid
	alloc_pages_bulk_array_weighted_interleave

I'm unsure whether kmalloc/kfree is safe (and non-offensive) in those
contexts. If kmalloc/kfree is safe fine, this problem is trivial.

If not, there is no good solution to this without pre-allocating a
scratch area per-task.

> >> I don't think barrier() is needed to wait for memory operations for
> >> stack.  It's usually used for cross-processor memory order.
> >>
> >
> > This is present in the old interleave code.  To the best of my
> > understanding, the concern is for mempolicy->nodemask rebinding that can
> > occur when cgroups.cpusets.mems_allowed changes.
> >
> > so we can't iterate over (mempolicy->nodemask), we have to take a local
> > copy.
> >
> > My *best* understanding of the barrier here is to prevent the compiler
> > from reordering operations such that it attempts to optimize out the
> > local copy (or do lazy-fetch).
> >
> > It is present in the original interleave code, so I pulled it forward to
> > this, but I have not tested whether this is a bit paranoid or not.
> >
> > from `interleave_nid`:
> >
> >  /*
> >   * The barrier will stabilize the nodemask in a register or on
> >   * the stack so that it will stop changing under the code.
> >   *
> >   * Between first_node() and next_node(), pol->nodes could be changed
> >   * by other threads. So we put pol->nodes in a local stack.
> >   */
> >  barrier();
> 
> Got it.  This is kind of READ_ONCE() for nodemask.  To avoid to add
> comments all over the place.  Can we implement a wrapper for it?  For
> example, memcpy_once().  __read_once_size() in
> tools/include/linux/compiler.h can be used as reference.
> 
> Because node_weights[] may be changed simultaneously too.  We may need
> to consider similar issue for it too.  But RCU seems more appropriate
> for node_weights[].
> 

Weights are collected individually onto the stack because we have to sum
them up before we actually apply the weights.

A stale weight is not offensive.  RCU is not needed and doesn't help.

The reason the barrier is needed is not weights, it's the nodemask.

So you basically just want to replace barrier() with this and drop the
copy/pasted comments:

static void read_once_policy_nodemask(struct mempolicy *pol, nodemask_t *mask)
{
        /*
         * The barrier will stabilize the nodemask in a register or on
         * the stack so that it will stop changing under the code.
         *
         * Between first_node() and next_node(), pol->nodes could be changed
         * by other threads. So we put pol->nodes in a local stack.
         */
        barrier();
        __builtin_memcpy(mask, &pol->nodes, sizeof(nodemask_t));
        barrier();
}

- nodemask_t nodemask = pol->nodemask
- barrier()
+ nodemask_t nodemask;
+ read_once_policy_nodemask(pol, &nodemask)

Is that right?

~Gregory

next prev parent reply	other threads:[~2024-01-02 20:30 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-23 18:10 [PATCH v5 00/11] mempolicy2, mbind2, and weighted interleave Gregory Price
2023-12-23 18:10 ` [PATCH v5 01/11] mm/mempolicy: implement the sysfs-based weighted_interleave interface Gregory Price
2023-12-27  6:42   ` Huang, Ying
2023-12-26  6:48     ` Gregory Price
2024-01-02  7:41       ` Huang, Ying
2024-01-02 19:45         ` Gregory Price
2024-01-03  2:45           ` Huang, Ying
2024-01-03  2:59             ` Gregory Price
2024-01-03  6:03               ` Huang, Ying
2024-01-03  2:46         ` Gregory Price
2023-12-23 18:10 ` [PATCH v5 02/11] mm/mempolicy: introduce MPOL_WEIGHTED_INTERLEAVE for weighted interleaving Gregory Price
2023-12-27  8:32   ` Huang, Ying
2023-12-26  7:01     ` Gregory Price
2023-12-26  8:06       ` Gregory Price
2023-12-26 11:32       ` Gregory Price
2024-01-02  8:42       ` Huang, Ying
2024-01-02 20:30         ` Gregory Price [this message]
2024-01-03  5:46           ` Huang, Ying
2024-01-03 22:09             ` Gregory Price
2024-01-04  5:39               ` Huang, Ying
2024-01-04 18:59                 ` Gregory Price
2024-01-05  6:51                   ` Huang, Ying
2024-01-05  7:25                     ` Gregory Price
2024-01-08  7:08                       ` Huang, Ying
2023-12-23 18:10 ` [PATCH v5 03/11] mm/mempolicy: refactor sanitize_mpol_flags for reuse Gregory Price
2023-12-27  8:39   ` Huang, Ying
2023-12-26  7:05     ` Gregory Price
2023-12-26 11:48       ` Gregory Price
2024-01-02  9:09         ` Huang, Ying
2024-01-02 20:32           ` Gregory Price
2023-12-23 18:10 ` [PATCH v5 04/11] mm/mempolicy: create struct mempolicy_args for creating new mempolicies Gregory Price
2023-12-23 18:10 ` [PATCH v5 05/11] mm/mempolicy: refactor kernel_get_mempolicy for code re-use Gregory Price
2023-12-23 18:10 ` [PATCH v5 06/11] mm/mempolicy: allow home_node to be set by mpol_new Gregory Price
2023-12-23 18:10 ` [PATCH v5 07/11] mm/mempolicy: add userland mempolicy arg structure Gregory Price
2023-12-23 18:10 ` [PATCH v5 08/11] mm/mempolicy: add set_mempolicy2 syscall Gregory Price
2024-01-02 14:38   ` Geert Uytterhoeven
2023-12-23 18:10 ` [PATCH v5 09/11] mm/mempolicy: add get_mempolicy2 syscall Gregory Price
2024-01-02 14:46   ` Geert Uytterhoeven
2023-12-23 18:11 ` [PATCH v5 10/11] mm/mempolicy: add the mbind2 syscall Gregory Price
2024-01-02 14:47   ` Geert Uytterhoeven
2023-12-23 18:11 ` [PATCH v5 11/11] mm/mempolicy: extend set_mempolicy2 and mbind2 to support weighted interleave Gregory Price
2023-12-25  7:54 ` [PATCH v5 00/11] mempolicy2, mbind2, and " Huang, Ying
2023-12-26  7:45   ` Gregory Price
2024-01-02  4:27     ` Huang, Ying
2024-01-02 19:06       ` Gregory Price
2024-01-03  3:15         ` Huang, Ying

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZZRybDPSoLme8Ldh@memverge.com \
    --to=gregory.price@memverge.com \
    --cc=Hasan.Maruf@amd.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=bp@alien8.de \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=emirakhur@micron.com \
    --cc=gourry.memverge@gmail.com \
    --cc=honggyu.kim@sk.com \
    --cc=hpa@zytor.com \
    --cc=hyeongtak.ji@sk.com \
    --cc=jgroves@micron.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mhocko@kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rakie.kim@sk.com \
    --cc=ravis.opensrc@micron.com \
    --cc=seungjun.ha@samsung.com \
    --cc=sthanneeru.opensrc@micron.com \
    --cc=sthanneeru@micron.com \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=vtavarespetr@micron.com \
    --cc=x86@kernel.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).