Re: [RFC PATCH 3/3] mm/mempolicy: implement a partial-interleave mempolicy

Linux-api Archive mirror
 help / color / mirror / Atom feed

From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
To: Gregory Price <gourry.memverge@gmail.com>
Cc: <linux-mm@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	<linux-arch@vger.kernel.org>, <linux-api@vger.kernel.org>,
	<linux-cxl@vger.kernel.org>, <luto@kernel.org>,
	<tglx@linutronix.de>, <mingo@redhat.com>, <bp@alien8.de>,
	<dave.hansen@linux.intel.com>, <hpa@zytor.com>, <arnd@arndb.de>,
	<akpm@linux-foundation.org>, <x86@kernel.org>,
	Gregory Price <gregory.price@memverge.com>
Subject: Re: [RFC PATCH 3/3] mm/mempolicy: implement a partial-interleave mempolicy
Date: Mon, 2 Oct 2023 14:40:35 +0100	[thread overview]
Message-ID: <20231002144035.00000b36@Huawei.com> (raw)
In-Reply-To: <20230914235457.482710-4-gregory.price@memverge.com>

On Thu, 14 Sep 2023 19:54:57 -0400
Gregory Price <gourry.memverge@gmail.com> wrote:

> The partial-interleave mempolicy implements interleave on an

I'm not sure 'partial' really conveys what is going on here.
Weighted, or uneven-interleave maybe?

> allocation interval. The default node is the local node, for
> which N pages will be allocated before an interleave pass occurs.
> 
> For example:
>   nodes=0,1,2
>   interval=3
>   cpunode=0
> 
> Over 10 consecutive allocations, the following nodes will be selected:
> [0,0,0,1,2,0,0,0,1,2]
> 
> In this example, there is a 60%/20%/20% distribution of memory.
> 
> Using this mechanism, it becomes possible to define an approximate
> distribution percentage of memory across a set of nodes:
> 
> local_node% : interval/((nr_nodes-1)+interval-1)
> other_node% : (1-local_node%)/(nr_nodes-1)

I'd like to see more discussion here of why you would do this...


A few trivial bits inline,

Jonathan

...

> +static unsigned long alloc_pages_bulk_array_partial_interleave(gfp_t gfp,
> +		struct mempolicy *pol, unsigned long nr_pages,
> +		struct page **page_array)
> +{
> +	nodemask_t nodemask = pol->nodes;
> +	unsigned long nr_pages_main;
> +	unsigned long nr_pages_other;
> +	unsigned long total_cycle;
> +	unsigned long delta;
> +	unsigned long interval;
> +	int allocated = 0;
> +	int start_nid;
> +	int nnodes;
> +	int prev, next;
> +	int i;
> +
> +	/* This stabilizes nodes on the stack incase pol->nodes changes */
> +	barrier();
> +
> +	nnodes = nodes_weight(nodemask);
> +	start_nid = numa_node_id();
> +
> +	if (!node_isset(start_nid, nodemask))
> +		start_nid = first_node(nodemask);
> +
> +	if (nnodes == 1) {
> +		allocated = __alloc_pages_bulk(gfp, start_nid,
> +					       NULL, nr_pages_main,
> +					       NULL, page_array);
> +		return allocated;
		return __alloc_pages_bulk(...)

> +	}
> +	/* We don't want to double-count the main node in calculations */
> +	nnodes--;
> +
> +	interval = pol->part_int.interval;
> +	total_cycle = (interval + nnodes);

excess brackets. Same in various other places.


> +	/* Number of pages on main node: (cycles*interval + up to interval) */
> +	nr_pages_main = ((nr_pages / total_cycle) * interval);
> +	nr_pages_main += (nr_pages % total_cycle % (interval + 1));


> +	/* Number of pages on others: (remaining/nodes) + 1 page if delta  */
> +	nr_pages_other = (nr_pages - nr_pages_main) / nnodes;
> +	nr_pages_other /= nnodes;
> +	/* Delta is number of pages beyond interval up to full cycle */
> +	delta = nr_pages - (nr_pages_main + (nr_pages_other * nnodes));
> +
> +	/* start by allocating for the main node, then interleave rest */
> +	prev = start_nid;
> +	allocated = __alloc_pages_bulk(gfp, start_nid, NULL, nr_pages_main,
> +				       NULL, page_array);
> +	for (i = 0; i < nnodes; i++) {
> +		int pages = nr_pages_other + (delta-- ? 1 : 0);
> +
> +		next = next_node_in(prev, nodemask);
> +		if (next < MAX_NUMNODES)
> +			prev = next;
> +		allocated += __alloc_pages_bulk(gfp, next, NULL, pages,
> +						NULL, page_array);
> +	}
> +
> +	return allocated;
> +}
> +

next prev parent reply	other threads:[~2023-10-02 13:40 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-14 23:54 [RFC PATCH 0/3] mm/mempolicy: set/get_mempolicy2 Gregory Price
2023-09-14 23:54 ` [RFC PATCH 1/3] mm/mempolicy: refactor do_set_mempolicy for code re-use Gregory Price
2023-10-02 11:03   ` Jonathan Cameron
2023-09-14 23:54 ` [RFC PATCH 2/3] mm/mempolicy: Implement set_mempolicy2 and get_mempolicy2 syscalls Gregory Price
2023-10-02 13:30   ` Jonathan Cameron
2023-10-02 15:30     ` Gregory Price
2023-10-02 18:03     ` Gregory Price
2023-09-14 23:54 ` [RFC PATCH 3/3] mm/mempolicy: implement a partial-interleave mempolicy Gregory Price
2023-10-02 13:40   ` Jonathan Cameron [this message]
2023-10-02 16:10     ` Gregory Price

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231002144035.00000b36@Huawei.com \
    --to=jonathan.cameron@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=gourry.memverge@gmail.com \
    --cc=gregory.price@memverge.com \
    --cc=hpa@zytor.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).