From: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
To: Mel Gorman <mel@csn.ul.ie>
Cc: linux-mm@kvack.org, linux-numa@vger.kernel.org,
akpm@linux-foundation.org, Nishanth Aravamudan <nacc@us.ibm.com>,
andi@firstfloor.org, David Rientjes <rientjes@google.com>,
Adam Litke <agl@us.ibm.com>, Andy Whitcroft <apw@canonical.com>,
eric.whitney@hp.com
Subject: Re: [PATCH 3/4] hugetlb: derive huge pages nodes allowed from task mempolicy
Date: Fri, 31 Jul 2009 14:49:55 -0400 [thread overview]
Message-ID: <1249066195.4674.207.camel@useless.americas.hpqcorp.net> (raw)
In-Reply-To: <20090730111512.GC4831@csn.ul.ie>
Incremental patch for missing kfree() at end. Will roll into next
respin of the series.
On Thu, 2009-07-30 at 12:15 +0100, Mel Gorman wrote:
> On Wed, Jul 29, 2009 at 01:55:11PM -0400, Lee Schermerhorn wrote:
> > [PATCH 3/4] hugetlb: derive huge pages nodes allowed from task mempolicy
> >
> > Against: 2.6.31-rc3-mmotm-090716-1432
> > atop the alloc_bootmem_huge_page() fix patch
> > [http://marc.info/?l=linux-mm&m=124775468226290&w=4]
> >
> > V2:
> > + cleaned up comments, removed some deemed unnecessary,
> > add some suggested by review
> > + removed check for !current in huge_mpol_nodes_allowed().
> > + added 'current->comm' to warning message in huge_mpol_nodes_allowed().
> > + added VM_BUG_ON() assertion in hugetlb.c next_node_allowed() to
> > catch out of range node id.
> > + add examples to patch description
> >
> > V3: Factored this patch from V2 patch 2/3
> >
> > This patch derives a "nodes_allowed" node mask from the numa
> > mempolicy of the task modifying the number of persistent huge
> > pages to control the allocation, freeing and adjusting of surplus
> > huge pages. This mask is derived as follows:
> >
> > * For "default" [NULL] task mempolicy, a NULL nodemask_t pointer
> > is produced. This will cause the hugetlb subsystem to use
> > node_online_map as the "nodes_allowed". This preserves the
> > behavior before this patch.
> > * For "preferred" mempolicy, including explicit local allocation,
> > a nodemask with the single preferred node will be produced.
> > "local" policy will NOT track any internode migrations of the
> > task adjusting nr_hugepages.
> > * For "bind" and "interleave" policy, the mempolicy's nodemask
> > will be used.
> > * Other than to inform the construction of the nodes_allowed node
> > mask, the actual mempolicy mode is ignored. That is, all modes
> > behave like interleave over the resulting nodes_allowed mask
> > with no "fallback".
> >
> > Notes:
> >
> > 1) This patch introduces a subtle change in behavior: huge page
> > allocation and freeing will be constrained by any mempolicy
> > that the task adjusting the huge page pool inherits from its
> > parent. This policy could come from a distant ancestor. The
> > adminstrator adjusting the huge page pool without explicitly
> > specifying a mempolicy via numactl might be surprised by this.
> > Additionaly, any mempolicy specified by numactl will be
> > constrained by the cpuset in which numactl is invoked.
> >
> > 2) Hugepages allocated at boot time use the node_online_map.
> > An additional patch could implement a temporary boot time
> > huge pages nodes_allowed command line parameter.
> >
> > 3) Using mempolicy to control persistent huge page allocation
> > and freeing requires no change to hugeadm when invoking
> > it via numactl, as shown in the examples below. However,
> > hugeadm could be enhanced to take the allowed nodes as an
> > argument and set its task mempolicy itself. This would allow
> > it to detect and warn about any non-default mempolicy that it
> > inherited from its parent, thus alleviating the issue described
> > in Note 1 above.
> >
> > See the updated documentation [next patch] for more information
> > about the implications of this patch.
> >
> > Examples:
> >
> > Starting with:
> >
> > Node 0 HugePages_Total: 0
> > Node 1 HugePages_Total: 0
> > Node 2 HugePages_Total: 0
> > Node 3 HugePages_Total: 0
> >
> > Default behavior [with or without this patch] balances persistent
> > hugepage allocation across nodes [with sufficient contiguous memory]:
> >
> > hugeadm --pool-pages-min=2048Kb:32
> >
> > yields:
> >
> > Node 0 HugePages_Total: 8
> > Node 1 HugePages_Total: 8
> > Node 2 HugePages_Total: 8
> > Node 3 HugePages_Total: 8
> >
> > Applying mempolicy--e.g., with numactl [using '-m' a.k.a.
> > '--membind' because it allows multiple nodes to be specified
> > and it's easy to type]--we can allocate huge pages on
> > individual nodes or sets of nodes. So, starting from the
> > condition above, with 8 huge pages per node:
> >
> > numactl -m 2 hugeadm --pool-pages-min=2048Kb:+8
> >
> > yields:
> >
> > Node 0 HugePages_Total: 8
> > Node 1 HugePages_Total: 8
> > Node 2 HugePages_Total: 16
> > Node 3 HugePages_Total: 8
> >
> > The incremental 8 huge pages were restricted to node 2 by the
> > specified mempolicy.
> >
> > Similarly, we can use mempolicy to free persistent huge pages
> > from specified nodes:
> >
> > numactl -m 0,1 hugeadm --pool-pages-min=2048Kb:-8
> >
> > yields:
> >
> > Node 0 HugePages_Total: 4
> > Node 1 HugePages_Total: 4
> > Node 2 HugePages_Total: 16
> > Node 3 HugePages_Total: 8
> >
> > The 8 huge pages freed were balanced over nodes 0 and 1.
> >
> > Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
> >
> > include/linux/mempolicy.h | 3 ++
> > mm/hugetlb.c | 13 ++++++---
> > mm/mempolicy.c | 61 ++++++++++++++++++++++++++++++++++++++++++++++
> > 3 files changed, 72 insertions(+), 5 deletions(-)
> >
> > Index: linux-2.6.31-rc3-mmotm-090716-1432/mm/mempolicy.c
> > ===================================================================
> > --- linux-2.6.31-rc3-mmotm-090716-1432.orig/mm/mempolicy.c 2009-07-28 11:15:10.000000000 -0400
> > +++ linux-2.6.31-rc3-mmotm-090716-1432/mm/mempolicy.c 2009-07-28 11:23:43.000000000 -0400
> > @@ -1544,6 +1544,67 @@ struct zonelist *huge_zonelist(struct vm
> > }
> > return zl;
> > }
> > +
> > +/*
> > + * huge_mpol_nodes_allowed -- mempolicy extension for huge pages.
> > + *
> > + * Returns a [pointer to a] nodelist based on the current task's mempolicy
> > + * to constraing the allocation and freeing of persistent huge pages
> > + * 'Preferred', 'local' and 'interleave' mempolicy will behave more like
> > + * 'bind' policy in this context. An attempt to allocate a persistent huge
> > + * page will never "fallback" to another node inside the buddy system
> > + * allocator.
> > + *
> > + * If the task's mempolicy is "default" [NULL], just return NULL for
> > + * default behavior. Otherwise, extract the policy nodemask for 'bind'
> > + * or 'interleave' policy or construct a nodemask for 'preferred' or
> > + * 'local' policy and return a pointer to a kmalloc()ed nodemask_t.
> > + *
> > + * N.B., it is the caller's responsibility to free a returned nodemask.
> > + */
> > +nodemask_t *huge_mpol_nodes_allowed(void)
> > +{
> > + nodemask_t *nodes_allowed = NULL;
> > + struct mempolicy *mempolicy;
> > + int nid;
> > +
> > + if (!current->mempolicy)
> > + return NULL;
> > +
> > + mpol_get(current->mempolicy);
> > + nodes_allowed = kmalloc(sizeof(*nodes_allowed), GFP_KERNEL);
> > + if (!nodes_allowed) {
> > + printk(KERN_WARNING "%s unable to allocate nodes allowed mask "
> > + "for huge page allocation.\nFalling back to default.\n",
> > + current->comm);
> > + goto out;
> > + }
> > + nodes_clear(*nodes_allowed);
> > +
> > + mempolicy = current->mempolicy;
> > + switch (mempolicy->mode) {
> > + case MPOL_PREFERRED:
> > + if (mempolicy->flags & MPOL_F_LOCAL)
> > + nid = numa_node_id();
> > + else
> > + nid = mempolicy->v.preferred_node;
> > + node_set(nid, *nodes_allowed);
> > + break;
> > +
> > + case MPOL_BIND:
> > + /* Fall through */
> > + case MPOL_INTERLEAVE:
> > + *nodes_allowed = mempolicy->v.nodes;
> > + break;
> > +
> > + default:
> > + BUG();
> > + }
> > +
> > +out:
> > + mpol_put(current->mempolicy);
> > + return nodes_allowed;
> > +}
> > #endif
> >
> > /* Allocate a page in interleaved policy.
> > Index: linux-2.6.31-rc3-mmotm-090716-1432/include/linux/mempolicy.h
> > ===================================================================
> > --- linux-2.6.31-rc3-mmotm-090716-1432.orig/include/linux/mempolicy.h 2009-07-28 11:15:10.000000000 -0400
> > +++ linux-2.6.31-rc3-mmotm-090716-1432/include/linux/mempolicy.h 2009-07-28 11:23:43.000000000 -0400
> > @@ -201,6 +201,7 @@ extern void mpol_fix_fork_child_flag(str
> > extern struct zonelist *huge_zonelist(struct vm_area_struct *vma,
> > unsigned long addr, gfp_t gfp_flags,
> > struct mempolicy **mpol, nodemask_t **nodemask);
> > +extern nodemask_t *huge_mpol_nodes_allowed(void);
> > extern unsigned slab_node(struct mempolicy *policy);
> >
> > extern enum zone_type policy_zone;
> > @@ -328,6 +329,8 @@ static inline struct zonelist *huge_zone
> > return node_zonelist(0, gfp_flags);
> > }
> >
> > +static inline nodemask_t *huge_mpol_nodes_allowed(void) { return NULL; }
> > +
> > static inline int do_migrate_pages(struct mm_struct *mm,
> > const nodemask_t *from_nodes,
> > const nodemask_t *to_nodes, int flags)
> > Index: linux-2.6.31-rc3-mmotm-090716-1432/mm/hugetlb.c
> > ===================================================================
> > --- linux-2.6.31-rc3-mmotm-090716-1432.orig/mm/hugetlb.c 2009-07-28 11:23:18.000000000 -0400
> > +++ linux-2.6.31-rc3-mmotm-090716-1432/mm/hugetlb.c 2009-07-28 11:24:43.000000000 -0400
> > @@ -1257,10 +1257,13 @@ static int adjust_pool_surplus(struct hs
> > static unsigned long set_max_huge_pages(struct hstate *h, unsigned long count)
> > {
> > unsigned long min_count, ret;
> > + nodemask_t *nodes_allowed;
> >
> > if (h->order >= MAX_ORDER)
> > return h->max_huge_pages;
> >
> > + nodes_allowed = huge_mpol_nodes_allowed();
> > +
> > /*
> > * Increase the pool size
> > * First take pages out of surplus state. Then make up the
> > @@ -1274,7 +1277,7 @@ static unsigned long set_max_huge_pages(
> > */
> > spin_lock(&hugetlb_lock);
> > while (h->surplus_huge_pages && count > persistent_huge_pages(h)) {
> > - if (!adjust_pool_surplus(h, NULL, -1))
> > + if (!adjust_pool_surplus(h, nodes_allowed, -1))
> > break;
> > }
> >
> > @@ -1285,7 +1288,7 @@ static unsigned long set_max_huge_pages(
> > * and reducing the surplus.
> > */
> > spin_unlock(&hugetlb_lock);
> > - ret = alloc_fresh_huge_page(h, NULL);
> > + ret = alloc_fresh_huge_page(h, nodes_allowed);
> > spin_lock(&hugetlb_lock);
> > if (!ret)
> > goto out;
> > @@ -1309,13 +1312,13 @@ static unsigned long set_max_huge_pages(
> > */
> > min_count = h->resv_huge_pages + h->nr_huge_pages - h->free_huge_pages;
> > min_count = max(count, min_count);
> > - try_to_free_low(h, min_count, NULL);
> > + try_to_free_low(h, min_count, nodes_allowed);
> > while (min_count < persistent_huge_pages(h)) {
> > - if (!free_pool_huge_page(h, NULL, 0))
> > + if (!free_pool_huge_page(h, nodes_allowed, 0))
> > break;
> > }
> > while (count < persistent_huge_pages(h)) {
> > - if (!adjust_pool_surplus(h, NULL, 1))
> > + if (!adjust_pool_surplus(h, nodes_allowed, 1))
> > break;
> > }
> > out:
>
> Where did the hunk go that calls kfree(nodes_allowed)? I think we might
> be leaking in this version.
>
> Otherwise it looks good.
>
[PATCH] hugetlb: derive huge pages nodes allowed from task mempolicy FIX
Against: 2.6.31-rc3-mmotm-090716-1432 with
hugetlb: derive huge pages nodes allowed from task mempolicy
restore kfree() of dynamically allocated nodes_allowed nodemask
dropped during refactoring of patches. Caught by Mel Gorman.
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
---
mm/hugetlb.c | 1 +
1 file changed, 1 insertion(+)
Index: linux-2.6.31-rc4-mmotm-090730-0501/mm/hugetlb.c
===================================================================
--- linux-2.6.31-rc4-mmotm-090730-0501.orig/mm/hugetlb.c 2009-07-30 09:09:28.000000000 -0400
+++ linux-2.6.31-rc4-mmotm-090730-0501/mm/hugetlb.c 2009-07-30 09:11:07.000000000 -0400
@@ -1324,6 +1324,7 @@ static unsigned long set_max_huge_pages(
out:
ret = persistent_huge_pages(h);
spin_unlock(&hugetlb_lock);
+ kfree(nodes_allowed);
return ret;
}
next prev parent reply other threads:[~2009-07-31 18:49 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-07-29 17:54 [PATCH 0/4] hugetlb: V3 constrain allocation/free based on task mempolicy Lee Schermerhorn
2009-07-29 17:54 ` [PATCH 1/4] hugetlb: rework hstate_next_node_* functions Lee Schermerhorn
2009-07-30 10:40 ` Mel Gorman
2009-07-29 17:55 ` [PATCH 2/4] hugetlb: add nodemask arg to huge page alloc, free and surplus adjust fcns Lee Schermerhorn
2009-07-30 10:49 ` Mel Gorman
2009-07-29 17:55 ` [PATCH 3/4] hugetlb: derive huge pages nodes allowed from task mempolicy Lee Schermerhorn
2009-07-30 11:15 ` Mel Gorman
2009-07-31 18:49 ` Lee Schermerhorn [this message]
2009-07-29 17:55 ` [PATCH 4/4] hugetlb: update hugetlb documentation for mempolicy based management Lee Schermerhorn
2009-07-30 11:18 ` [PATCH 0/4] hugetlb: V3 constrain allocation/free based on task mempolicy Mel Gorman
2009-07-30 14:07 ` Lee Schermerhorn
2009-07-30 14:15 ` Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1249066195.4674.207.camel@useless.americas.hpqcorp.net \
--to=lee.schermerhorn@hp.com \
--cc=agl@us.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=andi@firstfloor.org \
--cc=apw@canonical.com \
--cc=eric.whitney@hp.com \
--cc=linux-mm@kvack.org \
--cc=linux-numa@vger.kernel.org \
--cc=mel@csn.ul.ie \
--cc=nacc@us.ibm.com \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).