All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Justin He <Justin.He@arm.com>
To: David Hildenbrand <david@redhat.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Vishal Verma <vishal.l.verma@intel.com>,
	Dave Jiang <dave.jiang@intel.com>
Cc: "nvdimm@lists.linux.dev" <nvdimm@lists.linux.dev>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	nd <nd@arm.com>
Subject: RE: [PATCH] device-dax: use fallback nid when numa_node is invalid
Date: Thu, 29 Jul 2021 00:20:38 +0000	[thread overview]
Message-ID: <AM6PR08MB437663A6F8ABE7FCBC22B4E0F7EB9@AM6PR08MB4376.eurprd08.prod.outlook.com> (raw)
In-Reply-To: <fc31c6ab-d147-10c0-7678-d820bc8ec96e@redhat.com>

Hi David

> -----Original Message-----
> From: David Hildenbrand <david@redhat.com>
> Sent: Thursday, July 29, 2021 4:17 AM
> To: Justin He <Justin.He@arm.com>; Dan Williams <dan.j.williams@intel.com>;
> Vishal Verma <vishal.l.verma@intel.com>; Dave Jiang <dave.jiang@intel.com>
> Cc: nvdimm@lists.linux.dev; linux-kernel@vger.kernel.org; nd <nd@arm.com>
> Subject: Re: [PATCH] device-dax: use fallback nid when numa_node is invalid
> 
> On 28.07.21 10:22, Jia He wrote:
> > Previously, numa_off was set unconditionally in dummy_numa_init()
> > even with a fake numa node. Then ACPI set node id as NUMA_NO_NODE(-1)
> > after acpi_map_pxm_to_node() because it regards numa_off as turning
> > off the numa node. Hence dev_dax->target_node is NUMA_NO_NODE on
> > arm64 with fake numa.
> >
> > Without this patch, pmem can't be probed as a RAM device on arm64 if
> > SRAT table isn't present:
> >    $ndctl create-namespace -fe namespace0.0 --mode=devdax --map=dev -s 1g
> -a 64K
> >    kmem dax0.0: rejecting DAX region [mem 0x240400000-0x2bfffffff] with
> invalid node: -1
> >    kmem: probe of dax0.0 failed with error -22
> >
> > This fixes it by using fallback memory_add_physaddr_to_nid() as nid.
> >
> > Suggested-by: David Hildenbrand <david@redhat.com>
> > Signed-off-by: Jia He <justin.he@arm.com>
> > ---
> >   drivers/dax/kmem.c | 36 ++++++++++++++++++++----------------
> >   1 file changed, 20 insertions(+), 16 deletions(-)
> >
> > diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
> > index ac231cc36359..749674909e51 100644
> > --- a/drivers/dax/kmem.c
> > +++ b/drivers/dax/kmem.c
> > @@ -46,20 +46,7 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
> >   	struct dax_kmem_data *data;
> >   	int rc = -ENOMEM;
> >   	int i, mapped = 0;
> > -	int numa_node;
> > -
> > -	/*
> > -	 * Ensure good NUMA information for the persistent memory.
> > -	 * Without this check, there is a risk that slow memory
> > -	 * could be mixed in a node with faster memory, causing
> > -	 * unavoidable performance issues.
> > -	 */
> > -	numa_node = dev_dax->target_node;
> > -	if (numa_node < 0) {
> > -		dev_warn(dev, "rejecting DAX region with invalid node: %d\n",
> > -				numa_node);
> > -		return -EINVAL;
> > -	}
> > +	int numa_node = dev_dax->target_node, new_node;
> >
> >   	data = kzalloc(struct_size(data, res, dev_dax->nr_range),
> GFP_KERNEL);
> >   	if (!data)
> > @@ -104,6 +91,20 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
> >   		 */
> >   		res->flags = IORESOURCE_SYSTEM_RAM;
> >
> > +		/*
> > +		 * Ensure good NUMA information for the persistent memory.
> > +		 * Without this check, there is a risk but not fatal that slow
> > +		 * memory could be mixed in a node with faster memory, causing
> > +		 * unavoidable performance issues. Furthermore, fallback node
> > +		 * id can be used when numa_node is invalid.
> > +		 */
> > +		if (numa_node < 0) {
> > +			new_node = memory_add_physaddr_to_nid(range.start);
> > +			dev_info(dev, "changing nid from %d to %d for DAX
> region %pR\n",
> > +				numa_node, new_node, res);
> > +			numa_node = new_node;
> > +		}
> > +
> >   		/*
> >   		 * Ensure that future kexec'd kernels will not treat
> >   		 * this as RAM automatically.
> > @@ -141,6 +142,7 @@ static void dev_dax_kmem_remove(struct dev_dax
> *dev_dax)
> >   	int i, success = 0;
> >   	struct device *dev = &dev_dax->dev;
> >   	struct dax_kmem_data *data = dev_get_drvdata(dev);
> > +	int numa_node = dev_dax->target_node;
> >
> >   	/*
> >   	 * We have one shot for removing memory, if some memory blocks were
> not
> > @@ -156,8 +158,10 @@ static void dev_dax_kmem_remove(struct dev_dax
> *dev_dax)
> >   		if (rc)
> >   			continue;
> >
> > -		rc = remove_memory(dev_dax->target_node, range.start,
> > -				range_len(&range));
> > +		if (numa_node < 0)
> > +			numa_node = memory_add_physaddr_to_nid(range.start);
> > +
> > +		rc = remove_memory(numa_node, range.start, range_len(&range));
> >   		if (rc == 0) {
> >   			release_resource(data->res[i]);
> >   			kfree(data->res[i]);
> >
> 
> Note that this patch conflicts with:
> 
> https://lkml.kernel.org/r/20210723125210.29987-7-david@redhat.com
> 
> But nothing fundamental. Determining a single NID is similar to how I'm
> handling it for ACPI:
> 
> https://lkml.kernel.org/r/20210723125210.29987-6-david@redhat.com
> 

Okay, got it. Thanks for the reminder.
Seems my patch is not useful after your patch.


--
Cheers,
Justin (Jia He)



  reply	other threads:[~2021-07-29  0:21 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-28  8:22 [PATCH 0/1] fix pmem RAM device when nid is NUMA_NO_NODE Jia He
2021-07-28  8:22 ` [PATCH] device-dax: use fallback nid when numa_node is invalid Jia He
2021-07-28 20:17   ` David Hildenbrand
2021-07-29  0:20     ` Justin He [this message]
2021-07-29  7:59       ` David Hildenbrand
2021-07-29 14:44         ` Justin He
2021-08-02 15:47           ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AM6PR08MB437663A6F8ABE7FCBC22B4E0F7EB9@AM6PR08MB4376.eurprd08.prod.outlook.com \
    --to=justin.he@arm.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nd@arm.com \
    --cc=nvdimm@lists.linux.dev \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.