Linux-PM Archive mirror
 help / color / mirror / Atom feed
From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
To: Gavin Shan <gshan@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <peterz@infradead.org>, <linux-pm@vger.kernel.org>,
	<loongarch@lists.linux.dev>, <linux-acpi@vger.kernel.org>,
	<linux-arch@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	<linux-arm-kernel@lists.infradead.org>, <kvmarm@lists.linux.dev>,
	<x86@kernel.org>, Russell King <linux@armlinux.org.uk>,
	"Rafael J . Wysocki" <rafael@kernel.org>,
	Miguel Luis <miguel.luis@oracle.com>,
	James Morse <james.morse@arm.com>,
	Salil Mehta <salil.mehta@huawei.com>,
	"Jean-Philippe Brucker" <jean-philippe@linaro.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>, Marc Zyngier <maz@kernel.org>,
	Hanjun Guo <guohanjun@huawei.com>, Ingo Molnar <mingo@redhat.com>,
	Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>, <linuxarm@huawei.com>,
	<justin.he@arm.com>, <jianyong.wu@arm.com>,
	Lorenzo Pieralisi <lpieralisi@kernel.org>,
	"Sudeep Holla" <sudeep.holla@arm.com>
Subject: Re: [PATCH v8 04/16] ACPI: processor: Move checks and availability of acpi_processor earlier
Date: Tue, 30 Apr 2024 10:28:38 +0100	[thread overview]
Message-ID: <20240430102838.00006e04@Huawei.com> (raw)
In-Reply-To: <80a2e07f-ecb2-48af-b2be-646f17e0e63e@redhat.com>

On Tue, 30 Apr 2024 14:17:24 +1000
Gavin Shan <gshan@redhat.com> wrote:

> On 4/26/24 23:51, Jonathan Cameron wrote:
> > Make the per_cpu(processors, cpu) entries available earlier so that
> > they are available in arch_register_cpu() as ARM64 will need access
> > to the acpi_handle to distinguish between acpi_processor_add()
> > and earlier registration attempts (which will fail as _STA cannot
> > be checked).
> > 
> > Reorder the remove flow to clear this per_cpu() after
> > arch_unregister_cpu() has completed, allowing it to be used in
> > there as well.
> > 
> > Note that on x86 for the CPU hotplug case, the pr->id prior to
> > acpi_map_cpu() may be invalid. Thus the per_cpu() structures
> > must be initialized after that call or after checking the ID
> > is valid (not hotplug path).
> > 
> > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > 
> > ---
> > v8: On buggy bios detection when setting per_cpu structures
> >      do not carry on.
> >      Fix up the clearing of per cpu structures to remove unwanted
> >      side effects and ensure an error code isn't use to reference them.
> > ---
> >   drivers/acpi/acpi_processor.c | 79 +++++++++++++++++++++--------------
> >   1 file changed, 48 insertions(+), 31 deletions(-)
> > 
> > diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> > index ba0a6f0ac841..3b180e21f325 100644
> > --- a/drivers/acpi/acpi_processor.c
> > +++ b/drivers/acpi/acpi_processor.c
> > @@ -183,8 +183,38 @@ static void __init acpi_pcc_cpufreq_init(void) {}
> >   #endif /* CONFIG_X86 */
> >   
> >   /* Initialization */
> > +static DEFINE_PER_CPU(void *, processor_device_array);
> > +
> > +static bool acpi_processor_set_per_cpu(struct acpi_processor *pr,
> > +				       struct acpi_device *device)
> > +{
> > +	BUG_ON(pr->id >= nr_cpu_ids);  
> 
> One blank line after BUG_ON() if we need to follow original implementation.

Sure unintentional - I'll put that back.

> 
> > +	/*
> > +	 * Buggy BIOS check.
> > +	 * ACPI id of processors can be reported wrongly by the BIOS.
> > +	 * Don't trust it blindly
> > +	 */
> > +	if (per_cpu(processor_device_array, pr->id) != NULL &&
> > +	    per_cpu(processor_device_array, pr->id) != device) {
> > +		dev_warn(&device->dev,
> > +			 "BIOS reported wrong ACPI id %d for the processor\n",
> > +			 pr->id);
> > +		/* Give up, but do not abort the namespace scan. */  
> 
> It depends on how the return value is handled by the caller if the namespace
> is continued to be scanned. The caller can be acpi_processor_hotadd_init()
> and acpi_processor_get_info() after this patch is applied. So I think this
> specific comment need to be moved to the caller.

Good point. This gets messy and was an unintended change.

Previously the options were:
1) acpi_processor_get_info() failed for other reasons - this code was never called.
2) acpi_processor_get_info() succeeded without acpi_processor_hotadd_init (non hotplug)
   this code then ran and would paper over the problem doing a bunch of cleanup under err.
3) acpi_processor_get_info() succeeded with acpi_processor_hotadd_init called.
   This code then ran and would paper over the problem doing a bunch of cleanup under err.

We should maintain that or argue cleanly against it.

This isn't helped the the fact I have no idea which cases we care about for that bios
bug handling.  Do any of those bios's ever do hotplug?  Guess we have to try and maintain
whatever protection this was offering.

Also, the original code leaks data in some paths and I have limited idea
of whether it is intentional or not. So to tidy the issue up that you've identified
I'll need to try and make that code consistent first.

I suspect the only way to do that is going to be to duplicate the allocations we
'want' to leak to deal with the bios bug detection.

For example acpi_processor_get_info() failing leaks pr and pr->throttling.shared_cpu_map
before this series. After this series we need pr to leak because it's used for the detection
via processor_device_array.

I'll work through this but it's going to be tricky to tell if we get right.
Step 1 will be closing the existing leaks and then we will have something
consistent to build on.

> 
> Besides, it seems acpi_processor_set_per_cpu() isn't properly called and
> memory leakage can happen. More details are given below.
> 
> > +		return false;
> > +	}
> > +	/*
> > +	 * processor_device_array is not cleared on errors to allow buggy BIOS
> > +	 * checks.
> > +	 */
> > +	per_cpu(processor_device_array, pr->id) = device;
> > +	per_cpu(processors, pr->id) = pr;
> > +
> > +	return true;
> > +}
> > +
> >   #ifdef CONFIG_ACPI_HOTPLUG_CPU
> > -static int acpi_processor_hotadd_init(struct acpi_processor *pr)
> > +static int acpi_processor_hotadd_init(struct acpi_processor *pr,
> > +				      struct acpi_device *device)
> >   {
> >   	int ret;
> >   
> > @@ -198,8 +228,15 @@ static int acpi_processor_hotadd_init(struct acpi_processor *pr)
> >   	if (ret)
> >   		goto out;
> >   
> > +	if (!acpi_processor_set_per_cpu(pr, device)) {
> > +		acpi_unmap_cpu(pr->id);
> > +		goto out;
> > +	}
> > +  
> 
> With the 'goto out', zero is returned from acpi_processor_hotadd_init() to acpi_processor_get_info().
> The zero return value is carried from acpi_map_cpu() in acpi_processor_hotadd_init(). If I'm correct,
> we need return errno from acpi_processor_get_info() to acpi_processor_add() so that cleanup can be
> done. For example, the cleanup corresponding to the 'err' tag can be done in acpi_processor_add().
> Otherwise, we will have memory leakage.
> 
> >   	ret = arch_register_cpu(pr->id);
> >   	if (ret) {
> > +		/* Leave the processor device array in place to detect buggy bios */
> > +		per_cpu(processors, pr->id) = NULL;
> >   		acpi_unmap_cpu(pr->id);
> >   		goto out;
> >   	}
> > @@ -217,7 +254,8 @@ static int acpi_processor_hotadd_init(struct acpi_processor *pr)
> >   	return ret;
> >   }
> >   #else
> > -static inline int acpi_processor_hotadd_init(struct acpi_processor *pr)
> > +static inline int acpi_processor_hotadd_init(struct acpi_processor *pr,
> > +					     struct acpi_device *device)
> >   {
> >   	return -ENODEV;
> >   }
> > @@ -316,10 +354,13 @@ static int acpi_processor_get_info(struct acpi_device *device)
> >   	 *  because cpuid <-> apicid mapping is persistent now.
> >   	 */
> >   	if (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id)) {
> > -		int ret = acpi_processor_hotadd_init(pr);
> > +		int ret = acpi_processor_hotadd_init(pr, device);
> >   
> >   		if (ret)
> >   			return ret;
> > +	} else {
> > +		if (!acpi_processor_set_per_cpu(pr, device))
> > +			return 0;
> >   	}
> >     
> 
> For non-hotplug case, we still need pass the error to acpi_processor_add() so that
> cleanup corresponding 'err' tag can be done. Otherwise, we will have memory leakage.
> 
> >   	/*
> > @@ -365,8 +406,6 @@ static int acpi_processor_get_info(struct acpi_device *device)
> >    * (cpu_data(cpu)) values, like CPU feature flags, family, model, etc.
> >    * Such things have to be put in and set up by the processor driver's .probe().
> >    */
> > -static DEFINE_PER_CPU(void *, processor_device_array);
> > -
> >   static int acpi_processor_add(struct acpi_device *device,
> >   					const struct acpi_device_id *id)
> >   {
> > @@ -395,28 +434,6 @@ static int acpi_processor_add(struct acpi_device *device,
> >   	if (result) /* Processor is not physically present or unavailable */
> >   		return 0;
> >   
> > -	BUG_ON(pr->id >= nr_cpu_ids);
> > -
> > -	/*
> > -	 * Buggy BIOS check.
> > -	 * ACPI id of processors can be reported wrongly by the BIOS.
> > -	 * Don't trust it blindly
> > -	 */
> > -	if (per_cpu(processor_device_array, pr->id) != NULL &&
> > -	    per_cpu(processor_device_array, pr->id) != device) {
> > -		dev_warn(&device->dev,
> > -			"BIOS reported wrong ACPI id %d for the processor\n",
> > -			pr->id);
> > -		/* Give up, but do not abort the namespace scan. */
> > -		goto err;
> > -	}
> > -	/*
> > -	 * processor_device_array is not cleared on errors to allow buggy BIOS
> > -	 * checks.
> > -	 */
> > -	per_cpu(processor_device_array, pr->id) = device;
> > -	per_cpu(processors, pr->id) = pr;
> > -
> >   	dev = get_cpu_device(pr->id);
> >   	if (!dev) {
> >   		result = -ENODEV;
> > @@ -469,10 +486,6 @@ static void acpi_processor_remove(struct acpi_device *device)
> >   	device_release_driver(pr->dev);
> >   	acpi_unbind_one(pr->dev);
> >   
> > -	/* Clean up. */
> > -	per_cpu(processor_device_array, pr->id) = NULL;
> > -	per_cpu(processors, pr->id) = NULL;
> > -
> >   	cpu_maps_update_begin();
> >   	cpus_write_lock();
> >   
> > @@ -480,6 +493,10 @@ static void acpi_processor_remove(struct acpi_device *device)
> >   	arch_unregister_cpu(pr->id);
> >   	acpi_unmap_cpu(pr->id);
> >   
> > +	/* Clean up. */
> > +	per_cpu(processor_device_array, pr->id) = NULL;
> > +	per_cpu(processors, pr->id) = NULL;
> > +
> >   	cpus_write_unlock();
> >   	cpu_maps_update_done();
> >     
> 
> Thanks,
> Gavin
> 


  reply	other threads:[~2024-04-30  9:28 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-26 13:51 [PATCH v8 00/16] ACPI/arm64: add support for virtual cpu hotplug Jonathan Cameron
2024-04-26 13:51 ` [PATCH v8 01/16] ACPI: processor: Simplify initial onlining to use same path for cold and hotplug Jonathan Cameron
2024-04-26 16:05   ` Miguel Luis
2024-04-26 17:21     ` Miguel Luis
2024-04-26 17:49       ` Jonathan Cameron
2024-04-26 17:57         ` Rafael J. Wysocki
2024-04-26 18:09         ` Jonathan Cameron
2024-04-26 13:51 ` [PATCH v8 02/16] cpu: Do not warn on arch_register_cpu() returning -EPROBE_DEFER Jonathan Cameron
2024-04-26 13:51 ` [PATCH v8 03/16] ACPI: processor: Drop duplicated check on _STA (enabled + present) Jonathan Cameron
2024-04-26 13:51 ` [PATCH v8 04/16] ACPI: processor: Move checks and availability of acpi_processor earlier Jonathan Cameron
2024-04-30  4:17   ` Gavin Shan
2024-04-30  9:28     ` Jonathan Cameron [this message]
2024-04-30 10:12       ` Rafael J. Wysocki
2024-04-30 10:13       ` Jonathan Cameron
2024-04-30 10:17         ` Rafael J. Wysocki
2024-04-30 10:45           ` Jonathan Cameron
2024-04-30 10:47             ` Rafael J. Wysocki
2024-04-30 13:42         ` Jonathan Cameron
2024-04-26 13:51 ` [PATCH v8 05/16] ACPI: processor: Add acpi_get_processor_handle() helper Jonathan Cameron
2024-04-30  4:26   ` Gavin Shan
2024-04-30 11:07     ` Jonathan Cameron
2024-04-26 13:51 ` [PATCH v8 06/16] ACPI: processor: Register deferred CPUs from acpi_processor_get_info() Jonathan Cameron
2024-04-26 13:51 ` [PATCH v8 07/16] ACPI: scan: switch to flags for acpi_scan_check_and_detach() Jonathan Cameron
2024-04-26 13:51 ` [PATCH v8 08/16] ACPI: Add post_eject to struct acpi_scan_handler for cpu hotplug Jonathan Cameron
2024-04-26 13:51 ` [PATCH v8 09/16] arm64: acpi: Move get_cpu_for_acpi_id() to a header Jonathan Cameron
2024-04-30 16:37   ` Lorenzo Pieralisi
2024-04-26 13:51 ` [PATCH v8 10/16] irqchip/gic-v3: Don't return errors from gic_acpi_match_gicc() Jonathan Cameron
2024-04-26 15:14   ` Marc Zyngier
2024-04-26 13:51 ` [PATCH v8 11/16] irqchip/gic-v3: Add support for ACPI's disabled but 'online capable' CPUs Jonathan Cameron
2024-04-26 16:26   ` Marc Zyngier
2024-04-26 18:28     ` Jonathan Cameron
2024-04-28 11:28       ` Marc Zyngier
2024-04-29  9:21         ` Jonathan Cameron
2024-04-30 12:15           ` Jonathan Cameron
2024-04-26 13:51 ` [PATCH v8 12/16] arm64: psci: Ignore DENIED CPUs Jonathan Cameron
2024-04-30  4:29   ` Gavin Shan
2024-04-26 13:51 ` [PATCH v8 13/16] arm64: arch_register_cpu() variant to check if an ACPI handle is now available Jonathan Cameron
2024-04-30  4:31   ` Gavin Shan
2024-04-26 13:51 ` [PATCH v8 14/16] arm64: Kconfig: Enable hotplug CPU on arm64 if ACPI_PROCESSOR is enabled Jonathan Cameron
2024-04-26 13:51 ` [PATCH v8 15/16] arm64: document virtual CPU hotplug's expectations Jonathan Cameron
2024-04-26 13:51 ` [PATCH v8 16/16] cpumask: Add enabled cpumask for present CPUs that can be brought online Jonathan Cameron

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240430102838.00006e04@Huawei.com \
    --to=jonathan.cameron@huawei.com \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=gshan@redhat.com \
    --cc=guohanjun@huawei.com \
    --cc=james.morse@arm.com \
    --cc=jean-philippe@linaro.org \
    --cc=jianyong.wu@arm.com \
    --cc=justin.he@arm.com \
    --cc=kvmarm@lists.linux.dev \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=linux@armlinux.org.uk \
    --cc=linuxarm@huawei.com \
    --cc=loongarch@lists.linux.dev \
    --cc=lpieralisi@kernel.org \
    --cc=maz@kernel.org \
    --cc=miguel.luis@oracle.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rafael@kernel.org \
    --cc=salil.mehta@huawei.com \
    --cc=sudeep.holla@arm.com \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).