LKML Archive mirror
 help / color / mirror / Atom feed
From: Shiju Jose <shiju.jose@huawei.com>
To: Daniel Ferguson <danielf@os.amperecomputing.com>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"david@redhat.com" <david@redhat.com>,
	"Vilas.Sridharan@amd.com" <Vilas.Sridharan@amd.com>,
	"leo.duran@amd.com" <leo.duran@amd.com>,
	"Yazen.Ghannam@amd.com" <Yazen.Ghannam@amd.com>,
	"rientjes@google.com" <rientjes@google.com>,
	"jiaqiyan@google.com" <jiaqiyan@google.com>,
	"tony.luck@intel.com" <tony.luck@intel.com>,
	"Jon.Grimm@amd.com" <Jon.Grimm@amd.com>,
	"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>,
	"rafael@kernel.org" <rafael@kernel.org>,
	"lenb@kernel.org" <lenb@kernel.org>,
	"naoya.horiguchi@nec.com" <naoya.horiguchi@nec.com>,
	"james.morse@arm.com" <james.morse@arm.com>,
	"jthoughton@google.com" <jthoughton@google.com>,
	"somasundaram.a@hpe.com" <somasundaram.a@hpe.com>,
	"erdemaktas@google.com" <erdemaktas@google.com>,
	"pgonda@google.com" <pgonda@google.com>,
	"duenwen@google.com" <duenwen@google.com>,
	"mike.malvestuto@intel.com" <mike.malvestuto@intel.com>,
	"gthelen@google.com" <gthelen@google.com>,
	"wschwartz@amperecomputing.com" <wschwartz@amperecomputing.com>,
	"dferguson@amperecomputing.com" <dferguson@amperecomputing.com>,
	"wbs@os.amperecomputing.com" <wbs@os.amperecomputing.com>,
	"nifan.cxl@gmail.com" <nifan.cxl@gmail.com>,
	tanxiaofei <tanxiaofei@huawei.com>,
	"Zengtao (B)" <prime.zeng@hisilicon.com>,
	"kangkang.shen@futurewei.com" <kangkang.shen@futurewei.com>,
	wanghuiqiang <wanghuiqiang@huawei.com>,
	Linuxarm <linuxarm@huawei.com>,
	"ira.weiny@intel.com" <ira.weiny@intel.com>,
	"vishal.l.verma@intel.com" <vishal.l.verma@intel.com>,
	"alison.schofield@intel.com" <alison.schofield@intel.com>,
	"dave.jiang@intel.com" <dave.jiang@intel.com>,
	Jonathan Cameron <jonathan.cameron@huawei.com>,
	"dave@stgolabs.net" <dave@stgolabs.net>,
	"dan.j.williams@intel.com" <dan.j.williams@intel.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-acpi@vger.kernel.org" <linux-acpi@vger.kernel.org>,
	"linux-cxl@vger.kernel.org" <linux-cxl@vger.kernel.org>
Subject: RE: [RFC PATCH v8 10/10] ras: scrub: ACPI RAS2: Add memory ACPI RAS2 driver
Date: Fri, 7 Jun 2024 15:46:02 +0000	[thread overview]
Message-ID: <d1986e8e1d8549c588f7488dfd5dd374@huawei.com> (raw)
In-Reply-To: <fcd0621b-dd68-4e0d-96e1-15c16a3278d0@os.amperecomputing.com>

Hi Daniel,

Thanks for the feedback.

>-----Original Message-----
>From: Daniel Ferguson <danielf@os.amperecomputing.com>
>Sent: 05 June 2024 22:33
>To: Shiju Jose <shiju.jose@huawei.com>
>Cc: linux-edac@vger.kernel.org; linux-kernel@vger.kernel.org;
>david@redhat.com; Vilas.Sridharan@amd.com; leo.duran@amd.com;
>Yazen.Ghannam@amd.com; rientjes@google.com; jiaqiyan@google.com;
>tony.luck@intel.com; Jon.Grimm@amd.com; dave.hansen@linux.intel.com;
>rafael@kernel.org; lenb@kernel.org; naoya.horiguchi@nec.com;
>james.morse@arm.com; jthoughton@google.com; somasundaram.a@hpe.com;
>erdemaktas@google.com; pgonda@google.com; duenwen@google.com;
>mike.malvestuto@intel.com; gthelen@google.com;
>wschwartz@amperecomputing.com; dferguson@amperecomputing.com;
>wbs@os.amperecomputing.com; nifan.cxl@gmail.com; tanxiaofei
><tanxiaofei@huawei.com>; Zengtao (B) <prime.zeng@hisilicon.com>;
>kangkang.shen@futurewei.com; wanghuiqiang <wanghuiqiang@huawei.com>;
>Linuxarm <linuxarm@huawei.com>; ira.weiny@intel.com;
>vishal.l.verma@intel.com; alison.schofield@intel.com; dave.jiang@intel.com;
>Jonathan Cameron <jonathan.cameron@huawei.com>; dave@stgolabs.net;
>dan.j.williams@intel.com; linux-mm@kvack.org; linux-acpi@vger.kernel.org;
>linux-cxl@vger.kernel.org
>Subject: Re: [RFC PATCH v8 10/10] ras: scrub: ACPI RAS2: Add memory ACPI
>RAS2 driver
>
>> +/* Context - lock must be held */
>> +static int ras2_get_patrol_scrub_running(struct ras2_scrub_ctx *ras2_ctx,
>> +					 bool *running)
>> +{
>> +	struct acpi_ras2_ps_shared_mem __iomem *ps_sm = (void *)
>> +					ras2_ctx->pcc_subspace-
>>pcc_comm_addr;
>> +	int ret;
>> +
>> +	if (ras2_ctx->bg)
>> +		*running = true;
>> +
>> +	ps_sm->common.set_capabilities[0] =
>RAS2_SUPPORT_HW_PARTOL_SCRUB;
>> +	ps_sm->params.patrol_scrub_command =
>RAS2_GET_PATROL_PARAMETERS;
>
>Need to reset the address range (base and size). A user may have previously
>called "Enable Background" where the code zeros out these parameters.
>	ps_sm->params.requested_address_range[0] = ras2_ctx->base;
>	ps_sm->params.requested_address_range[1] = ras2_ctx->size;
The address range is being set to the above in the ras2_hw_scrub_set_enabled_od(), because they are
valid for on-demand scrubbing only. 

However the ras2_ctx->base and ras2_ctx->size are set to the  
ras2_ctx->base = ps_sm->params.actual_address_range[0];
ras2_ctx->size = ps_sm->params.actual_address_range[1];
in the ras2_update_patrol_scrub_params_cache(), which is called after enabling bg scrub and on-demand scrub. 
Thus ras2_ctx->base and ras2_ctx->size may have a 0 or garbage value for bg scrub because address range is not valid for bg scrubbing as perc ACPI specification. I will add checks to retain the cached address range if bg scrub is enabled. 
>
>
>> +
>> +	ret = ras2_send_pcc_cmd(ras2_ctx, RAS2_PCC_CMD_EXEC);
>> +	if (ret) {
>> +		dev_err(ras2_ctx->dev, "failed to read parameters\n");
>> +		return ret;
>> +	}
>> +
>> +	*running = ps_sm->params.flags &
>> +RAS2_PATROL_SCRUB_FLAG_SCRUBBER_RUNNING;
>> +
>> +	return 0;
>> +}
>> +
>> +static int ras2_hw_scrub_write_rate(struct device *dev, u64 rate) {
>> +	struct ras2_scrub_ctx *ras2_ctx = dev_get_drvdata(dev);
>> +	bool running;
>> +	int ret;
>> +
>> +	guard(mutex)(&ras2_ctx->lock);
>> +	ret = ras2_get_patrol_scrub_running(ras2_ctx, &running);
>> +	if (ret)
>> +		return ret;
>> +
>> +	if (running)
>> +		return -EBUSY;
>
>
>I suggest we do not check if the patrol scrub is running when we are merely
>updating cached values. More importantly, if we had previously wrote an invalid
>value (that is only invalidated by firmware after executing a command), then
>when we try to write a correct value, this "ras2_get_patrol_scrub_running"
>check will always fail, therefore preventing us from correcting our error.

In our opinion, write the rate and range etc, though updating the cached values, should be allowed only when the scrub is NOT running to avoid confusion thinking they are actually set in the running scrubber, when read them back in the userspace.
>
>> +
>> +	if (rate < ras2_ctx->rate_min || rate > ras2_ctx->rate_max)
>> +		return -EINVAL;
>> +
>> +	ras2_ctx->rate = rate;
>> +
>> +	return 0;
>> +}
>> +
>> +static int ras2_hw_scrub_read_rate(struct device *dev, u64 *rate) {
>> +	struct ras2_scrub_ctx *ras2_ctx = dev_get_drvdata(dev);
>> +
>> +	*rate = ras2_ctx->rate;
>> +
>> +	return 0;
>> +}
>> +
>> +static int ras2_hw_scrub_read_rate_avail(struct device *dev, u64
>> +*min, u64 *max) {
>> +	struct ras2_scrub_ctx *ras2_ctx = dev_get_drvdata(dev);
>> +
>> +	*min = ras2_ctx->rate_min;
>> +	*max = ras2_ctx->rate_max;
>> +
>> +	return 0;
>> +}
>> +
>> +static int ras2_hw_scrub_read_range(struct device *dev, u64 *base,
>> +u64 *size) {
>> +	struct ras2_scrub_ctx *ras2_ctx = dev_get_drvdata(dev);
>> +
>> +	*base = ras2_ctx->base;
>> +	*size = ras2_ctx->size;
>> +
>> +	return 0;
>> +}
>> +
>> +static int ras2_hw_scrub_write_range(struct device *dev, u64 base,
>> +u64 size) {
>> +	struct ras2_scrub_ctx *ras2_ctx = dev_get_drvdata(dev);
>> +	bool running;
>> +	int ret;
>> +
>> +	guard(mutex)(&ras2_ctx->lock);
>> +	ret = ras2_get_patrol_scrub_running(ras2_ctx, &running);
>> +	if (ret)
>> +		return ret;
>> +
>> +	if (running)
>> +		return -EBUSY;
>
>I suggest we do not check if the patrol scrub is running. See previous comment
>above.
Same as above.

>
>> +
>> +	ras2_ctx->base = base;
>> +	ras2_ctx->size = size;
>> +
>> +	return 0;
>> +}
>> +
>> +static int ras2_hw_scrub_set_enabled_bg(struct device *dev, bool
>> +enable) {
>> +	struct ras2_scrub_ctx *ras2_ctx = dev_get_drvdata(dev);
>> +	struct acpi_ras2_ps_shared_mem __iomem *ps_sm = (void *)
>> +					ras2_ctx->pcc_subspace-
>>pcc_comm_addr;
>> +	int ret;
>> +
>> +	guard(mutex)(&ras2_ctx->lock);
>> +	ps_sm->common.set_capabilities[0] =
>RAS2_SUPPORT_HW_PARTOL_SCRUB;
>> +	if (enable) {
>> +		ps_sm->params.requested_address_range[0] = 0;
>> +		ps_sm->params.requested_address_range[1] = 0;
>> +		ps_sm->params.scrub_params_in &=
>~RAS2_PATROL_SCRUB_RATE_IN_MASK;
>> +		ps_sm->params.scrub_params_in |=
>FIELD_PREP(RAS2_PATROL_SCRUB_RATE_IN_MASK,
>> +							    ras2_ctx->rate);
>> +		ps_sm->params.patrol_scrub_command =
>RAS2_START_PATROL_SCRUBBER;
>> +	} else {
>> +		ps_sm->params.patrol_scrub_command =
>RAS2_STOP_PATROL_SCRUBBER;
>> +	}
>> +	ps_sm->params.scrub_params_in &=
>~RAS2_PATROL_SCRUB_EN_BACKGROUND;
>> +	ps_sm->params.scrub_params_in |=
>FIELD_PREP(RAS2_PATROL_SCRUB_EN_BACKGROUND,
>> +						    enable);
>> +
>> +	ret = ras2_send_pcc_cmd(ras2_ctx, RAS2_PCC_CMD_EXEC);
>> +	if (ret) {
>> +		dev_err(ras2_ctx->dev, "%s: failed to enable(%d) background
>scrubbing\n",
>> +			__func__, enable);
>> +		return ret;
>> +	}
>> +	ras2_ctx->bg = true;
>> +
>> +	/* Update the cache to account for rounding of supplied parameters and
>similar */
>> +	return ras2_update_patrol_scrub_params_cache(ras2_ctx);
>> +}
>> +
>> +static int ras2_hw_scrub_get_enabled_bg(struct device *dev, bool
>> +*enabled) {
>> +	struct ras2_scrub_ctx *ras2_ctx = dev_get_drvdata(dev);
>> +
>> +	*enabled = ras2_ctx->bg;
>> +
>> +	return 0;
>> +}
>> +
>> +static int ras2_hw_scrub_set_enabled_od(struct device *dev, bool
>> +enable) {
>> +	struct ras2_scrub_ctx *ras2_ctx = dev_get_drvdata(dev);
>> +	struct acpi_ras2_ps_shared_mem __iomem *ps_sm = (void *)
>> +					ras2_ctx->pcc_subspace-
>>pcc_comm_addr;
>> +	bool enabled;
>> +	int ret;
>> +
>> +	guard(mutex)(&ras2_ctx->lock);
>> +	ps_sm->common.set_capabilities[0] =
>RAS2_SUPPORT_HW_PARTOL_SCRUB;
>> +	if (enable) {
>> +		if (!ras2_ctx->size) {
>> +			dev_warn(ras2_ctx->dev,
>> +				 "%s: Invalid requested address range,
>requested_address_range[0]=0x%llx "
>> +				 "requested_address_range[1]=0x%llx\n",
>__func__,
>> +				 ps_sm->params.requested_address_range[0],
>> +				 ps_sm->params.requested_address_range[1]);
>> +			return -ERANGE;
>> +		}
>> +		ret = ras2_get_patrol_scrub_running(ras2_ctx, &enabled);
>> +		if (ret)
>> +			return ret;
>> +
>> +		if (enabled)
>> +			return 0;
>> +
>> +		ps_sm->params.scrub_params_in &=
>~RAS2_PATROL_SCRUB_RATE_IN_MASK;
>> +		ps_sm->params.scrub_params_in |=
>FIELD_PREP(RAS2_PATROL_SCRUB_RATE_IN_MASK,
>> +							    ras2_ctx->rate);
>> +		ps_sm->params.requested_address_range[0] = ras2_ctx->base;
>> +		ps_sm->params.requested_address_range[1] = ras2_ctx->size;
>
>
>We need to clear the RAS2_PATROL_SCRUB_EN_BACKGROUND bit in the input
>parameters.
>This is in case "Enable Background" was previously called, and this bit was set.
>
>		ps_sm->params.scrub_params_in &=
>~RAS2_PATROL_SCRUB_EN_BACKGROUND;
We need to stop background scrub if it is already running before start an on-demand scrubbing. 
The RAS2_PATROL_SCRUB_EN_BACKGROUND bit would be cleared with disable  bg scrub
with the following code
in ras2_hw_scrub_set_enabled_bg() when disable background scrub('enable' is 0 in this case).
ps_sm->params.scrub_params_in &= ~RAS2_PATROL_SCRUB_EN_BACKGROUND;
ps_sm->params.scrub_params_in |= FIELD_PREP(RAS2_PATROL_SCRUB_EN_BACKGROUND,
						    enable);
Hope it make sense?
>
>
>> +		ps_sm->params.patrol_scrub_command =
>RAS2_START_PATROL_SCRUBBER;
>> +	} else {
>> +		ps_sm->params.patrol_scrub_command =
>RAS2_STOP_PATROL_SCRUBBER;
>> +	}
>> +
>> +	ret = ras2_send_pcc_cmd(ras2_ctx, RAS2_PCC_CMD_EXEC);
>> +	if (ret) {
>> +		dev_err(ras2_ctx->dev, "failed to enable(%d) the demand
>scrubbing\n", enable);
>> +		return ret;
>> +	}
>> +	ras2_ctx->bg = false;
>> +
>> +	return ras2_update_patrol_scrub_params_cache(ras2_ctx);
>> +}
>
>
Thanks,
Shiju

  reply	other threads:[~2024-06-07 15:46 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-19 16:47 [RFC PATCH v8 00/10] ras: scrub: introduce subsystem + CXL/ACPI-RAS2 drivers shiju.jose
2024-04-19 16:47 ` [RFC PATCH v8 01/10] ras: scrub: Add scrub subsystem shiju.jose
2024-04-24 20:25   ` fan
2024-04-25 10:38     ` Shiju Jose
2024-04-25 10:15   ` Borislav Petkov
2024-04-25 18:11     ` Shiju Jose
2024-05-06 10:30       ` Borislav Petkov
2024-05-08 16:59         ` Shiju Jose
2024-05-08 17:20           ` Borislav Petkov
2024-05-08 17:44             ` Shiju Jose
2024-05-08 19:25               ` Borislav Petkov
2024-05-09  9:19                 ` Jonathan Cameron
2024-05-09 15:52                   ` Borislav Petkov
2024-05-09 20:03                     ` Borislav Petkov
2024-05-09 21:21                       ` Dan Williams
2024-05-09 21:51                         ` Borislav Petkov
2024-05-09 22:59                           ` Dan Williams
2024-05-10  9:25                             ` Borislav Petkov
2024-05-10 17:13                               ` Dan Williams
2024-05-11 10:17                                 ` Borislav Petkov
2024-05-17 11:15                                   ` Jonathan Cameron
2024-05-17 11:44                                     ` Jonathan Cameron
2024-05-21  8:06                                       ` Borislav Petkov
2024-05-22  9:40                                         ` Jonathan Cameron
2024-05-27  9:09                                           ` Borislav Petkov
2024-05-20 10:54                                   ` Shiju Jose
2024-05-20 11:58                                     ` Jonathan Cameron
2024-05-27  9:21                                       ` Borislav Petkov
2024-05-28  9:06                                         ` Jonathan Cameron
2024-06-06 16:05                                           ` Borislav Petkov
2024-05-10 13:31                     ` Jonathan Cameron
2024-05-09 21:47   ` Dan Williams
2024-05-10  9:03     ` Jonathan Cameron
2024-04-19 16:47 ` [RFC PATCH v8 02/10] cxl/mbox: Add GET_SUPPORTED_FEATURES mailbox command shiju.jose
2024-04-19 16:47 ` [RFC PATCH v8 03/10] cxl/mbox: Add GET_FEATURE " shiju.jose
2024-04-24 23:19   ` fan
2024-04-25 10:38     ` Shiju Jose
2024-04-19 16:47 ` [RFC PATCH v8 04/10] cxl/mbox: Add SET_FEATURE " shiju.jose
2024-04-25 17:26   ` fan
2024-04-19 16:47 ` [RFC PATCH v8 05/10] cxl/memscrub: Add CXL device patrol scrub control feature shiju.jose
2024-04-26 23:56   ` fan
2024-04-29 11:20     ` Shiju Jose
2024-04-29 12:21       ` Jonathan Cameron
2024-05-10  0:26   ` Dan Williams
2024-05-10 11:23     ` Jonathan Cameron
2024-04-19 16:47 ` [RFC PATCH v8 06/10] ACPICA: Add __free() based cleanup function for acpi_put_table shiju.jose
2024-04-19 18:06   ` Jonathan Cameron
2024-04-19 16:47 ` [RFC PATCH v8 07/10] platform: Add __free() based cleanup function for platform_device_put shiju.jose
2024-04-19 16:47 ` [RFC PATCH v8 08/10] ACPI:RAS2: Add ACPI RAS2 driver shiju.jose
2024-06-05 21:32   ` Daniel Ferguson
2024-04-19 16:47 ` [RFC PATCH v8 09/10] ras: scrub: Add scrub control attributes for ACPI RAS2 shiju.jose
2024-04-19 16:47 ` [RFC PATCH v8 10/10] ras: scrub: ACPI RAS2: Add memory ACPI RAS2 driver shiju.jose
2024-06-05 21:33   ` Daniel Ferguson
2024-06-07 15:46     ` Shiju Jose [this message]
2024-06-21 18:06       ` Daniel Ferguson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d1986e8e1d8549c588f7488dfd5dd374@huawei.com \
    --to=shiju.jose@huawei.com \
    --cc=Jon.Grimm@amd.com \
    --cc=Vilas.Sridharan@amd.com \
    --cc=Yazen.Ghannam@amd.com \
    --cc=alison.schofield@intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=danielf@os.amperecomputing.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dave@stgolabs.net \
    --cc=david@redhat.com \
    --cc=dferguson@amperecomputing.com \
    --cc=duenwen@google.com \
    --cc=erdemaktas@google.com \
    --cc=gthelen@google.com \
    --cc=ira.weiny@intel.com \
    --cc=james.morse@arm.com \
    --cc=jiaqiyan@google.com \
    --cc=jonathan.cameron@huawei.com \
    --cc=jthoughton@google.com \
    --cc=kangkang.shen@futurewei.com \
    --cc=lenb@kernel.org \
    --cc=leo.duran@amd.com \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxarm@huawei.com \
    --cc=mike.malvestuto@intel.com \
    --cc=naoya.horiguchi@nec.com \
    --cc=nifan.cxl@gmail.com \
    --cc=pgonda@google.com \
    --cc=prime.zeng@hisilicon.com \
    --cc=rafael@kernel.org \
    --cc=rientjes@google.com \
    --cc=somasundaram.a@hpe.com \
    --cc=tanxiaofei@huawei.com \
    --cc=tony.luck@intel.com \
    --cc=vishal.l.verma@intel.com \
    --cc=wanghuiqiang@huawei.com \
    --cc=wbs@os.amperecomputing.com \
    --cc=wschwartz@amperecomputing.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).