LKML Archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v8 00/10] ras: scrub: introduce subsystem + CXL/ACPI-RAS2 drivers
@ 2024-04-19 16:47 shiju.jose
  2024-04-19 16:47 ` [RFC PATCH v8 01/10] ras: scrub: Add scrub subsystem shiju.jose
                   ` (9 more replies)
  0 siblings, 10 replies; 22+ messages in thread
From: shiju.jose @ 2024-04-19 16:47 UTC (permalink / raw)
  To: linux-cxl, linux-acpi, linux-mm, dan.j.williams, dave,
	jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
	ira.weiny
  Cc: linux-edac, linux-kernel, david, Vilas.Sridharan, leo.duran,
	Yazen.Ghannam, rientjes, jiaqiyan, tony.luck, Jon.Grimm,
	dave.hansen, rafael, lenb, naoya.horiguchi, james.morse,
	jthoughton, somasundaram.a, erdemaktas, pgonda, duenwen,
	mike.malvestuto, gthelen, wschwartz, dferguson, wbs, nifan.cxl,
	yazen.ghannam, tanxiaofei, prime.zeng, kangkang.shen,
	wanghuiqiang, linuxarm, shiju.jose

From: Shiju Jose <shiju.jose@huawei.com>

Memory Scrub
============

Increasing DRAM size and cost has made memory subsystem reliability
an important concern. These modules are used where potentially
corrupted data could cause expensive or fatal issues. Memory errors are
one of the top hardware failures that cause server and workload crashes. 

Memory scrub is a feature where an ECC engine reads data from
each memory media location, corrects with an ECC if necessary and
writes the corrected data back to the same memory media location.

The memory DIMMs could be scrubbed at a configurable rate to detect
uncorrected memory errors and attempts to recover from detected memory
errors providing the following benefits.
- Proactively scrubbing memory DIMMs reduces the chance of a correctable
  error becoming uncorrectable.
- Once detected, uncorrected errors caught in unallocated memory pages are
  isolated and prevented from being allocated to an application or the OS.
- The probability of software/hardware products encountering memory
  errors is reduced.
Some details of background can be found in Reference [5].

There are 2 types of memory scrubbing,
1. Background (patrol) scrubbing of the RAM whilest the RAM is otherwise
   idle.
2. On-demand scrubbing for a specific address range/region of memory. 

There are several types of interfaces to HW memory scrubbers
identified such as ACPI NVDIMM ARS(Address Range Scrub), CXL memory
device patrol scrub, CXL DDR5 ECS, ACPI RAS2 memory scrubbing.

The scrub control varies between different memory scrubbers. To allow
for standard userspace tooling there is a need to present these controls
with a standard ABI.
 
Introduce generic memory scrub subsystem which allows user to
control underlying scrubbers in the system via generic sysfs scrub
control interface.

Use case of scrub control feature
=================================
1. Scrub controls in user space allow the user space tool to disable
and enable the feature in case disabling of the scrubbing and changing
the scrub rate are needed for other purposes such as performance-aware
operations which requires the background operations to be turned off or
reduced.
2. Also allows to perform on-demand scrubbing for specific address range
if supported by the scrubber.

Comparison of scrubbing features
================================
 ................................................................
 .              .   ACPI    . CXL patrol.  CXL ECS  .  ARS      .
 .  Name        .   RAS2    . scrub     .           .           .
 ................................................................
 .              .           .           .           .           .
 . On-demand    . Supported . No        . No        . Supported .
 . Scrubbing    .           .           .           .           .
 .              .           .           .           .           .  
 ................................................................
 .              .           .           .           .           .
 . Background   . Supported . Supported . Supported . No        .
 . scrubbing    .           .           .           .           .
 .              .           .           .           .           .
 ................................................................
 .              .           .           .           .           .
 . Mode of      . Scrub ctrl. per device. per memory.  Unknown  .
 . scrubbing    . per NUMA  .           . media     .           .
 .              . domain.   .           .           .           .
 ................................................................
 .              .           .           .           .           . 
 . Query scrub  . Supported . Supported . Supported . Supported .       
 . capabilities .           .           .           .           .
 .              .           .           .           .           .
 ................................................................
 .              .           .           .           .           . 
 . Setting      . Supported . No        . No        . Supported .       
 . address range.           .           .           .           .
 .              .           .           .           .           .
 ................................................................
 .              .           .           .           .           . 
 . Setting      . Supported . Supported . No        . No        .       
 . scrub rate   .           .           .           .           .
 .              .           .           .           .           .
 ................................................................
 .              .           .           .           .           . 
 . Unit for     . Not       . in hours  . No        . No        .       
 . scrub rate   . Defined   .           .           .           .
 .              .           .           .           .           .
 ................................................................
 .              . Supported .           .           .           .
 . Scrub        . on-demand . No        . No        . Supported .
 . status/      . scrubbing .           .           .           .
 . Completion   . only      .           .           .           .
 ................................................................
 . UC error     .           .CXL general.CXL general. ACPI UCE  .
 . reporting    . Exception .media/DRAM .media/DRAM . notify and.
 .              .           .event/media.event/media. query     .
 .              .           .scan?      .scan?      . ARS status.
 ................................................................
 .              .           .           .           .           .      
 . Clear UC     .  No       . No        .  No       . Supported .
 . error        .           .           .           .           .
 .              .           .           .           .           .  
 ................................................................
 .              .           .           .           .           .
 . Translate    . No        . No        . No        . Supported .
 . *(1)SPA to   .           .           .           .           .
 . *(2)DPA      .           .           .           .           .  
 ................................................................
 .              .           .           .           .           .
 . Error inject . No        . Can inject. No        . Supported .
 .              .           . poison for.           .           .
 .              .           . CXL       .           .           .  
 ................................................................
*(1) - SPA - System Physical Address. See section 9.19.7.8
       Function Index 5 - Translate SPA of ACPI spec r6.5.  
*(2) - DPA - Device Physical Address. See section 9.19.7.8
       Function Index 5 - Translate SPA of ACPI spec r6.5.  

CXL Scrubbing features
======================
Add support for control CXL patrol scrubber and ACPI RAS2 HW based memory
patrol scrubber and register with the scrub subsystem to expose the scrub
controls to the userspace tool.

CXL spec r3.1 section 8.2.9.9.11.1 describes the memory device patrol scrub
control feature. The device patrol scrub proactively locates and makes
corrections to errors in regular cycle. The patrol scrub control allows the
request to configure patrol scrubber's input configurations.

The patrol scrub control allows the requester to specify the number of
hours in which the patrol scrub cycles must be completed, provided that
the requested number is not less than the minimum number of hours for the
patrol scrub cycle that the device is capable of. In addition, the patrol
scrub controls allow the host to disable and enable the feature in case
disabling of the feature is needed for other purposes such as
performance-aware operations which require the background operations to be
turned off.

ACPI RAS2 Hardware-based Memory Scrubbing
=========================================
ACPI spec 6.5 section 5.2.21 ACPI RAS2 describes ACPI RAS2 table
provides interfaces for platform RAS features and supports independent
RAS controls and capabilities for a given RAS feature for multiple
instances of the same component in a given system.
Memory RAS features apply to RAS capabilities, controls and operations
that are specific to memory. RAS2 PCC sub-spaces for memory-specific RAS
features have a Feature Type of 0x00 (Memory).

The platform can use the hardware-based memory scrubbing feature to expose
controls and capabilities associated with hardware-based memory scrub
engines. The RAS2 memory scrubbing feature supports following as per spec,
 - Independent memory scrubbing controls for each NUMA domain, identified
   using its proximity domain.
   Note: However AmpereComputing has single entry repeated as they have
         centralized controls.
 - Provision for background (patrol) scrubbing of the entire memory system,
   as well as on-demand scrubbing for a specific region of memory.

ACPI Address Range Scrubbing(ARS)
================================
ARS allows the platform to communicate memory errors to system software.
This capability allows system software to prevent accesses to addresses
with uncorrectable errors in memory. ARS functions manage all NVDIMMs
present in the system. Only one scrub can be in progress system wide
at any given time.
Following functions are supported as per the specification.
1. Query ARS Capabilities for a given address range, indicates platform
   supports the ACPI NVDIMM Root Device Unconsumed Error Notification.
2. Start ARS triggers an Address Range Scrub for the given memory range.
   Address scrubbing can be done for volatile memory, persistent memory,
   or both.
3. Query ARS Status command allows software to get the status of ARS,  
   including the progress of ARS and ARS error record.
4. Clear Uncorrectable Error.
5. Translate SPA
6. ARS Error Inject etc.
iNote: Support for ARS is not added in this series because to reduce the
line of code for review and could be added after initial code is merged. 

Series adds,
1. Scrub subsystem driver supports configuring memory scrubbers
   in the system.
2. Support for CXL feature mailbox commands, which is used by
   CXL device scrubbing features. 
3. CXL device scrub driver supporting patrol scrub control and
   register with scrub subsystem.
5. ACPI RAS2 driver adds OS interface for RAS2 communication through
   PCC mailbox and extracts ACPI RAS2 feature table (RAS2) and
   create platform device for the RAS memory features, which binds
   to the memory ACPI RAS2 driver.
7. Memory ACPI RAS2 driver gets the PCC subspace for communicating
   with the ACPI compliant platform supports ACPI RAS2. Add callback
   functions and registers with scrub subsystem to support user to
   control the HW patrol scrubbers exposed to the kernel via the
   ACPI RAS2 table.

Note: The RAS2 feature has dependency on the patch
      "ACPICA: ACPI 6.5: Add support for RAS2 table" which Rafael has queued. 
https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/commit/?h=bleeding-edge&id=9726d821f88e284ecd998b76ae5f2174721cd9dc

The QEMU series to support the CXL specific scrub features is
available here,
https://lore.kernel.org/qemu-devel/20240223085902.1549-1-shiju.jose@huawei.com/

Open Questions based on feedbacks from the community:
1. Leo: Standardize unit for scrub rate, for example ACPI RAS2 does not define
   unit for the scrub rate. RAS2 clarification needed. 
2. Jonathan:
   May the scrub class to be rename as RASCTL or something like that to allow
   for wider control options etc that open compute RAS API supports?  
3. Jonathan: Any need for discoverability of capability to scan different regions,
   such as global PA space to the userspace. Left as future extension.
4. Jiaqi:
   - STOP_PATROL_SCRUBBER from RAS2 must be blocked and, must not be exposed to
     OS/userspace. Stopping patrol scrubber is unacceptable for platform where
     OEM has enabled patrol scrubber, because the patrol scrubber is a key part
     of logging and is repurposed for other RAS actions.
   If the OEM does not want to expose this control, they should lock it down so the
   interface is not exposed to the OS. These features are optional afterall.
   - "Requested Address Range"/"Actual Address Range" (region to scrub) is a
      similarly bad thing to expose in RAS2.
   If the OEM does not want to expose this, they should lock it down so the
   interface is not exposed to the OS. These features are optional afterall.
5. Shiju: How to determine initial status(background scrub / stopped etc).
   
References:
1. ACPI spec r6.5 section 5.2.21 ACPI RAS2.
2. ACPI spec r6.5 section 9.19.7.2 ARS.
3. CXL spec  r3.1 8.2.9.9.11.1 Device patrol scrub control feature
4. CXL spec  r3.1 8.2.9.9.11.2 DDR5 ECS feature
5. Background information about kernel support for memory scan, memory
   error detection and ACPI RASF.
   https://lore.kernel.org/all/20221103155029.2451105-1-jiaqiyan@google.com/
6. Discussions on RASF:
   https://lore.kernel.org/lkml/20230915172818.761-1-shiju.jose@huawei.com/#r 

Changes
=======
v7 -> v8:
1. Add more detailed cover letter and add info for basic analysis
   of ACPI ARS for comment from Dan Williams.
2. Changed file name etc from ras2 to acpi_ras2 in memory ACPI RAS2
   driver for comment from Boris.
3. Add documents for usage for comment from Jonathan.
4. Changed logic in memory/acpi_ras2.c for enable background
   scrubbing to allow setting the scrub rate.
5. Merged memory/acpi_ras2_common.c with memory/acpi_ras2.c and
   obselete code, suggested by Jonathan.  
6. Initial optimizations and cleanup especially in the memory/acpi_ras2.
7. Removed CXL ECS support for time being. 
8. Removed support for region based scrub control from the scrub
   subsytem, which was needed for the CXL ECS, can be added later
   if required.
9. Fixed the format of few comments and a definition in CXL feature
    code for the feedbacks from Fan.
11. Jonathan done several optimizations, interface changes and
    cleanups all over the code.
12. Fixes for feedbacks from Daniel Ferguson(Amperecomputing)
    for RAS2.
13.  Workaround for a RAS2 case of only one actual controller as
     reported by Daniel Ferguson(AmpereComputing) in their hardware.
14. Feedback from Yazen, move the common scrub and ras2 changes
    under /drivers/ras/.
15. Drop patch ACPICA: ACPI 6.5: Add support for RAS2 table because 
    Rafael queued the patch.
    https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/commit/?h=bleeding-edge&id=9726d821f88e284ecd998b76ae5f2174721cd9dc
 
v6 -> v7:
1. Main changes for comments from Jonathan, Thanks.
1.1. CXL
 - Changes for deal with small mail box and supporting multipart
   feature data transfers.
 - Provide more specific parameters to mbox supported/get/set features
   interface functions.
 - kvmalloc -> kmalloc in CXL scrub mem allocation for feature commands.
 - Changed the way using __free(kfree)
 - Removed readback and verify for setting CXL scrub patrol and ECS
   parameters. Could be added later if needed.
 - In is_visible() callback functions for scrub control sysfs attrs
   changed to writeback the default attribute mode value instead of
   setting per attrs.
 - Add documentation for sysfs interfaces for CXL ECS scrub control. 
1.2. RAS2
 - In rasf common code, rename rasf to ras2 because RASF seems obselete.
 - Replace pr_* with dev_* log function calls from ACPI RAS2 and
   memory RAS2 drivers.
 - In rasf common code, rename rasf to ras2.
 - Removed including unnecessary .h file from memory RAS2 driver.
 - In is_visible() callback functions for scrub control sysfs attrs
   changed to writeback the default attribute mode value instead of
   setting per attribute.

2. Changes for comments from Fan, Thanks.
 - Add debug message if cxl patrol scrub and ecs init function
   calls fail.
3. Updated cover letter for feedback from Dan Williams. 
   
v5 -> v6:
1. Changes for comments from Davidlohr, Thanks.
 - Update CXL feature code based on spec 3.1.
 - attrb -> attr
 - Use enums with default counting.  
2. Rebased to the latest kernel.

v4 -> v5:
1. Following are the main changes made based on the feedback from Dan Williams on v4.
1.1. In the scrub subsystem the common scrub control attributes are statically defined
     instead of dynamically created.
1.2. Add scrub subsystem support externally defined attribute group.
     Add CXL ECS driver define ECS specific attribute group and pass to
	 the scrub subsystem.
1.3. Move cxl_mem_ecs_init() to cxl/core/region.c so that the CXL region_id
     is used in the registration with the scrub subsystem. 	 
1.4. Add previously posted RASF common and RAS2 patches to this scrub series.
	 
2. Add support for the 'enable_background_scrub' attribute
   for RAS2, on request from Bill Schwartz(wschwartz@amperecomputing.com).

v3 -> v4:
1. Fixes for the warnings/errors reported by kernel test robot.
2. Add support for reading the 'enable' attribute of CXL patrol scrub.

Changes
v2 -> v3:
1. Changes for comments from Davidlohr, Thanks.
 - Updated cxl scrub kconfig
 - removed usage of the flag is_support_feature from
   the function cxl_mem_get_supported_feature_entry().
 - corrected spelling error.
 - removed unnecessary debug message.
 - removed export feature commands to the userspace.
2. Possible fix for the warnings/errors reported by kernel
   test robot.
3. Add documentation for the common scrub configure atrributes.

v1 -> v2:
1. Changes for comments from Dave Jiang, Thanks.
 - Split patches.
 - reversed xmas tree declarations.
 - declared flags as enums.
 - removed few unnecessary variable initializations.
 - replaced PTR_ERR_OR_ZERO() with IS_ERR() and PTR_ERR().
 - add auto clean declarations.
 - replaced while loop with for loop.
 - Removed allocation from cxl_get_supported_features() and
   cxl_get_feature() and make change to take allocated memory
   pointer from the caller.
 - replaced if/else with switch case.
 - replaced sprintf() with sysfs_emit() in 2 places.
 - replaced goto label with return in few functions.
2. removed unused code for supported attributes from ecs.
3. Included following common patch for scrub configure driver
   to this series.
   "memory: scrub: Add scrub driver supports configuring memory scrubbers
    in the system"


Jonathan Cameron (2):
  ACPICA: Add __free() based cleanup function for acpi_put_table
  platform: Add __free() based cleanup function for platform_device_put

Shiju Jose (8):
  ras: scrub: Add scrub subsystem
  cxl/mbox: Add GET_SUPPORTED_FEATURES mailbox command
  cxl/mbox: Add GET_FEATURE mailbox command
  cxl/mbox: Add SET_FEATURE mailbox command
  cxl/memscrub: Add CXL device patrol scrub control feature
  ACPI:RAS2: Add ACPI RAS2 driver
  ras: scrub: Add scrub control attributes for ACPI RAS2
  ras: scrub: ACPI RAS2: Add memory ACPI RAS2 driver

 .../ABI/testing/sysfs-class-scrub-configure   |  71 ++++
 Documentation/scrub/scrub-configure.rst       |  85 ++++
 drivers/acpi/Kconfig                          |  10 +
 drivers/acpi/Makefile                         |   1 +
 drivers/acpi/ras2.c                           | 366 ++++++++++++++++
 drivers/cxl/Kconfig                           |  19 +
 drivers/cxl/core/Makefile                     |   1 +
 drivers/cxl/core/mbox.c                       | 153 +++++++
 drivers/cxl/core/memscrub.c                   | 314 ++++++++++++++
 drivers/cxl/cxlmem.h                          | 130 ++++++
 drivers/cxl/mem.c                             |   6 +
 drivers/ras/Kconfig                           |  17 +
 drivers/ras/Makefile                          |   2 +
 drivers/ras/acpi_ras2.c                       | 358 ++++++++++++++++
 drivers/ras/memory_scrub.c                    | 402 ++++++++++++++++++
 include/acpi/acpixf.h                         |   2 +
 include/acpi/ras2_acpi.h                      |  59 +++
 include/linux/memory_scrub.h                  |  45 ++
 include/linux/platform_device.h               |   1 +
 19 files changed, 2042 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-class-scrub-configure
 create mode 100644 Documentation/scrub/scrub-configure.rst
 create mode 100755 drivers/acpi/ras2.c
 create mode 100644 drivers/cxl/core/memscrub.c
 create mode 100644 drivers/ras/acpi_ras2.c
 create mode 100755 drivers/ras/memory_scrub.c
 create mode 100644 include/acpi/ras2_acpi.h
 create mode 100755 include/linux/memory_scrub.h

-- 
2.34.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [RFC PATCH v8 01/10] ras: scrub: Add scrub subsystem
  2024-04-19 16:47 [RFC PATCH v8 00/10] ras: scrub: introduce subsystem + CXL/ACPI-RAS2 drivers shiju.jose
@ 2024-04-19 16:47 ` shiju.jose
  2024-04-24 20:25   ` fan
  2024-04-25 10:15   ` Borislav Petkov
  2024-04-19 16:47 ` [RFC PATCH v8 02/10] cxl/mbox: Add GET_SUPPORTED_FEATURES mailbox command shiju.jose
                   ` (8 subsequent siblings)
  9 siblings, 2 replies; 22+ messages in thread
From: shiju.jose @ 2024-04-19 16:47 UTC (permalink / raw)
  To: linux-cxl, linux-acpi, linux-mm, dan.j.williams, dave,
	jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
	ira.weiny
  Cc: linux-edac, linux-kernel, david, Vilas.Sridharan, leo.duran,
	Yazen.Ghannam, rientjes, jiaqiyan, tony.luck, Jon.Grimm,
	dave.hansen, rafael, lenb, naoya.horiguchi, james.morse,
	jthoughton, somasundaram.a, erdemaktas, pgonda, duenwen,
	mike.malvestuto, gthelen, wschwartz, dferguson, wbs, nifan.cxl,
	yazen.ghannam, tanxiaofei, prime.zeng, kangkang.shen,
	wanghuiqiang, linuxarm, shiju.jose

From: Shiju Jose <shiju.jose@huawei.com>

Add scrub subsystem supports configuring the memory scrubbers
in the system. The scrub subsystem provides the interface for
registering the scrub devices. The scrub control attributes
are provided to the user in /sys/class/ras/rasX/scrub

Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
 .../ABI/testing/sysfs-class-scrub-configure   |  47 +++
 drivers/ras/Kconfig                           |   7 +
 drivers/ras/Makefile                          |   1 +
 drivers/ras/memory_scrub.c                    | 271 ++++++++++++++++++
 include/linux/memory_scrub.h                  |  37 +++
 5 files changed, 363 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-class-scrub-configure
 create mode 100755 drivers/ras/memory_scrub.c
 create mode 100755 include/linux/memory_scrub.h

diff --git a/Documentation/ABI/testing/sysfs-class-scrub-configure b/Documentation/ABI/testing/sysfs-class-scrub-configure
new file mode 100644
index 000000000000..3ed77dbb00ad
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-class-scrub-configure
@@ -0,0 +1,47 @@
+What:		/sys/class/ras/
+Date:		March 2024
+KernelVersion:	6.9
+Contact:	linux-kernel@vger.kernel.org
+Description:
+		The ras/ class subdirectory belongs to the
+		common ras features such as scrub subsystem.
+
+What:		/sys/class/ras/rasX/scrub/
+Date:		March 2024
+KernelVersion:	6.9
+Contact:	linux-kernel@vger.kernel.org
+Description:
+		The /sys/class/ras/ras{0,1,2,3,...}/scrub directories
+		correspond to each scrub device registered with the
+		scrub subsystem.
+
+What:		/sys/class/ras/rasX/scrub/name
+Date:		March 2024
+KernelVersion:	6.9
+Contact:	linux-kernel@vger.kernel.org
+Description:
+		(RO) name of the memory scrubber
+
+What:		/sys/class/ras/rasX/scrub/enable_background
+Date:		March 2024
+KernelVersion:	6.9
+Contact:	linux-kernel@vger.kernel.org
+Description:
+		(RW) Enable/Disable background(patrol) scrubbing if supported.
+
+What:		/sys/class/ras/rasX/scrub/rate_available
+Date:		March 2024
+KernelVersion:	6.9
+Contact:	linux-kernel@vger.kernel.org
+Description:
+		(RO) Supported range for the scrub rate by the scrubber.
+		The scrub rate represents in hours.
+
+What:		/sys/class/ras/rasX/scrub/rate
+Date:		March 2024
+KernelVersion:	6.9
+Contact:	linux-kernel@vger.kernel.org
+Description:
+		(RW) The scrub rate specified and it must be with in the
+		supported range by the scrubber.
+		The scrub rate represents in hours.
diff --git a/drivers/ras/Kconfig b/drivers/ras/Kconfig
index fc4f4bb94a4c..181701479564 100644
--- a/drivers/ras/Kconfig
+++ b/drivers/ras/Kconfig
@@ -46,4 +46,11 @@ config RAS_FMPM
 	  Memory will be retired during boot time and run time depending on
 	  platform-specific policies.
 
+config SCRUB
+	tristate "Memory scrub driver"
+	help
+	  This option selects the memory scrub subsystem, supports
+	  configuring the parameters of underlying scrubbers in the
+	  system for the DRAM memories.
+
 endif
diff --git a/drivers/ras/Makefile b/drivers/ras/Makefile
index 11f95d59d397..89bcf0d84355 100644
--- a/drivers/ras/Makefile
+++ b/drivers/ras/Makefile
@@ -2,6 +2,7 @@
 obj-$(CONFIG_RAS)	+= ras.o
 obj-$(CONFIG_DEBUG_FS)	+= debugfs.o
 obj-$(CONFIG_RAS_CEC)	+= cec.o
+obj-$(CONFIG_SCRUB)	+= memory_scrub.o
 
 obj-$(CONFIG_RAS_FMPM)	+= amd/fmpm.o
 obj-y			+= amd/atl/
diff --git a/drivers/ras/memory_scrub.c b/drivers/ras/memory_scrub.c
new file mode 100755
index 000000000000..7e995380ec3a
--- /dev/null
+++ b/drivers/ras/memory_scrub.c
@@ -0,0 +1,271 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Memory scrub subsystem supports configuring the registered
+ * memory scrubbers.
+ *
+ * Copyright (c) 2024 HiSilicon Limited.
+ */
+
+#define pr_fmt(fmt)     "MEM SCRUB: " fmt
+
+#include <linux/acpi.h>
+#include <linux/bitops.h>
+#include <linux/delay.h>
+#include <linux/kfifo.h>
+#include <linux/memory_scrub.h>
+#include <linux/platform_device.h>
+#include <linux/spinlock.h>
+
+/* memory scrubber config definitions */
+#define SCRUB_ID_PREFIX "ras"
+#define SCRUB_ID_FORMAT SCRUB_ID_PREFIX "%d"
+
+static DEFINE_IDA(scrub_ida);
+
+struct scrub_device {
+	int id;
+	struct device dev;
+	const struct scrub_ops *ops;
+};
+
+#define to_scrub_device(d) container_of(d, struct scrub_device, dev)
+static ssize_t enable_background_store(struct device *dev,
+				       struct device_attribute *attr,
+				       const char *buf, size_t len)
+{
+	struct scrub_device *scrub_dev = to_scrub_device(dev);
+	bool enable;
+	int ret;
+
+	ret = kstrtobool(buf, &enable);
+	if (ret < 0)
+		return ret;
+
+	ret = scrub_dev->ops->set_enabled_bg(dev, enable);
+	if (ret)
+		return ret;
+
+	return len;
+}
+
+static ssize_t enable_background_show(struct device *dev,
+				      struct device_attribute *attr, char *buf)
+{
+	struct scrub_device *scrub_dev = to_scrub_device(dev);
+	bool enable;
+	int ret;
+
+	ret = scrub_dev->ops->get_enabled_bg(dev, &enable);
+	if (ret)
+		return ret;
+
+	return sysfs_emit(buf, "%d\n", enable);
+}
+
+static ssize_t name_show(struct device *dev,
+			 struct device_attribute *attr, char *buf)
+{
+	struct scrub_device *scrub_dev = to_scrub_device(dev);
+	int ret;
+
+	ret = scrub_dev->ops->get_name(dev, buf);
+	if (ret)
+		return ret;
+
+	return strlen(buf);
+}
+
+static ssize_t rate_show(struct device *dev, struct device_attribute *attr,
+			 char *buf)
+{
+	struct scrub_device *scrub_dev = to_scrub_device(dev);
+	u64 val;
+	int ret;
+
+	ret = scrub_dev->ops->rate_read(dev, &val);
+	if (ret)
+		return ret;
+
+	return sysfs_emit(buf, "0x%llx\n", val);
+}
+
+static ssize_t rate_store(struct device *dev, struct device_attribute *attr,
+			  const char *buf, size_t len)
+{
+	struct scrub_device *scrub_dev = to_scrub_device(dev);
+	long val;
+	int ret;
+
+	ret = kstrtol(buf, 10, &val);
+	if (ret < 0)
+		return ret;
+
+	ret = scrub_dev->ops->rate_write(dev, val);
+	if (ret)
+		return ret;
+
+	return len;
+}
+
+static ssize_t rate_available_show(struct device *dev,
+				   struct device_attribute *attr,
+				   char *buf)
+{
+	struct scrub_device *scrub_dev = to_scrub_device(dev);
+	u64 min_sr, max_sr;
+	int ret;
+
+	ret = scrub_dev->ops->rate_avail_range(dev, &min_sr, &max_sr);
+	if (ret)
+		return ret;
+
+	return sysfs_emit(buf, "0x%llx-0x%llx\n", min_sr, max_sr);
+}
+
+DEVICE_ATTR_RW(enable_background);
+DEVICE_ATTR_RO(name);
+DEVICE_ATTR_RW(rate);
+DEVICE_ATTR_RO(rate_available);
+
+static struct attribute *scrub_attrs[] = {
+	&dev_attr_enable_background.attr,
+	&dev_attr_name.attr,
+	&dev_attr_rate.attr,
+	&dev_attr_rate_available.attr,
+	NULL
+};
+
+static umode_t scrub_attr_visible(struct kobject *kobj,
+				  struct attribute *a, int attr_id)
+{
+	struct device *dev = kobj_to_dev(kobj);
+	struct scrub_device *scrub_dev = to_scrub_device(dev);
+	const struct scrub_ops *ops = scrub_dev->ops;
+
+	if (a == &dev_attr_enable_background.attr) {
+		if (ops->set_enabled_bg && ops->get_enabled_bg)
+			return a->mode;
+		if (ops->get_enabled_bg)
+			return 0444;
+		return 0;
+	}
+	if (a == &dev_attr_name.attr)
+		return ops->get_name ? a->mode : 0;
+	if (a == &dev_attr_rate_available.attr)
+		return ops->rate_avail_range ? a->mode : 0;
+	if (a == &dev_attr_rate.attr) { /* Write only makes little sense */
+		if (ops->rate_read && ops->rate_write)
+			return a->mode;
+		if (ops->rate_read)
+			return 0444;
+		return 0;
+	}
+
+	return 0;
+}
+
+static const struct attribute_group scrub_attr_group = {
+	.name		= "scrub",
+	.attrs		= scrub_attrs,
+	.is_visible	= scrub_attr_visible,
+};
+
+static const struct attribute_group *scrub_attr_groups[] = {
+	&scrub_attr_group,
+	NULL
+};
+
+static void scrub_dev_release(struct device *dev)
+{
+	struct scrub_device *scrub_dev = to_scrub_device(dev);
+
+	ida_free(&scrub_ida, scrub_dev->id);
+	kfree(scrub_dev);
+}
+
+static struct class scrub_class = {
+	.name = "ras",
+	.dev_groups = scrub_attr_groups,
+	.dev_release = scrub_dev_release,
+};
+
+static struct device *
+scrub_device_register(struct device *parent, void *drvdata,
+		      const struct scrub_ops *ops)
+{
+	struct scrub_device *scrub_dev;
+	struct device *hdev;
+	int err;
+
+	scrub_dev = kzalloc(sizeof(*scrub_dev), GFP_KERNEL);
+	if (!scrub_dev)
+		return ERR_PTR(-ENOMEM);
+	hdev = &scrub_dev->dev;
+
+	scrub_dev->id = ida_alloc(&scrub_ida, GFP_KERNEL);
+	if (scrub_dev->id < 0) {
+		kfree(scrub_dev);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	scrub_dev->ops = ops;
+	hdev->class = &scrub_class;
+	hdev->parent = parent;
+	dev_set_drvdata(hdev, drvdata);
+	dev_set_name(hdev, SCRUB_ID_FORMAT, scrub_dev->id);
+	err = device_register(hdev);
+	if (err) {
+		put_device(hdev);
+		return ERR_PTR(err);
+	}
+
+	return hdev;
+}
+
+static void devm_scrub_release(void *dev)
+{
+	device_unregister(dev);
+}
+
+/**
+ * devm_scrub_device_register - register scrubber device
+ * @dev: the parent device
+ * @drvdata: driver data to attach to the scrub device
+ * @ops: pointer to scrub_ops structure (optional)
+ *
+ * Returns the pointer to the new device on success, ERR_PTR() otherwise.
+ * The new device would be automatically unregistered with the parent device.
+ */
+struct device *
+devm_scrub_device_register(struct device *dev, void *drvdata,
+			   const struct scrub_ops *ops)
+{
+	struct device *hdev;
+	int ret;
+
+	if (!dev)
+		return ERR_PTR(-EINVAL);
+
+	hdev = scrub_device_register(dev, drvdata, ops);
+	if (IS_ERR(hdev))
+		return hdev;
+
+	ret = devm_add_action_or_reset(dev, devm_scrub_release, hdev);
+	if (ret)
+		return ERR_PTR(ret);
+
+	return hdev;
+}
+EXPORT_SYMBOL_GPL(devm_scrub_device_register);
+
+static int __init memory_scrub_control_init(void)
+{
+	return class_register(&scrub_class);
+}
+subsys_initcall(memory_scrub_control_init);
+
+static void memory_scrub_control_exit(void)
+{
+	class_unregister(&scrub_class);
+}
+module_exit(memory_scrub_control_exit);
diff --git a/include/linux/memory_scrub.h b/include/linux/memory_scrub.h
new file mode 100755
index 000000000000..f0e1657a5072
--- /dev/null
+++ b/include/linux/memory_scrub.h
@@ -0,0 +1,37 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Memory scrub subsystem driver supports controlling
+ * the memory scrubbers in the system.
+ *
+ * Copyright (c) 2024 HiSilicon Limited.
+ */
+
+#ifndef __MEMORY_SCRUB_H
+#define __MEMORY_SCRUB_H
+
+#include <linux/types.h>
+
+struct device;
+
+/**
+ * struct scrub_ops - scrub device operations (all elements optional)
+ * @get_enabled_bg: check if currently performing background scrub.
+ * @set_enabled_bg: start or stop a bg-scrub.
+ * @get_name: get the memory scrubber name.
+ * @rate_avail_range: retrieve limits on supported rates.
+ * @rate_read: read the scrub rate
+ * @rate_write: set the scrub rate
+ */
+struct scrub_ops {
+	int (*get_enabled_bg)(struct device *dev, bool *enable);
+	int (*set_enabled_bg)(struct device *dev, bool enable);
+	int (*get_name)(struct device *dev, char *buf);
+	int (*rate_avail_range)(struct device *dev, u64 *min, u64 *max);
+	int (*rate_read)(struct device *dev, u64 *rate);
+	int (*rate_write)(struct device *dev, u64 rate);
+};
+
+struct device *
+devm_scrub_device_register(struct device *dev, void *drvdata,
+			   const struct scrub_ops *ops);
+#endif /* __MEMORY_SCRUB_H */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH v8 02/10] cxl/mbox: Add GET_SUPPORTED_FEATURES mailbox command
  2024-04-19 16:47 [RFC PATCH v8 00/10] ras: scrub: introduce subsystem + CXL/ACPI-RAS2 drivers shiju.jose
  2024-04-19 16:47 ` [RFC PATCH v8 01/10] ras: scrub: Add scrub subsystem shiju.jose
@ 2024-04-19 16:47 ` shiju.jose
  2024-04-19 16:47 ` [RFC PATCH v8 03/10] cxl/mbox: Add GET_FEATURE " shiju.jose
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: shiju.jose @ 2024-04-19 16:47 UTC (permalink / raw)
  To: linux-cxl, linux-acpi, linux-mm, dan.j.williams, dave,
	jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
	ira.weiny
  Cc: linux-edac, linux-kernel, david, Vilas.Sridharan, leo.duran,
	Yazen.Ghannam, rientjes, jiaqiyan, tony.luck, Jon.Grimm,
	dave.hansen, rafael, lenb, naoya.horiguchi, james.morse,
	jthoughton, somasundaram.a, erdemaktas, pgonda, duenwen,
	mike.malvestuto, gthelen, wschwartz, dferguson, wbs, nifan.cxl,
	yazen.ghannam, tanxiaofei, prime.zeng, kangkang.shen,
	wanghuiqiang, linuxarm, shiju.jose

From: Shiju Jose <shiju.jose@huawei.com>

Add support for GET_SUPPORTED_FEATURES mailbox command.

CXL spec 3.1 section 8.2.9.6 describes optional device specific features.
CXL devices supports features with changeable attributes.
Get Supported Features retrieves the list of supported device specific
features. The settings of a feature can be retrieved using Get Feature
and optionally modified using Set Feature.

Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
 drivers/cxl/core/mbox.c | 27 ++++++++++++++++++
 drivers/cxl/cxlmem.h    | 61 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 88 insertions(+)

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index f0f54aeccc87..82e279b821e2 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -1291,6 +1291,33 @@ int cxl_set_timestamp(struct cxl_memdev_state *mds)
 }
 EXPORT_SYMBOL_NS_GPL(cxl_set_timestamp, CXL);
 
+int cxl_get_supported_features(struct cxl_memdev_state *mds,
+			       u32 count, u16 start_index,
+			       struct cxl_mbox_get_supp_feats_out *feats_out)
+{
+	struct cxl_mbox_get_supp_feats_in pi;
+	struct cxl_mbox_cmd mbox_cmd;
+	int rc;
+
+	pi.count = cpu_to_le32(count);
+	pi.start_index = cpu_to_le16(start_index);
+
+	mbox_cmd = (struct cxl_mbox_cmd) {
+		.opcode = CXL_MBOX_OP_GET_SUPPORTED_FEATURES,
+		.size_in = sizeof(pi),
+		.payload_in = &pi,
+		.size_out = count,
+		.payload_out = feats_out,
+		.min_out = sizeof(*feats_out),
+	};
+	rc = cxl_internal_send_cmd(mds, &mbox_cmd);
+	if (rc < 0)
+		return rc;
+
+	return 0;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_get_supported_features, CXL);
+
 int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
 		       struct cxl_region *cxlr)
 {
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 36cee9c30ceb..06231e63373e 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -527,6 +527,7 @@ enum cxl_opcode {
 	CXL_MBOX_OP_SET_TIMESTAMP	= 0x0301,
 	CXL_MBOX_OP_GET_SUPPORTED_LOGS	= 0x0400,
 	CXL_MBOX_OP_GET_LOG		= 0x0401,
+	CXL_MBOX_OP_GET_SUPPORTED_FEATURES	= 0x0500,
 	CXL_MBOX_OP_IDENTIFY		= 0x4000,
 	CXL_MBOX_OP_GET_PARTITION_INFO	= 0x4100,
 	CXL_MBOX_OP_SET_PARTITION_INFO	= 0x4101,
@@ -696,6 +697,63 @@ struct cxl_mbox_set_timestamp_in {
 
 } __packed;
 
+/*
+ * Get Supported Features CXL 3.1 Spec 8.2.9.6.1
+ */
+
+/*
+ * Get Supported Features input payload
+ * CXL rev 3.1 section 8.2.9.6.1 Table 8-95
+ */
+struct cxl_mbox_get_supp_feats_in {
+	__le32 count;
+	__le16 start_index;
+	u8 rsvd[2];
+} __packed;
+
+/*
+ * Get Supported Features Supported Feature Entry
+ * CXL rev 3.1 section 8.2.9.6.1 Table 8-97
+ */
+/* Supported Feature Entry : Payload out attribute flags */
+#define CXL_FEAT_ENTRY_FLAG_CHANGABLE	BIT(0)
+#define CXL_FEAT_ENTRY_FLAG_DEEPEST_RESET_PERSISTENCE_MASK	GENMASK(3, 1)
+#define CXL_FEAT_ENTRY_FLAG_PERSIST_ACROSS_FIRMWARE_UPDATE	BIT(4)
+#define CXL_FEAT_ENTRY_FLAG_SUPPORT_DEFAULT_SELECTION	BIT(5)
+#define CXL_FEAT_ENTRY_FLAG_SUPPORT_SAVED_SELECTION	BIT(6)
+
+enum cxl_feat_attr_value_persistence {
+	CXL_FEAT_ATTR_VALUE_PERSISTENCE_NONE,
+	CXL_FEAT_ATTR_VALUE_PERSISTENCE_CXL_RESET,
+	CXL_FEAT_ATTR_VALUE_PERSISTENCE_HOT_RESET,
+	CXL_FEAT_ATTR_VALUE_PERSISTENCE_WARM_RESET,
+	CXL_FEAT_ATTR_VALUE_PERSISTENCE_COLD_RESET,
+	CXL_FEAT_ATTR_VALUE_PERSISTENCE_MAX
+};
+
+struct cxl_mbox_supp_feat_entry {
+	uuid_t uuid;
+	__le16 index;
+	__le16 get_size;
+	__le16 set_size;
+	__le32 attr_flags;
+	u8 get_version;
+	u8 set_version;
+	__le16 set_effects;
+	u8 rsvd[18];
+}  __packed;
+
+/*
+ * Get Supported Features output payload
+ * CXL rev 3.1 section 8.2.9.6.1 Table 8-96
+ */
+struct cxl_mbox_get_supp_feats_out {
+	__le16 nr_entries;
+	__le16 nr_supported;
+	u8 rsvd[4];
+	struct cxl_mbox_supp_feat_entry feat_entries[];
+} __packed;
+
 /* Get Poison List  CXL 3.0 Spec 8.2.9.8.4.1 */
 struct cxl_mbox_poison_in {
 	__le64 offset;
@@ -827,6 +885,9 @@ void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
 			    enum cxl_event_type event_type,
 			    const uuid_t *uuid, union cxl_event *evt);
 int cxl_set_timestamp(struct cxl_memdev_state *mds);
+int cxl_get_supported_features(struct cxl_memdev_state *mds,
+			       u32 count, u16 start_index,
+			       struct cxl_mbox_get_supp_feats_out *feats_out);
 int cxl_poison_state_init(struct cxl_memdev_state *mds);
 int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
 		       struct cxl_region *cxlr);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH v8 03/10] cxl/mbox: Add GET_FEATURE mailbox command
  2024-04-19 16:47 [RFC PATCH v8 00/10] ras: scrub: introduce subsystem + CXL/ACPI-RAS2 drivers shiju.jose
  2024-04-19 16:47 ` [RFC PATCH v8 01/10] ras: scrub: Add scrub subsystem shiju.jose
  2024-04-19 16:47 ` [RFC PATCH v8 02/10] cxl/mbox: Add GET_SUPPORTED_FEATURES mailbox command shiju.jose
@ 2024-04-19 16:47 ` shiju.jose
  2024-04-24 23:19   ` fan
  2024-04-19 16:47 ` [RFC PATCH v8 04/10] cxl/mbox: Add SET_FEATURE " shiju.jose
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 22+ messages in thread
From: shiju.jose @ 2024-04-19 16:47 UTC (permalink / raw)
  To: linux-cxl, linux-acpi, linux-mm, dan.j.williams, dave,
	jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
	ira.weiny
  Cc: linux-edac, linux-kernel, david, Vilas.Sridharan, leo.duran,
	Yazen.Ghannam, rientjes, jiaqiyan, tony.luck, Jon.Grimm,
	dave.hansen, rafael, lenb, naoya.horiguchi, james.morse,
	jthoughton, somasundaram.a, erdemaktas, pgonda, duenwen,
	mike.malvestuto, gthelen, wschwartz, dferguson, wbs, nifan.cxl,
	yazen.ghannam, tanxiaofei, prime.zeng, kangkang.shen,
	wanghuiqiang, linuxarm, shiju.jose

From: Shiju Jose <shiju.jose@huawei.com>

Add support for GET_FEATURE mailbox command.

CXL spec 3.1 section 8.2.9.6 describes optional device specific features.
The settings of a feature can be retrieved using Get Feature command.

Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
 drivers/cxl/core/mbox.c | 53 +++++++++++++++++++++++++++++++++++++++++
 drivers/cxl/cxlmem.h    | 28 ++++++++++++++++++++++
 2 files changed, 81 insertions(+)

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 82e279b821e2..999965871048 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -1318,6 +1318,59 @@ int cxl_get_supported_features(struct cxl_memdev_state *mds,
 }
 EXPORT_SYMBOL_NS_GPL(cxl_get_supported_features, CXL);
 
+size_t cxl_get_feature(struct cxl_memdev_state *mds,
+		       const uuid_t feat_uuid, void *feat_out,
+		       size_t feat_out_size,
+		       size_t feat_out_min_size,
+		       enum cxl_get_feat_selection selection)
+{
+	struct cxl_dev_state *cxlds = &mds->cxlds;
+	struct cxl_mbox_get_feat_in pi;
+	struct cxl_mbox_cmd mbox_cmd;
+	size_t data_rcvd_size = 0;
+	size_t data_to_rd_size, size_out;
+	int rc;
+
+	if (feat_out_size < feat_out_min_size) {
+		dev_err(cxlds->dev,
+			"%s: feature out buffer size(%lu) is not big enough\n",
+			__func__, feat_out_size);
+		return 0;
+	}
+
+	if (feat_out_size <= mds->payload_size)
+		size_out = feat_out_size;
+	else
+		size_out = mds->payload_size;
+	pi.uuid = feat_uuid;
+	pi.selection = selection;
+	do {
+		if ((feat_out_min_size - data_rcvd_size) <= mds->payload_size)
+			data_to_rd_size = feat_out_min_size - data_rcvd_size;
+		else
+			data_to_rd_size = mds->payload_size;
+
+		pi.offset = cpu_to_le16(data_rcvd_size);
+		pi.count = cpu_to_le16(data_to_rd_size);
+
+		mbox_cmd = (struct cxl_mbox_cmd) {
+			.opcode = CXL_MBOX_OP_GET_FEATURE,
+			.size_in = sizeof(pi),
+			.payload_in = &pi,
+			.size_out = size_out,
+			.payload_out = feat_out + data_rcvd_size,
+			.min_out = data_to_rd_size,
+		};
+		rc = cxl_internal_send_cmd(mds, &mbox_cmd);
+		if (rc < 0 || mbox_cmd.size_out == 0)
+			return 0;
+		data_rcvd_size += mbox_cmd.size_out;
+	} while (data_rcvd_size < feat_out_min_size);
+
+	return data_rcvd_size;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_get_feature, CXL);
+
 int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
 		       struct cxl_region *cxlr)
 {
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 06231e63373e..c822eb30e6d1 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -528,6 +528,7 @@ enum cxl_opcode {
 	CXL_MBOX_OP_GET_SUPPORTED_LOGS	= 0x0400,
 	CXL_MBOX_OP_GET_LOG		= 0x0401,
 	CXL_MBOX_OP_GET_SUPPORTED_FEATURES	= 0x0500,
+	CXL_MBOX_OP_GET_FEATURE		= 0x0501,
 	CXL_MBOX_OP_IDENTIFY		= 0x4000,
 	CXL_MBOX_OP_GET_PARTITION_INFO	= 0x4100,
 	CXL_MBOX_OP_SET_PARTITION_INFO	= 0x4101,
@@ -754,6 +755,28 @@ struct cxl_mbox_get_supp_feats_out {
 	struct cxl_mbox_supp_feat_entry feat_entries[];
 } __packed;
 
+/*
+ * Get Feature CXL 3.1 Spec 8.2.9.6.2
+ */
+
+/*
+ * Get Feature input payload
+ * CXL rev 3.1 section 8.2.9.6.2 Table 8-99
+ */
+enum cxl_get_feat_selection {
+	CXL_GET_FEAT_SEL_CURRENT_VALUE,
+	CXL_GET_FEAT_SEL_DEFAULT_VALUE,
+	CXL_GET_FEAT_SEL_SAVED_VALUE,
+	CXL_GET_FEAT_SEL_MAX
+};
+
+struct cxl_mbox_get_feat_in {
+	uuid_t uuid;
+	__le16 offset;
+	__le16 count;
+	u8 selection;
+}  __packed;
+
 /* Get Poison List  CXL 3.0 Spec 8.2.9.8.4.1 */
 struct cxl_mbox_poison_in {
 	__le64 offset;
@@ -888,6 +911,11 @@ int cxl_set_timestamp(struct cxl_memdev_state *mds);
 int cxl_get_supported_features(struct cxl_memdev_state *mds,
 			       u32 count, u16 start_index,
 			       struct cxl_mbox_get_supp_feats_out *feats_out);
+size_t cxl_get_feature(struct cxl_memdev_state *mds,
+		       const uuid_t feat_uuid, void *feat_out,
+		       size_t feat_out_size,
+		       size_t feat_out_min_size,
+		       enum cxl_get_feat_selection selection);
 int cxl_poison_state_init(struct cxl_memdev_state *mds);
 int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
 		       struct cxl_region *cxlr);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH v8 04/10] cxl/mbox: Add SET_FEATURE mailbox command
  2024-04-19 16:47 [RFC PATCH v8 00/10] ras: scrub: introduce subsystem + CXL/ACPI-RAS2 drivers shiju.jose
                   ` (2 preceding siblings ...)
  2024-04-19 16:47 ` [RFC PATCH v8 03/10] cxl/mbox: Add GET_FEATURE " shiju.jose
@ 2024-04-19 16:47 ` shiju.jose
  2024-04-25 17:26   ` fan
  2024-04-19 16:47 ` [RFC PATCH v8 05/10] cxl/memscrub: Add CXL device patrol scrub control feature shiju.jose
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 22+ messages in thread
From: shiju.jose @ 2024-04-19 16:47 UTC (permalink / raw)
  To: linux-cxl, linux-acpi, linux-mm, dan.j.williams, dave,
	jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
	ira.weiny
  Cc: linux-edac, linux-kernel, david, Vilas.Sridharan, leo.duran,
	Yazen.Ghannam, rientjes, jiaqiyan, tony.luck, Jon.Grimm,
	dave.hansen, rafael, lenb, naoya.horiguchi, james.morse,
	jthoughton, somasundaram.a, erdemaktas, pgonda, duenwen,
	mike.malvestuto, gthelen, wschwartz, dferguson, wbs, nifan.cxl,
	yazen.ghannam, tanxiaofei, prime.zeng, kangkang.shen,
	wanghuiqiang, linuxarm, shiju.jose

From: Shiju Jose <shiju.jose@huawei.com>

Add support for SET_FEATURE mailbox command.

CXL spec 3.1 section 8.2.9.6 describes optional device specific features.
CXL devices supports features with changeable attributes.
The settings of a feature can be optionally modified using Set Feature
command.

Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
 drivers/cxl/core/mbox.c | 73 +++++++++++++++++++++++++++++++++++++++++
 drivers/cxl/cxlmem.h    | 33 +++++++++++++++++++
 2 files changed, 106 insertions(+)

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 999965871048..4ca1238e8fec 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -1371,6 +1371,79 @@ size_t cxl_get_feature(struct cxl_memdev_state *mds,
 }
 EXPORT_SYMBOL_NS_GPL(cxl_get_feature, CXL);
 
+/*
+ * FEAT_DATA_MIN_PAYLOAD_SIZE - min extra number of bytes should be
+ * available in the mailbox for storing the actual feature data so that
+ * the feature data transfer would work as expected.
+ */
+#define FEAT_DATA_MIN_PAYLOAD_SIZE 10
+int cxl_set_feature(struct cxl_memdev_state *mds,
+		    const uuid_t feat_uuid, u8 feat_version,
+		    void *feat_data, size_t feat_data_size,
+		    u8 feat_flag)
+{
+	struct cxl_memdev_set_feat_pi {
+		struct cxl_mbox_set_feat_hdr hdr;
+		u8 feat_data[];
+	}  __packed;
+	size_t data_in_size, data_sent_size = 0;
+	struct cxl_mbox_cmd mbox_cmd;
+	size_t hdr_size;
+	int rc = 0;
+
+	struct cxl_memdev_set_feat_pi *pi __free(kfree) =
+					kmalloc(mds->payload_size, GFP_KERNEL);
+	pi->hdr.uuid = feat_uuid;
+	pi->hdr.version = feat_version;
+	feat_flag &= ~CXL_SET_FEAT_FLAG_DATA_TRANSFER_MASK;
+	hdr_size = sizeof(pi->hdr);
+	/*
+	 * Check minimum mbox payload size is available for
+	 * the feature data transfer.
+	 */
+	if (hdr_size + FEAT_DATA_MIN_PAYLOAD_SIZE > mds->payload_size)
+		return -ENOMEM;
+
+	if ((hdr_size + feat_data_size) <= mds->payload_size) {
+		pi->hdr.flags = cpu_to_le32(feat_flag |
+				       CXL_SET_FEAT_FLAG_FULL_DATA_TRANSFER);
+		data_in_size = feat_data_size;
+	} else {
+		pi->hdr.flags = cpu_to_le32(feat_flag |
+				       CXL_SET_FEAT_FLAG_INITIATE_DATA_TRANSFER);
+		data_in_size = mds->payload_size - hdr_size;
+	}
+
+	do {
+		pi->hdr.offset = cpu_to_le16(data_sent_size);
+		memcpy(pi->feat_data, feat_data + data_sent_size, data_in_size);
+		mbox_cmd = (struct cxl_mbox_cmd) {
+			.opcode = CXL_MBOX_OP_SET_FEATURE,
+			.size_in = hdr_size + data_in_size,
+			.payload_in = pi,
+		};
+		rc = cxl_internal_send_cmd(mds, &mbox_cmd);
+		if (rc < 0)
+			return rc;
+
+		data_sent_size += data_in_size;
+		if (data_sent_size >= feat_data_size)
+			return 0;
+
+		if ((feat_data_size - data_sent_size) <= (mds->payload_size - hdr_size)) {
+			data_in_size = feat_data_size - data_sent_size;
+			pi->hdr.flags = cpu_to_le32(feat_flag |
+					       CXL_SET_FEAT_FLAG_FINISH_DATA_TRANSFER);
+		} else {
+			pi->hdr.flags = cpu_to_le32(feat_flag |
+					       CXL_SET_FEAT_FLAG_CONTINUE_DATA_TRANSFER);
+		}
+	} while (true);
+
+	return rc;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_set_feature, CXL);
+
 int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
 		       struct cxl_region *cxlr)
 {
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index c822eb30e6d1..1c50a3e2eced 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -529,6 +529,7 @@ enum cxl_opcode {
 	CXL_MBOX_OP_GET_LOG		= 0x0401,
 	CXL_MBOX_OP_GET_SUPPORTED_FEATURES	= 0x0500,
 	CXL_MBOX_OP_GET_FEATURE		= 0x0501,
+	CXL_MBOX_OP_SET_FEATURE		= 0x0502,
 	CXL_MBOX_OP_IDENTIFY		= 0x4000,
 	CXL_MBOX_OP_GET_PARTITION_INFO	= 0x4100,
 	CXL_MBOX_OP_SET_PARTITION_INFO	= 0x4101,
@@ -777,6 +778,34 @@ struct cxl_mbox_get_feat_in {
 	u8 selection;
 }  __packed;
 
+/*
+ * Set Feature CXL 3.1 Spec 8.2.9.6.3
+ */
+
+/*
+ * Set Feature input payload
+ * CXL rev 3.1 section 8.2.9.6.3 Table 8-101
+ */
+/* Set Feature : Payload in flags */
+#define CXL_SET_FEAT_FLAG_DATA_TRANSFER_MASK	GENMASK(2, 0)
+enum cxl_set_feat_flag_data_transfer {
+	CXL_SET_FEAT_FLAG_FULL_DATA_TRANSFER,
+	CXL_SET_FEAT_FLAG_INITIATE_DATA_TRANSFER,
+	CXL_SET_FEAT_FLAG_CONTINUE_DATA_TRANSFER,
+	CXL_SET_FEAT_FLAG_FINISH_DATA_TRANSFER,
+	CXL_SET_FEAT_FLAG_ABORT_DATA_TRANSFER,
+	CXL_SET_FEAT_FLAG_DATA_TRANSFER_MAX
+};
+#define CXL_SET_FEAT_FLAG_DATA_SAVED_ACROSS_RESET	BIT(3)
+
+struct cxl_mbox_set_feat_hdr {
+	uuid_t uuid;
+	__le32 flags;
+	__le16 offset;
+	u8 version;
+	u8 rsvd[9];
+}  __packed;
+
 /* Get Poison List  CXL 3.0 Spec 8.2.9.8.4.1 */
 struct cxl_mbox_poison_in {
 	__le64 offset;
@@ -916,6 +945,10 @@ size_t cxl_get_feature(struct cxl_memdev_state *mds,
 		       size_t feat_out_size,
 		       size_t feat_out_min_size,
 		       enum cxl_get_feat_selection selection);
+int cxl_set_feature(struct cxl_memdev_state *mds,
+		    const uuid_t feat_uuid, u8 feat_version,
+		    void *feat_data, size_t feat_data_size,
+		    u8 feat_flag);
 int cxl_poison_state_init(struct cxl_memdev_state *mds);
 int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
 		       struct cxl_region *cxlr);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH v8 05/10] cxl/memscrub: Add CXL device patrol scrub control feature
  2024-04-19 16:47 [RFC PATCH v8 00/10] ras: scrub: introduce subsystem + CXL/ACPI-RAS2 drivers shiju.jose
                   ` (3 preceding siblings ...)
  2024-04-19 16:47 ` [RFC PATCH v8 04/10] cxl/mbox: Add SET_FEATURE " shiju.jose
@ 2024-04-19 16:47 ` shiju.jose
  2024-04-26 23:56   ` fan
  2024-04-19 16:47 ` [RFC PATCH v8 06/10] ACPICA: Add __free() based cleanup function for acpi_put_table shiju.jose
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 22+ messages in thread
From: shiju.jose @ 2024-04-19 16:47 UTC (permalink / raw)
  To: linux-cxl, linux-acpi, linux-mm, dan.j.williams, dave,
	jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
	ira.weiny
  Cc: linux-edac, linux-kernel, david, Vilas.Sridharan, leo.duran,
	Yazen.Ghannam, rientjes, jiaqiyan, tony.luck, Jon.Grimm,
	dave.hansen, rafael, lenb, naoya.horiguchi, james.morse,
	jthoughton, somasundaram.a, erdemaktas, pgonda, duenwen,
	mike.malvestuto, gthelen, wschwartz, dferguson, wbs, nifan.cxl,
	yazen.ghannam, tanxiaofei, prime.zeng, kangkang.shen,
	wanghuiqiang, linuxarm, shiju.jose

From: Shiju Jose <shiju.jose@huawei.com>

CXL spec 3.1 section 8.2.9.9.11.1 describes the device patrol scrub control
feature. The device patrol scrub proactively locates and makes corrections
to errors in regular cycle.

Allow specifying the number of hours within which the patrol scrub must be
completed, subject to minimum and maximum limits reported by the device.
Also allow disabling scrub allowing trade-off error rates against
performance.

Register with scrub subsystem to provide scrub control attributes to the
user.

Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
 Documentation/scrub/scrub-configure.rst |  52 ++++
 drivers/cxl/Kconfig                     |  19 ++
 drivers/cxl/core/Makefile               |   1 +
 drivers/cxl/core/memscrub.c             | 314 ++++++++++++++++++++++++
 drivers/cxl/cxlmem.h                    |   8 +
 drivers/cxl/mem.c                       |   6 +
 6 files changed, 400 insertions(+)
 create mode 100644 Documentation/scrub/scrub-configure.rst
 create mode 100644 drivers/cxl/core/memscrub.c

diff --git a/Documentation/scrub/scrub-configure.rst b/Documentation/scrub/scrub-configure.rst
new file mode 100644
index 000000000000..2275366b60d3
--- /dev/null
+++ b/Documentation/scrub/scrub-configure.rst
@@ -0,0 +1,52 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+================
+Scrub subsystem
+================
+
+Copyright (c) 2024 HiSilicon Limited.
+
+:Author:   Shiju Jose <shiju.jose@huawei.com>
+:License:  The GNU Free Documentation License, Version 1.2
+          (dual licensed under the GPL v2)
+:Original Reviewers:
+
+- Written for: 6.9
+- Updated for:
+
+Introduction
+------------
+The scrub subsystem provides interface for controlling attributes
+of memory scrubbers in the system. The scrub device drivers
+in the system register with the scrub subsystem.The scrub subsystem
+driver exposes the scrub controls to the user in the sysfs.
+
+The File System
+---------------
+
+The control attributes of the registered scrubbers could be
+accessed in the /sys/class/ras/rasX/scrub/
+
+sysfs
+-----
+
+Sysfs files are documented in
+`Documentation/ABI/testing/sysfs-class-scrub-configure`.
+
+Example
+-------
+
+The usage takes the form shown in this example::
+
+1. CXL patrol scrubber
+    # cat /sys/class/ras/ras0/scrub/rate_available
+    # 0x1-0xff
+    # echo 30 > /sys/class/ras/ras0/scrub/rate
+    # cat /sys/class/ras/ras0/scrub/rate
+    # 0x1e
+    # echo 1 > /sys/class/ras/ras0/scrub/enable_background
+    # cat /sys/class/ras/ras0/scrub/enable_background
+    # 1
+    # echo 0 > /sys/class/ras/ras0/scrub/enable_background
+    # cat /sys/class/ras/ras0/scrub/enable_background
+    # 0
diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
index 5f3c9c5529b9..3621b9f27e80 100644
--- a/drivers/cxl/Kconfig
+++ b/drivers/cxl/Kconfig
@@ -144,4 +144,23 @@ config CXL_REGION_INVALIDATION_TEST
 	  If unsure, or if this kernel is meant for production environments,
 	  say N.
 
+config CXL_SCRUB
+	bool "CXL: Memory scrub feature"
+	depends on CXL_PCI
+	depends on CXL_MEM
+	depends on SCRUB
+	help
+	  The CXL memory scrub control is an optional feature allows host to
+	  control the scrub configurations of CXL Type 3 devices, which
+	  supports patrol scrubbing.
+
+	  Registers with the scrub subsystem to provide control attributes
+	  of CXL memory device scrubber to the user.
+	  Provides interface functions to support configuring the CXL memory
+	  device patrol scrubber.
+
+	  Say 'y/n' to enable/disable control of memory scrub parameters for
+	  CXL.mem devices. See section 8.2.9.9.11.1 of CXL 3.1 specification
+	  for detailed description of CXL memory patrol scrub control feature.
+
 endif
diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile
index 9259bcc6773c..e0fc814c3983 100644
--- a/drivers/cxl/core/Makefile
+++ b/drivers/cxl/core/Makefile
@@ -16,3 +16,4 @@ cxl_core-y += pmu.o
 cxl_core-y += cdat.o
 cxl_core-$(CONFIG_TRACING) += trace.o
 cxl_core-$(CONFIG_CXL_REGION) += region.o
+cxl_core-$(CONFIG_CXL_SCRUB) += memscrub.o
diff --git a/drivers/cxl/core/memscrub.c b/drivers/cxl/core/memscrub.c
new file mode 100644
index 000000000000..a50f6e384394
--- /dev/null
+++ b/drivers/cxl/core/memscrub.c
@@ -0,0 +1,314 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * CXL memory scrub driver.
+ *
+ * Copyright (c) 2024 HiSilicon Limited.
+ *
+ *  - Provides functions to configure patrol scrub feature of the
+ *    CXL memory devices.
+ *  - Registers with the scrub subsystem driver to expose the sysfs attributes
+ *    to the user for configuring the CXL memory patrol scrub feature.
+ */
+
+#define pr_fmt(fmt)	"CXL_MEM_SCRUB: " fmt
+
+#include <cxlmem.h>
+#include <linux/cleanup.h>
+#include <linux/limits.h>
+#include <linux/memory_scrub.h>
+
+static int cxl_mem_get_supported_feature_entry(struct cxl_memdev *cxlmd, const uuid_t *feat_uuid,
+					       struct cxl_mbox_supp_feat_entry *feat_entry_out)
+{
+	struct cxl_mbox_supp_feat_entry *feat_entry;
+	struct cxl_dev_state *cxlds = cxlmd->cxlds;
+	struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
+	int feat_index, feats_out_size;
+	int nentries, count;
+	int ret;
+
+	feat_index = 0;
+	feats_out_size = sizeof(struct cxl_mbox_get_supp_feats_out) +
+			  sizeof(struct cxl_mbox_supp_feat_entry);
+	struct cxl_mbox_get_supp_feats_out *feats_out __free(kfree) =
+					kmalloc(feats_out_size, GFP_KERNEL);
+	if (!feats_out)
+		return -ENOMEM;
+
+	while (true) {
+		memset(feats_out, 0, feats_out_size);
+		ret = cxl_get_supported_features(mds, feats_out_size,
+						 feat_index, feats_out);
+		if (ret)
+			return ret;
+
+		nentries = feats_out->nr_entries;
+		if (!nentries)
+			return -EOPNOTSUPP;
+
+		/* Check CXL memdev supports the feature */
+		feat_entry = feats_out->feat_entries;
+		for (count = 0; count < nentries; count++, feat_entry++) {
+			if (uuid_equal(&feat_entry->uuid, feat_uuid)) {
+				memcpy(feat_entry_out, feat_entry,
+				       sizeof(*feat_entry_out));
+				return 0;
+			}
+		}
+		feat_index += nentries;
+	}
+}
+
+/* CXL memory patrol scrub control definitions */
+#define CXL_MEMDEV_PS_GET_FEAT_VERSION	0x01
+#define CXL_MEMDEV_PS_SET_FEAT_VERSION	0x01
+
+static const uuid_t cxl_patrol_scrub_uuid =
+	UUID_INIT(0x96dad7d6, 0xfde8, 0x482b, 0xa7, 0x33, 0x75, 0x77, 0x4e,     \
+		  0x06, 0xdb, 0x8a);
+
+/* CXL memory patrol scrub control functions */
+struct cxl_patrol_scrub_context {
+	struct device *dev;
+	u16 get_feat_size;
+	u16 set_feat_size;
+	bool scrub_cycle_changeable;
+};
+
+/**
+ * struct cxl_memdev_ps_params - CXL memory patrol scrub parameter data structure.
+ * @enable:     [IN & OUT] enable(1)/disable(0) patrol scrub.
+ * @scrub_cycle_changeable: [OUT] scrub cycle attribute of patrol scrub is changeable.
+ * @rate:       [IN] Requested patrol scrub cycle in hours.
+ *              [OUT] Current patrol scrub cycle in hours.
+ * @min_rate:[OUT] minimum patrol scrub cycle, in hours, supported.
+ */
+struct cxl_memdev_ps_params {
+	bool enable;
+	bool scrub_cycle_changeable;
+	u16 rate;
+	u16 min_rate;
+};
+
+enum cxl_scrub_param {
+	cxl_ps_param_enable,
+	cxl_ps_param_rate,
+};
+
+#define	CXL_MEMDEV_PS_SCRUB_CYCLE_CHANGE_CAP_MASK	BIT(0)
+#define	CXL_MEMDEV_PS_SCRUB_CYCLE_REALTIME_REPORT_CAP_MASK	BIT(1)
+#define	CXL_MEMDEV_PS_CUR_SCRUB_CYCLE_MASK	GENMASK(7, 0)
+#define	CXL_MEMDEV_PS_MIN_SCRUB_CYCLE_MASK	GENMASK(15, 8)
+#define	CXL_MEMDEV_PS_FLAG_ENABLED_MASK	BIT(0)
+
+struct cxl_memdev_ps_rd_attrs {
+	u8 scrub_cycle_cap;
+	__le16 scrub_cycle;
+	u8 scrub_flags;
+}  __packed;
+
+struct cxl_memdev_ps_wr_attrs {
+	u8 scrub_cycle_hr;
+	u8 scrub_flags;
+}  __packed;
+
+static int cxl_mem_ps_get_attrs(struct device *dev,
+				struct cxl_memdev_ps_params *params)
+{
+	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
+	struct cxl_dev_state *cxlds = cxlmd->cxlds;
+	struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
+	size_t rd_data_size = sizeof(struct cxl_memdev_ps_rd_attrs);
+	size_t data_size;
+
+	if (!mds)
+		return -EFAULT;
+
+	struct cxl_memdev_ps_rd_attrs *rd_attrs __free(kfree) =
+						kmalloc(rd_data_size, GFP_KERNEL);
+	if (!rd_attrs)
+		return -ENOMEM;
+
+	data_size = cxl_get_feature(mds, cxl_patrol_scrub_uuid, rd_attrs,
+				    rd_data_size, rd_data_size,
+				    CXL_GET_FEAT_SEL_CURRENT_VALUE);
+	if (!data_size)
+		return -EIO;
+
+	params->scrub_cycle_changeable = FIELD_GET(CXL_MEMDEV_PS_SCRUB_CYCLE_CHANGE_CAP_MASK,
+						   rd_attrs->scrub_cycle_cap);
+	params->enable = FIELD_GET(CXL_MEMDEV_PS_FLAG_ENABLED_MASK,
+				   rd_attrs->scrub_flags);
+	params->rate = FIELD_GET(CXL_MEMDEV_PS_CUR_SCRUB_CYCLE_MASK,
+				 rd_attrs->scrub_cycle);
+	params->min_rate = FIELD_GET(CXL_MEMDEV_PS_MIN_SCRUB_CYCLE_MASK,
+				      rd_attrs->scrub_cycle);
+
+	return 0;
+}
+
+static int cxl_mem_ps_set_attrs(struct device *dev, struct cxl_memdev_ps_params *params,
+				enum cxl_scrub_param param_type)
+{
+	struct cxl_memdev_ps_wr_attrs wr_attrs;
+	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
+	struct cxl_dev_state *cxlds = cxlmd->cxlds;
+	struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
+	struct cxl_memdev_ps_params rd_params;
+	int ret;
+
+	ret = cxl_mem_ps_get_attrs(dev, &rd_params);
+	if (ret) {
+		dev_err(dev, "Get cxlmemdev patrol scrub params failed ret=%d\n",
+			ret);
+		return ret;
+	}
+
+	switch (param_type) {
+	case cxl_ps_param_enable:
+		wr_attrs.scrub_flags = FIELD_PREP(CXL_MEMDEV_PS_FLAG_ENABLED_MASK,
+						   params->enable);
+		wr_attrs.scrub_cycle_hr = FIELD_PREP(CXL_MEMDEV_PS_CUR_SCRUB_CYCLE_MASK,
+						      rd_params.rate);
+		break;
+	case cxl_ps_param_rate:
+		if (params->rate < rd_params.min_rate) {
+			dev_err(dev, "Invalid CXL patrol scrub cycle(%d) to set\n",
+				params->rate);
+			dev_err(dev, "Minimum supported CXL patrol scrub cycle in hour %d\n",
+			       params->min_rate);
+			return -EINVAL;
+		}
+		wr_attrs.scrub_cycle_hr = FIELD_PREP(CXL_MEMDEV_PS_CUR_SCRUB_CYCLE_MASK,
+						     params->rate);
+		wr_attrs.scrub_flags = FIELD_PREP(CXL_MEMDEV_PS_FLAG_ENABLED_MASK,
+						  rd_params.enable);
+		break;
+	}
+
+	ret = cxl_set_feature(mds, cxl_patrol_scrub_uuid, CXL_MEMDEV_PS_SET_FEAT_VERSION,
+			      &wr_attrs, sizeof(wr_attrs),
+			      CXL_SET_FEAT_FLAG_DATA_SAVED_ACROSS_RESET);
+	if (ret)
+		dev_err(dev, "CXL patrol scrub set feature failed ret=%d\n",
+			ret);
+
+	return ret;
+}
+
+static int cxl_patrol_scrub_get_enabled_bg(struct device *dev, bool *enabled)
+{
+	struct cxl_memdev_ps_params params;
+	int ret;
+
+	ret = cxl_mem_ps_get_attrs(dev->parent, &params);
+	if (ret)
+		return ret;
+
+	*enabled = params.enable;
+
+	return 0;
+}
+
+static int cxl_patrol_scrub_set_enabled_bg(struct device *dev, bool enable)
+{
+	struct cxl_memdev_ps_params params = {
+		.enable = enable,
+	};
+
+	return cxl_mem_ps_set_attrs(dev->parent, &params, cxl_ps_param_enable);
+}
+
+static int cxl_patrol_scrub_get_name(struct device *dev, char *name)
+{
+	struct cxl_memdev *cxlmd = to_cxl_memdev(dev->parent);
+
+	return sysfs_emit(name, "%s_%s\n", "cxl_patrol_scrub",
+			  dev_name(&cxlmd->dev));
+}
+
+static int cxl_patrol_scrub_write_rate(struct device *dev, u64 rate)
+{
+	struct cxl_memdev_ps_params params = {
+		.rate = rate,
+	};
+
+	return cxl_mem_ps_set_attrs(dev->parent, &params, cxl_ps_param_rate);
+}
+
+static int cxl_patrol_scrub_read_rate(struct device *dev, u64 *rate)
+{
+	struct cxl_memdev_ps_params params;
+	int ret;
+
+	ret = cxl_mem_ps_get_attrs(dev->parent, &params);
+	if (ret)
+		return ret;
+
+	*rate = params.rate;
+
+	return 0;
+}
+
+static int cxl_patrol_scrub_read_rate_avail(struct device *dev, u64 *min, u64 *max)
+{
+	struct cxl_memdev_ps_params params;
+	int ret;
+
+	ret = cxl_mem_ps_get_attrs(dev->parent, &params);
+	if (ret)
+		return ret;
+	*min = params.min_rate;
+	*max = U8_MAX; /* Max set by register size */
+
+	return 0;
+}
+
+static const struct scrub_ops cxl_ps_scrub_ops = {
+	.get_enabled_bg = cxl_patrol_scrub_get_enabled_bg,
+	.set_enabled_bg = cxl_patrol_scrub_set_enabled_bg,
+	.get_name = cxl_patrol_scrub_get_name,
+	.rate_read = cxl_patrol_scrub_read_rate,
+	.rate_write = cxl_patrol_scrub_write_rate,
+	.rate_avail_range = cxl_patrol_scrub_read_rate_avail,
+};
+
+int cxl_mem_patrol_scrub_init(struct cxl_memdev *cxlmd)
+{
+	struct cxl_patrol_scrub_context *cxl_ps_ctx;
+	struct cxl_mbox_supp_feat_entry feat_entry;
+	struct cxl_memdev_ps_params params;
+	struct device *cxl_scrub_dev;
+	int ret;
+
+	ret = cxl_mem_get_supported_feature_entry(cxlmd, &cxl_patrol_scrub_uuid,
+						  &feat_entry);
+	if (ret < 0)
+		return ret;
+
+	if (!(feat_entry.attr_flags & CXL_FEAT_ENTRY_FLAG_CHANGABLE))
+		return -EOPNOTSUPP;
+
+	ret = cxl_mem_ps_get_attrs(&cxlmd->dev, &params);
+	if (ret)
+		return dev_err_probe(&cxlmd->dev, ret,
+				     "Get CXL patrol scrub params failed\n");
+
+	cxl_ps_ctx = devm_kzalloc(&cxlmd->dev, sizeof(*cxl_ps_ctx), GFP_KERNEL);
+	if (!cxl_ps_ctx)
+		return -ENOMEM;
+
+	*cxl_ps_ctx = (struct cxl_patrol_scrub_context) {
+		.get_feat_size = feat_entry.get_size,
+		.set_feat_size = feat_entry.set_size,
+		.scrub_cycle_changeable =  params.scrub_cycle_changeable,
+	};
+
+	cxl_scrub_dev = devm_scrub_device_register(&cxlmd->dev, cxl_ps_ctx,
+						   &cxl_ps_scrub_ops);
+	if (IS_ERR(cxl_scrub_dev))
+		return PTR_ERR(cxl_scrub_dev);
+
+	return 0;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_mem_patrol_scrub_init, CXL);
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 1c50a3e2eced..f95e39febd73 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -956,6 +956,14 @@ int cxl_trigger_poison_list(struct cxl_memdev *cxlmd);
 int cxl_inject_poison(struct cxl_memdev *cxlmd, u64 dpa);
 int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa);
 
+/* cxl memory scrub functions */
+#ifdef CONFIG_CXL_SCRUB
+int cxl_mem_patrol_scrub_init(struct cxl_memdev *cxlmd);
+#else
+static inline int cxl_mem_patrol_scrub_init(struct cxl_memdev *cxlmd)
+{ return 0; }
+#endif
+
 #ifdef CONFIG_CXL_SUSPEND
 void cxl_mem_active_inc(void);
 void cxl_mem_active_dec(void);
diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
index 0c79d9ce877c..399e43463626 100644
--- a/drivers/cxl/mem.c
+++ b/drivers/cxl/mem.c
@@ -117,6 +117,12 @@ static int cxl_mem_probe(struct device *dev)
 	if (!cxlds->media_ready)
 		return -EBUSY;
 
+	rc = cxl_mem_patrol_scrub_init(cxlmd);
+	if (rc) {
+		dev_dbg(&cxlmd->dev, "CXL patrol scrub init failed\n");
+		return rc;
+	}
+
 	/*
 	 * Someone is trying to reattach this device after it lost its port
 	 * connection (an endpoint port previously registered by this memdev was
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH v8 06/10] ACPICA: Add __free() based cleanup function for acpi_put_table
  2024-04-19 16:47 [RFC PATCH v8 00/10] ras: scrub: introduce subsystem + CXL/ACPI-RAS2 drivers shiju.jose
                   ` (4 preceding siblings ...)
  2024-04-19 16:47 ` [RFC PATCH v8 05/10] cxl/memscrub: Add CXL device patrol scrub control feature shiju.jose
@ 2024-04-19 16:47 ` shiju.jose
  2024-04-19 18:06   ` Jonathan Cameron
  2024-04-19 16:47 ` [RFC PATCH v8 07/10] platform: Add __free() based cleanup function for platform_device_put shiju.jose
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 22+ messages in thread
From: shiju.jose @ 2024-04-19 16:47 UTC (permalink / raw)
  To: linux-cxl, linux-acpi, linux-mm, dan.j.williams, dave,
	jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
	ira.weiny
  Cc: linux-edac, linux-kernel, david, Vilas.Sridharan, leo.duran,
	Yazen.Ghannam, rientjes, jiaqiyan, tony.luck, Jon.Grimm,
	dave.hansen, rafael, lenb, naoya.horiguchi, james.morse,
	jthoughton, somasundaram.a, erdemaktas, pgonda, duenwen,
	mike.malvestuto, gthelen, wschwartz, dferguson, wbs, nifan.cxl,
	yazen.ghannam, tanxiaofei, prime.zeng, kangkang.shen,
	wanghuiqiang, linuxarm, shiju.jose

From: Jonathan Cameron <Jonathan.Cameron@huawei.com>

Add __free() based cleanup function for acpi_put_table.

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
 include/acpi/acpixf.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/acpi/acpixf.h b/include/acpi/acpixf.h
index 3d90716f9522..fc64d903a703 100644
--- a/include/acpi/acpixf.h
+++ b/include/acpi/acpixf.h
@@ -492,6 +492,8 @@ ACPI_EXTERNAL_RETURN_STATUS(acpi_status
 					    **out_table))
 ACPI_EXTERNAL_RETURN_VOID(void acpi_put_table(struct acpi_table_header *table))
 
+DEFINE_FREE(acpi_put_table, struct acpi_table_header *, if (!IS_ERR_OR_NULL(_T)) acpi_put_table(_T))
+
 ACPI_EXTERNAL_RETURN_STATUS(acpi_status
 			    acpi_get_table_by_index(u32 table_index,
 						    struct acpi_table_header
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH v8 07/10] platform: Add __free() based cleanup function for platform_device_put
  2024-04-19 16:47 [RFC PATCH v8 00/10] ras: scrub: introduce subsystem + CXL/ACPI-RAS2 drivers shiju.jose
                   ` (5 preceding siblings ...)
  2024-04-19 16:47 ` [RFC PATCH v8 06/10] ACPICA: Add __free() based cleanup function for acpi_put_table shiju.jose
@ 2024-04-19 16:47 ` shiju.jose
  2024-04-19 16:47 ` [RFC PATCH v8 08/10] ACPI:RAS2: Add ACPI RAS2 driver shiju.jose
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: shiju.jose @ 2024-04-19 16:47 UTC (permalink / raw)
  To: linux-cxl, linux-acpi, linux-mm, dan.j.williams, dave,
	jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
	ira.weiny
  Cc: linux-edac, linux-kernel, david, Vilas.Sridharan, leo.duran,
	Yazen.Ghannam, rientjes, jiaqiyan, tony.luck, Jon.Grimm,
	dave.hansen, rafael, lenb, naoya.horiguchi, james.morse,
	jthoughton, somasundaram.a, erdemaktas, pgonda, duenwen,
	mike.malvestuto, gthelen, wschwartz, dferguson, wbs, nifan.cxl,
	yazen.ghannam, tanxiaofei, prime.zeng, kangkang.shen,
	wanghuiqiang, linuxarm, shiju.jose

From: Jonathan Cameron <Jonathan.Cameron@huawei.com>

Add __free() based cleanup function for platform_device_put().

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
 include/linux/platform_device.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/platform_device.h b/include/linux/platform_device.h
index 7a41c72c1959..1ddc35623b4c 100644
--- a/include/linux/platform_device.h
+++ b/include/linux/platform_device.h
@@ -232,6 +232,7 @@ extern int platform_device_add_data(struct platform_device *pdev,
 extern int platform_device_add(struct platform_device *pdev);
 extern void platform_device_del(struct platform_device *pdev);
 extern void platform_device_put(struct platform_device *pdev);
+DEFINE_FREE(platform_device_put, struct platform_device *, if (_T) platform_device_put(_T))
 
 struct platform_driver {
 	int (*probe)(struct platform_device *);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH v8 08/10] ACPI:RAS2: Add ACPI RAS2 driver
  2024-04-19 16:47 [RFC PATCH v8 00/10] ras: scrub: introduce subsystem + CXL/ACPI-RAS2 drivers shiju.jose
                   ` (6 preceding siblings ...)
  2024-04-19 16:47 ` [RFC PATCH v8 07/10] platform: Add __free() based cleanup function for platform_device_put shiju.jose
@ 2024-04-19 16:47 ` shiju.jose
  2024-04-19 16:47 ` [RFC PATCH v8 09/10] ras: scrub: Add scrub control attributes for ACPI RAS2 shiju.jose
  2024-04-19 16:47 ` [RFC PATCH v8 10/10] ras: scrub: ACPI RAS2: Add memory ACPI RAS2 driver shiju.jose
  9 siblings, 0 replies; 22+ messages in thread
From: shiju.jose @ 2024-04-19 16:47 UTC (permalink / raw)
  To: linux-cxl, linux-acpi, linux-mm, dan.j.williams, dave,
	jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
	ira.weiny
  Cc: linux-edac, linux-kernel, david, Vilas.Sridharan, leo.duran,
	Yazen.Ghannam, rientjes, jiaqiyan, tony.luck, Jon.Grimm,
	dave.hansen, rafael, lenb, naoya.horiguchi, james.morse,
	jthoughton, somasundaram.a, erdemaktas, pgonda, duenwen,
	mike.malvestuto, gthelen, wschwartz, dferguson, wbs, nifan.cxl,
	yazen.ghannam, tanxiaofei, prime.zeng, kangkang.shen,
	wanghuiqiang, linuxarm, shiju.jose

From: Shiju Jose <shiju.jose@huawei.com>

Add support for ACPI RAS2 feature table (RAS2) defined in the
ACPI 6.5 Specification, section 5.2.21.
Driver contains RAS2 Init, which extracts the RAS2 table and driver
adds platform device for each memory features which binds to the
RAS2 memory driver.

Driver uses PCC mailbox to communicate with the ACPI HW and the
driver adds OSPM interfaces to send RAS2 commands.

Co-developed-by: A Somasundaram <somasundaram.a@hpe.com>
Signed-off-by: A Somasundaram <somasundaram.a@hpe.com>
Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
 drivers/acpi/Kconfig     |  10 ++
 drivers/acpi/Makefile    |   1 +
 drivers/acpi/ras2.c      | 366 +++++++++++++++++++++++++++++++++++++++
 include/acpi/ras2_acpi.h |  59 +++++++
 4 files changed, 436 insertions(+)
 create mode 100755 drivers/acpi/ras2.c
 create mode 100644 include/acpi/ras2_acpi.h

diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
index ff1689bb3124..638f1e38f961 100644
--- a/drivers/acpi/Kconfig
+++ b/drivers/acpi/Kconfig
@@ -284,6 +284,16 @@ config ACPI_CPPC_LIB
 	  If your platform does not support CPPC in firmware,
 	  leave this option disabled.
 
+config ACPI_RAS2
+	bool "ACPI RAS2 driver"
+	select MAILBOX
+	select PCC
+	help
+	  The driver adds support for ACPI RAS2 feature table(extracts RAS2
+	  table from OS system table) and OSPM interfaces to send RAS2
+	  commands via PCC mailbox subspace. Driver adds platform device for
+	  the RAS2 memory features which binds to the RAS2 memory driver.
+
 config ACPI_PROCESSOR
 	tristate "Processor"
 	depends on X86 || ARM64 || LOONGARCH || RISCV
diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
index 8cc8c0d9c873..1df9de524c62 100644
--- a/drivers/acpi/Makefile
+++ b/drivers/acpi/Makefile
@@ -104,6 +104,7 @@ obj-$(CONFIG_ACPI_EC_DEBUGFS)	+= ec_sys.o
 obj-$(CONFIG_ACPI_BGRT)		+= bgrt.o
 obj-$(CONFIG_ACPI_CPPC_LIB)	+= cppc_acpi.o
 obj-$(CONFIG_ACPI_SPCR_TABLE)	+= spcr.o
+obj-$(CONFIG_ACPI_RAS2)		+= ras2.o
 obj-$(CONFIG_ACPI_DEBUGGER_USER) += acpi_dbg.o
 obj-$(CONFIG_ACPI_PPTT) 	+= pptt.o
 obj-$(CONFIG_ACPI_PFRUT)	+= pfr_update.o pfr_telemetry.o
diff --git a/drivers/acpi/ras2.c b/drivers/acpi/ras2.c
new file mode 100755
index 000000000000..f4282aad5174
--- /dev/null
+++ b/drivers/acpi/ras2.c
@@ -0,0 +1,366 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Implementation of ACPI RAS2 driver.
+ *
+ * Copyright (c) 2024 HiSilicon Limited.
+ *
+ * Support for RAS2 - ACPI 6.5 Specification, section 5.2.21
+ *
+ * Driver contains ACPI RAS2 init, which extracts the ACPI RAS2 table and
+ * get the PCC channel subspace for communicating with the ACPI compliant
+ * HW platform which supports ACPI RAS2. Driver adds platform devices
+ * for each RAS2 memory feature which binds to the memory ACPI RAS2 driver.
+ */
+
+#define pr_fmt(fmt)    "ACPI RAS2: " fmt
+
+#include <linux/delay.h>
+#include <linux/export.h>
+#include <linux/ktime.h>
+#include <linux/platform_device.h>
+#include <acpi/pcc.h>
+#include <acpi/ras2_acpi.h>
+
+/*
+ * Arbitrary Retries for PCC commands because the
+ * remote processor could be much slower to reply.
+ */
+#define RAS2_NUM_RETRIES 600
+
+#define RAS2_FEATURE_TYPE_MEMORY        0x00
+
+/* global variables for the RAS2 PCC subspaces */
+static DEFINE_MUTEX(ras2_pcc_subspace_lock);
+static LIST_HEAD(ras2_pcc_subspaces);
+
+static int ras2_check_pcc_chan(struct ras2_pcc_subspace *pcc_subspace)
+{
+	struct acpi_ras2_shared_memory __iomem *generic_comm_base = pcc_subspace->pcc_comm_addr;
+	ktime_t next_deadline = ktime_add(ktime_get(), pcc_subspace->deadline);
+	u16 status;
+
+	while (!ktime_after(ktime_get(), next_deadline)) {
+		/*
+		 * As per ACPI spec, the PCC space will be initialized by
+		 * platform and should have set the command completion bit when
+		 * PCC can be used by OSPM
+		 */
+		status = readw_relaxed(&generic_comm_base->status);
+		if (status & RAS2_PCC_CMD_ERROR)
+			return -EIO;
+		if (status & RAS2_PCC_CMD_COMPLETE)
+			return 0;
+		/*
+		 * Reducing the bus traffic in case this loop takes longer than
+		 * a few retries.
+		 */
+		msleep(10);
+	}
+
+	return -EIO;
+}
+
+/**
+ * ras2_send_pcc_cmd() - Send RAS2 command via PCC channel
+ * @ras2_ctx:	pointer to the ras2 context structure
+ * @cmd:	command to send
+ *
+ * Returns: 0 on success, an error otherwise
+ */
+int ras2_send_pcc_cmd(struct ras2_scrub_ctx *ras2_ctx, u16 cmd)
+{
+	struct ras2_pcc_subspace *pcc_subspace = ras2_ctx->pcc_subspace;
+	struct acpi_ras2_shared_memory *generic_comm_base = pcc_subspace->pcc_comm_addr;
+	static ktime_t last_cmd_cmpl_time, last_mpar_reset;
+	struct mbox_chan *pcc_channel;
+	unsigned int time_delta;
+	static int mpar_count;
+	int ret;
+
+	guard(mutex)(&ras2_pcc_subspace_lock);
+	ret = ras2_check_pcc_chan(pcc_subspace);
+	if (ret)
+		return ret;
+	pcc_channel = pcc_subspace->pcc_chan->mchan;
+
+	/*
+	 * Handle the Minimum Request Turnaround Time(MRTT)
+	 * "The minimum amount of time that OSPM must wait after the completion
+	 * of a command before issuing the next command, in microseconds"
+	 */
+	if (pcc_subspace->pcc_mrtt) {
+		time_delta = ktime_us_delta(ktime_get(), last_cmd_cmpl_time);
+		if (pcc_subspace->pcc_mrtt > time_delta)
+			udelay(pcc_subspace->pcc_mrtt - time_delta);
+	}
+
+	/*
+	 * Handle the non-zero Maximum Periodic Access Rate(MPAR)
+	 * "The maximum number of periodic requests that the subspace channel can
+	 * support, reported in commands per minute. 0 indicates no limitation."
+	 *
+	 * This parameter should be ideally zero or large enough so that it can
+	 * handle maximum number of requests that all the cores in the system can
+	 * collectively generate. If it is not, we will follow the spec and just
+	 * not send the request to the platform after hitting the MPAR limit in
+	 * any 60s window
+	 */
+	if (pcc_subspace->pcc_mpar) {
+		if (mpar_count == 0) {
+			time_delta = ktime_ms_delta(ktime_get(), last_mpar_reset);
+			if (time_delta < 60 * MSEC_PER_SEC) {
+				dev_dbg(ras2_ctx->dev,
+					"PCC cmd not sent due to MPAR limit");
+				return -EIO;
+			}
+			last_mpar_reset = ktime_get();
+			mpar_count = pcc_subspace->pcc_mpar;
+		}
+		mpar_count--;
+	}
+
+	/* Write to the shared comm region. */
+	writew_relaxed(cmd, &generic_comm_base->command);
+
+	/* Flip CMD COMPLETE bit */
+	writew_relaxed(0, &generic_comm_base->status);
+
+	/* Ring doorbell */
+	ret = mbox_send_message(pcc_channel, &cmd);
+	if (ret < 0) {
+		dev_err(ras2_ctx->dev,
+			"Err sending PCC mbox message. cmd:%d, ret:%d\n",
+			cmd, ret);
+		return ret;
+	}
+
+	/*
+	 * If Minimum Request Turnaround Time is non-zero, we need
+	 * to record the completion time of both READ and WRITE
+	 * command for proper handling of MRTT, so we need to check
+	 * for pcc_mrtt in addition to CMD_READ
+	 */
+	if (cmd == RAS2_PCC_CMD_EXEC || pcc_subspace->pcc_mrtt) {
+		ret = ras2_check_pcc_chan(pcc_subspace);
+		if (pcc_subspace->pcc_mrtt)
+			last_cmd_cmpl_time = ktime_get();
+	}
+
+	if (pcc_channel->mbox->txdone_irq)
+		mbox_chan_txdone(pcc_channel, ret);
+	else
+		mbox_client_txdone(pcc_channel, ret);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(ras2_send_pcc_cmd);
+
+static int ras2_register_pcc_channel(struct device *dev, struct ras2_scrub_ctx *ras2_ctx,
+				     int pcc_subspace_id)
+{
+	struct acpi_pcct_hw_reduced *ras2_ss;
+	struct mbox_client *ras2_mbox_cl;
+	struct pcc_mbox_chan *pcc_chan;
+	struct ras2_pcc_subspace *pcc_subspace;
+
+	if (pcc_subspace_id < 0)
+		return -EINVAL;
+
+	mutex_lock(&ras2_pcc_subspace_lock);
+	list_for_each_entry(pcc_subspace, &ras2_pcc_subspaces, elem) {
+		if (pcc_subspace->pcc_subspace_id == pcc_subspace_id) {
+			ras2_ctx->pcc_subspace = pcc_subspace;
+			pcc_subspace->ref_count++;
+			mutex_unlock(&ras2_pcc_subspace_lock);
+			return 0;
+		}
+	}
+	mutex_unlock(&ras2_pcc_subspace_lock);
+
+	pcc_subspace = kcalloc(1, sizeof(*pcc_subspace), GFP_KERNEL);
+	if (!pcc_subspace)
+		return -ENOMEM;
+	pcc_subspace->pcc_subspace_id = pcc_subspace_id;
+	ras2_mbox_cl = &pcc_subspace->mbox_client;
+	ras2_mbox_cl->dev = dev;
+	ras2_mbox_cl->knows_txdone = true;
+
+	pcc_chan = pcc_mbox_request_channel(ras2_mbox_cl, pcc_subspace_id);
+	if (IS_ERR(pcc_chan)) {
+		kfree(pcc_subspace);
+		return PTR_ERR(pcc_chan);
+	}
+	pcc_subspace->pcc_chan = pcc_chan;
+	ras2_ss = pcc_chan->mchan->con_priv;
+	pcc_subspace->comm_base_addr = ras2_ss->base_address;
+
+	/*
+	 * ras2_ss->latency is just a Nominal value. In reality
+	 * the remote processor could be much slower to reply.
+	 * So add an arbitrary amount of wait on top of Nominal.
+	 */
+	pcc_subspace->deadline = ns_to_ktime(RAS2_NUM_RETRIES * ras2_ss->latency *
+					     NSEC_PER_USEC);
+	pcc_subspace->pcc_mrtt = ras2_ss->min_turnaround_time;
+	pcc_subspace->pcc_mpar = ras2_ss->max_access_rate;
+	pcc_subspace->pcc_comm_addr = acpi_os_ioremap(pcc_subspace->comm_base_addr,
+						      ras2_ss->length);
+	/* Set flag so that we dont come here for each CPU. */
+	pcc_subspace->pcc_channel_acquired = true;
+
+	mutex_lock(&ras2_pcc_subspace_lock);
+	list_add(&pcc_subspace->elem, &ras2_pcc_subspaces);
+	pcc_subspace->ref_count++;
+	mutex_unlock(&ras2_pcc_subspace_lock);
+	ras2_ctx->pcc_subspace = pcc_subspace;
+
+	return 0;
+}
+
+static void ras2_unregister_pcc_channel(void *ctx)
+{
+	struct ras2_scrub_ctx *ras2_ctx = ctx;
+	struct ras2_pcc_subspace *pcc_subspace = ras2_ctx->pcc_subspace;
+
+	if (!pcc_subspace  || !pcc_subspace->pcc_chan)
+		return;
+
+	guard(mutex)(&ras2_pcc_subspace_lock);
+	if (pcc_subspace->ref_count > 0)
+		pcc_subspace->ref_count--;
+	if (!pcc_subspace->ref_count) {
+		list_del(&pcc_subspace->elem);
+		pcc_mbox_free_channel(pcc_subspace->pcc_chan);
+		kfree(pcc_subspace);
+	}
+}
+
+/**
+ * devm_ras2_register_pcc_channel() - Register RAS2 PCC channel
+ * @dev:		pointer to the ras2 device
+ * @ras2_ctx:		pointer to the ras2 context structure
+ * @pcc_subspace_id:	identifier of the RAS2 PCC channel.
+ *
+ * Returns: 0 on success, an error otherwise
+ */
+int devm_ras2_register_pcc_channel(struct device *dev, struct ras2_scrub_ctx *ras2_ctx,
+				   int pcc_subspace_id)
+{
+	int ret;
+
+	ret = ras2_register_pcc_channel(dev, ras2_ctx, pcc_subspace_id);
+	if (ret)
+		return ret;
+
+	return devm_add_action_or_reset(dev, ras2_unregister_pcc_channel, ras2_ctx);
+}
+EXPORT_SYMBOL_NS_GPL(devm_ras2_register_pcc_channel, ACPI_RAS2);
+
+static struct platform_device *ras2_add_platform_device(char *name, int channel)
+{
+	int ret;
+	struct platform_device *pdev __free(platform_device_put) =
+		platform_device_alloc(name, PLATFORM_DEVID_AUTO);
+	if (!pdev)
+		return ERR_PTR(-ENOMEM);
+
+	ret = platform_device_add_data(pdev, &channel, sizeof(channel));
+	if (ret)
+		return ERR_PTR(ret);
+
+	ret = platform_device_add(pdev);
+	if (ret)
+		return ERR_PTR(ret);
+
+	return_ptr(pdev);
+}
+
+static struct acpi_table_header *acpi_get_table2(acpi_string signature,
+						 u32 instance)
+{
+	struct acpi_table_header *header = NULL;
+	acpi_status status = acpi_get_table(signature, instance, &header);
+
+	if (ACPI_FAILURE(status))
+		return ERR_PTR(-EINVAL);
+
+	return header;
+}
+
+static int __init ras2_acpi_init(void)
+{
+	struct acpi_ras2_pcc_desc *pcc_desc_list;
+	struct acpi_table_ras2 *pRas2Table;
+	struct platform_device *pdev;
+	int pcc_subspace_id;
+	acpi_size ras2_size;
+	u8 count = 0, i;
+
+	struct acpi_table_header *pAcpiTable __free(acpi_put_table) =
+						acpi_get_table2("RAS2", 0);
+	if (IS_ERR_OR_NULL(pAcpiTable)) {
+		pr_err("ACPI RAS2 driver failed to initialize, get table failed\n");
+		return -ENODEV;
+	}
+
+	ras2_size = pAcpiTable->length;
+	if (ras2_size < sizeof(struct acpi_table_ras2)) {
+		pr_err("ACPI RAS2 table present but broken (too short #1)\n");
+		return -EINVAL;
+	}
+
+	pRas2Table = (struct acpi_table_ras2 *)pAcpiTable;
+	if (pRas2Table->num_pcc_descs <= 0) {
+		pr_err("ACPI RAS2 table does not contain PCC descriptors\n");
+		return -EINVAL;
+	}
+
+	struct platform_device **pdev_list __free(kfree) =
+			kcalloc(pRas2Table->num_pcc_descs, sizeof(*pdev_list),
+				GFP_KERNEL);
+	if (!pdev_list)
+		return -ENOMEM;
+
+	pcc_desc_list = (struct acpi_ras2_pcc_desc *)(pRas2Table + 1);
+	/* Double scan for the case of only one actual controller */
+	pcc_subspace_id = -1;
+	count = 0;
+	for (i = 0; i < pRas2Table->num_pcc_descs; i++, pcc_desc_list++) {
+		if (pcc_desc_list->feature_type != RAS2_FEATURE_TYPE_MEMORY)
+			continue;
+		if (pcc_subspace_id == -1) {
+			pcc_subspace_id = pcc_desc_list->channel_id;
+			count++;
+		}
+		if (pcc_desc_list->channel_id != pcc_subspace_id)
+			count++;
+	}
+	if (count == 1) {
+		pdev = ras2_add_platform_device("acpi_ras2", pcc_subspace_id);
+		if (!pdev)
+			goto free_ras2_pdev;
+		pdev_list[0] = pdev;
+		return 0;
+	}
+
+	count = 0;
+	for (i = 0; i < pRas2Table->num_pcc_descs; i++, pcc_desc_list++) {
+		if (pcc_desc_list->feature_type != RAS2_FEATURE_TYPE_MEMORY)
+			continue;
+		pcc_subspace_id = pcc_desc_list->channel_id;
+		/* Add the platform device and bind ACPI RAS2 memory driver */
+		pdev = ras2_add_platform_device("acpi_ras2", pcc_subspace_id);
+		if (!pdev)
+			goto free_ras2_pdev;
+		pdev_list[count++] = pdev;
+	}
+
+	return 0;
+
+free_ras2_pdev:
+	for (i = count; i >= 0; i++)
+		platform_device_put(pdev_list[i]);
+
+	return -ENODEV;
+}
+late_initcall(ras2_acpi_init)
diff --git a/include/acpi/ras2_acpi.h b/include/acpi/ras2_acpi.h
new file mode 100644
index 000000000000..8c9430e6383e
--- /dev/null
+++ b/include/acpi/ras2_acpi.h
@@ -0,0 +1,59 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * RAS2 ACPI driver header file
+ *
+ * (C) Copyright 2014, 2015 Hewlett-Packard Enterprises
+ *
+ * Copyright (c) 2024 HiSilicon Limited
+ */
+
+#ifndef _RAS2_ACPI_H
+#define _RAS2_ACPI_H
+
+#include <linux/acpi.h>
+#include <linux/mailbox_client.h>
+#include <linux/mutex.h>
+#include <linux/types.h>
+
+#define RAS2_PCC_CMD_COMPLETE	BIT(0)
+#define RAS2_PCC_CMD_ERROR	BIT(2)
+
+/* RAS2 specific PCC commands */
+#define RAS2_PCC_CMD_EXEC 0x01
+
+struct device;
+
+/* Data structures for PCC communication and RAS2 table */
+struct pcc_mbox_chan;
+
+struct ras2_pcc_subspace {
+	int pcc_subspace_id;
+	struct mbox_client mbox_client;
+	struct pcc_mbox_chan *pcc_chan;
+	struct acpi_ras2_shared_memory __iomem *pcc_comm_addr;
+	u64 comm_base_addr;
+	bool pcc_channel_acquired;
+	ktime_t deadline;
+	unsigned int pcc_mpar;
+	unsigned int pcc_mrtt;
+	struct list_head elem;
+	u16 ref_count;
+};
+
+struct ras2_scrub_ctx {
+	struct device *dev;
+	struct ras2_pcc_subspace *pcc_subspace;
+	int id;
+	struct device *scrub_dev;
+	bool bg;
+	u64 base, size;
+	u8 rate, rate_min, rate_max;
+	/* Lock to provide mutually exclusive access to PCC channel */
+	struct mutex lock;
+};
+
+int ras2_send_pcc_cmd(struct ras2_scrub_ctx *ras2_ctx, u16 cmd);
+int devm_ras2_register_pcc_channel(struct device *dev, struct ras2_scrub_ctx *ras2_ctx,
+				   int pcc_subspace_id);
+
+#endif /* _RAS2_ACPI_H */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH v8 09/10] ras: scrub: Add scrub control attributes for ACPI RAS2
  2024-04-19 16:47 [RFC PATCH v8 00/10] ras: scrub: introduce subsystem + CXL/ACPI-RAS2 drivers shiju.jose
                   ` (7 preceding siblings ...)
  2024-04-19 16:47 ` [RFC PATCH v8 08/10] ACPI:RAS2: Add ACPI RAS2 driver shiju.jose
@ 2024-04-19 16:47 ` shiju.jose
  2024-04-19 16:47 ` [RFC PATCH v8 10/10] ras: scrub: ACPI RAS2: Add memory ACPI RAS2 driver shiju.jose
  9 siblings, 0 replies; 22+ messages in thread
From: shiju.jose @ 2024-04-19 16:47 UTC (permalink / raw)
  To: linux-cxl, linux-acpi, linux-mm, dan.j.williams, dave,
	jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
	ira.weiny
  Cc: linux-edac, linux-kernel, david, Vilas.Sridharan, leo.duran,
	Yazen.Ghannam, rientjes, jiaqiyan, tony.luck, Jon.Grimm,
	dave.hansen, rafael, lenb, naoya.horiguchi, james.morse,
	jthoughton, somasundaram.a, erdemaktas, pgonda, duenwen,
	mike.malvestuto, gthelen, wschwartz, dferguson, wbs, nifan.cxl,
	yazen.ghannam, tanxiaofei, prime.zeng, kangkang.shen,
	wanghuiqiang, linuxarm, shiju.jose

From: Shiju Jose <shiju.jose@huawei.com>

Add scrub control attributes for ACPI RAS2 patrol scrub feature.

Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
 .../ABI/testing/sysfs-class-scrub-configure   |  28 +++-
 drivers/ras/memory_scrub.c                    | 131 ++++++++++++++++++
 include/linux/memory_scrub.h                  |   8 ++
 3 files changed, 165 insertions(+), 2 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-class-scrub-configure b/Documentation/ABI/testing/sysfs-class-scrub-configure
index 3ed77dbb00ad..7178776249f8 100644
--- a/Documentation/ABI/testing/sysfs-class-scrub-configure
+++ b/Documentation/ABI/testing/sysfs-class-scrub-configure
@@ -15,12 +15,21 @@ Description:
 		correspond to each scrub device registered with the
 		scrub subsystem.
 
-What:		/sys/class/ras/rasX/scrub/name
+What:		/sys/class/ras/rasX/scrub/addr_range_base
 Date:		March 2024
 KernelVersion:	6.9
 Contact:	linux-kernel@vger.kernel.org
 Description:
-		(RO) name of the memory scrubber
+		(RW) The base of the address range of the memory region
+		to be scrubbed (on-demand scrubbing).
+
+What:		/sys/class/ras/rasX/scrub/addr_range_size
+Date:		March 2024
+KernelVersion:	6.9
+Contact:	linux-kernel@vger.kernel.org
+Description:
+		(RW) The size of the address range of the memory region
+		to be scrubbed (on-demand scrubbing).
 
 What:		/sys/class/ras/rasX/scrub/enable_background
 Date:		March 2024
@@ -29,6 +38,21 @@ Contact:	linux-kernel@vger.kernel.org
 Description:
 		(RW) Enable/Disable background(patrol) scrubbing if supported.
 
+What:		/sys/class/ras/rasX/scrub/enable_on_demand
+Date:		March 2024
+KernelVersion:	6.9
+Contact:	linux-kernel@vger.kernel.org
+Description:
+		(RW) Enable/Disable on-demand scrubbing the memory region
+		if supported.
+
+What:		/sys/class/ras/rasX/scrub/name
+Date:		March 2024
+KernelVersion:	6.9
+Contact:	linux-kernel@vger.kernel.org
+Description:
+		(RO) name of the memory scrubber
+
 What:		/sys/class/ras/rasX/scrub/rate_available
 Date:		March 2024
 KernelVersion:	6.9
diff --git a/drivers/ras/memory_scrub.c b/drivers/ras/memory_scrub.c
index 7e995380ec3a..ace6c59b8993 100755
--- a/drivers/ras/memory_scrub.c
+++ b/drivers/ras/memory_scrub.c
@@ -29,6 +29,83 @@ struct scrub_device {
 };
 
 #define to_scrub_device(d) container_of(d, struct scrub_device, dev)
+static ssize_t addr_range_base_show(struct device *dev,
+				    struct device_attribute *attr,
+				    char *buf)
+{
+	struct scrub_device *scrub_dev = to_scrub_device(dev);
+	u64 base, size;
+	int ret;
+
+	ret = scrub_dev->ops->read_range(dev, &base, &size);
+	if (ret)
+		return ret;
+
+	return sysfs_emit(buf, "0x%llx\n", base);
+}
+
+static ssize_t addr_range_size_show(struct device *dev,
+				    struct device_attribute *attr,
+				    char *buf)
+{
+	struct scrub_device *scrub_dev = to_scrub_device(dev);
+	u64 base, size;
+	int ret;
+
+	ret = scrub_dev->ops->read_range(dev, &base, &size);
+	if (ret)
+		return ret;
+
+	return sysfs_emit(buf, "0x%llx\n", size);
+}
+
+static ssize_t addr_range_base_store(struct device *dev,
+				     struct device_attribute *attr,
+				     const char *buf, size_t len)
+{
+	struct scrub_device *scrub_dev = to_scrub_device(dev);
+	u64 base, size;
+	int ret;
+
+	ret = scrub_dev->ops->read_range(dev, &base, &size);
+	if (ret)
+		return ret;
+
+	ret = kstrtou64(buf, 16, &base);
+	if (ret < 0)
+		return ret;
+
+	ret = scrub_dev->ops->write_range(dev, base, size);
+	if (ret)
+		return ret;
+
+	return len;
+}
+
+static ssize_t addr_range_size_store(struct device *dev,
+				     struct device_attribute *attr,
+				     const char *buf,
+				     size_t len)
+{
+	struct scrub_device *scrub_dev = to_scrub_device(dev);
+	u64 base, size;
+	int ret;
+
+	ret = scrub_dev->ops->read_range(dev, &base, &size);
+	if (ret)
+		return ret;
+
+	ret = kstrtou64(buf, 16, &size);
+	if (ret < 0)
+		return ret;
+
+	ret = scrub_dev->ops->write_range(dev, base, size);
+	if (ret)
+		return ret;
+
+	return len;
+}
+
 static ssize_t enable_background_store(struct device *dev,
 				       struct device_attribute *attr,
 				       const char *buf, size_t len)
@@ -62,6 +139,39 @@ static ssize_t enable_background_show(struct device *dev,
 	return sysfs_emit(buf, "%d\n", enable);
 }
 
+static ssize_t enable_on_demand_show(struct device *dev,
+				     struct device_attribute *attr, char *buf)
+{
+	struct scrub_device *scrub_dev = to_scrub_device(dev);
+	bool enable;
+	int ret;
+
+	ret = scrub_dev->ops->get_enabled_od(dev, &enable);
+	if (ret)
+		return ret;
+
+	return sysfs_emit(buf, "%d\n", enable);
+}
+
+static ssize_t enable_on_demand_store(struct device *dev,
+				      struct device_attribute *attr,
+				      const char *buf, size_t len)
+{
+	struct scrub_device *scrub_dev = to_scrub_device(dev);
+	bool enable;
+	int ret;
+
+	ret = kstrtobool(buf, &enable);
+	if (ret < 0)
+		return ret;
+
+	ret = scrub_dev->ops->set_enabled_od(dev, enable);
+	if (ret)
+		return ret;
+
+	return len;
+}
+
 static ssize_t name_show(struct device *dev,
 			 struct device_attribute *attr, char *buf)
 {
@@ -122,13 +232,19 @@ static ssize_t rate_available_show(struct device *dev,
 	return sysfs_emit(buf, "0x%llx-0x%llx\n", min_sr, max_sr);
 }
 
+DEVICE_ATTR_RW(addr_range_base);
+DEVICE_ATTR_RW(addr_range_size);
 DEVICE_ATTR_RW(enable_background);
+DEVICE_ATTR_RW(enable_on_demand);
 DEVICE_ATTR_RO(name);
 DEVICE_ATTR_RW(rate);
 DEVICE_ATTR_RO(rate_available);
 
 static struct attribute *scrub_attrs[] = {
+	&dev_attr_addr_range_base.attr,
+	&dev_attr_addr_range_size.attr,
 	&dev_attr_enable_background.attr,
+	&dev_attr_enable_on_demand.attr,
 	&dev_attr_name.attr,
 	&dev_attr_rate.attr,
 	&dev_attr_rate_available.attr,
@@ -142,6 +258,14 @@ static umode_t scrub_attr_visible(struct kobject *kobj,
 	struct scrub_device *scrub_dev = to_scrub_device(dev);
 	const struct scrub_ops *ops = scrub_dev->ops;
 
+	if (a == &dev_attr_addr_range_base.attr ||
+	    a == &dev_attr_addr_range_size.attr) {
+		if (ops->read_range && ops->write_range)
+			return a->mode;
+		if (ops->read_range)
+			return 0444;
+		return 0;
+	}
 	if (a == &dev_attr_enable_background.attr) {
 		if (ops->set_enabled_bg && ops->get_enabled_bg)
 			return a->mode;
@@ -149,6 +273,13 @@ static umode_t scrub_attr_visible(struct kobject *kobj,
 			return 0444;
 		return 0;
 	}
+	if (a == &dev_attr_enable_on_demand.attr) {
+		if (ops->set_enabled_od && ops->get_enabled_od)
+			return a->mode;
+		if (ops->get_enabled_od)
+			return 0444;
+		return 0;
+	}
 	if (a == &dev_attr_name.attr)
 		return ops->get_name ? a->mode : 0;
 	if (a == &dev_attr_rate_available.attr)
diff --git a/include/linux/memory_scrub.h b/include/linux/memory_scrub.h
index f0e1657a5072..d8edb48677c9 100755
--- a/include/linux/memory_scrub.h
+++ b/include/linux/memory_scrub.h
@@ -15,16 +15,24 @@ struct device;
 
 /**
  * struct scrub_ops - scrub device operations (all elements optional)
+ * @read_range: read base and offset of scrubbing range.
+ * @write_range: set the base and offset of the scrubbing range.
  * @get_enabled_bg: check if currently performing background scrub.
  * @set_enabled_bg: start or stop a bg-scrub.
+ * @get_enabled_od: check if currently performing on-demand scrub.
+ * @set_enabled_od: start or stop an on-demand scrub.
  * @get_name: get the memory scrubber name.
  * @rate_avail_range: retrieve limits on supported rates.
  * @rate_read: read the scrub rate
  * @rate_write: set the scrub rate
  */
 struct scrub_ops {
+	int (*read_range)(struct device *dev, u64 *base, u64 *size);
+	int (*write_range)(struct device *dev, u64 base, u64 size);
 	int (*get_enabled_bg)(struct device *dev, bool *enable);
 	int (*set_enabled_bg)(struct device *dev, bool enable);
+	int (*get_enabled_od)(struct device *dev, bool *enable);
+	int (*set_enabled_od)(struct device *dev, bool enable);
 	int (*get_name)(struct device *dev, char *buf);
 	int (*rate_avail_range)(struct device *dev, u64 *min, u64 *max);
 	int (*rate_read)(struct device *dev, u64 *rate);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH v8 10/10] ras: scrub: ACPI RAS2: Add memory ACPI RAS2 driver
  2024-04-19 16:47 [RFC PATCH v8 00/10] ras: scrub: introduce subsystem + CXL/ACPI-RAS2 drivers shiju.jose
                   ` (8 preceding siblings ...)
  2024-04-19 16:47 ` [RFC PATCH v8 09/10] ras: scrub: Add scrub control attributes for ACPI RAS2 shiju.jose
@ 2024-04-19 16:47 ` shiju.jose
  9 siblings, 0 replies; 22+ messages in thread
From: shiju.jose @ 2024-04-19 16:47 UTC (permalink / raw)
  To: linux-cxl, linux-acpi, linux-mm, dan.j.williams, dave,
	jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
	ira.weiny
  Cc: linux-edac, linux-kernel, david, Vilas.Sridharan, leo.duran,
	Yazen.Ghannam, rientjes, jiaqiyan, tony.luck, Jon.Grimm,
	dave.hansen, rafael, lenb, naoya.horiguchi, james.morse,
	jthoughton, somasundaram.a, erdemaktas, pgonda, duenwen,
	mike.malvestuto, gthelen, wschwartz, dferguson, wbs, nifan.cxl,
	yazen.ghannam, tanxiaofei, prime.zeng, kangkang.shen,
	wanghuiqiang, linuxarm, shiju.jose

From: Shiju Jose <shiju.jose@huawei.com>

Memory ACPI RAS2 driver binds to the platform device add by the
ACPI RAS2 table parser.

Driver uses a PCC subspace for communicating with the ACPI compliant
platform to provide control of memory scrub parameters via the scrub
subsystem.

Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
 Documentation/scrub/scrub-configure.rst |  33 +++
 drivers/ras/Kconfig                     |  10 +
 drivers/ras/Makefile                    |   1 +
 drivers/ras/acpi_ras2.c                 | 358 ++++++++++++++++++++++++
 4 files changed, 402 insertions(+)
 create mode 100644 drivers/ras/acpi_ras2.c

diff --git a/Documentation/scrub/scrub-configure.rst b/Documentation/scrub/scrub-configure.rst
index 2275366b60d3..7a1bf87bc6d7 100644
--- a/Documentation/scrub/scrub-configure.rst
+++ b/Documentation/scrub/scrub-configure.rst
@@ -50,3 +50,36 @@ The usage takes the form shown in this example::
     # echo 0 > /sys/class/ras/ras0/scrub/enable_background
     # cat /sys/class/ras/ras0/scrub/enable_background
     # 0
+
+2. RAS2
+2.1 On demand scrubbing for a specific memory region.
+    # echo 0x120000 > /sys/class/ras/ras1/scrub/addr_range_base
+    # echo 0x150000 > /sys/class/ras/ras1/scrub/addr_range_size
+    # cat /sys/class/ras/ras1/scrub/rate_available
+    # 0x1-0x18
+    # echo 20 > /sys/class/ras/ras1/scrub/rate
+    # echo 1 > /sys/class/ras/ras1/scrub/enable_on_demand
+    # cat /sys/class/ras/ras1/scrub/enable_on_demand
+    # 1
+    # cat /sys/class/ras/ras1/scrub/rate
+    # 0x14
+    # cat /sys/class/ras/ras1/scrub/addr_range_base
+    # 0x120000
+    # cat /sys/class/ras/ras1/scrub/addr_range_size
+    # 0x150000
+    # echo 0 > /sys/class/ras/ras1/scrub/enable_on_demand
+    # cat /sys/class/ras/ras1/scrub/enable_on_demand
+    # 0
+
+2.2 Background scrubbing the entire memory
+    # cat /sys/class/ras/ras1/scrub/rate_available
+    # 0x1-0x18
+    # echo 3 > /sys/class/ras/ras1/scrub/rate
+    # echo 1 > /sys/class/ras/ras1/scrub/enable_background
+    # cat /sys/class/ras/ras1/scrub/enable_background
+    # 1
+    # cat /sys/class/ras/ras1/scrub/rate
+    # 0x3
+    # echo 0 > /sys/class/ras/ras1/scrub/enable_background
+    # cat /sys/class/ras/ras1/scrub/enable_background
+    # 0
diff --git a/drivers/ras/Kconfig b/drivers/ras/Kconfig
index 181701479564..57c346dfc01f 100644
--- a/drivers/ras/Kconfig
+++ b/drivers/ras/Kconfig
@@ -53,4 +53,14 @@ config SCRUB
 	  configuring the parameters of underlying scrubbers in the
 	  system for the DRAM memories.
 
+config MEM_ACPI_RAS2
+	tristate "Memory ACPI RAS2 driver"
+	depends on ACPI_RAS2
+	depends on SCRUB
+	help
+	  The driver binds to the platform device added by the ACPI RAS2
+	  table parser. Use a PCC channel subspace for communicating with
+	  the ACPI compliant platform to provide control of memory scrub
+	  parameters via the scrub subsystem.
+
 endif
diff --git a/drivers/ras/Makefile b/drivers/ras/Makefile
index 89bcf0d84355..48339fee1cb3 100644
--- a/drivers/ras/Makefile
+++ b/drivers/ras/Makefile
@@ -3,6 +3,7 @@ obj-$(CONFIG_RAS)	+= ras.o
 obj-$(CONFIG_DEBUG_FS)	+= debugfs.o
 obj-$(CONFIG_RAS_CEC)	+= cec.o
 obj-$(CONFIG_SCRUB)	+= memory_scrub.o
+obj-$(CONFIG_MEM_ACPI_RAS2)	+= acpi_ras2.o
 
 obj-$(CONFIG_RAS_FMPM)	+= amd/fmpm.o
 obj-y			+= amd/atl/
diff --git a/drivers/ras/acpi_ras2.c b/drivers/ras/acpi_ras2.c
new file mode 100644
index 000000000000..b3e9b61367bb
--- /dev/null
+++ b/drivers/ras/acpi_ras2.c
@@ -0,0 +1,363 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * ACPI RAS2 memory driver
+ *
+ * Copyright (c) 2024 HiSilicon Limited.
+ *
+ */
+
+#define pr_fmt(fmt)	"MEMORY ACPI RAS2: " fmt
+
+#include <linux/memory_scrub.h>
+#include <linux/platform_device.h>
+#include <acpi/ras2_acpi.h>
+
+#define RAS2_SUPPORT_HW_PARTOL_SCRUB	BIT(0)
+#define RAS2_TYPE_PATROL_SCRUB	0x0000
+
+#define RAS2_GET_PATROL_PARAMETERS	0x01
+#define	RAS2_START_PATROL_SCRUBBER	0x02
+#define	RAS2_STOP_PATROL_SCRUBBER	0x03
+
+#define RAS2_PATROL_SCRUB_RATE_IN_MASK	GENMASK(15, 8)
+#define RAS2_PATROL_SCRUB_EN_BACKGROUND	BIT(0)
+#define RAS2_PATROL_SCRUB_RATE_OUT_MASK	GENMASK(7, 0)
+#define RAS2_PATROL_SCRUB_MIN_RATE_OUT_MASK	GENMASK(15, 8)
+#define RAS2_PATROL_SCRUB_MAX_RATE_OUT_MASK	GENMASK(23, 16)
+#define RAS2_PATROL_SCRUB_FLAG_SCRUBBER_RUNNING	BIT(0)
+
+struct acpi_ras2_ps_shared_mem {
+	struct acpi_ras2_shared_memory common;
+	struct acpi_ras2_patrol_scrub_parameter params;
+};
+
+static int ras2_is_patrol_scrub_support(struct ras2_scrub_ctx *ras2_ctx)
+{
+	struct acpi_ras2_shared_memory __iomem *common = (void *)
+				ras2_ctx->pcc_subspace->pcc_comm_addr;
+
+	guard(mutex)(&ras2_ctx->lock);
+	common->set_capabilities[0] = 0;
+
+	return common->features[0] & RAS2_SUPPORT_HW_PARTOL_SCRUB;
+}
+
+static int ras2_update_patrol_scrub_params_cache(struct ras2_scrub_ctx *ras2_ctx)
+{
+	struct acpi_ras2_ps_shared_mem __iomem *ps_sm = (void *)
+					ras2_ctx->pcc_subspace->pcc_comm_addr;
+	int ret;
+
+	ps_sm->common.set_capabilities[0] = RAS2_SUPPORT_HW_PARTOL_SCRUB;
+	ps_sm->params.patrol_scrub_command = RAS2_GET_PATROL_PARAMETERS;
+
+	ret = ras2_send_pcc_cmd(ras2_ctx, RAS2_PCC_CMD_EXEC);
+	if (ret) {
+		dev_err(ras2_ctx->dev, "failed to read parameters\n");
+		return ret;
+	}
+
+	ras2_ctx->rate_min = FIELD_GET(RAS2_PATROL_SCRUB_MIN_RATE_OUT_MASK,
+				       ps_sm->params.scrub_params_out);
+	ras2_ctx->rate_max = FIELD_GET(RAS2_PATROL_SCRUB_MAX_RATE_OUT_MASK,
+				       ps_sm->params.scrub_params_out);
+	ras2_ctx->base = ps_sm->params.actual_address_range[0];
+	ras2_ctx->size = ps_sm->params.actual_address_range[1];
+	ras2_ctx->rate = FIELD_GET(RAS2_PATROL_SCRUB_RATE_OUT_MASK,
+				   ps_sm->params.scrub_params_out);
+	return 0;
+}
+
+/* Context - lock must be held */
+static int ras2_get_patrol_scrub_running(struct ras2_scrub_ctx *ras2_ctx,
+					 bool *running)
+{
+	struct acpi_ras2_ps_shared_mem __iomem *ps_sm = (void *)
+					ras2_ctx->pcc_subspace->pcc_comm_addr;
+	int ret;
+
+	if (ras2_ctx->bg)
+		*running = true;
+
+	ps_sm->common.set_capabilities[0] = RAS2_SUPPORT_HW_PARTOL_SCRUB;
+	ps_sm->params.patrol_scrub_command = RAS2_GET_PATROL_PARAMETERS;
+
+	ret = ras2_send_pcc_cmd(ras2_ctx, RAS2_PCC_CMD_EXEC);
+	if (ret) {
+		dev_err(ras2_ctx->dev, "failed to read parameters\n");
+		return ret;
+	}
+
+	*running = ps_sm->params.flags & RAS2_PATROL_SCRUB_FLAG_SCRUBBER_RUNNING;
+
+	return 0;
+}
+
+static int ras2_hw_scrub_write_rate(struct device *dev, u64 rate)
+{
+	struct ras2_scrub_ctx *ras2_ctx = dev_get_drvdata(dev);
+	bool running;
+	int ret;
+
+	guard(mutex)(&ras2_ctx->lock);
+	ret = ras2_get_patrol_scrub_running(ras2_ctx, &running);
+	if (ret)
+		return ret;
+
+	if (running)
+		return -EBUSY;
+
+	if (rate < ras2_ctx->rate_min || rate > ras2_ctx->rate_max)
+		return -EINVAL;
+
+	ras2_ctx->rate = rate;
+
+	return 0;
+}
+
+static int ras2_hw_scrub_read_rate(struct device *dev, u64 *rate)
+{
+	struct ras2_scrub_ctx *ras2_ctx = dev_get_drvdata(dev);
+
+	*rate = ras2_ctx->rate;
+
+	return 0;
+}
+
+static int ras2_hw_scrub_read_rate_avail(struct device *dev, u64 *min, u64 *max)
+{
+	struct ras2_scrub_ctx *ras2_ctx = dev_get_drvdata(dev);
+
+	*min = ras2_ctx->rate_min;
+	*max = ras2_ctx->rate_max;
+
+	return 0;
+}
+
+static int ras2_hw_scrub_read_range(struct device *dev, u64 *base, u64 *size)
+{
+	struct ras2_scrub_ctx *ras2_ctx = dev_get_drvdata(dev);
+
+	*base = ras2_ctx->base;
+	*size = ras2_ctx->size;
+
+	return 0;
+}
+
+static int ras2_hw_scrub_write_range(struct device *dev, u64 base, u64 size)
+{
+	struct ras2_scrub_ctx *ras2_ctx = dev_get_drvdata(dev);
+	bool running;
+	int ret;
+
+	guard(mutex)(&ras2_ctx->lock);
+	ret = ras2_get_patrol_scrub_running(ras2_ctx, &running);
+	if (ret)
+		return ret;
+
+	if (running)
+		return -EBUSY;
+
+	ras2_ctx->base = base;
+	ras2_ctx->size = size;
+
+	return 0;
+}
+
+static int ras2_hw_scrub_set_enabled_bg(struct device *dev, bool enable)
+{
+	struct ras2_scrub_ctx *ras2_ctx = dev_get_drvdata(dev);
+	struct acpi_ras2_ps_shared_mem __iomem *ps_sm = (void *)
+					ras2_ctx->pcc_subspace->pcc_comm_addr;
+	int ret;
+
+	guard(mutex)(&ras2_ctx->lock);
+	ps_sm->common.set_capabilities[0] = RAS2_SUPPORT_HW_PARTOL_SCRUB;
+	if (enable) {
+		ps_sm->params.requested_address_range[0] = 0;
+		ps_sm->params.requested_address_range[1] = 0;
+		ps_sm->params.scrub_params_in &= ~RAS2_PATROL_SCRUB_RATE_IN_MASK;
+		ps_sm->params.scrub_params_in |= FIELD_PREP(RAS2_PATROL_SCRUB_RATE_IN_MASK,
+							    ras2_ctx->rate);
+		ps_sm->params.patrol_scrub_command = RAS2_START_PATROL_SCRUBBER;
+	} else {
+		ps_sm->params.patrol_scrub_command = RAS2_STOP_PATROL_SCRUBBER;
+	}
+	ps_sm->params.scrub_params_in &= ~RAS2_PATROL_SCRUB_EN_BACKGROUND;
+	ps_sm->params.scrub_params_in |= FIELD_PREP(RAS2_PATROL_SCRUB_EN_BACKGROUND,
+						    enable);
+
+	ret = ras2_send_pcc_cmd(ras2_ctx, RAS2_PCC_CMD_EXEC);
+	if (ret) {
+		dev_err(ras2_ctx->dev, "%s: failed to enable(%d) background scrubbing\n",
+			__func__, enable);
+		return ret;
+	}
+	ras2_ctx->bg = true;
+
+	/* Update the cache to account for rounding of supplied parameters and similar */
+	return ras2_update_patrol_scrub_params_cache(ras2_ctx);
+}
+
+static int ras2_hw_scrub_get_enabled_bg(struct device *dev, bool *enabled)
+{
+	struct ras2_scrub_ctx *ras2_ctx = dev_get_drvdata(dev);
+
+	*enabled = ras2_ctx->bg;
+
+	return 0;
+}
+
+static int ras2_hw_scrub_set_enabled_od(struct device *dev, bool enable)
+{
+	struct ras2_scrub_ctx *ras2_ctx = dev_get_drvdata(dev);
+	struct acpi_ras2_ps_shared_mem __iomem *ps_sm = (void *)
+					ras2_ctx->pcc_subspace->pcc_comm_addr;
+	bool enabled;
+	int ret;
+
+	guard(mutex)(&ras2_ctx->lock);
+	ps_sm->common.set_capabilities[0] = RAS2_SUPPORT_HW_PARTOL_SCRUB;
+	if (enable) {
+		if (!ras2_ctx->size) {
+			dev_warn(ras2_ctx->dev,
+				 "%s: Invalid requested address range, requested_address_range[0]=0x%llx "
+				 "requested_address_range[1]=0x%llx\n", __func__,
+				 ps_sm->params.requested_address_range[0],
+				 ps_sm->params.requested_address_range[1]);
+			return -ERANGE;
+		}
+		ret = ras2_get_patrol_scrub_running(ras2_ctx, &enabled);
+		if (ret)
+			return ret;
+
+		if (enabled)
+			return 0;
+
+		ps_sm->params.scrub_params_in &= ~RAS2_PATROL_SCRUB_RATE_IN_MASK;
+		ps_sm->params.scrub_params_in |= FIELD_PREP(RAS2_PATROL_SCRUB_RATE_IN_MASK,
+							    ras2_ctx->rate);
+		ps_sm->params.requested_address_range[0] = ras2_ctx->base;
+		ps_sm->params.requested_address_range[1] = ras2_ctx->size;
+		ps_sm->params.patrol_scrub_command = RAS2_START_PATROL_SCRUBBER;
+	} else {
+		ps_sm->params.patrol_scrub_command = RAS2_STOP_PATROL_SCRUBBER;
+	}
+
+	ret = ras2_send_pcc_cmd(ras2_ctx, RAS2_PCC_CMD_EXEC);
+	if (ret) {
+		dev_err(ras2_ctx->dev, "failed to enable(%d) the demand scrubbing\n", enable);
+		return ret;
+	}
+	ras2_ctx->bg = false;
+
+	return ras2_update_patrol_scrub_params_cache(ras2_ctx);
+}
+
+static int ras2_hw_scrub_get_enabled_od(struct device *dev, bool *enabled)
+{
+	struct ras2_scrub_ctx *ras2_ctx = dev_get_drvdata(dev);
+
+	guard(mutex)(&ras2_ctx->lock);
+	if (ras2_ctx->bg) {
+		*enabled = false;
+		return 0;
+	}
+
+	return ras2_get_patrol_scrub_running(ras2_ctx, enabled);
+}
+
+static int ras2_hw_scrub_get_name(struct device *dev, char *name)
+{
+	struct ras2_scrub_ctx *ras2_ctx = dev_get_drvdata(dev);
+
+	return sysfs_emit(name, "ras2_scrub%d\n", ras2_ctx->id);
+}
+
+static const struct scrub_ops ras2_scrub_ops = {
+	.read_range = ras2_hw_scrub_read_range,
+	.write_range = ras2_hw_scrub_write_range,
+	.get_enabled_bg = ras2_hw_scrub_get_enabled_bg,
+	.set_enabled_bg = ras2_hw_scrub_set_enabled_bg,
+	.get_enabled_od = ras2_hw_scrub_get_enabled_od,
+	.set_enabled_od = ras2_hw_scrub_set_enabled_od,
+	.get_name = ras2_hw_scrub_get_name,
+	.rate_avail_range = ras2_hw_scrub_read_rate_avail,
+	.rate_read = ras2_hw_scrub_read_rate,
+	.rate_write = ras2_hw_scrub_write_rate,
+};
+
+static DEFINE_IDA(ras2_ida);
+
+static void ida_release(void *ctx)
+{
+	struct ras2_scrub_ctx *ras2_ctx = ctx;
+
+	ida_free(&ras2_ida, ras2_ctx->id);
+}
+
+static int ras2_probe(struct platform_device *pdev)
+{
+	struct ras2_scrub_ctx *ras2_ctx;
+	struct device *hw_scrub_dev;
+	int ret, id;
+
+	/* RAS2 PCC Channel and Scrub specific context */
+	ras2_ctx = devm_kzalloc(&pdev->dev, sizeof(*ras2_ctx), GFP_KERNEL);
+	if (!ras2_ctx)
+		return -ENOMEM;
+
+	ras2_ctx->dev = &pdev->dev;
+	mutex_init(&ras2_ctx->lock);
+
+	ret = devm_ras2_register_pcc_channel(&pdev->dev, ras2_ctx,
+					     *((int *)dev_get_platdata(&pdev->dev)));
+	if (ret < 0) {
+		dev_dbg(ras2_ctx->dev,
+			"failed to register pcc channel ret=%d\n", ret);
+		return ret;
+	}
+	if (!ras2_is_patrol_scrub_support(ras2_ctx))
+		return -EOPNOTSUPP;
+
+	ret = ras2_update_patrol_scrub_params_cache(ras2_ctx);
+	if (ret)
+		return ret;
+
+	id = ida_alloc(&ras2_ida, GFP_KERNEL);
+	if (id < 0)
+		return id;
+
+	ras2_ctx->id = id;
+
+	ret = devm_add_action_or_reset(&pdev->dev, ida_release, ras2_ctx);
+	if (ret < 0)
+		return ret;
+
+	hw_scrub_dev = devm_scrub_device_register(&pdev->dev, ras2_ctx, &ras2_scrub_ops);
+	if (IS_ERR(hw_scrub_dev))
+		return PTR_ERR(hw_scrub_dev);
+
+	ras2_ctx->scrub_dev = hw_scrub_dev;
+
+	return 0;
+}
+
+static const struct platform_device_id ras2_id_table[] = {
+	{ .name = "acpi_ras2", },
+	{ }
+};
+MODULE_DEVICE_TABLE(platform, ras2_id_table);
+
+static struct platform_driver ras2_driver = {
+	.probe = ras2_probe,
+	.driver = {
+		.name = "acpi_ras2",
+	},
+	.id_table = ras2_id_table,
+};
+module_driver(ras2_driver, platform_driver_register, platform_driver_unregister);
+
+MODULE_IMPORT_NS(ACPI_RAS2);
+MODULE_DESCRIPTION("ACPI RAS2 memory driver");
+MODULE_LICENSE("GPL");
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v8 06/10] ACPICA: Add __free() based cleanup function for acpi_put_table
  2024-04-19 16:47 ` [RFC PATCH v8 06/10] ACPICA: Add __free() based cleanup function for acpi_put_table shiju.jose
@ 2024-04-19 18:06   ` Jonathan Cameron
  0 siblings, 0 replies; 22+ messages in thread
From: Jonathan Cameron @ 2024-04-19 18:06 UTC (permalink / raw)
  To: shiju.jose
  Cc: linux-cxl, linux-acpi, linux-mm, dan.j.williams, dave, dave.jiang,
	alison.schofield, vishal.l.verma, ira.weiny, linux-edac,
	linux-kernel, david, Vilas.Sridharan, leo.duran, Yazen.Ghannam,
	rientjes, jiaqiyan, tony.luck, Jon.Grimm, dave.hansen, rafael,
	lenb, naoya.horiguchi, james.morse, jthoughton, somasundaram.a,
	erdemaktas, pgonda, duenwen, mike.malvestuto, gthelen, wschwartz,
	dferguson, wbs, nifan.cxl, tanxiaofei, prime.zeng, kangkang.shen,
	wanghuiqiang, linuxarm

On Sat, 20 Apr 2024 00:47:15 +0800
<shiju.jose@huawei.com> wrote:

> From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> 
> Add __free() based cleanup function for acpi_put_table.
> 
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
> ---

Reviewing (and rejecting) my own patch time ;(

I was thinking this would be useful more widely but hadn't looked
as closely as I should have done.  Sorry Shiju for sending you
down a bad path.

>  include/acpi/acpixf.h | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/include/acpi/acpixf.h b/include/acpi/acpixf.h
> index 3d90716f9522..fc64d903a703 100644
> --- a/include/acpi/acpixf.h
> +++ b/include/acpi/acpixf.h
> @@ -492,6 +492,8 @@ ACPI_EXTERNAL_RETURN_STATUS(acpi_status
>  					    **out_table))
>  ACPI_EXTERNAL_RETURN_VOID(void acpi_put_table(struct acpi_table_header *table))
>  
> +DEFINE_FREE(acpi_put_table, struct acpi_table_header *, if (!IS_ERR_OR_NULL(_T)) acpi_put_table(_T))

This is reliant on acpi_get_table2() in patch 8 / below being used as acpi_get_table()
doesn't return the table.

Maybe we are better off treating acpi_get_table() / acpi_put_table() as if it were a
conditional lock? Or change the 93 instances of acpi_get_table to deal with it returning
a copy of the table handle pointer

That would bring it inline with many other get functions in the kernel + make our life
easier using tooling like this.


+static struct acpi_table_header *acpi_get_table2(acpi_string signature,
+						  u32 instance)
+{
+	struct acpi_table_header *header = NULL;
+	acpi_status status = acpi_get_table(signature, instance, &header);
+
+	if (ACPI_FAILURE(status))
+		return ERR_PTR(-EINVAL);
+
+	return header;
+}
So that we could do things like:
+	struct acpi_table_header *pAcpiTable __free(acpi_put_table) =
+						acpi_get_table2("RAS2", 0);

and avoid having to call acpi_put_table() in error paths etc.

The snag is that acpi_get_table() is from acpica (via this wrapper) so any
modification would be a little messy. Also a number of cases use the status
value via 
const char *msg = acpi_format_exception(status);

Which we'd need to return via some path (a parameter probably). We 'could'
do that but the advantages of this are getting eroded.

Upshot, this is messier than I thought, so we probably shouldn't do it.

The code in ras2 can be done reasonably neatly an outer wrapper function
that gets the table and an inner one that deals with the actual processing
of the entries.

Pity as there are some messy bits of code this would tidy up. In most of
those a helper function also works.

Jonathan

p.s. Whilst looking at this I noticed that acpi_has_watchdog() if it
succeeds doesn't put the wdat table which seems suspicious as a side
effect.

> +
>  ACPI_EXTERNAL_RETURN_STATUS(acpi_status
>  			    acpi_get_table_by_index(u32 table_index,
>  						    struct acpi_table_header


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v8 01/10] ras: scrub: Add scrub subsystem
  2024-04-19 16:47 ` [RFC PATCH v8 01/10] ras: scrub: Add scrub subsystem shiju.jose
@ 2024-04-24 20:25   ` fan
  2024-04-25 10:38     ` Shiju Jose
  2024-04-25 10:15   ` Borislav Petkov
  1 sibling, 1 reply; 22+ messages in thread
From: fan @ 2024-04-24 20:25 UTC (permalink / raw)
  To: shiju.jose
  Cc: linux-cxl, linux-acpi, linux-mm, dan.j.williams, dave,
	jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
	ira.weiny, linux-edac, linux-kernel, david, Vilas.Sridharan,
	leo.duran, Yazen.Ghannam, rientjes, jiaqiyan, tony.luck,
	Jon.Grimm, dave.hansen, rafael, lenb, naoya.horiguchi,
	james.morse, jthoughton, somasundaram.a, erdemaktas, pgonda,
	duenwen, mike.malvestuto, gthelen, wschwartz, dferguson, wbs,
	nifan.cxl, tanxiaofei, prime.zeng, kangkang.shen, wanghuiqiang,
	linuxarm

On Sat, Apr 20, 2024 at 12:47:10AM +0800, shiju.jose@huawei.com wrote:
> From: Shiju Jose <shiju.jose@huawei.com>
> 
> Add scrub subsystem supports configuring the memory scrubbers
> in the system. The scrub subsystem provides the interface for
> registering the scrub devices. The scrub control attributes
> are provided to the user in /sys/class/ras/rasX/scrub
> 
> Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
> ---
>  .../ABI/testing/sysfs-class-scrub-configure   |  47 +++
>  drivers/ras/Kconfig                           |   7 +
>  drivers/ras/Makefile                          |   1 +
>  drivers/ras/memory_scrub.c                    | 271 ++++++++++++++++++
>  include/linux/memory_scrub.h                  |  37 +++
>  5 files changed, 363 insertions(+)
>  create mode 100644 Documentation/ABI/testing/sysfs-class-scrub-configure
>  create mode 100755 drivers/ras/memory_scrub.c
>  create mode 100755 include/linux/memory_scrub.h
> 
> diff --git a/Documentation/ABI/testing/sysfs-class-scrub-configure b/Documentation/ABI/testing/sysfs-class-scrub-configure
> new file mode 100644
> index 000000000000..3ed77dbb00ad
> --- /dev/null
> +++ b/Documentation/ABI/testing/sysfs-class-scrub-configure
> @@ -0,0 +1,47 @@
> +What:		/sys/class/ras/
> +Date:		March 2024
> +KernelVersion:	6.9
> +Contact:	linux-kernel@vger.kernel.org
> +Description:
> +		The ras/ class subdirectory belongs to the
> +		common ras features such as scrub subsystem.
> +
> +What:		/sys/class/ras/rasX/scrub/
> +Date:		March 2024
> +KernelVersion:	6.9
> +Contact:	linux-kernel@vger.kernel.org
> +Description:
> +		The /sys/class/ras/ras{0,1,2,3,...}/scrub directories
> +		correspond to each scrub device registered with the
> +		scrub subsystem.
> +
> +What:		/sys/class/ras/rasX/scrub/name
> +Date:		March 2024
> +KernelVersion:	6.9
> +Contact:	linux-kernel@vger.kernel.org
> +Description:
> +		(RO) name of the memory scrubber
> +
> +What:		/sys/class/ras/rasX/scrub/enable_background
> +Date:		March 2024
> +KernelVersion:	6.9
> +Contact:	linux-kernel@vger.kernel.org
> +Description:
> +		(RW) Enable/Disable background(patrol) scrubbing if supported.
> +
> +What:		/sys/class/ras/rasX/scrub/rate_available
> +Date:		March 2024
> +KernelVersion:	6.9
> +Contact:	linux-kernel@vger.kernel.org
> +Description:
> +		(RO) Supported range for the scrub rate by the scrubber.
> +		The scrub rate represents in hours.
> +
> +What:		/sys/class/ras/rasX/scrub/rate
> +Date:		March 2024
> +KernelVersion:	6.9
> +Contact:	linux-kernel@vger.kernel.org
> +Description:
> +		(RW) The scrub rate specified and it must be with in the
> +		supported range by the scrubber.
> +		The scrub rate represents in hours.
> diff --git a/drivers/ras/Kconfig b/drivers/ras/Kconfig
> index fc4f4bb94a4c..181701479564 100644
> --- a/drivers/ras/Kconfig
> +++ b/drivers/ras/Kconfig
> @@ -46,4 +46,11 @@ config RAS_FMPM
>  	  Memory will be retired during boot time and run time depending on
>  	  platform-specific policies.
>  
> +config SCRUB
> +	tristate "Memory scrub driver"
> +	help
> +	  This option selects the memory scrub subsystem, supports
> +	  configuring the parameters of underlying scrubbers in the
> +	  system for the DRAM memories.
> +
>  endif
> diff --git a/drivers/ras/Makefile b/drivers/ras/Makefile
> index 11f95d59d397..89bcf0d84355 100644
> --- a/drivers/ras/Makefile
> +++ b/drivers/ras/Makefile
> @@ -2,6 +2,7 @@
>  obj-$(CONFIG_RAS)	+= ras.o
>  obj-$(CONFIG_DEBUG_FS)	+= debugfs.o
>  obj-$(CONFIG_RAS_CEC)	+= cec.o
> +obj-$(CONFIG_SCRUB)	+= memory_scrub.o
>  
>  obj-$(CONFIG_RAS_FMPM)	+= amd/fmpm.o
>  obj-y			+= amd/atl/
> diff --git a/drivers/ras/memory_scrub.c b/drivers/ras/memory_scrub.c
> new file mode 100755
> index 000000000000..7e995380ec3a
> --- /dev/null
> +++ b/drivers/ras/memory_scrub.c
> @@ -0,0 +1,271 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Memory scrub subsystem supports configuring the registered
> + * memory scrubbers.
> + *
> + * Copyright (c) 2024 HiSilicon Limited.
> + */
> +
> +#define pr_fmt(fmt)     "MEM SCRUB: " fmt
> +
> +#include <linux/acpi.h>
> +#include <linux/bitops.h>
> +#include <linux/delay.h>
> +#include <linux/kfifo.h>
> +#include <linux/memory_scrub.h>
> +#include <linux/platform_device.h>
> +#include <linux/spinlock.h>
> +
> +/* memory scrubber config definitions */
> +#define SCRUB_ID_PREFIX "ras"
> +#define SCRUB_ID_FORMAT SCRUB_ID_PREFIX "%d"
> +
> +static DEFINE_IDA(scrub_ida);
> +
> +struct scrub_device {
> +	int id;
> +	struct device dev;
> +	const struct scrub_ops *ops;
> +};
> +
> +#define to_scrub_device(d) container_of(d, struct scrub_device, dev)
> +static ssize_t enable_background_store(struct device *dev,
> +				       struct device_attribute *attr,
> +				       const char *buf, size_t len)
> +{
> +	struct scrub_device *scrub_dev = to_scrub_device(dev);
> +	bool enable;
> +	int ret;
> +
> +	ret = kstrtobool(buf, &enable);
> +	if (ret < 0)
> +		return ret;
> +
> +	ret = scrub_dev->ops->set_enabled_bg(dev, enable);
> +	if (ret)
> +		return ret;
> +
> +	return len;
> +}
> +
> +static ssize_t enable_background_show(struct device *dev,
> +				      struct device_attribute *attr, char *buf)
> +{
> +	struct scrub_device *scrub_dev = to_scrub_device(dev);
> +	bool enable;
> +	int ret;
> +
> +	ret = scrub_dev->ops->get_enabled_bg(dev, &enable);
> +	if (ret)
> +		return ret;
> +
> +	return sysfs_emit(buf, "%d\n", enable);
> +}
> +
> +static ssize_t name_show(struct device *dev,
> +			 struct device_attribute *attr, char *buf)
> +{
> +	struct scrub_device *scrub_dev = to_scrub_device(dev);
> +	int ret;
> +
> +	ret = scrub_dev->ops->get_name(dev, buf);
> +	if (ret)
> +		return ret;
> +
> +	return strlen(buf);
> +}
> +
> +static ssize_t rate_show(struct device *dev, struct device_attribute *attr,
> +			 char *buf)
> +{
> +	struct scrub_device *scrub_dev = to_scrub_device(dev);
> +	u64 val;
> +	int ret;
> +
> +	ret = scrub_dev->ops->rate_read(dev, &val);
> +	if (ret)
> +		return ret;
> +
> +	return sysfs_emit(buf, "0x%llx\n", val);
> +}
> +
> +static ssize_t rate_store(struct device *dev, struct device_attribute *attr,
> +			  const char *buf, size_t len)
> +{
> +	struct scrub_device *scrub_dev = to_scrub_device(dev);
> +	long val;
> +	int ret;
> +
> +	ret = kstrtol(buf, 10, &val);
> +	if (ret < 0)
> +		return ret;
> +
> +	ret = scrub_dev->ops->rate_write(dev, val);
> +	if (ret)
> +		return ret;
> +
> +	return len;
> +}
> +
> +static ssize_t rate_available_show(struct device *dev,
> +				   struct device_attribute *attr,
> +				   char *buf)
> +{
> +	struct scrub_device *scrub_dev = to_scrub_device(dev);
> +	u64 min_sr, max_sr;
> +	int ret;
> +
> +	ret = scrub_dev->ops->rate_avail_range(dev, &min_sr, &max_sr);
> +	if (ret)
> +		return ret;
> +
> +	return sysfs_emit(buf, "0x%llx-0x%llx\n", min_sr, max_sr);
> +}
> +
> +DEVICE_ATTR_RW(enable_background);
> +DEVICE_ATTR_RO(name);
> +DEVICE_ATTR_RW(rate);
> +DEVICE_ATTR_RO(rate_available);
> +
> +static struct attribute *scrub_attrs[] = {
> +	&dev_attr_enable_background.attr,
> +	&dev_attr_name.attr,
> +	&dev_attr_rate.attr,
> +	&dev_attr_rate_available.attr,
> +	NULL
> +};
> +
> +static umode_t scrub_attr_visible(struct kobject *kobj,
> +				  struct attribute *a, int attr_id)
> +{
> +	struct device *dev = kobj_to_dev(kobj);
> +	struct scrub_device *scrub_dev = to_scrub_device(dev);
> +	const struct scrub_ops *ops = scrub_dev->ops;
> +
> +	if (a == &dev_attr_enable_background.attr) {
> +		if (ops->set_enabled_bg && ops->get_enabled_bg)
> +			return a->mode;
> +		if (ops->get_enabled_bg)
> +			return 0444;
> +		return 0;
> +	}
> +	if (a == &dev_attr_name.attr)
> +		return ops->get_name ? a->mode : 0;
> +	if (a == &dev_attr_rate_available.attr)
> +		return ops->rate_avail_range ? a->mode : 0;
> +	if (a == &dev_attr_rate.attr) { /* Write only makes little sense */
> +		if (ops->rate_read && ops->rate_write)
> +			return a->mode;
> +		if (ops->rate_read)
> +			return 0444;
> +		return 0;
> +	}
> +
> +	return 0;
> +}
> +
> +static const struct attribute_group scrub_attr_group = {
> +	.name		= "scrub",
> +	.attrs		= scrub_attrs,
> +	.is_visible	= scrub_attr_visible,
> +};
> +
> +static const struct attribute_group *scrub_attr_groups[] = {
> +	&scrub_attr_group,
> +	NULL
> +};
> +
> +static void scrub_dev_release(struct device *dev)
> +{
> +	struct scrub_device *scrub_dev = to_scrub_device(dev);
> +
> +	ida_free(&scrub_ida, scrub_dev->id);
> +	kfree(scrub_dev);
> +}
> +
> +static struct class scrub_class = {
> +	.name = "ras",
> +	.dev_groups = scrub_attr_groups,
> +	.dev_release = scrub_dev_release,
> +};
> +
> +static struct device *
> +scrub_device_register(struct device *parent, void *drvdata,
> +		      const struct scrub_ops *ops)
> +{
> +	struct scrub_device *scrub_dev;
> +	struct device *hdev;
> +	int err;
> +
> +	scrub_dev = kzalloc(sizeof(*scrub_dev), GFP_KERNEL);
> +	if (!scrub_dev)
> +		return ERR_PTR(-ENOMEM);
> +	hdev = &scrub_dev->dev;
> +
> +	scrub_dev->id = ida_alloc(&scrub_ida, GFP_KERNEL);
> +	if (scrub_dev->id < 0) {
> +		kfree(scrub_dev);
> +		return ERR_PTR(-ENOMEM);
> +	}
> +
> +	scrub_dev->ops = ops;
> +	hdev->class = &scrub_class;
> +	hdev->parent = parent;
> +	dev_set_drvdata(hdev, drvdata);
> +	dev_set_name(hdev, SCRUB_ID_FORMAT, scrub_dev->id);

Need to check the return value of dev_set_name?

fan

> +	err = device_register(hdev);
> +	if (err) {
> +		put_device(hdev);
> +		return ERR_PTR(err);
> +	}
> +
> +	return hdev;
> +}
> +
> +static void devm_scrub_release(void *dev)
> +{
> +	device_unregister(dev);
> +}
> +
> +/**
> + * devm_scrub_device_register - register scrubber device
> + * @dev: the parent device
> + * @drvdata: driver data to attach to the scrub device
> + * @ops: pointer to scrub_ops structure (optional)
> + *
> + * Returns the pointer to the new device on success, ERR_PTR() otherwise.
> + * The new device would be automatically unregistered with the parent device.
> + */
> +struct device *
> +devm_scrub_device_register(struct device *dev, void *drvdata,
> +			   const struct scrub_ops *ops)
> +{
> +	struct device *hdev;
> +	int ret;
> +
> +	if (!dev)
> +		return ERR_PTR(-EINVAL);
> +
> +	hdev = scrub_device_register(dev, drvdata, ops);
> +	if (IS_ERR(hdev))
> +		return hdev;
> +
> +	ret = devm_add_action_or_reset(dev, devm_scrub_release, hdev);
> +	if (ret)
> +		return ERR_PTR(ret);
> +
> +	return hdev;
> +}
> +EXPORT_SYMBOL_GPL(devm_scrub_device_register);
> +
> +static int __init memory_scrub_control_init(void)
> +{
> +	return class_register(&scrub_class);
> +}
> +subsys_initcall(memory_scrub_control_init);
> +
> +static void memory_scrub_control_exit(void)
> +{
> +	class_unregister(&scrub_class);
> +}
> +module_exit(memory_scrub_control_exit);
> diff --git a/include/linux/memory_scrub.h b/include/linux/memory_scrub.h
> new file mode 100755
> index 000000000000..f0e1657a5072
> --- /dev/null
> +++ b/include/linux/memory_scrub.h
> @@ -0,0 +1,37 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Memory scrub subsystem driver supports controlling
> + * the memory scrubbers in the system.
> + *
> + * Copyright (c) 2024 HiSilicon Limited.
> + */
> +
> +#ifndef __MEMORY_SCRUB_H
> +#define __MEMORY_SCRUB_H
> +
> +#include <linux/types.h>
> +
> +struct device;
> +
> +/**
> + * struct scrub_ops - scrub device operations (all elements optional)
> + * @get_enabled_bg: check if currently performing background scrub.
> + * @set_enabled_bg: start or stop a bg-scrub.
> + * @get_name: get the memory scrubber name.
> + * @rate_avail_range: retrieve limits on supported rates.
> + * @rate_read: read the scrub rate
> + * @rate_write: set the scrub rate
> + */
> +struct scrub_ops {
> +	int (*get_enabled_bg)(struct device *dev, bool *enable);
> +	int (*set_enabled_bg)(struct device *dev, bool enable);
> +	int (*get_name)(struct device *dev, char *buf);
> +	int (*rate_avail_range)(struct device *dev, u64 *min, u64 *max);
> +	int (*rate_read)(struct device *dev, u64 *rate);
> +	int (*rate_write)(struct device *dev, u64 rate);
> +};
> +
> +struct device *
> +devm_scrub_device_register(struct device *dev, void *drvdata,
> +			   const struct scrub_ops *ops);
> +#endif /* __MEMORY_SCRUB_H */
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v8 03/10] cxl/mbox: Add GET_FEATURE mailbox command
  2024-04-19 16:47 ` [RFC PATCH v8 03/10] cxl/mbox: Add GET_FEATURE " shiju.jose
@ 2024-04-24 23:19   ` fan
  2024-04-25 10:38     ` Shiju Jose
  0 siblings, 1 reply; 22+ messages in thread
From: fan @ 2024-04-24 23:19 UTC (permalink / raw)
  To: shiju.jose
  Cc: linux-cxl, linux-acpi, linux-mm, dan.j.williams, dave,
	jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
	ira.weiny, linux-edac, linux-kernel, david, Vilas.Sridharan,
	leo.duran, Yazen.Ghannam, rientjes, jiaqiyan, tony.luck,
	Jon.Grimm, dave.hansen, rafael, lenb, naoya.horiguchi,
	james.morse, jthoughton, somasundaram.a, erdemaktas, pgonda,
	duenwen, mike.malvestuto, gthelen, wschwartz, dferguson, wbs,
	nifan.cxl, tanxiaofei, prime.zeng, kangkang.shen, wanghuiqiang,
	linuxarm

On Sat, Apr 20, 2024 at 12:47:12AM +0800, shiju.jose@huawei.com wrote:
> From: Shiju Jose <shiju.jose@huawei.com>
> 
> Add support for GET_FEATURE mailbox command.
> 
> CXL spec 3.1 section 8.2.9.6 describes optional device specific features.
> The settings of a feature can be retrieved using Get Feature command.
> 
> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
> ---
>  drivers/cxl/core/mbox.c | 53 +++++++++++++++++++++++++++++++++++++++++
>  drivers/cxl/cxlmem.h    | 28 ++++++++++++++++++++++
>  2 files changed, 81 insertions(+)
> 
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 82e279b821e2..999965871048 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -1318,6 +1318,59 @@ int cxl_get_supported_features(struct cxl_memdev_state *mds,
>  }
>  EXPORT_SYMBOL_NS_GPL(cxl_get_supported_features, CXL);
>  
> +size_t cxl_get_feature(struct cxl_memdev_state *mds,
> +		       const uuid_t feat_uuid, void *feat_out,
> +		       size_t feat_out_size,
> +		       size_t feat_out_min_size,
> +		       enum cxl_get_feat_selection selection)
> +{
> +	struct cxl_dev_state *cxlds = &mds->cxlds;
> +	struct cxl_mbox_get_feat_in pi;
> +	struct cxl_mbox_cmd mbox_cmd;
> +	size_t data_rcvd_size = 0;
> +	size_t data_to_rd_size, size_out;
> +	int rc;
> +
> +	if (feat_out_size < feat_out_min_size) {
> +		dev_err(cxlds->dev,
> +			"%s: feature out buffer size(%lu) is not big enough\n",
> +			__func__, feat_out_size);
> +		return 0;
> +	}
> +
> +	if (feat_out_size <= mds->payload_size)
> +		size_out = feat_out_size;
> +	else
> +		size_out = mds->payload_size;

Using min() instead?
    size_out = min(feat_out_size, mds->payload_size)

> +	pi.uuid = feat_uuid;
> +	pi.selection = selection;
> +	do {
> +		if ((feat_out_min_size - data_rcvd_size) <= mds->payload_size)
> +			data_to_rd_size = feat_out_min_size - data_rcvd_size;
> +		else
> +			data_to_rd_size = mds->payload_size;

data_to_rd_size = min(feat_out_min_size - data_rcvd_size, mds->payload_size);
    
It seems feat_out_min_size is always the same as feat_out_size in this series,
what is it for? For the loop here, my understanding is we need to fill up the
out buffer multiple times if the feature cannot be held in a call, so it
seems feat_out_min_size should be feat_out_size here.

Fan

> +
> +		pi.offset = cpu_to_le16(data_rcvd_size);
> +		pi.count = cpu_to_le16(data_to_rd_size);
> +
> +		mbox_cmd = (struct cxl_mbox_cmd) {
> +			.opcode = CXL_MBOX_OP_GET_FEATURE,
> +			.size_in = sizeof(pi),
> +			.payload_in = &pi,
> +			.size_out = size_out,
> +			.payload_out = feat_out + data_rcvd_size,
> +			.min_out = data_to_rd_size,
> +		};
> +		rc = cxl_internal_send_cmd(mds, &mbox_cmd);
> +		if (rc < 0 || mbox_cmd.size_out == 0)
> +			return 0;
> +		data_rcvd_size += mbox_cmd.size_out;
> +	} while (data_rcvd_size < feat_out_min_size);
> +
> +	return data_rcvd_size;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_get_feature, CXL);
> +
>  int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
>  		       struct cxl_region *cxlr)
>  {
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 06231e63373e..c822eb30e6d1 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -528,6 +528,7 @@ enum cxl_opcode {
>  	CXL_MBOX_OP_GET_SUPPORTED_LOGS	= 0x0400,
>  	CXL_MBOX_OP_GET_LOG		= 0x0401,
>  	CXL_MBOX_OP_GET_SUPPORTED_FEATURES	= 0x0500,
> +	CXL_MBOX_OP_GET_FEATURE		= 0x0501,
>  	CXL_MBOX_OP_IDENTIFY		= 0x4000,
>  	CXL_MBOX_OP_GET_PARTITION_INFO	= 0x4100,
>  	CXL_MBOX_OP_SET_PARTITION_INFO	= 0x4101,
> @@ -754,6 +755,28 @@ struct cxl_mbox_get_supp_feats_out {
>  	struct cxl_mbox_supp_feat_entry feat_entries[];
>  } __packed;
>  
> +/*
> + * Get Feature CXL 3.1 Spec 8.2.9.6.2
> + */
> +
> +/*
> + * Get Feature input payload
> + * CXL rev 3.1 section 8.2.9.6.2 Table 8-99
> + */
> +enum cxl_get_feat_selection {
> +	CXL_GET_FEAT_SEL_CURRENT_VALUE,
> +	CXL_GET_FEAT_SEL_DEFAULT_VALUE,
> +	CXL_GET_FEAT_SEL_SAVED_VALUE,
> +	CXL_GET_FEAT_SEL_MAX
> +};
> +
> +struct cxl_mbox_get_feat_in {
> +	uuid_t uuid;
> +	__le16 offset;
> +	__le16 count;
> +	u8 selection;
> +}  __packed;
> +
>  /* Get Poison List  CXL 3.0 Spec 8.2.9.8.4.1 */
>  struct cxl_mbox_poison_in {
>  	__le64 offset;
> @@ -888,6 +911,11 @@ int cxl_set_timestamp(struct cxl_memdev_state *mds);
>  int cxl_get_supported_features(struct cxl_memdev_state *mds,
>  			       u32 count, u16 start_index,
>  			       struct cxl_mbox_get_supp_feats_out *feats_out);
> +size_t cxl_get_feature(struct cxl_memdev_state *mds,
> +		       const uuid_t feat_uuid, void *feat_out,
> +		       size_t feat_out_size,
> +		       size_t feat_out_min_size,
> +		       enum cxl_get_feat_selection selection);
>  int cxl_poison_state_init(struct cxl_memdev_state *mds);
>  int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
>  		       struct cxl_region *cxlr);
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v8 01/10] ras: scrub: Add scrub subsystem
  2024-04-19 16:47 ` [RFC PATCH v8 01/10] ras: scrub: Add scrub subsystem shiju.jose
  2024-04-24 20:25   ` fan
@ 2024-04-25 10:15   ` Borislav Petkov
  2024-04-25 18:11     ` Shiju Jose
  1 sibling, 1 reply; 22+ messages in thread
From: Borislav Petkov @ 2024-04-25 10:15 UTC (permalink / raw)
  To: shiju.jose
  Cc: linux-cxl, linux-acpi, linux-mm, dan.j.williams, dave,
	jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
	ira.weiny, linux-edac, linux-kernel, david, Vilas.Sridharan,
	leo.duran, Yazen.Ghannam, rientjes, jiaqiyan, tony.luck,
	Jon.Grimm, dave.hansen, rafael, lenb, naoya.horiguchi,
	james.morse, jthoughton, somasundaram.a, erdemaktas, pgonda,
	duenwen, mike.malvestuto, gthelen, wschwartz, dferguson, wbs,
	nifan.cxl, tanxiaofei, prime.zeng, kangkang.shen, wanghuiqiang,
	linuxarm

On Sat, Apr 20, 2024 at 12:47:10AM +0800, shiju.jose@huawei.com wrote:
> From: Shiju Jose <shiju.jose@huawei.com>
> 
> Add scrub subsystem supports configuring the memory scrubbers
> in the system. The scrub subsystem provides the interface for
> registering the scrub devices. The scrub control attributes
> are provided to the user in /sys/class/ras/rasX/scrub
> 
> Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
> ---
>  .../ABI/testing/sysfs-class-scrub-configure   |  47 +++
>  drivers/ras/Kconfig                           |   7 +
>  drivers/ras/Makefile                          |   1 +
>  drivers/ras/memory_scrub.c                    | 271 ++++++++++++++++++
>  include/linux/memory_scrub.h                  |  37 +++
>  5 files changed, 363 insertions(+)
>  create mode 100644 Documentation/ABI/testing/sysfs-class-scrub-configure
>  create mode 100755 drivers/ras/memory_scrub.c
>  create mode 100755 include/linux/memory_scrub.h

ERROR: modpost: missing MODULE_LICENSE() in drivers/ras/memory_scrub.o
make[2]: *** [scripts/Makefile.modpost:145: Module.symvers] Error 1
make[1]: *** [/mnt/kernel/kernel/2nd/linux/Makefile:1871: modpost] Error 2
make: *** [Makefile:240: __sub-make] Error 2

Each patch of yours needs to build.

> diff --git a/Documentation/ABI/testing/sysfs-class-scrub-configure b/Documentation/ABI/testing/sysfs-class-scrub-configure
> new file mode 100644
> index 000000000000..3ed77dbb00ad
> --- /dev/null
> +++ b/Documentation/ABI/testing/sysfs-class-scrub-configure
> @@ -0,0 +1,47 @@
> +What:		/sys/class/ras/
> +Date:		March 2024
> +KernelVersion:	6.9
> +Contact:	linux-kernel@vger.kernel.org
> +Description:
> +		The ras/ class subdirectory belongs to the
> +		common ras features such as scrub subsystem.
> +
> +What:		/sys/class/ras/rasX/scrub/
> +Date:		March 2024
> +KernelVersion:	6.9
> +Contact:	linux-kernel@vger.kernel.org
> +Description:
> +		The /sys/class/ras/ras{0,1,2,3,...}/scrub directories

You have different scrubbers.

I'd prefer if you put their names in here instead and do this structure:

/sys/class/ras/scrub/cxl-patrol
		    /ars
		    /cxl-ecs
		    /acpi-ras2

and so on.

Unless the idea is for those devices to have multiple RAS-specific
functionality than just scrubbing. Then you want to do

/sys/class/ras/cxl/scrub
		  /other_function

/sys/class/ras/ars/scrub
		  /...

You get the idea.

> +		correspond to each scrub device registered with the
> +		scrub subsystem.
> +
> +What:		/sys/class/ras/rasX/scrub/name
> +Date:		March 2024
> +KernelVersion:	6.9
> +Contact:	linux-kernel@vger.kernel.org
> +Description:
> +		(RO) name of the memory scrubber
> +
> +What:		/sys/class/ras/rasX/scrub/enable_background
> +Date:		March 2024
> +KernelVersion:	6.9
> +Contact:	linux-kernel@vger.kernel.org
> +Description:
> +		(RW) Enable/Disable background(patrol) scrubbing if supported.
> +
> +What:		/sys/class/ras/rasX/scrub/rate_available

That's dumping a range so I guess it should be called probably
"possible_rates" or so, so that it is clear what it means.

If some scrubbers support only a discrete set of rate values, then
"possible_rates" fits too if you dump them as a list of values.

> +Date:		March 2024
> +KernelVersion:	6.9
> +Contact:	linux-kernel@vger.kernel.org
> +Description:
> +		(RO) Supported range for the scrub rate by the scrubber.
> +		The scrub rate represents in hours.
> +
> +What:		/sys/class/ras/rasX/scrub/rate
> +Date:		March 2024
> +KernelVersion:	6.9
> +Contact:	linux-kernel@vger.kernel.org
> +Description:
> +		(RW) The scrub rate specified and it must be with in the
> +		supported range by the scrubber.
> +		The scrub rate represents in hours.
> diff --git a/drivers/ras/Kconfig b/drivers/ras/Kconfig
> index fc4f4bb94a4c..181701479564 100644
> --- a/drivers/ras/Kconfig
> +++ b/drivers/ras/Kconfig
> @@ -46,4 +46,11 @@ config RAS_FMPM
>  	  Memory will be retired during boot time and run time depending on
>  	  platform-specific policies.
>  
> +config SCRUB
> +	tristate "Memory scrub driver"
> +	help
> +	  This option selects the memory scrub subsystem, supports

s/This option selects/Enable/

> +	  configuring the parameters of underlying scrubbers in the
> +	  system for the DRAM memories.
> +
>  endif
> diff --git a/drivers/ras/Makefile b/drivers/ras/Makefile
> index 11f95d59d397..89bcf0d84355 100644
> --- a/drivers/ras/Makefile
> +++ b/drivers/ras/Makefile
> @@ -2,6 +2,7 @@
>  obj-$(CONFIG_RAS)	+= ras.o
>  obj-$(CONFIG_DEBUG_FS)	+= debugfs.o
>  obj-$(CONFIG_RAS_CEC)	+= cec.o
> +obj-$(CONFIG_SCRUB)	+= memory_scrub.o
>  
>  obj-$(CONFIG_RAS_FMPM)	+= amd/fmpm.o
>  obj-y			+= amd/atl/
> diff --git a/drivers/ras/memory_scrub.c b/drivers/ras/memory_scrub.c
> new file mode 100755
> index 000000000000..7e995380ec3a
> --- /dev/null
> +++ b/drivers/ras/memory_scrub.c
> @@ -0,0 +1,271 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Memory scrub subsystem supports configuring the registered
> + * memory scrubbers.
> + *
> + * Copyright (c) 2024 HiSilicon Limited.
> + */
> +
> +#define pr_fmt(fmt)     "MEM SCRUB: " fmt
> +
> +#include <linux/acpi.h>
> +#include <linux/bitops.h>
> +#include <linux/delay.h>
> +#include <linux/kfifo.h>
> +#include <linux/memory_scrub.h>
> +#include <linux/platform_device.h>
> +#include <linux/spinlock.h>
> +
> +/* memory scrubber config definitions */

No need for that comment.

> +static ssize_t rate_available_show(struct device *dev,
> +				   struct device_attribute *attr,
> +				   char *buf)
> +{
> +	struct scrub_device *scrub_dev = to_scrub_device(dev);
> +	u64 min_sr, max_sr;
> +	int ret;
> +
> +	ret = scrub_dev->ops->rate_avail_range(dev, &min_sr, &max_sr);
> +	if (ret)
> +		return ret;
> +
> +	return sysfs_emit(buf, "0x%llx-0x%llx\n", min_sr, max_sr);
> +}

This glue driver will need to store the min and max scrub rates on init
and rate_store() will have to verify the newly supplied rate is within
that range before writing it.

Not the user, nor the underlying hw driver.

> +
> +DEVICE_ATTR_RW(enable_background);
> +DEVICE_ATTR_RO(name);
> +DEVICE_ATTR_RW(rate);
> +DEVICE_ATTR_RO(rate_available);

static

> +
> +static struct attribute *scrub_attrs[] = {
> +	&dev_attr_enable_background.attr,
> +	&dev_attr_name.attr,
> +	&dev_attr_rate.attr,
> +	&dev_attr_rate_available.attr,
> +	NULL
> +};
> +
> +static umode_t scrub_attr_visible(struct kobject *kobj,
> +				  struct attribute *a, int attr_id)
> +{
> +	struct device *dev = kobj_to_dev(kobj);
> +	struct scrub_device *scrub_dev = to_scrub_device(dev);
> +	const struct scrub_ops *ops = scrub_dev->ops;
> +
> +	if (a == &dev_attr_enable_background.attr) {
> +		if (ops->set_enabled_bg && ops->get_enabled_bg)
> +			return a->mode;
> +		if (ops->get_enabled_bg)
> +			return 0444;
> +		return 0;
> +	}
> +	if (a == &dev_attr_name.attr)
> +		return ops->get_name ? a->mode : 0;
> +	if (a == &dev_attr_rate_available.attr)
> +		return ops->rate_avail_range ? a->mode : 0;
> +	if (a == &dev_attr_rate.attr) { /* Write only makes little sense */
> +		if (ops->rate_read && ops->rate_write)
> +			return a->mode;
> +		if (ops->rate_read)
> +			return 0444;
> +		return 0;
> +	}

All of that stuff's permissions should be root-only.

> +
> +	return 0;
> +}
> +
> +static const struct attribute_group scrub_attr_group = {
> +	.name		= "scrub",
> +	.attrs		= scrub_attrs,
> +	.is_visible	= scrub_attr_visible,
> +};
> +
> +static const struct attribute_group *scrub_attr_groups[] = {
> +	&scrub_attr_group,
> +	NULL
> +};
> +
> +static void scrub_dev_release(struct device *dev)
> +{
> +	struct scrub_device *scrub_dev = to_scrub_device(dev);
> +
> +	ida_free(&scrub_ida, scrub_dev->id);
> +	kfree(scrub_dev);
> +}
> +
> +static struct class scrub_class = {
> +	.name = "ras",
> +	.dev_groups = scrub_attr_groups,
> +	.dev_release = scrub_dev_release,
> +};
> +
> +static struct device *
> +scrub_device_register(struct device *parent, void *drvdata,
> +		      const struct scrub_ops *ops)
> +{
> +	struct scrub_device *scrub_dev;
> +	struct device *hdev;
> +	int err;
> +
> +	scrub_dev = kzalloc(sizeof(*scrub_dev), GFP_KERNEL);
> +	if (!scrub_dev)
> +		return ERR_PTR(-ENOMEM);
> +	hdev = &scrub_dev->dev;
> +
> +	scrub_dev->id = ida_alloc(&scrub_ida, GFP_KERNEL);

What's that silly thing for?

> +	if (scrub_dev->id < 0) {
> +		kfree(scrub_dev);
> +		return ERR_PTR(-ENOMEM);
> +	}
> +
> +	scrub_dev->ops = ops;
> +	hdev->class = &scrub_class;
> +	hdev->parent = parent;
> +	dev_set_drvdata(hdev, drvdata);
> +	dev_set_name(hdev, SCRUB_ID_FORMAT, scrub_dev->id);
> +	err = device_register(hdev);
> +	if (err) {
> +		put_device(hdev);
> +		return ERR_PTR(err);
> +	}
> +
> +	return hdev;
> +}
> +
> +static void devm_scrub_release(void *dev)
> +{
> +	device_unregister(dev);
> +}
> +
> +/**
> + * devm_scrub_device_register - register scrubber device
> + * @dev: the parent device
> + * @drvdata: driver data to attach to the scrub device
> + * @ops: pointer to scrub_ops structure (optional)
> + *
> + * Returns the pointer to the new device on success, ERR_PTR() otherwise.
> + * The new device would be automatically unregistered with the parent device.
> + */
> +struct device *
> +devm_scrub_device_register(struct device *dev, void *drvdata,
> +			   const struct scrub_ops *ops)
> +{
> +	struct device *hdev;
> +	int ret;
> +
> +	if (!dev)
> +		return ERR_PTR(-EINVAL);
> +
> +	hdev = scrub_device_register(dev, drvdata, ops);
> +	if (IS_ERR(hdev))
> +		return hdev;
> +
> +	ret = devm_add_action_or_reset(dev, devm_scrub_release, hdev);
> +	if (ret)
> +		return ERR_PTR(ret);
> +
> +	return hdev;
> +}
> +EXPORT_SYMBOL_GPL(devm_scrub_device_register);
> +
> +static int __init memory_scrub_control_init(void)
> +{
> +	return class_register(&scrub_class);
> +}
> +subsys_initcall(memory_scrub_control_init);

You can't just blindly register this thing without checking whether
there are even any hw scrubber devices on the system.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [RFC PATCH v8 01/10] ras: scrub: Add scrub subsystem
  2024-04-24 20:25   ` fan
@ 2024-04-25 10:38     ` Shiju Jose
  0 siblings, 0 replies; 22+ messages in thread
From: Shiju Jose @ 2024-04-25 10:38 UTC (permalink / raw)
  To: fan
  Cc: linux-cxl@vger.kernel.org, linux-acpi@vger.kernel.org,
	linux-mm@kvack.org, dan.j.williams@intel.com, dave@stgolabs.net,
	Jonathan Cameron, dave.jiang@intel.com,
	alison.schofield@intel.com, vishal.l.verma@intel.com,
	ira.weiny@intel.com, linux-edac@vger.kernel.org,
	linux-kernel@vger.kernel.org, david@redhat.com,
	Vilas.Sridharan@amd.com, leo.duran@amd.com, Yazen.Ghannam@amd.com,
	rientjes@google.com, jiaqiyan@google.com, tony.luck@intel.com,
	Jon.Grimm@amd.com, dave.hansen@linux.intel.com, rafael@kernel.org,
	lenb@kernel.org, naoya.horiguchi@nec.com, james.morse@arm.com,
	jthoughton@google.com, somasundaram.a@hpe.com,
	erdemaktas@google.com, pgonda@google.com, duenwen@google.com,
	mike.malvestuto@intel.com, gthelen@google.com,
	wschwartz@amperecomputing.com, dferguson@amperecomputing.com,
	wbs@os.amperecomputing.com, tanxiaofei, Zengtao (B),
	kangkang.shen@futurewei.com, wanghuiqiang, Linuxarm

>-----Original Message-----
>From: fan <nifan.cxl@gmail.com>
>Sent: 24 April 2024 21:26
>To: Shiju Jose <shiju.jose@huawei.com>
>Cc: linux-cxl@vger.kernel.org; linux-acpi@vger.kernel.org; linux-
>mm@kvack.org; dan.j.williams@intel.com; dave@stgolabs.net; Jonathan
>Cameron <jonathan.cameron@huawei.com>; dave.jiang@intel.com;
>alison.schofield@intel.com; vishal.l.verma@intel.com; ira.weiny@intel.com;
>linux-edac@vger.kernel.org; linux-kernel@vger.kernel.org; david@redhat.com;
>Vilas.Sridharan@amd.com; leo.duran@amd.com; Yazen.Ghannam@amd.com;
>rientjes@google.com; jiaqiyan@google.com; tony.luck@intel.com;
>Jon.Grimm@amd.com; dave.hansen@linux.intel.com; rafael@kernel.org;
>lenb@kernel.org; naoya.horiguchi@nec.com; james.morse@arm.com;
>jthoughton@google.com; somasundaram.a@hpe.com;
>erdemaktas@google.com; pgonda@google.com; duenwen@google.com;
>mike.malvestuto@intel.com; gthelen@google.com;
>wschwartz@amperecomputing.com; dferguson@amperecomputing.com;
>wbs@os.amperecomputing.com; nifan.cxl@gmail.com; tanxiaofei
><tanxiaofei@huawei.com>; Zengtao (B) <prime.zeng@hisilicon.com>;
>kangkang.shen@futurewei.com; wanghuiqiang <wanghuiqiang@huawei.com>;
>Linuxarm <linuxarm@huawei.com>
>Subject: Re: [RFC PATCH v8 01/10] ras: scrub: Add scrub subsystem
>
>On Sat, Apr 20, 2024 at 12:47:10AM +0800, shiju.jose@huawei.com wrote:
>> From: Shiju Jose <shiju.jose@huawei.com>
>>
>> Add scrub subsystem supports configuring the memory scrubbers in the
>> system. The scrub subsystem provides the interface for registering the
>> scrub devices. The scrub control attributes are provided to the user
>> in /sys/class/ras/rasX/scrub
>>
>> Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
>> ---
>>  .../ABI/testing/sysfs-class-scrub-configure   |  47 +++
>>  drivers/ras/Kconfig                           |   7 +
>>  drivers/ras/Makefile                          |   1 +
>>  drivers/ras/memory_scrub.c                    | 271 ++++++++++++++++++
>>  include/linux/memory_scrub.h                  |  37 +++
>>  5 files changed, 363 insertions(+)
>>  create mode 100644
>> Documentation/ABI/testing/sysfs-class-scrub-configure
>>  create mode 100755 drivers/ras/memory_scrub.c  create mode 100755
>> include/linux/memory_scrub.h
>>
>> diff --git a/Documentation/ABI/testing/sysfs-class-scrub-configure
>> b/Documentation/ABI/testing/sysfs-class-scrub-configure
>> new file mode 100644
>> index 000000000000..3ed77dbb00ad
>> --- /dev/null
>> +++ b/Documentation/ABI/testing/sysfs-class-scrub-configure
>> @@ -0,0 +1,47 @@
>> +What:		/sys/class/ras/
>> +Date:		March 2024
>> +KernelVersion:	6.9
>> +Contact:	linux-kernel@vger.kernel.org
>> +Description:
>> +		The ras/ class subdirectory belongs to the
>> +		common ras features such as scrub subsystem.
>> +
>> +What:		/sys/class/ras/rasX/scrub/
>> +Date:		March 2024
>> +KernelVersion:	6.9
>> +Contact:	linux-kernel@vger.kernel.org
>> +Description:
>> +		The /sys/class/ras/ras{0,1,2,3,...}/scrub directories
>> +		correspond to each scrub device registered with the
>> +		scrub subsystem.
>> +
>> +What:		/sys/class/ras/rasX/scrub/name
>> +Date:		March 2024
>> +KernelVersion:	6.9
>> +Contact:	linux-kernel@vger.kernel.org
>> +Description:
>> +		(RO) name of the memory scrubber
>> +
>> +What:		/sys/class/ras/rasX/scrub/enable_background
>> +Date:		March 2024
>> +KernelVersion:	6.9
>> +Contact:	linux-kernel@vger.kernel.org
>> +Description:
>> +		(RW) Enable/Disable background(patrol) scrubbing if supported.
>> +
>> +What:		/sys/class/ras/rasX/scrub/rate_available
>> +Date:		March 2024
>> +KernelVersion:	6.9
>> +Contact:	linux-kernel@vger.kernel.org
>> +Description:
>> +		(RO) Supported range for the scrub rate by the scrubber.
>> +		The scrub rate represents in hours.
>> +
>> +What:		/sys/class/ras/rasX/scrub/rate
>> +Date:		March 2024
>> +KernelVersion:	6.9
>> +Contact:	linux-kernel@vger.kernel.org
>> +Description:
>> +		(RW) The scrub rate specified and it must be with in the
>> +		supported range by the scrubber.
>> +		The scrub rate represents in hours.
>> diff --git a/drivers/ras/Kconfig b/drivers/ras/Kconfig index
>> fc4f4bb94a4c..181701479564 100644
>> --- a/drivers/ras/Kconfig
>> +++ b/drivers/ras/Kconfig
>> @@ -46,4 +46,11 @@ config RAS_FMPM
>>  	  Memory will be retired during boot time and run time depending on
>>  	  platform-specific policies.
>>
>> +config SCRUB
>> +	tristate "Memory scrub driver"
>> +	help
>> +	  This option selects the memory scrub subsystem, supports
>> +	  configuring the parameters of underlying scrubbers in the
>> +	  system for the DRAM memories.
>> +
>>  endif
>> diff --git a/drivers/ras/Makefile b/drivers/ras/Makefile index
>> 11f95d59d397..89bcf0d84355 100644
>> --- a/drivers/ras/Makefile
>> +++ b/drivers/ras/Makefile
>> @@ -2,6 +2,7 @@
>>  obj-$(CONFIG_RAS)	+= ras.o
>>  obj-$(CONFIG_DEBUG_FS)	+= debugfs.o
>>  obj-$(CONFIG_RAS_CEC)	+= cec.o
>> +obj-$(CONFIG_SCRUB)	+= memory_scrub.o
>>
>>  obj-$(CONFIG_RAS_FMPM)	+= amd/fmpm.o
>>  obj-y			+= amd/atl/
>> diff --git a/drivers/ras/memory_scrub.c b/drivers/ras/memory_scrub.c
>> new file mode 100755 index 000000000000..7e995380ec3a
>> --- /dev/null
>> +++ b/drivers/ras/memory_scrub.c
>> @@ -0,0 +1,271 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * Memory scrub subsystem supports configuring the registered
>> + * memory scrubbers.
>> + *
>> + * Copyright (c) 2024 HiSilicon Limited.
>> + */
>> +
>> +#define pr_fmt(fmt)     "MEM SCRUB: " fmt
>> +
>> +#include <linux/acpi.h>
>> +#include <linux/bitops.h>
>> +#include <linux/delay.h>
>> +#include <linux/kfifo.h>
>> +#include <linux/memory_scrub.h>
>> +#include <linux/platform_device.h>
>> +#include <linux/spinlock.h>
>> +
>> +/* memory scrubber config definitions */ #define SCRUB_ID_PREFIX
>> +"ras"
>> +#define SCRUB_ID_FORMAT SCRUB_ID_PREFIX "%d"
>> +
>> +static DEFINE_IDA(scrub_ida);
>> +
>> +struct scrub_device {
>> +	int id;
>> +	struct device dev;
>> +	const struct scrub_ops *ops;
>> +};
>> +
>> +#define to_scrub_device(d) container_of(d, struct scrub_device, dev)
>> +static ssize_t enable_background_store(struct device *dev,
>> +				       struct device_attribute *attr,
>> +				       const char *buf, size_t len) {
>> +	struct scrub_device *scrub_dev = to_scrub_device(dev);
>> +	bool enable;
>> +	int ret;
>> +
>> +	ret = kstrtobool(buf, &enable);
>> +	if (ret < 0)
>> +		return ret;
>> +
>> +	ret = scrub_dev->ops->set_enabled_bg(dev, enable);
>> +	if (ret)
>> +		return ret;
>> +
>> +	return len;
>> +}
>> +
>> +static ssize_t enable_background_show(struct device *dev,
>> +				      struct device_attribute *attr, char *buf) {
>> +	struct scrub_device *scrub_dev = to_scrub_device(dev);
>> +	bool enable;
>> +	int ret;
>> +
>> +	ret = scrub_dev->ops->get_enabled_bg(dev, &enable);
>> +	if (ret)
>> +		return ret;
>> +
>> +	return sysfs_emit(buf, "%d\n", enable); }
>> +
>> +static ssize_t name_show(struct device *dev,
>> +			 struct device_attribute *attr, char *buf) {
>> +	struct scrub_device *scrub_dev = to_scrub_device(dev);
>> +	int ret;
>> +
>> +	ret = scrub_dev->ops->get_name(dev, buf);
>> +	if (ret)
>> +		return ret;
>> +
>> +	return strlen(buf);
>> +}
>> +
>> +static ssize_t rate_show(struct device *dev, struct device_attribute *attr,
>> +			 char *buf)
>> +{
>> +	struct scrub_device *scrub_dev = to_scrub_device(dev);
>> +	u64 val;
>> +	int ret;
>> +
>> +	ret = scrub_dev->ops->rate_read(dev, &val);
>> +	if (ret)
>> +		return ret;
>> +
>> +	return sysfs_emit(buf, "0x%llx\n", val); }
>> +
>> +static ssize_t rate_store(struct device *dev, struct device_attribute *attr,
>> +			  const char *buf, size_t len)
>> +{
>> +	struct scrub_device *scrub_dev = to_scrub_device(dev);
>> +	long val;
>> +	int ret;
>> +
>> +	ret = kstrtol(buf, 10, &val);
>> +	if (ret < 0)
>> +		return ret;
>> +
>> +	ret = scrub_dev->ops->rate_write(dev, val);
>> +	if (ret)
>> +		return ret;
>> +
>> +	return len;
>> +}
>> +
>> +static ssize_t rate_available_show(struct device *dev,
>> +				   struct device_attribute *attr,
>> +				   char *buf)
>> +{
>> +	struct scrub_device *scrub_dev = to_scrub_device(dev);
>> +	u64 min_sr, max_sr;
>> +	int ret;
>> +
>> +	ret = scrub_dev->ops->rate_avail_range(dev, &min_sr, &max_sr);
>> +	if (ret)
>> +		return ret;
>> +
>> +	return sysfs_emit(buf, "0x%llx-0x%llx\n", min_sr, max_sr); }
>> +
>> +DEVICE_ATTR_RW(enable_background);
>> +DEVICE_ATTR_RO(name);
>> +DEVICE_ATTR_RW(rate);
>> +DEVICE_ATTR_RO(rate_available);
>> +
>> +static struct attribute *scrub_attrs[] = {
>> +	&dev_attr_enable_background.attr,
>> +	&dev_attr_name.attr,
>> +	&dev_attr_rate.attr,
>> +	&dev_attr_rate_available.attr,
>> +	NULL
>> +};
>> +
>> +static umode_t scrub_attr_visible(struct kobject *kobj,
>> +				  struct attribute *a, int attr_id) {
>> +	struct device *dev = kobj_to_dev(kobj);
>> +	struct scrub_device *scrub_dev = to_scrub_device(dev);
>> +	const struct scrub_ops *ops = scrub_dev->ops;
>> +
>> +	if (a == &dev_attr_enable_background.attr) {
>> +		if (ops->set_enabled_bg && ops->get_enabled_bg)
>> +			return a->mode;
>> +		if (ops->get_enabled_bg)
>> +			return 0444;
>> +		return 0;
>> +	}
>> +	if (a == &dev_attr_name.attr)
>> +		return ops->get_name ? a->mode : 0;
>> +	if (a == &dev_attr_rate_available.attr)
>> +		return ops->rate_avail_range ? a->mode : 0;
>> +	if (a == &dev_attr_rate.attr) { /* Write only makes little sense */
>> +		if (ops->rate_read && ops->rate_write)
>> +			return a->mode;
>> +		if (ops->rate_read)
>> +			return 0444;
>> +		return 0;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static const struct attribute_group scrub_attr_group = {
>> +	.name		= "scrub",
>> +	.attrs		= scrub_attrs,
>> +	.is_visible	= scrub_attr_visible,
>> +};
>> +
>> +static const struct attribute_group *scrub_attr_groups[] = {
>> +	&scrub_attr_group,
>> +	NULL
>> +};
>> +
>> +static void scrub_dev_release(struct device *dev) {
>> +	struct scrub_device *scrub_dev = to_scrub_device(dev);
>> +
>> +	ida_free(&scrub_ida, scrub_dev->id);
>> +	kfree(scrub_dev);
>> +}
>> +
>> +static struct class scrub_class = {
>> +	.name = "ras",
>> +	.dev_groups = scrub_attr_groups,
>> +	.dev_release = scrub_dev_release,
>> +};
>> +
>> +static struct device *
>> +scrub_device_register(struct device *parent, void *drvdata,
>> +		      const struct scrub_ops *ops)
>> +{
>> +	struct scrub_device *scrub_dev;
>> +	struct device *hdev;
>> +	int err;
>> +
>> +	scrub_dev = kzalloc(sizeof(*scrub_dev), GFP_KERNEL);
>> +	if (!scrub_dev)
>> +		return ERR_PTR(-ENOMEM);
>> +	hdev = &scrub_dev->dev;
>> +
>> +	scrub_dev->id = ida_alloc(&scrub_ida, GFP_KERNEL);
>> +	if (scrub_dev->id < 0) {
>> +		kfree(scrub_dev);
>> +		return ERR_PTR(-ENOMEM);
>> +	}
>> +
>> +	scrub_dev->ops = ops;
>> +	hdev->class = &scrub_class;
>> +	hdev->parent = parent;
>> +	dev_set_drvdata(hdev, drvdata);
>> +	dev_set_name(hdev, SCRUB_ID_FORMAT, scrub_dev->id);
>
>Need to check the return value of dev_set_name?
Will do, though checking return value of dev_set_name() is not common in the kernel.

>
>fan
>
>> +	err = device_register(hdev);
>> +	if (err) {

Thanks,
Shiju

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [RFC PATCH v8 03/10] cxl/mbox: Add GET_FEATURE mailbox command
  2024-04-24 23:19   ` fan
@ 2024-04-25 10:38     ` Shiju Jose
  0 siblings, 0 replies; 22+ messages in thread
From: Shiju Jose @ 2024-04-25 10:38 UTC (permalink / raw)
  To: fan
  Cc: linux-cxl@vger.kernel.org, linux-acpi@vger.kernel.org,
	linux-mm@kvack.org, dan.j.williams@intel.com, dave@stgolabs.net,
	Jonathan Cameron, dave.jiang@intel.com,
	alison.schofield@intel.com, vishal.l.verma@intel.com,
	ira.weiny@intel.com, linux-edac@vger.kernel.org,
	linux-kernel@vger.kernel.org, david@redhat.com,
	Vilas.Sridharan@amd.com, leo.duran@amd.com, Yazen.Ghannam@amd.com,
	rientjes@google.com, jiaqiyan@google.com, tony.luck@intel.com,
	Jon.Grimm@amd.com, dave.hansen@linux.intel.com, rafael@kernel.org,
	lenb@kernel.org, naoya.horiguchi@nec.com, james.morse@arm.com,
	jthoughton@google.com, somasundaram.a@hpe.com,
	erdemaktas@google.com, pgonda@google.com, duenwen@google.com,
	mike.malvestuto@intel.com, gthelen@google.com,
	wschwartz@amperecomputing.com, dferguson@amperecomputing.com,
	wbs@os.amperecomputing.com, tanxiaofei, Zengtao (B),
	kangkang.shen@futurewei.com, wanghuiqiang, Linuxarm

>-----Original Message-----
>From: fan <nifan.cxl@gmail.com>
>Sent: 25 April 2024 00:19
>To: Shiju Jose <shiju.jose@huawei.com>
>Cc: linux-cxl@vger.kernel.org; linux-acpi@vger.kernel.org; linux-
>mm@kvack.org; dan.j.williams@intel.com; dave@stgolabs.net; Jonathan
>Cameron <jonathan.cameron@huawei.com>; dave.jiang@intel.com;
>alison.schofield@intel.com; vishal.l.verma@intel.com; ira.weiny@intel.com;
>linux-edac@vger.kernel.org; linux-kernel@vger.kernel.org; david@redhat.com;
>Vilas.Sridharan@amd.com; leo.duran@amd.com; Yazen.Ghannam@amd.com;
>rientjes@google.com; jiaqiyan@google.com; tony.luck@intel.com;
>Jon.Grimm@amd.com; dave.hansen@linux.intel.com; rafael@kernel.org;
>lenb@kernel.org; naoya.horiguchi@nec.com; james.morse@arm.com;
>jthoughton@google.com; somasundaram.a@hpe.com;
>erdemaktas@google.com; pgonda@google.com; duenwen@google.com;
>mike.malvestuto@intel.com; gthelen@google.com;
>wschwartz@amperecomputing.com; dferguson@amperecomputing.com;
>wbs@os.amperecomputing.com; nifan.cxl@gmail.com; tanxiaofei
><tanxiaofei@huawei.com>; Zengtao (B) <prime.zeng@hisilicon.com>;
>kangkang.shen@futurewei.com; wanghuiqiang <wanghuiqiang@huawei.com>;
>Linuxarm <linuxarm@huawei.com>
>Subject: Re: [RFC PATCH v8 03/10] cxl/mbox: Add GET_FEATURE mailbox
>command
>
>On Sat, Apr 20, 2024 at 12:47:12AM +0800, shiju.jose@huawei.com wrote:
>> From: Shiju Jose <shiju.jose@huawei.com>
>>
>> Add support for GET_FEATURE mailbox command.
>>
>> CXL spec 3.1 section 8.2.9.6 describes optional device specific features.
>> The settings of a feature can be retrieved using Get Feature command.
>>
>> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
>> ---
>>  drivers/cxl/core/mbox.c | 53
>+++++++++++++++++++++++++++++++++++++++++
>>  drivers/cxl/cxlmem.h    | 28 ++++++++++++++++++++++
>>  2 files changed, 81 insertions(+)
>>
>> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c index
>> 82e279b821e2..999965871048 100644
>> --- a/drivers/cxl/core/mbox.c
>> +++ b/drivers/cxl/core/mbox.c
>> @@ -1318,6 +1318,59 @@ int cxl_get_supported_features(struct
>> cxl_memdev_state *mds,  }
>> EXPORT_SYMBOL_NS_GPL(cxl_get_supported_features, CXL);
>>
>> +size_t cxl_get_feature(struct cxl_memdev_state *mds,
>> +		       const uuid_t feat_uuid, void *feat_out,
>> +		       size_t feat_out_size,
>> +		       size_t feat_out_min_size,
>> +		       enum cxl_get_feat_selection selection) {
>> +	struct cxl_dev_state *cxlds = &mds->cxlds;
>> +	struct cxl_mbox_get_feat_in pi;
>> +	struct cxl_mbox_cmd mbox_cmd;
>> +	size_t data_rcvd_size = 0;
>> +	size_t data_to_rd_size, size_out;
>> +	int rc;
>> +
>> +	if (feat_out_size < feat_out_min_size) {
>> +		dev_err(cxlds->dev,
>> +			"%s: feature out buffer size(%lu) is not big enough\n",
>> +			__func__, feat_out_size);
>> +		return 0;
>> +	}
>> +
>> +	if (feat_out_size <= mds->payload_size)
>> +		size_out = feat_out_size;
>> +	else
>> +		size_out = mds->payload_size;
>
>Using min() instead?
>    size_out = min(feat_out_size, mds->payload_size)
Will do.
>
>> +	pi.uuid = feat_uuid;
>> +	pi.selection = selection;
>> +	do {
>> +		if ((feat_out_min_size - data_rcvd_size) <= mds->payload_size)
>> +			data_to_rd_size = feat_out_min_size - data_rcvd_size;
>> +		else
>> +			data_to_rd_size = mds->payload_size;
>
>data_to_rd_size = min(feat_out_min_size - data_rcvd_size, mds->payload_size);

Will do.
>
>It seems feat_out_min_size is always the same as feat_out_size in this series,
>what is it for? For the loop here, my understanding is we need to fill up the out
>buffer multiple times if the feature cannot be held in a call, so it seems
>feat_out_min_size should be feat_out_size here.
feat_out_size and feat_out_min_size added separately because this function is a common interface
and  it might be useful for the features like DDR5 ECS Control, where the  Get feature output payload
size is relatively high and actually required data is small, contains DDR5 ECS control feature readable attributes for N number of memory media FRUs.

>
>Fan
>
>> +
>> +		pi.offset = cpu_to_le16(data_rcvd_size);
>> +		pi.count = cpu_to_le16(data_to_rd_size);
>> +
>> +		mbox_cmd = (struct cxl_mbox_cmd) {
>> +			.opcode = CXL_MBOX_OP_GET_FEATURE,
>> +			.size_in = sizeof(pi),
>> +			.payload_in = &pi,
>> +			.size_out = size_out,
>> +			.payload_out = feat_out + data_rcvd_size,
>> +			.min_out = data_to_rd_size,
>> +		};
>> +		rc = cxl_internal_send_cmd(mds, &mbox_cmd);
>> +		if (rc < 0 || mbox_cmd.size_out == 0)
>> +			return 0;
>> +		data_rcvd_size += mbox_cmd.size_out;
>> +	} while (data_rcvd_size < feat_out_min_size);
>> +
>> +	return data_rcvd_size;
>> +}
>> +EXPORT_SYMBOL_NS_GPL(cxl_get_feature, CXL);
>> +
>>  int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
>>  		       struct cxl_region *cxlr)
>>  {
>> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h index
>> 06231e63373e..c822eb30e6d1 100644
>> --- a/drivers/cxl/cxlmem.h
>> +++ b/drivers/cxl/cxlmem.h
>> @@ -528,6 +528,7 @@ enum cxl_opcode {
>>  	CXL_MBOX_OP_GET_SUPPORTED_LOGS	= 0x0400,
>>  	CXL_MBOX_OP_GET_LOG		= 0x0401,
>>  	CXL_MBOX_OP_GET_SUPPORTED_FEATURES	= 0x0500,
>> +	CXL_MBOX_OP_GET_FEATURE		= 0x0501,
>>  	CXL_MBOX_OP_IDENTIFY		= 0x4000,
>>  	CXL_MBOX_OP_GET_PARTITION_INFO	= 0x4100,
>>  	CXL_MBOX_OP_SET_PARTITION_INFO	= 0x4101,
>> @@ -754,6 +755,28 @@ struct cxl_mbox_get_supp_feats_out {
>>  	struct cxl_mbox_supp_feat_entry feat_entries[];  } __packed;
>>
>> +/*
>> + * Get Feature CXL 3.1 Spec 8.2.9.6.2  */
>> +
>> +/*
>> + * Get Feature input payload
>> + * CXL rev 3.1 section 8.2.9.6.2 Table 8-99  */ enum
>> +cxl_get_feat_selection {
>> +	CXL_GET_FEAT_SEL_CURRENT_VALUE,
>> +	CXL_GET_FEAT_SEL_DEFAULT_VALUE,
>> +	CXL_GET_FEAT_SEL_SAVED_VALUE,
>> +	CXL_GET_FEAT_SEL_MAX
>> +};
>> +
>> +struct cxl_mbox_get_feat_in {
>> +	uuid_t uuid;
>> +	__le16 offset;
>> +	__le16 count;
>> +	u8 selection;
>> +}  __packed;
>> +
>>  /* Get Poison List  CXL 3.0 Spec 8.2.9.8.4.1 */  struct
>> cxl_mbox_poison_in {
>>  	__le64 offset;
>> @@ -888,6 +911,11 @@ int cxl_set_timestamp(struct cxl_memdev_state
>> *mds);  int cxl_get_supported_features(struct cxl_memdev_state *mds,
>>  			       u32 count, u16 start_index,
>>  			       struct cxl_mbox_get_supp_feats_out *feats_out);
>> +size_t cxl_get_feature(struct cxl_memdev_state *mds,
>> +		       const uuid_t feat_uuid, void *feat_out,
>> +		       size_t feat_out_size,
>> +		       size_t feat_out_min_size,
>> +		       enum cxl_get_feat_selection selection);
>>  int cxl_poison_state_init(struct cxl_memdev_state *mds);  int
>> cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
>>  		       struct cxl_region *cxlr);
>> --
>> 2.34.1
>>
Thanks,
Shiju

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v8 04/10] cxl/mbox: Add SET_FEATURE mailbox command
  2024-04-19 16:47 ` [RFC PATCH v8 04/10] cxl/mbox: Add SET_FEATURE " shiju.jose
@ 2024-04-25 17:26   ` fan
  0 siblings, 0 replies; 22+ messages in thread
From: fan @ 2024-04-25 17:26 UTC (permalink / raw)
  To: shiju.jose
  Cc: linux-cxl, linux-acpi, linux-mm, dan.j.williams, dave,
	jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
	ira.weiny, linux-edac, linux-kernel, david, Vilas.Sridharan,
	leo.duran, Yazen.Ghannam, rientjes, jiaqiyan, tony.luck,
	Jon.Grimm, dave.hansen, rafael, lenb, naoya.horiguchi,
	james.morse, jthoughton, somasundaram.a, erdemaktas, pgonda,
	duenwen, mike.malvestuto, gthelen, wschwartz, dferguson, wbs,
	nifan.cxl, tanxiaofei, prime.zeng, kangkang.shen, wanghuiqiang,
	linuxarm

On Sat, Apr 20, 2024 at 12:47:13AM +0800, shiju.jose@huawei.com wrote:
> From: Shiju Jose <shiju.jose@huawei.com>
> 
> Add support for SET_FEATURE mailbox command.
> 
> CXL spec 3.1 section 8.2.9.6 describes optional device specific features.
> CXL devices supports features with changeable attributes.
> The settings of a feature can be optionally modified using Set Feature
> command.
> 
> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
> ---
>  drivers/cxl/core/mbox.c | 73 +++++++++++++++++++++++++++++++++++++++++
>  drivers/cxl/cxlmem.h    | 33 +++++++++++++++++++
>  2 files changed, 106 insertions(+)
> 
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 999965871048..4ca1238e8fec 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -1371,6 +1371,79 @@ size_t cxl_get_feature(struct cxl_memdev_state *mds,
>  }
>  EXPORT_SYMBOL_NS_GPL(cxl_get_feature, CXL);
>  
> +/*
> + * FEAT_DATA_MIN_PAYLOAD_SIZE - min extra number of bytes should be
> + * available in the mailbox for storing the actual feature data so that
> + * the feature data transfer would work as expected.
> + */
> +#define FEAT_DATA_MIN_PAYLOAD_SIZE 10
> +int cxl_set_feature(struct cxl_memdev_state *mds,
> +		    const uuid_t feat_uuid, u8 feat_version,
> +		    void *feat_data, size_t feat_data_size,
> +		    u8 feat_flag)
> +{
> +	struct cxl_memdev_set_feat_pi {
> +		struct cxl_mbox_set_feat_hdr hdr;
> +		u8 feat_data[];
> +	}  __packed;
> +	size_t data_in_size, data_sent_size = 0;
> +	struct cxl_mbox_cmd mbox_cmd;
> +	size_t hdr_size;
> +	int rc = 0;
> +
> +	struct cxl_memdev_set_feat_pi *pi __free(kfree) =
> +					kmalloc(mds->payload_size, GFP_KERNEL);
> +	pi->hdr.uuid = feat_uuid;
> +	pi->hdr.version = feat_version;
> +	feat_flag &= ~CXL_SET_FEAT_FLAG_DATA_TRANSFER_MASK;
> +	hdr_size = sizeof(pi->hdr);
> +	/*
> +	 * Check minimum mbox payload size is available for
> +	 * the feature data transfer.
> +	 */
> +	if (hdr_size + FEAT_DATA_MIN_PAYLOAD_SIZE > mds->payload_size)
> +		return -ENOMEM;
> +
> +	if ((hdr_size + feat_data_size) <= mds->payload_size) {
> +		pi->hdr.flags = cpu_to_le32(feat_flag |
> +				       CXL_SET_FEAT_FLAG_FULL_DATA_TRANSFER);
> +		data_in_size = feat_data_size;
> +	} else {
> +		pi->hdr.flags = cpu_to_le32(feat_flag |
> +				       CXL_SET_FEAT_FLAG_INITIATE_DATA_TRANSFER);
> +		data_in_size = mds->payload_size - hdr_size;
> +	}
> +
> +	do {
> +		pi->hdr.offset = cpu_to_le16(data_sent_size);
> +		memcpy(pi->feat_data, feat_data + data_sent_size, data_in_size);
> +		mbox_cmd = (struct cxl_mbox_cmd) {
> +			.opcode = CXL_MBOX_OP_SET_FEATURE,
> +			.size_in = hdr_size + data_in_size,
> +			.payload_in = pi,
> +		};
> +		rc = cxl_internal_send_cmd(mds, &mbox_cmd);
> +		if (rc < 0)
> +			return rc;
> +
> +		data_sent_size += data_in_size;
> +		if (data_sent_size >= feat_data_size)
> +			return 0;
> +
> +		if ((feat_data_size - data_sent_size) <= (mds->payload_size - hdr_size)) {
> +			data_in_size = feat_data_size - data_sent_size;
> +			pi->hdr.flags = cpu_to_le32(feat_flag |
> +					       CXL_SET_FEAT_FLAG_FINISH_DATA_TRANSFER);
> +		} else {
> +			pi->hdr.flags = cpu_to_le32(feat_flag |
> +					       CXL_SET_FEAT_FLAG_CONTINUE_DATA_TRANSFER);
> +		}
> +	} while (true);
> +
> +	return rc;
Dead code.

Fan

> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_set_feature, CXL);
> +
>  int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
>  		       struct cxl_region *cxlr)
>  {
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index c822eb30e6d1..1c50a3e2eced 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -529,6 +529,7 @@ enum cxl_opcode {
>  	CXL_MBOX_OP_GET_LOG		= 0x0401,
>  	CXL_MBOX_OP_GET_SUPPORTED_FEATURES	= 0x0500,
>  	CXL_MBOX_OP_GET_FEATURE		= 0x0501,
> +	CXL_MBOX_OP_SET_FEATURE		= 0x0502,
>  	CXL_MBOX_OP_IDENTIFY		= 0x4000,
>  	CXL_MBOX_OP_GET_PARTITION_INFO	= 0x4100,
>  	CXL_MBOX_OP_SET_PARTITION_INFO	= 0x4101,
> @@ -777,6 +778,34 @@ struct cxl_mbox_get_feat_in {
>  	u8 selection;
>  }  __packed;
>  
> +/*
> + * Set Feature CXL 3.1 Spec 8.2.9.6.3
> + */
> +
> +/*
> + * Set Feature input payload
> + * CXL rev 3.1 section 8.2.9.6.3 Table 8-101
> + */
> +/* Set Feature : Payload in flags */
> +#define CXL_SET_FEAT_FLAG_DATA_TRANSFER_MASK	GENMASK(2, 0)
> +enum cxl_set_feat_flag_data_transfer {
> +	CXL_SET_FEAT_FLAG_FULL_DATA_TRANSFER,
> +	CXL_SET_FEAT_FLAG_INITIATE_DATA_TRANSFER,
> +	CXL_SET_FEAT_FLAG_CONTINUE_DATA_TRANSFER,
> +	CXL_SET_FEAT_FLAG_FINISH_DATA_TRANSFER,
> +	CXL_SET_FEAT_FLAG_ABORT_DATA_TRANSFER,
> +	CXL_SET_FEAT_FLAG_DATA_TRANSFER_MAX
> +};
> +#define CXL_SET_FEAT_FLAG_DATA_SAVED_ACROSS_RESET	BIT(3)
> +
> +struct cxl_mbox_set_feat_hdr {
> +	uuid_t uuid;
> +	__le32 flags;
> +	__le16 offset;
> +	u8 version;
> +	u8 rsvd[9];
> +}  __packed;
> +
>  /* Get Poison List  CXL 3.0 Spec 8.2.9.8.4.1 */
>  struct cxl_mbox_poison_in {
>  	__le64 offset;
> @@ -916,6 +945,10 @@ size_t cxl_get_feature(struct cxl_memdev_state *mds,
>  		       size_t feat_out_size,
>  		       size_t feat_out_min_size,
>  		       enum cxl_get_feat_selection selection);
> +int cxl_set_feature(struct cxl_memdev_state *mds,
> +		    const uuid_t feat_uuid, u8 feat_version,
> +		    void *feat_data, size_t feat_data_size,
> +		    u8 feat_flag);
>  int cxl_poison_state_init(struct cxl_memdev_state *mds);
>  int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
>  		       struct cxl_region *cxlr);
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [RFC PATCH v8 01/10] ras: scrub: Add scrub subsystem
  2024-04-25 10:15   ` Borislav Petkov
@ 2024-04-25 18:11     ` Shiju Jose
  0 siblings, 0 replies; 22+ messages in thread
From: Shiju Jose @ 2024-04-25 18:11 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-cxl@vger.kernel.org, linux-acpi@vger.kernel.org,
	linux-mm@kvack.org, dan.j.williams@intel.com, dave@stgolabs.net,
	Jonathan Cameron, dave.jiang@intel.com,
	alison.schofield@intel.com, vishal.l.verma@intel.com,
	ira.weiny@intel.com, linux-edac@vger.kernel.org,
	linux-kernel@vger.kernel.org, david@redhat.com,
	Vilas.Sridharan@amd.com, leo.duran@amd.com, Yazen.Ghannam@amd.com,
	rientjes@google.com, jiaqiyan@google.com, tony.luck@intel.com,
	Jon.Grimm@amd.com, dave.hansen@linux.intel.com, rafael@kernel.org,
	lenb@kernel.org, naoya.horiguchi@nec.com, james.morse@arm.com,
	jthoughton@google.com, somasundaram.a@hpe.com,
	erdemaktas@google.com, pgonda@google.com, duenwen@google.com,
	mike.malvestuto@intel.com, gthelen@google.com,
	wschwartz@amperecomputing.com, dferguson@amperecomputing.com,
	wbs@os.amperecomputing.com, nifan.cxl@gmail.com, tanxiaofei,
	Zengtao (B), kangkang.shen@futurewei.com, wanghuiqiang, Linuxarm

Hi Boris,

Thanks for the feedbacks. 

Please find reply inline,

Thanks,
Shiju
>-----Original Message-----
>From: Borislav Petkov <bp@alien8.de>
>Sent: 25 April 2024 11:16
>To: Shiju Jose <shiju.jose@huawei.com>
>Cc: linux-cxl@vger.kernel.org; linux-acpi@vger.kernel.org; linux-
>mm@kvack.org; dan.j.williams@intel.com; dave@stgolabs.net; Jonathan
>Cameron <jonathan.cameron@huawei.com>; dave.jiang@intel.com;
>alison.schofield@intel.com; vishal.l.verma@intel.com; ira.weiny@intel.com;
>linux-edac@vger.kernel.org; linux-kernel@vger.kernel.org; david@redhat.com;
>Vilas.Sridharan@amd.com; leo.duran@amd.com; Yazen.Ghannam@amd.com;
>rientjes@google.com; jiaqiyan@google.com; tony.luck@intel.com;
>Jon.Grimm@amd.com; dave.hansen@linux.intel.com; rafael@kernel.org;
>lenb@kernel.org; naoya.horiguchi@nec.com; james.morse@arm.com;
>jthoughton@google.com; somasundaram.a@hpe.com;
>erdemaktas@google.com; pgonda@google.com; duenwen@google.com;
>mike.malvestuto@intel.com; gthelen@google.com;
>wschwartz@amperecomputing.com; dferguson@amperecomputing.com;
>wbs@os.amperecomputing.com; nifan.cxl@gmail.com; tanxiaofei
><tanxiaofei@huawei.com>; Zengtao (B) <prime.zeng@hisilicon.com>;
>kangkang.shen@futurewei.com; wanghuiqiang <wanghuiqiang@huawei.com>;
>Linuxarm <linuxarm@huawei.com>
>Subject: Re: [RFC PATCH v8 01/10] ras: scrub: Add scrub subsystem
>
>On Sat, Apr 20, 2024 at 12:47:10AM +0800, shiju.jose@huawei.com wrote:
>> From: Shiju Jose <shiju.jose@huawei.com>
>>
>> Add scrub subsystem supports configuring the memory scrubbers in the
>> system. The scrub subsystem provides the interface for registering the
>> scrub devices. The scrub control attributes are provided to the user
>> in /sys/class/ras/rasX/scrub
>>
>> Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
>> ---
>>  .../ABI/testing/sysfs-class-scrub-configure   |  47 +++
>>  drivers/ras/Kconfig                           |   7 +
>>  drivers/ras/Makefile                          |   1 +
>>  drivers/ras/memory_scrub.c                    | 271 ++++++++++++++++++
>>  include/linux/memory_scrub.h                  |  37 +++
>>  5 files changed, 363 insertions(+)
>>  create mode 100644
>> Documentation/ABI/testing/sysfs-class-scrub-configure
>>  create mode 100755 drivers/ras/memory_scrub.c  create mode 100755
>> include/linux/memory_scrub.h
>
>ERROR: modpost: missing MODULE_LICENSE() in drivers/ras/memory_scrub.o
>make[2]: *** [scripts/Makefile.modpost:145: Module.symvers] Error 1
>make[1]: *** [/mnt/kernel/kernel/2nd/linux/Makefile:1871: modpost] Error 2
>make: *** [Makefile:240: __sub-make] Error 2
>
>Each patch of yours needs to build.

Fixed.

>
>> diff --git a/Documentation/ABI/testing/sysfs-class-scrub-configure
>> b/Documentation/ABI/testing/sysfs-class-scrub-configure
>> new file mode 100644
>> index 000000000000..3ed77dbb00ad
>> --- /dev/null
>> +++ b/Documentation/ABI/testing/sysfs-class-scrub-configure
>> @@ -0,0 +1,47 @@
>> +What:		/sys/class/ras/
>> +Date:		March 2024
>> +KernelVersion:	6.9
>> +Contact:	linux-kernel@vger.kernel.org
>> +Description:
>> +		The ras/ class subdirectory belongs to the
>> +		common ras features such as scrub subsystem.
>> +
>> +What:		/sys/class/ras/rasX/scrub/
>> +Date:		March 2024
>> +KernelVersion:	6.9
>> +Contact:	linux-kernel@vger.kernel.org
>> +Description:
>> +		The /sys/class/ras/ras{0,1,2,3,...}/scrub directories
>
>You have different scrubbers.
>
>I'd prefer if you put their names in here instead and do this structure:
>
>/sys/class/ras/scrub/cxl-patrol
>		    /ars
>		    /cxl-ecs
>		    /acpi-ras2
>
>and so on.
>
>Unless the idea is for those devices to have multiple RAS-specific functionality
>than just scrubbing. Then you want to do
>
>/sys/class/ras/cxl/scrub
>		  /other_function
>
>/sys/class/ras/ars/scrub
>		  /...
>
>You get the idea.
It is expected to have multiple RAS-specific functionalities other than scrubbing  in long run.
Most of the classes in the kernel found as  /sys/class/<class-name>/<class-name>X/   

If not, however /sys/class/ras/<module -name>X/<feature> is more suitable because
there are multiple device instances such as cxl devices with scrub control feature.
For example, /sys/class/ras/cxlX/scrub
 
>
>> +		correspond to each scrub device registered with the
>> +		scrub subsystem.
>> +
>> +What:		/sys/class/ras/rasX/scrub/name
>> +Date:		March 2024
>> +KernelVersion:	6.9
>> +Contact:	linux-kernel@vger.kernel.org
>> +Description:
>> +		(RO) name of the memory scrubber
>> +
>> +What:		/sys/class/ras/rasX/scrub/enable_background
>> +Date:		March 2024
>> +KernelVersion:	6.9
>> +Contact:	linux-kernel@vger.kernel.org
>> +Description:
>> +		(RW) Enable/Disable background(patrol) scrubbing if supported.
>> +
>> +What:		/sys/class/ras/rasX/scrub/rate_available
>
>That's dumping a range so I guess it should be called probably "possible_rates"
>or so, so that it is clear what it means.
>
>If some scrubbers support only a discrete set of rate values, then
>"possible_rates" fits too if you dump them as a list of values.
Sure. Will check.

>
>> +Date:		March 2024
>> +KernelVersion:	6.9
>> +Contact:	linux-kernel@vger.kernel.org
>> +Description:
>> +		(RO) Supported range for the scrub rate by the scrubber.
>> +		The scrub rate represents in hours.
>> +
>> +What:		/sys/class/ras/rasX/scrub/rate
>> +Date:		March 2024
>> +KernelVersion:	6.9
>> +Contact:	linux-kernel@vger.kernel.org
>> +Description:
>> +		(RW) The scrub rate specified and it must be with in the
>> +		supported range by the scrubber.
>> +		The scrub rate represents in hours.
>> diff --git a/drivers/ras/Kconfig b/drivers/ras/Kconfig index
>> fc4f4bb94a4c..181701479564 100644
>> --- a/drivers/ras/Kconfig
>> +++ b/drivers/ras/Kconfig
>> @@ -46,4 +46,11 @@ config RAS_FMPM
>>  	  Memory will be retired during boot time and run time depending on
>>  	  platform-specific policies.
>>
>> +config SCRUB
>> +	tristate "Memory scrub driver"
>> +	help
>> +	  This option selects the memory scrub subsystem, supports
>
>s/This option selects/Enable/
Sure.

>
>> +	  configuring the parameters of underlying scrubbers in the
>> +	  system for the DRAM memories.
>> +
>>  endif
>> diff --git a/drivers/ras/Makefile b/drivers/ras/Makefile index
>> 11f95d59d397..89bcf0d84355 100644
>> --- a/drivers/ras/Makefile
>> +++ b/drivers/ras/Makefile
>> @@ -2,6 +2,7 @@
>>  obj-$(CONFIG_RAS)	+= ras.o
>>  obj-$(CONFIG_DEBUG_FS)	+= debugfs.o
>>  obj-$(CONFIG_RAS_CEC)	+= cec.o
>> +obj-$(CONFIG_SCRUB)	+= memory_scrub.o
>>
>>  obj-$(CONFIG_RAS_FMPM)	+= amd/fmpm.o
>>  obj-y			+= amd/atl/
>> diff --git a/drivers/ras/memory_scrub.c b/drivers/ras/memory_scrub.c
>> new file mode 100755 index 000000000000..7e995380ec3a
>> --- /dev/null
>> +++ b/drivers/ras/memory_scrub.c
>> @@ -0,0 +1,271 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * Memory scrub subsystem supports configuring the registered
>> + * memory scrubbers.
>> + *
>> + * Copyright (c) 2024 HiSilicon Limited.
>> + */
>> +
>> +#define pr_fmt(fmt)     "MEM SCRUB: " fmt
>> +
>> +#include <linux/acpi.h>
>> +#include <linux/bitops.h>
>> +#include <linux/delay.h>
>> +#include <linux/kfifo.h>
>> +#include <linux/memory_scrub.h>
>> +#include <linux/platform_device.h>
>> +#include <linux/spinlock.h>
>> +
>> +/* memory scrubber config definitions */
>
>No need for that comment.
Will remove.
>
>> +static ssize_t rate_available_show(struct device *dev,
>> +				   struct device_attribute *attr,
>> +				   char *buf)
>> +{
>> +	struct scrub_device *scrub_dev = to_scrub_device(dev);
>> +	u64 min_sr, max_sr;
>> +	int ret;
>> +
>> +	ret = scrub_dev->ops->rate_avail_range(dev, &min_sr, &max_sr);
>> +	if (ret)
>> +		return ret;
>> +
>> +	return sysfs_emit(buf, "0x%llx-0x%llx\n", min_sr, max_sr); }
>
>This glue driver will need to store the min and max scrub rates on init and
>rate_store() will have to verify the newly supplied rate is within that range
>before writing it.
>
>Not the user, nor the underlying hw driver.
Presently underlying hw driver does the check. I think this will become more
complex if does in the common rate_store() if we have to check against either a list of
possible rates or min and max rates.

>
>> +
>> +DEVICE_ATTR_RW(enable_background);
>> +DEVICE_ATTR_RO(name);
>> +DEVICE_ATTR_RW(rate);
>> +DEVICE_ATTR_RO(rate_available);
>
>static
>
>> +
>> +static struct attribute *scrub_attrs[] = {
>> +	&dev_attr_enable_background.attr,
>> +	&dev_attr_name.attr,
>> +	&dev_attr_rate.attr,
>> +	&dev_attr_rate_available.attr,
>> +	NULL
>> +};
>> +
>> +static umode_t scrub_attr_visible(struct kobject *kobj,
>> +				  struct attribute *a, int attr_id) {
>> +	struct device *dev = kobj_to_dev(kobj);
>> +	struct scrub_device *scrub_dev = to_scrub_device(dev);
>> +	const struct scrub_ops *ops = scrub_dev->ops;
>> +
>> +	if (a == &dev_attr_enable_background.attr) {
>> +		if (ops->set_enabled_bg && ops->get_enabled_bg)
>> +			return a->mode;
>> +		if (ops->get_enabled_bg)
>> +			return 0444;
>> +		return 0;
>> +	}
>> +	if (a == &dev_attr_name.attr)
>> +		return ops->get_name ? a->mode : 0;
>> +	if (a == &dev_attr_rate_available.attr)
>> +		return ops->rate_avail_range ? a->mode : 0;
>> +	if (a == &dev_attr_rate.attr) { /* Write only makes little sense */
>> +		if (ops->rate_read && ops->rate_write)
>> +			return a->mode;
>> +		if (ops->rate_read)
>> +			return 0444;
>> +		return 0;
>> +	}
>
>All of that stuff's permissions should be root-only.
Sure.

>
>> +
>> +	return 0;
>> +}
>> +
>> +static const struct attribute_group scrub_attr_group = {
>> +	.name		= "scrub",
>> +	.attrs		= scrub_attrs,
>> +	.is_visible	= scrub_attr_visible,
>> +};
>> +
>> +static const struct attribute_group *scrub_attr_groups[] = {
>> +	&scrub_attr_group,
>> +	NULL
>> +};
>> +
>> +static void scrub_dev_release(struct device *dev) {
>> +	struct scrub_device *scrub_dev = to_scrub_device(dev);
>> +
>> +	ida_free(&scrub_ida, scrub_dev->id);
>> +	kfree(scrub_dev);
>> +}
>> +
>> +static struct class scrub_class = {
>> +	.name = "ras",
>> +	.dev_groups = scrub_attr_groups,
>> +	.dev_release = scrub_dev_release,
>> +};
>> +
>> +static struct device *
>> +scrub_device_register(struct device *parent, void *drvdata,
>> +		      const struct scrub_ops *ops)
>> +{
>> +	struct scrub_device *scrub_dev;
>> +	struct device *hdev;
>> +	int err;
>> +
>> +	scrub_dev = kzalloc(sizeof(*scrub_dev), GFP_KERNEL);
>> +	if (!scrub_dev)
>> +		return ERR_PTR(-ENOMEM);
>> +	hdev = &scrub_dev->dev;
>> +
>> +	scrub_dev->id = ida_alloc(&scrub_ida, GFP_KERNEL);
>
>What's that silly thing for?
This is the ras instance id (X) used for scrub control feature, /sys/class/ras/rasX/scrub/

>
>> +	if (scrub_dev->id < 0) {
>> +		kfree(scrub_dev);
>> +		return ERR_PTR(-ENOMEM);
>> +	}
>> +
>> +	scrub_dev->ops = ops;
>> +	hdev->class = &scrub_class;
>> +	hdev->parent = parent;
>> +	dev_set_drvdata(hdev, drvdata);
>> +	dev_set_name(hdev, SCRUB_ID_FORMAT, scrub_dev->id);
>> +	err = device_register(hdev);
>> +	if (err) {
>> +		put_device(hdev);
>> +		return ERR_PTR(err);
>> +	}
>> +
>> +	return hdev;
>> +}
>> +
>> +static void devm_scrub_release(void *dev) {
>> +	device_unregister(dev);
>> +}
>> +
>> +/**
>> + * devm_scrub_device_register - register scrubber device
>> + * @dev: the parent device
>> + * @drvdata: driver data to attach to the scrub device
>> + * @ops: pointer to scrub_ops structure (optional)
>> + *
>> + * Returns the pointer to the new device on success, ERR_PTR() otherwise.
>> + * The new device would be automatically unregistered with the parent
>device.
>> + */
>> +struct device *
>> +devm_scrub_device_register(struct device *dev, void *drvdata,
>> +			   const struct scrub_ops *ops)
>> +{
>> +	struct device *hdev;
>> +	int ret;
>> +
>> +	if (!dev)
>> +		return ERR_PTR(-EINVAL);
>> +
>> +	hdev = scrub_device_register(dev, drvdata, ops);
>> +	if (IS_ERR(hdev))
>> +		return hdev;
>> +
>> +	ret = devm_add_action_or_reset(dev, devm_scrub_release, hdev);
>> +	if (ret)
>> +		return ERR_PTR(ret);
>> +
>> +	return hdev;
>> +}
>> +EXPORT_SYMBOL_GPL(devm_scrub_device_register);
>> +
>> +static int __init memory_scrub_control_init(void) {
>> +	return class_register(&scrub_class); }
>> +subsys_initcall(memory_scrub_control_init);
>
>You can't just blindly register this thing without checking whether there are even
>any hw scrubber devices on the system.
I  think it happens only when a dependent module as autoloaded based on a scrub device existing with exception of memory scrub control built in and who would build this in?

>
>--
>Regards/Gruss,
>    Boris.
>
Thanks,
Shiju

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v8 05/10] cxl/memscrub: Add CXL device patrol scrub control feature
  2024-04-19 16:47 ` [RFC PATCH v8 05/10] cxl/memscrub: Add CXL device patrol scrub control feature shiju.jose
@ 2024-04-26 23:56   ` fan
  2024-04-29 11:20     ` Shiju Jose
  0 siblings, 1 reply; 22+ messages in thread
From: fan @ 2024-04-26 23:56 UTC (permalink / raw)
  Cc: linux-cxl, linux-acpi, linux-mm, dan.j.williams, dave,
	jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
	ira.weiny, linux-edac, linux-kernel, david, Vilas.Sridharan,
	leo.duran, Yazen.Ghannam, rientjes, jiaqiyan, tony.luck,
	Jon.Grimm, dave.hansen, rafael, lenb, naoya.horiguchi,
	james.morse, jthoughton, somasundaram.a, erdemaktas, pgonda,
	duenwen, mike.malvestuto, gthelen, wschwartz, dferguson, wbs,
	nifan.cxl, tanxiaofei, prime.zeng, kangkang.shen, wanghuiqiang,
	linuxarm

On Sat, Apr 20, 2024 at 12:47:14AM +0800, shiju.jose@huawei.com wrote:
> From: Shiju Jose <shiju.jose@huawei.com>
> 
> CXL spec 3.1 section 8.2.9.9.11.1 describes the device patrol scrub control
> feature. The device patrol scrub proactively locates and makes corrections
> to errors in regular cycle.
> 
> Allow specifying the number of hours within which the patrol scrub must be
> completed, subject to minimum and maximum limits reported by the device.
> Also allow disabling scrub allowing trade-off error rates against
> performance.
> 
> Register with scrub subsystem to provide scrub control attributes to the
> user.
> 
> Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
> ---
>  Documentation/scrub/scrub-configure.rst |  52 ++++
>  drivers/cxl/Kconfig                     |  19 ++
>  drivers/cxl/core/Makefile               |   1 +
>  drivers/cxl/core/memscrub.c             | 314 ++++++++++++++++++++++++
>  drivers/cxl/cxlmem.h                    |   8 +
>  drivers/cxl/mem.c                       |   6 +
>  6 files changed, 400 insertions(+)
>  create mode 100644 Documentation/scrub/scrub-configure.rst
>  create mode 100644 drivers/cxl/core/memscrub.c
> 
> diff --git a/Documentation/scrub/scrub-configure.rst b/Documentation/scrub/scrub-configure.rst
> new file mode 100644
> index 000000000000..2275366b60d3
> --- /dev/null
> +++ b/Documentation/scrub/scrub-configure.rst
> @@ -0,0 +1,52 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +================
> +Scrub subsystem
> +================
> +
> +Copyright (c) 2024 HiSilicon Limited.
> +
> +:Author:   Shiju Jose <shiju.jose@huawei.com>
> +:License:  The GNU Free Documentation License, Version 1.2
> +          (dual licensed under the GPL v2)
> +:Original Reviewers:
> +
> +- Written for: 6.9
> +- Updated for:
> +
> +Introduction
> +------------
> +The scrub subsystem provides interface for controlling attributes
> +of memory scrubbers in the system. The scrub device drivers
> +in the system register with the scrub subsystem.The scrub subsystem
> +driver exposes the scrub controls to the user in the sysfs.
> +
> +The File System
> +---------------
> +
> +The control attributes of the registered scrubbers could be
> +accessed in the /sys/class/ras/rasX/scrub/
> +
> +sysfs
> +-----
> +
> +Sysfs files are documented in
> +`Documentation/ABI/testing/sysfs-class-scrub-configure`.
> +
> +Example
> +-------
> +
> +The usage takes the form shown in this example::
> +
> +1. CXL patrol scrubber
> +    # cat /sys/class/ras/ras0/scrub/rate_available
> +    # 0x1-0xff
> +    # echo 30 > /sys/class/ras/ras0/scrub/rate
> +    # cat /sys/class/ras/ras0/scrub/rate
> +    # 0x1e
> +    # echo 1 > /sys/class/ras/ras0/scrub/enable_background
> +    # cat /sys/class/ras/ras0/scrub/enable_background
> +    # 1
> +    # echo 0 > /sys/class/ras/ras0/scrub/enable_background
> +    # cat /sys/class/ras/ras0/scrub/enable_background
> +    # 0
> diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> index 5f3c9c5529b9..3621b9f27e80 100644
> --- a/drivers/cxl/Kconfig
> +++ b/drivers/cxl/Kconfig
> @@ -144,4 +144,23 @@ config CXL_REGION_INVALIDATION_TEST
>  	  If unsure, or if this kernel is meant for production environments,
>  	  say N.
>  
> +config CXL_SCRUB
> +	bool "CXL: Memory scrub feature"
> +	depends on CXL_PCI
> +	depends on CXL_MEM
> +	depends on SCRUB
> +	help
> +	  The CXL memory scrub control is an optional feature allows host to
> +	  control the scrub configurations of CXL Type 3 devices, which
> +	  supports patrol scrubbing.
> +
> +	  Registers with the scrub subsystem to provide control attributes
> +	  of CXL memory device scrubber to the user.
> +	  Provides interface functions to support configuring the CXL memory
> +	  device patrol scrubber.
> +
> +	  Say 'y/n' to enable/disable control of memory scrub parameters for
> +	  CXL.mem devices. See section 8.2.9.9.11.1 of CXL 3.1 specification
> +	  for detailed description of CXL memory patrol scrub control feature.
> +
>  endif
> diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile
> index 9259bcc6773c..e0fc814c3983 100644
> --- a/drivers/cxl/core/Makefile
> +++ b/drivers/cxl/core/Makefile
> @@ -16,3 +16,4 @@ cxl_core-y += pmu.o
>  cxl_core-y += cdat.o
>  cxl_core-$(CONFIG_TRACING) += trace.o
>  cxl_core-$(CONFIG_CXL_REGION) += region.o
> +cxl_core-$(CONFIG_CXL_SCRUB) += memscrub.o
> diff --git a/drivers/cxl/core/memscrub.c b/drivers/cxl/core/memscrub.c
> new file mode 100644
> index 000000000000..a50f6e384394
> --- /dev/null
> +++ b/drivers/cxl/core/memscrub.c
> @@ -0,0 +1,314 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * CXL memory scrub driver.
> + *
> + * Copyright (c) 2024 HiSilicon Limited.
> + *
> + *  - Provides functions to configure patrol scrub feature of the
> + *    CXL memory devices.
> + *  - Registers with the scrub subsystem driver to expose the sysfs attributes
> + *    to the user for configuring the CXL memory patrol scrub feature.
> + */
> +
> +#define pr_fmt(fmt)	"CXL_MEM_SCRUB: " fmt
> +
> +#include <cxlmem.h>
> +#include <linux/cleanup.h>
> +#include <linux/limits.h>
> +#include <linux/memory_scrub.h>
> +
> +static int cxl_mem_get_supported_feature_entry(struct cxl_memdev *cxlmd, const uuid_t *feat_uuid,
> +					       struct cxl_mbox_supp_feat_entry *feat_entry_out)
> +{
> +	struct cxl_mbox_supp_feat_entry *feat_entry;
> +	struct cxl_dev_state *cxlds = cxlmd->cxlds;
> +	struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
> +	int feat_index, feats_out_size;
> +	int nentries, count;
> +	int ret;
> +
> +	feat_index = 0;
> +	feats_out_size = sizeof(struct cxl_mbox_get_supp_feats_out) +
> +			  sizeof(struct cxl_mbox_supp_feat_entry);
> +	struct cxl_mbox_get_supp_feats_out *feats_out __free(kfree) =
> +					kmalloc(feats_out_size, GFP_KERNEL);
> +	if (!feats_out)
> +		return -ENOMEM;
> +
> +	while (true) {
> +		memset(feats_out, 0, feats_out_size);
> +		ret = cxl_get_supported_features(mds, feats_out_size,
> +						 feat_index, feats_out);
> +		if (ret)
> +			return ret;
> +
> +		nentries = feats_out->nr_entries;
> +		if (!nentries)
> +			return -EOPNOTSUPP;
> +
> +		/* Check CXL memdev supports the feature */
> +		feat_entry = feats_out->feat_entries;
> +		for (count = 0; count < nentries; count++, feat_entry++) {
> +			if (uuid_equal(&feat_entry->uuid, feat_uuid)) {
> +				memcpy(feat_entry_out, feat_entry,
> +				       sizeof(*feat_entry_out));
> +				return 0;
> +			}
> +		}
> +		feat_index += nentries;
> +	}
> +}
> +
> +/* CXL memory patrol scrub control definitions */
> +#define CXL_MEMDEV_PS_GET_FEAT_VERSION	0x01
> +#define CXL_MEMDEV_PS_SET_FEAT_VERSION	0x01
> +
> +static const uuid_t cxl_patrol_scrub_uuid =
> +	UUID_INIT(0x96dad7d6, 0xfde8, 0x482b, 0xa7, 0x33, 0x75, 0x77, 0x4e,     \
> +		  0x06, 0xdb, 0x8a);
> +
> +/* CXL memory patrol scrub control functions */
> +struct cxl_patrol_scrub_context {
> +	struct device *dev;
> +	u16 get_feat_size;
> +	u16 set_feat_size;
> +	bool scrub_cycle_changeable;
> +};
> +
> +/**
> + * struct cxl_memdev_ps_params - CXL memory patrol scrub parameter data structure.
> + * @enable:     [IN & OUT] enable(1)/disable(0) patrol scrub.
> + * @scrub_cycle_changeable: [OUT] scrub cycle attribute of patrol scrub is changeable.
> + * @rate:       [IN] Requested patrol scrub cycle in hours.
> + *              [OUT] Current patrol scrub cycle in hours.
> + * @min_rate:[OUT] minimum patrol scrub cycle, in hours, supported.
> + */
> +struct cxl_memdev_ps_params {
> +	bool enable;
> +	bool scrub_cycle_changeable;
> +	u16 rate;
> +	u16 min_rate;
> +};
> +
> +enum cxl_scrub_param {
> +	cxl_ps_param_enable,
> +	cxl_ps_param_rate,
> +};
> +
> +#define	CXL_MEMDEV_PS_SCRUB_CYCLE_CHANGE_CAP_MASK	BIT(0)
> +#define	CXL_MEMDEV_PS_SCRUB_CYCLE_REALTIME_REPORT_CAP_MASK	BIT(1)
> +#define	CXL_MEMDEV_PS_CUR_SCRUB_CYCLE_MASK	GENMASK(7, 0)
> +#define	CXL_MEMDEV_PS_MIN_SCRUB_CYCLE_MASK	GENMASK(15, 8)
> +#define	CXL_MEMDEV_PS_FLAG_ENABLED_MASK	BIT(0)
> +
> +struct cxl_memdev_ps_rd_attrs {
> +	u8 scrub_cycle_cap;
> +	__le16 scrub_cycle;
> +	u8 scrub_flags;
> +}  __packed;
> +
> +struct cxl_memdev_ps_wr_attrs {
> +	u8 scrub_cycle_hr;
> +	u8 scrub_flags;
> +}  __packed;
> +

In this patch, generally "rate" is used for cycle in hour, here we use
scrub_cycle_hr. I am not sure whether "rate" is the proper term for the
purpose, "interval" or "cycle" seems more straightforward for me.
But someone else may have a different thought about it.

> +static int cxl_mem_ps_get_attrs(struct device *dev,
> +				struct cxl_memdev_ps_params *params)
> +{
> +	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> +	struct cxl_dev_state *cxlds = cxlmd->cxlds;
> +	struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
> +	size_t rd_data_size = sizeof(struct cxl_memdev_ps_rd_attrs);
> +	size_t data_size;
> +
> +	if (!mds)
> +		return -EFAULT;
> +
> +	struct cxl_memdev_ps_rd_attrs *rd_attrs __free(kfree) =
> +						kmalloc(rd_data_size, GFP_KERNEL);
> +	if (!rd_attrs)
> +		return -ENOMEM;
> +
> +	data_size = cxl_get_feature(mds, cxl_patrol_scrub_uuid, rd_attrs,
> +				    rd_data_size, rd_data_size,
> +				    CXL_GET_FEAT_SEL_CURRENT_VALUE);
> +	if (!data_size)
> +		return -EIO;
> +
> +	params->scrub_cycle_changeable = FIELD_GET(CXL_MEMDEV_PS_SCRUB_CYCLE_CHANGE_CAP_MASK,
> +						   rd_attrs->scrub_cycle_cap);
> +	params->enable = FIELD_GET(CXL_MEMDEV_PS_FLAG_ENABLED_MASK,
> +				   rd_attrs->scrub_flags);
> +	params->rate = FIELD_GET(CXL_MEMDEV_PS_CUR_SCRUB_CYCLE_MASK,
> +				 rd_attrs->scrub_cycle);
> +	params->min_rate = FIELD_GET(CXL_MEMDEV_PS_MIN_SCRUB_CYCLE_MASK,
> +				      rd_attrs->scrub_cycle);
> +
> +	return 0;
> +}
> +
> +static int cxl_mem_ps_set_attrs(struct device *dev, struct cxl_memdev_ps_params *params,
> +				enum cxl_scrub_param param_type)
> +{
> +	struct cxl_memdev_ps_wr_attrs wr_attrs;
> +	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> +	struct cxl_dev_state *cxlds = cxlmd->cxlds;
> +	struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
> +	struct cxl_memdev_ps_params rd_params;
> +	int ret;
> +
> +	ret = cxl_mem_ps_get_attrs(dev, &rd_params);
> +	if (ret) {
> +		dev_err(dev, "Get cxlmemdev patrol scrub params failed ret=%d\n",
> +			ret);
> +		return ret;
> +	}
> +
> +	switch (param_type) {
> +	case cxl_ps_param_enable:
> +		wr_attrs.scrub_flags = FIELD_PREP(CXL_MEMDEV_PS_FLAG_ENABLED_MASK,
> +						   params->enable);
> +		wr_attrs.scrub_cycle_hr = FIELD_PREP(CXL_MEMDEV_PS_CUR_SCRUB_CYCLE_MASK,
> +						      rd_params.rate);
> +		break;
> +	case cxl_ps_param_rate:
> +		if (params->rate < rd_params.min_rate) {
> +			dev_err(dev, "Invalid CXL patrol scrub cycle(%d) to set\n",
> +				params->rate);
> +			dev_err(dev, "Minimum supported CXL patrol scrub cycle in hour %d\n",
> +			       params->min_rate);
> +			return -EINVAL;
> +		}
> +		wr_attrs.scrub_cycle_hr = FIELD_PREP(CXL_MEMDEV_PS_CUR_SCRUB_CYCLE_MASK,
> +						     params->rate);
> +		wr_attrs.scrub_flags = FIELD_PREP(CXL_MEMDEV_PS_FLAG_ENABLED_MASK,
> +						  rd_params.enable);
> +		break;
> +	}
> +
> +	ret = cxl_set_feature(mds, cxl_patrol_scrub_uuid, CXL_MEMDEV_PS_SET_FEAT_VERSION,
> +			      &wr_attrs, sizeof(wr_attrs),
> +			      CXL_SET_FEAT_FLAG_DATA_SAVED_ACROSS_RESET);
> +	if (ret)
> +		dev_err(dev, "CXL patrol scrub set feature failed ret=%d\n",
> +			ret);
> +
> +	return ret;
> +}
> +
> +static int cxl_patrol_scrub_get_enabled_bg(struct device *dev, bool *enabled)
> +{
> +	struct cxl_memdev_ps_params params;
> +	int ret;
> +
> +	ret = cxl_mem_ps_get_attrs(dev->parent, &params);
> +	if (ret)
> +		return ret;
> +
> +	*enabled = params.enable;
> +
> +	return 0;
> +}
> +
> +static int cxl_patrol_scrub_set_enabled_bg(struct device *dev, bool enable)
> +{
> +	struct cxl_memdev_ps_params params = {
> +		.enable = enable,
> +	};
> +
> +	return cxl_mem_ps_set_attrs(dev->parent, &params, cxl_ps_param_enable);
> +}
> +
> +static int cxl_patrol_scrub_get_name(struct device *dev, char *name)
> +{
> +	struct cxl_memdev *cxlmd = to_cxl_memdev(dev->parent);
> +
> +	return sysfs_emit(name, "%s_%s\n", "cxl_patrol_scrub",
> +			  dev_name(&cxlmd->dev));
> +}
> +
> +static int cxl_patrol_scrub_write_rate(struct device *dev, u64 rate)
> +{
> +	struct cxl_memdev_ps_params params = {
> +		.rate = rate,
> +	};
> +
> +	return cxl_mem_ps_set_attrs(dev->parent, &params, cxl_ps_param_rate);
> +}
> +
> +static int cxl_patrol_scrub_read_rate(struct device *dev, u64 *rate)
> +{
> +	struct cxl_memdev_ps_params params;
> +	int ret;
> +
> +	ret = cxl_mem_ps_get_attrs(dev->parent, &params);
> +	if (ret)
> +		return ret;
> +
> +	*rate = params.rate;
> +
> +	return 0;
> +}
> +
> +static int cxl_patrol_scrub_read_rate_avail(struct device *dev, u64 *min, u64 *max)
> +{
> +	struct cxl_memdev_ps_params params;
> +	int ret;
> +
> +	ret = cxl_mem_ps_get_attrs(dev->parent, &params);
> +	if (ret)
> +		return ret;
> +	*min = params.min_rate;
> +	*max = U8_MAX; /* Max set by register size */
> +
> +	return 0;
> +}
> +
> +static const struct scrub_ops cxl_ps_scrub_ops = {
> +	.get_enabled_bg = cxl_patrol_scrub_get_enabled_bg,
> +	.set_enabled_bg = cxl_patrol_scrub_set_enabled_bg,
> +	.get_name = cxl_patrol_scrub_get_name,
> +	.rate_read = cxl_patrol_scrub_read_rate,
> +	.rate_write = cxl_patrol_scrub_write_rate,
> +	.rate_avail_range = cxl_patrol_scrub_read_rate_avail,
> +};
> +
> +int cxl_mem_patrol_scrub_init(struct cxl_memdev *cxlmd)
> +{
> +	struct cxl_patrol_scrub_context *cxl_ps_ctx;
> +	struct cxl_mbox_supp_feat_entry feat_entry;
> +	struct cxl_memdev_ps_params params;
> +	struct device *cxl_scrub_dev;
> +	int ret;
> +
> +	ret = cxl_mem_get_supported_feature_entry(cxlmd, &cxl_patrol_scrub_uuid,
> +						  &feat_entry);
> +	if (ret < 0)
> +		return ret;
> +
> +	if (!(feat_entry.attr_flags & CXL_FEAT_ENTRY_FLAG_CHANGABLE))
> +		return -EOPNOTSUPP;
> +
> +	ret = cxl_mem_ps_get_attrs(&cxlmd->dev, &params);
> +	if (ret)
> +		return dev_err_probe(&cxlmd->dev, ret,
> +				     "Get CXL patrol scrub params failed\n");
> +
> +	cxl_ps_ctx = devm_kzalloc(&cxlmd->dev, sizeof(*cxl_ps_ctx), GFP_KERNEL);
> +	if (!cxl_ps_ctx)
> +		return -ENOMEM;
> +
> +	*cxl_ps_ctx = (struct cxl_patrol_scrub_context) {
> +		.get_feat_size = feat_entry.get_size,
> +		.set_feat_size = feat_entry.set_size,
> +		.scrub_cycle_changeable =  params.scrub_cycle_changeable,
> +	};
> +
> +	cxl_scrub_dev = devm_scrub_device_register(&cxlmd->dev, cxl_ps_ctx,
> +						   &cxl_ps_scrub_ops);
> +	if (IS_ERR(cxl_scrub_dev))
> +		return PTR_ERR(cxl_scrub_dev);
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_mem_patrol_scrub_init, CXL);
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 1c50a3e2eced..f95e39febd73 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -956,6 +956,14 @@ int cxl_trigger_poison_list(struct cxl_memdev *cxlmd);
>  int cxl_inject_poison(struct cxl_memdev *cxlmd, u64 dpa);
>  int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa);
>  
> +/* cxl memory scrub functions */
> +#ifdef CONFIG_CXL_SCRUB
> +int cxl_mem_patrol_scrub_init(struct cxl_memdev *cxlmd);
> +#else
> +static inline int cxl_mem_patrol_scrub_init(struct cxl_memdev *cxlmd)
> +{ return 0; }
> +#endif
> +
>  #ifdef CONFIG_CXL_SUSPEND
>  void cxl_mem_active_inc(void);
>  void cxl_mem_active_dec(void);
> diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> index 0c79d9ce877c..399e43463626 100644
> --- a/drivers/cxl/mem.c
> +++ b/drivers/cxl/mem.c
> @@ -117,6 +117,12 @@ static int cxl_mem_probe(struct device *dev)
>  	if (!cxlds->media_ready)
>  		return -EBUSY;
>  
> +	rc = cxl_mem_patrol_scrub_init(cxlmd);
> +	if (rc) {
> +		dev_dbg(&cxlmd->dev, "CXL patrol scrub init failed\n");
> +		return rc;
> +	}

If the device does not support memory patrol scrub feature, the above
function will return -EOPNOTSUPP. Since the feature is optional, should we
just warn it and let it go through?

Fan
> +
>  	/*
>  	 * Someone is trying to reattach this device after it lost its port
>  	 * connection (an endpoint port previously registered by this memdev was
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [RFC PATCH v8 05/10] cxl/memscrub: Add CXL device patrol scrub control feature
  2024-04-26 23:56   ` fan
@ 2024-04-29 11:20     ` Shiju Jose
  2024-04-29 12:21       ` Jonathan Cameron
  0 siblings, 1 reply; 22+ messages in thread
From: Shiju Jose @ 2024-04-29 11:20 UTC (permalink / raw)
  To: fan
  Cc: linux-cxl@vger.kernel.org, linux-acpi@vger.kernel.org,
	linux-mm@kvack.org, dan.j.williams@intel.com, dave@stgolabs.net,
	Jonathan Cameron, dave.jiang@intel.com,
	alison.schofield@intel.com, vishal.l.verma@intel.com,
	ira.weiny@intel.com, linux-edac@vger.kernel.org,
	linux-kernel@vger.kernel.org, david@redhat.com,
	Vilas.Sridharan@amd.com, leo.duran@amd.com, Yazen.Ghannam@amd.com,
	rientjes@google.com, jiaqiyan@google.com, tony.luck@intel.com,
	Jon.Grimm@amd.com, dave.hansen@linux.intel.com, rafael@kernel.org,
	lenb@kernel.org, naoya.horiguchi@nec.com, james.morse@arm.com,
	jthoughton@google.com, somasundaram.a@hpe.com,
	erdemaktas@google.com, pgonda@google.com, duenwen@google.com,
	mike.malvestuto@intel.com, gthelen@google.com,
	wschwartz@amperecomputing.com, dferguson@amperecomputing.com,
	wbs@os.amperecomputing.com, tanxiaofei, Zengtao (B),
	kangkang.shen@futurewei.com, wanghuiqiang, Linuxarm


>-----Original Message-----
>From: fan <nifan.cxl@gmail.com>
>Sent: 27 April 2024 00:57
>Cc: linux-cxl@vger.kernel.org; linux-acpi@vger.kernel.org; linux-
>mm@kvack.org; dan.j.williams@intel.com; dave@stgolabs.net; Jonathan
>Cameron <jonathan.cameron@huawei.com>; dave.jiang@intel.com;
>alison.schofield@intel.com; vishal.l.verma@intel.com; ira.weiny@intel.com;
>linux-edac@vger.kernel.org; linux-kernel@vger.kernel.org; david@redhat.com;
>Vilas.Sridharan@amd.com; leo.duran@amd.com; Yazen.Ghannam@amd.com;
>rientjes@google.com; jiaqiyan@google.com; tony.luck@intel.com;
>Jon.Grimm@amd.com; dave.hansen@linux.intel.com; rafael@kernel.org;
>lenb@kernel.org; naoya.horiguchi@nec.com; james.morse@arm.com;
>jthoughton@google.com; somasundaram.a@hpe.com;
>erdemaktas@google.com; pgonda@google.com; duenwen@google.com;
>mike.malvestuto@intel.com; gthelen@google.com;
>wschwartz@amperecomputing.com; dferguson@amperecomputing.com;
>wbs@os.amperecomputing.com; nifan.cxl@gmail.com; tanxiaofei
><tanxiaofei@huawei.com>; Zengtao (B) <prime.zeng@hisilicon.com>;
>kangkang.shen@futurewei.com; wanghuiqiang <wanghuiqiang@huawei.com>;
>Linuxarm <linuxarm@huawei.com>
>Subject: Re: [RFC PATCH v8 05/10] cxl/memscrub: Add CXL device patrol scrub
>control feature
>
>On Sat, Apr 20, 2024 at 12:47:14AM +0800, shiju.jose@huawei.com wrote:
>> From: Shiju Jose <shiju.jose@huawei.com>
>>
>> CXL spec 3.1 section 8.2.9.9.11.1 describes the device patrol scrub
>> control feature. The device patrol scrub proactively locates and makes
>> corrections to errors in regular cycle.
>>
>> Allow specifying the number of hours within which the patrol scrub
>> must be completed, subject to minimum and maximum limits reported by the
>device.
>> Also allow disabling scrub allowing trade-off error rates against
>> performance.
>>
>> Register with scrub subsystem to provide scrub control attributes to
>> the user.
>>
>> Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
>> ---
>>  Documentation/scrub/scrub-configure.rst |  52 ++++
>>  drivers/cxl/Kconfig                     |  19 ++
>>  drivers/cxl/core/Makefile               |   1 +
>>  drivers/cxl/core/memscrub.c             | 314 ++++++++++++++++++++++++
>>  drivers/cxl/cxlmem.h                    |   8 +
>>  drivers/cxl/mem.c                       |   6 +
>>  6 files changed, 400 insertions(+)
>>  create mode 100644 Documentation/scrub/scrub-configure.rst
>>  create mode 100644 drivers/cxl/core/memscrub.c
>>
>> diff --git a/Documentation/scrub/scrub-configure.rst
>> b/Documentation/scrub/scrub-configure.rst
>> new file mode 100644
>> index 000000000000..2275366b60d3
>> --- /dev/null
>> +++ b/Documentation/scrub/scrub-configure.rst
>> @@ -0,0 +1,52 @@
>> +.. SPDX-License-Identifier: GPL-2.0
>> +
>> +================
>> +Scrub subsystem
>> +================
>> +
>> +Copyright (c) 2024 HiSilicon Limited.
>> +
>> +:Author:   Shiju Jose <shiju.jose@huawei.com>
>> +:License:  The GNU Free Documentation License, Version 1.2
>> +          (dual licensed under the GPL v2) :Original Reviewers:
>> +
>> +- Written for: 6.9
>> +- Updated for:
>> +
>> +Introduction
>> +------------
>> +The scrub subsystem provides interface for controlling attributes of
>> +memory scrubbers in the system. The scrub device drivers in the
>> +system register with the scrub subsystem.The scrub subsystem driver
>> +exposes the scrub controls to the user in the sysfs.
>> +
>> +The File System
>> +---------------
>> +
>> +The control attributes of the registered scrubbers could be accessed
>> +in the /sys/class/ras/rasX/scrub/
>> +
>> +sysfs
>> +-----
>> +
>> +Sysfs files are documented in
>> +`Documentation/ABI/testing/sysfs-class-scrub-configure`.
>> +
>> +Example
>> +-------
>> +
>> +The usage takes the form shown in this example::
>> +
>> +1. CXL patrol scrubber
>> +    # cat /sys/class/ras/ras0/scrub/rate_available
>> +    # 0x1-0xff
>> +    # echo 30 > /sys/class/ras/ras0/scrub/rate
>> +    # cat /sys/class/ras/ras0/scrub/rate
>> +    # 0x1e
>> +    # echo 1 > /sys/class/ras/ras0/scrub/enable_background
>> +    # cat /sys/class/ras/ras0/scrub/enable_background
>> +    # 1
>> +    # echo 0 > /sys/class/ras/ras0/scrub/enable_background
>> +    # cat /sys/class/ras/ras0/scrub/enable_background
>> +    # 0
>> diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig index
>> 5f3c9c5529b9..3621b9f27e80 100644
>> --- a/drivers/cxl/Kconfig
>> +++ b/drivers/cxl/Kconfig
>> @@ -144,4 +144,23 @@ config CXL_REGION_INVALIDATION_TEST
>>  	  If unsure, or if this kernel is meant for production environments,
>>  	  say N.
>>
>> +config CXL_SCRUB
>> +	bool "CXL: Memory scrub feature"
>> +	depends on CXL_PCI
>> +	depends on CXL_MEM
>> +	depends on SCRUB
>> +	help
>> +	  The CXL memory scrub control is an optional feature allows host to
>> +	  control the scrub configurations of CXL Type 3 devices, which
>> +	  supports patrol scrubbing.
>> +
>> +	  Registers with the scrub subsystem to provide control attributes
>> +	  of CXL memory device scrubber to the user.
>> +	  Provides interface functions to support configuring the CXL memory
>> +	  device patrol scrubber.
>> +
>> +	  Say 'y/n' to enable/disable control of memory scrub parameters for
>> +	  CXL.mem devices. See section 8.2.9.9.11.1 of CXL 3.1 specification
>> +	  for detailed description of CXL memory patrol scrub control feature.
>> +
>>  endif
>> diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile
>> index 9259bcc6773c..e0fc814c3983 100644
>> --- a/drivers/cxl/core/Makefile
>> +++ b/drivers/cxl/core/Makefile
>> @@ -16,3 +16,4 @@ cxl_core-y += pmu.o
>>  cxl_core-y += cdat.o
>>  cxl_core-$(CONFIG_TRACING) += trace.o
>>  cxl_core-$(CONFIG_CXL_REGION) += region.o
>> +cxl_core-$(CONFIG_CXL_SCRUB) += memscrub.o
>> diff --git a/drivers/cxl/core/memscrub.c b/drivers/cxl/core/memscrub.c
>> new file mode 100644 index 000000000000..a50f6e384394
>> --- /dev/null
>> +++ b/drivers/cxl/core/memscrub.c
>> @@ -0,0 +1,314 @@
>> +// SPDX-License-Identifier: GPL-2.0-or-later
>> +/*
>> + * CXL memory scrub driver.
>> + *
>> + * Copyright (c) 2024 HiSilicon Limited.
>> + *
>> + *  - Provides functions to configure patrol scrub feature of the
>> + *    CXL memory devices.
>> + *  - Registers with the scrub subsystem driver to expose the sysfs attributes
>> + *    to the user for configuring the CXL memory patrol scrub feature.
>> + */
>> +
>> +#define pr_fmt(fmt)	"CXL_MEM_SCRUB: " fmt
>> +
>> +#include <cxlmem.h>
>> +#include <linux/cleanup.h>
>> +#include <linux/limits.h>
>> +#include <linux/memory_scrub.h>
>> +
>> +static int cxl_mem_get_supported_feature_entry(struct cxl_memdev *cxlmd,
>const uuid_t *feat_uuid,
>> +					       struct cxl_mbox_supp_feat_entry
>*feat_entry_out) {
>> +	struct cxl_mbox_supp_feat_entry *feat_entry;
>> +	struct cxl_dev_state *cxlds = cxlmd->cxlds;
>> +	struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
>> +	int feat_index, feats_out_size;
>> +	int nentries, count;
>> +	int ret;
>> +
>> +	feat_index = 0;
>> +	feats_out_size = sizeof(struct cxl_mbox_get_supp_feats_out) +
>> +			  sizeof(struct cxl_mbox_supp_feat_entry);
>> +	struct cxl_mbox_get_supp_feats_out *feats_out __free(kfree) =
>> +					kmalloc(feats_out_size, GFP_KERNEL);
>> +	if (!feats_out)
>> +		return -ENOMEM;
>> +
>> +	while (true) {
>> +		memset(feats_out, 0, feats_out_size);
>> +		ret = cxl_get_supported_features(mds, feats_out_size,
>> +						 feat_index, feats_out);
>> +		if (ret)
>> +			return ret;
>> +
>> +		nentries = feats_out->nr_entries;
>> +		if (!nentries)
>> +			return -EOPNOTSUPP;
>> +
>> +		/* Check CXL memdev supports the feature */
>> +		feat_entry = feats_out->feat_entries;
>> +		for (count = 0; count < nentries; count++, feat_entry++) {
>> +			if (uuid_equal(&feat_entry->uuid, feat_uuid)) {
>> +				memcpy(feat_entry_out, feat_entry,
>> +				       sizeof(*feat_entry_out));
>> +				return 0;
>> +			}
>> +		}
>> +		feat_index += nentries;
>> +	}
>> +}
>> +
>> +/* CXL memory patrol scrub control definitions */
>> +#define CXL_MEMDEV_PS_GET_FEAT_VERSION	0x01
>> +#define CXL_MEMDEV_PS_SET_FEAT_VERSION	0x01
>> +
>> +static const uuid_t cxl_patrol_scrub_uuid =
>> +	UUID_INIT(0x96dad7d6, 0xfde8, 0x482b, 0xa7, 0x33, 0x75, 0x77, 0x4e,
>\
>> +		  0x06, 0xdb, 0x8a);
>> +
>> +/* CXL memory patrol scrub control functions */ struct
>> +cxl_patrol_scrub_context {
>> +	struct device *dev;
>> +	u16 get_feat_size;
>> +	u16 set_feat_size;
>> +	bool scrub_cycle_changeable;
>> +};
>> +
>> +/**
>> + * struct cxl_memdev_ps_params - CXL memory patrol scrub parameter data
>structure.
>> + * @enable:     [IN & OUT] enable(1)/disable(0) patrol scrub.
>> + * @scrub_cycle_changeable: [OUT] scrub cycle attribute of patrol scrub is
>changeable.
>> + * @rate:       [IN] Requested patrol scrub cycle in hours.
>> + *              [OUT] Current patrol scrub cycle in hours.
>> + * @min_rate:[OUT] minimum patrol scrub cycle, in hours, supported.
>> + */
>> +struct cxl_memdev_ps_params {
>> +	bool enable;
>> +	bool scrub_cycle_changeable;
>> +	u16 rate;
>> +	u16 min_rate;
>> +};
>> +
>> +enum cxl_scrub_param {
>> +	cxl_ps_param_enable,
>> +	cxl_ps_param_rate,
>> +};
>> +
>> +#define	CXL_MEMDEV_PS_SCRUB_CYCLE_CHANGE_CAP_MASK	BIT(0)
>> +#define
>	CXL_MEMDEV_PS_SCRUB_CYCLE_REALTIME_REPORT_CAP_MASK
>	BIT(1)
>> +#define	CXL_MEMDEV_PS_CUR_SCRUB_CYCLE_MASK	GENMASK(7, 0)
>> +#define	CXL_MEMDEV_PS_MIN_SCRUB_CYCLE_MASK	GENMASK(15,
>8)
>> +#define	CXL_MEMDEV_PS_FLAG_ENABLED_MASK	BIT(0)
>> +
>> +struct cxl_memdev_ps_rd_attrs {
>> +	u8 scrub_cycle_cap;
>> +	__le16 scrub_cycle;
>> +	u8 scrub_flags;
>> +}  __packed;
>> +
>> +struct cxl_memdev_ps_wr_attrs {
>> +	u8 scrub_cycle_hr;
>> +	u8 scrub_flags;
>> +}  __packed;
>> +
>
>In this patch, generally "rate" is used for cycle in hour, here we use
>scrub_cycle_hr. I am not sure whether "rate" is the proper term for the purpose,
>"interval" or "cycle" seems more straightforward for me.
>But someone else may have a different thought about it.
"rate" is used in scrub control subsystem as common term based on RAS2 definition and 
thus used in the callbacks here.  May be change to "cycle" in every related drivers?

>
>> +static int cxl_mem_ps_get_attrs(struct device *dev,
>> +				struct cxl_memdev_ps_params *params) {
>> +	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
>> +	struct cxl_dev_state *cxlds = cxlmd->cxlds;
>> +	struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
>> +	size_t rd_data_size = sizeof(struct cxl_memdev_ps_rd_attrs);
>> +	size_t data_size;
>> +
>> +	if (!mds)
>> +		return -EFAULT;
>> +
>> +	struct cxl_memdev_ps_rd_attrs *rd_attrs __free(kfree) =
>> +						kmalloc(rd_data_size,
>GFP_KERNEL);
>> +	if (!rd_attrs)
>> +		return -ENOMEM;
>> +
>> +	data_size = cxl_get_feature(mds, cxl_patrol_scrub_uuid, rd_attrs,
>> +				    rd_data_size, rd_data_size,
>> +				    CXL_GET_FEAT_SEL_CURRENT_VALUE);
>> +	if (!data_size)
>> +		return -EIO;
>> +
>> +	params->scrub_cycle_changeable =
>FIELD_GET(CXL_MEMDEV_PS_SCRUB_CYCLE_CHANGE_CAP_MASK,
>> +						   rd_attrs->scrub_cycle_cap);
>> +	params->enable =
>FIELD_GET(CXL_MEMDEV_PS_FLAG_ENABLED_MASK,
>> +				   rd_attrs->scrub_flags);
>> +	params->rate =
>FIELD_GET(CXL_MEMDEV_PS_CUR_SCRUB_CYCLE_MASK,
>> +				 rd_attrs->scrub_cycle);
>> +	params->min_rate =
>FIELD_GET(CXL_MEMDEV_PS_MIN_SCRUB_CYCLE_MASK,
>> +				      rd_attrs->scrub_cycle);
>> +
>> +	return 0;
>> +}
>> +
>> +static int cxl_mem_ps_set_attrs(struct device *dev, struct
>cxl_memdev_ps_params *params,
>> +				enum cxl_scrub_param param_type)
>> +{
>> +	struct cxl_memdev_ps_wr_attrs wr_attrs;
>> +	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
>> +	struct cxl_dev_state *cxlds = cxlmd->cxlds;
>> +	struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
>> +	struct cxl_memdev_ps_params rd_params;
>> +	int ret;
>> +
>> +	ret = cxl_mem_ps_get_attrs(dev, &rd_params);
>> +	if (ret) {
>> +		dev_err(dev, "Get cxlmemdev patrol scrub params failed
>ret=%d\n",
>> +			ret);
>> +		return ret;
>> +	}
>> +
>> +	switch (param_type) {
>> +	case cxl_ps_param_enable:
>> +		wr_attrs.scrub_flags =
>FIELD_PREP(CXL_MEMDEV_PS_FLAG_ENABLED_MASK,
>> +						   params->enable);
>> +		wr_attrs.scrub_cycle_hr =
>FIELD_PREP(CXL_MEMDEV_PS_CUR_SCRUB_CYCLE_MASK,
>> +						      rd_params.rate);
>> +		break;
>> +	case cxl_ps_param_rate:
>> +		if (params->rate < rd_params.min_rate) {
>> +			dev_err(dev, "Invalid CXL patrol scrub cycle(%d) to
>set\n",
>> +				params->rate);
>> +			dev_err(dev, "Minimum supported CXL patrol scrub
>cycle in hour %d\n",
>> +			       params->min_rate);
>> +			return -EINVAL;
>> +		}
>> +		wr_attrs.scrub_cycle_hr =
>FIELD_PREP(CXL_MEMDEV_PS_CUR_SCRUB_CYCLE_MASK,
>> +						     params->rate);
>> +		wr_attrs.scrub_flags =
>FIELD_PREP(CXL_MEMDEV_PS_FLAG_ENABLED_MASK,
>> +						  rd_params.enable);
>> +		break;
>> +	}
>> +
>> +	ret = cxl_set_feature(mds, cxl_patrol_scrub_uuid,
>CXL_MEMDEV_PS_SET_FEAT_VERSION,
>> +			      &wr_attrs, sizeof(wr_attrs),
>> +
>CXL_SET_FEAT_FLAG_DATA_SAVED_ACROSS_RESET);
>> +	if (ret)
>> +		dev_err(dev, "CXL patrol scrub set feature failed ret=%d\n",
>> +			ret);
>> +
>> +	return ret;
>> +}
>> +
>> +static int cxl_patrol_scrub_get_enabled_bg(struct device *dev, bool
>> +*enabled) {
>> +	struct cxl_memdev_ps_params params;
>> +	int ret;
>> +
>> +	ret = cxl_mem_ps_get_attrs(dev->parent, &params);
>> +	if (ret)
>> +		return ret;
>> +
>> +	*enabled = params.enable;
>> +
>> +	return 0;
>> +}
>> +
>> +static int cxl_patrol_scrub_set_enabled_bg(struct device *dev, bool
>> +enable) {
>> +	struct cxl_memdev_ps_params params = {
>> +		.enable = enable,
>> +	};
>> +
>> +	return cxl_mem_ps_set_attrs(dev->parent, &params,
>> +cxl_ps_param_enable); }
>> +
>> +static int cxl_patrol_scrub_get_name(struct device *dev, char *name)
>> +{
>> +	struct cxl_memdev *cxlmd = to_cxl_memdev(dev->parent);
>> +
>> +	return sysfs_emit(name, "%s_%s\n", "cxl_patrol_scrub",
>> +			  dev_name(&cxlmd->dev));
>> +}
>> +
>> +static int cxl_patrol_scrub_write_rate(struct device *dev, u64 rate)
>> +{
>> +	struct cxl_memdev_ps_params params = {
>> +		.rate = rate,
>> +	};
>> +
>> +	return cxl_mem_ps_set_attrs(dev->parent, &params,
>> +cxl_ps_param_rate); }
>> +
>> +static int cxl_patrol_scrub_read_rate(struct device *dev, u64 *rate)
>> +{
>> +	struct cxl_memdev_ps_params params;
>> +	int ret;
>> +
>> +	ret = cxl_mem_ps_get_attrs(dev->parent, &params);
>> +	if (ret)
>> +		return ret;
>> +
>> +	*rate = params.rate;
>> +
>> +	return 0;
>> +}
>> +
>> +static int cxl_patrol_scrub_read_rate_avail(struct device *dev, u64
>> +*min, u64 *max) {
>> +	struct cxl_memdev_ps_params params;
>> +	int ret;
>> +
>> +	ret = cxl_mem_ps_get_attrs(dev->parent, &params);
>> +	if (ret)
>> +		return ret;
>> +	*min = params.min_rate;
>> +	*max = U8_MAX; /* Max set by register size */
>> +
>> +	return 0;
>> +}
>> +
>> +static const struct scrub_ops cxl_ps_scrub_ops = {
>> +	.get_enabled_bg = cxl_patrol_scrub_get_enabled_bg,
>> +	.set_enabled_bg = cxl_patrol_scrub_set_enabled_bg,
>> +	.get_name = cxl_patrol_scrub_get_name,
>> +	.rate_read = cxl_patrol_scrub_read_rate,
>> +	.rate_write = cxl_patrol_scrub_write_rate,
>> +	.rate_avail_range = cxl_patrol_scrub_read_rate_avail, };
>> +
>> +int cxl_mem_patrol_scrub_init(struct cxl_memdev *cxlmd) {
>> +	struct cxl_patrol_scrub_context *cxl_ps_ctx;
>> +	struct cxl_mbox_supp_feat_entry feat_entry;
>> +	struct cxl_memdev_ps_params params;
>> +	struct device *cxl_scrub_dev;
>> +	int ret;
>> +
>> +	ret = cxl_mem_get_supported_feature_entry(cxlmd,
>&cxl_patrol_scrub_uuid,
>> +						  &feat_entry);
>> +	if (ret < 0)
>> +		return ret;
>> +
>> +	if (!(feat_entry.attr_flags & CXL_FEAT_ENTRY_FLAG_CHANGABLE))
>> +		return -EOPNOTSUPP;
>> +
>> +	ret = cxl_mem_ps_get_attrs(&cxlmd->dev, &params);
>> +	if (ret)
>> +		return dev_err_probe(&cxlmd->dev, ret,
>> +				     "Get CXL patrol scrub params failed\n");
>> +
>> +	cxl_ps_ctx = devm_kzalloc(&cxlmd->dev, sizeof(*cxl_ps_ctx),
>GFP_KERNEL);
>> +	if (!cxl_ps_ctx)
>> +		return -ENOMEM;
>> +
>> +	*cxl_ps_ctx = (struct cxl_patrol_scrub_context) {
>> +		.get_feat_size = feat_entry.get_size,
>> +		.set_feat_size = feat_entry.set_size,
>> +		.scrub_cycle_changeable =  params.scrub_cycle_changeable,
>> +	};
>> +
>> +	cxl_scrub_dev = devm_scrub_device_register(&cxlmd->dev, cxl_ps_ctx,
>> +						   &cxl_ps_scrub_ops);
>> +	if (IS_ERR(cxl_scrub_dev))
>> +		return PTR_ERR(cxl_scrub_dev);
>> +
>> +	return 0;
>> +}
>> +EXPORT_SYMBOL_NS_GPL(cxl_mem_patrol_scrub_init, CXL);
>> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h index
>> 1c50a3e2eced..f95e39febd73 100644
>> --- a/drivers/cxl/cxlmem.h
>> +++ b/drivers/cxl/cxlmem.h
>> @@ -956,6 +956,14 @@ int cxl_trigger_poison_list(struct cxl_memdev
>> *cxlmd);  int cxl_inject_poison(struct cxl_memdev *cxlmd, u64 dpa);
>> int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa);
>>
>> +/* cxl memory scrub functions */
>> +#ifdef CONFIG_CXL_SCRUB
>> +int cxl_mem_patrol_scrub_init(struct cxl_memdev *cxlmd); #else static
>> +inline int cxl_mem_patrol_scrub_init(struct cxl_memdev *cxlmd) {
>> +return 0; } #endif
>> +
>>  #ifdef CONFIG_CXL_SUSPEND
>>  void cxl_mem_active_inc(void);
>>  void cxl_mem_active_dec(void);
>> diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c index
>> 0c79d9ce877c..399e43463626 100644
>> --- a/drivers/cxl/mem.c
>> +++ b/drivers/cxl/mem.c
>> @@ -117,6 +117,12 @@ static int cxl_mem_probe(struct device *dev)
>>  	if (!cxlds->media_ready)
>>  		return -EBUSY;
>>
>> +	rc = cxl_mem_patrol_scrub_init(cxlmd);
>> +	if (rc) {
>> +		dev_dbg(&cxlmd->dev, "CXL patrol scrub init failed\n");
>> +		return rc;
>> +	}
>
>If the device does not support memory patrol scrub feature, the above function
>will return -EOPNOTSUPP. Since the feature is optional, should we just warn it
>and let it go through?
Feedback from Jonathan was that, if this feature is built in, then should not proceed
if the patrol scrub init failed, though it is an optional feature.
 
>
>Fan
>> +
>>  	/*
>>  	 * Someone is trying to reattach this device after it lost its port
>>  	 * connection (an endpoint port previously registered by this memdev
>> was
>> --
>> 2.34.1
>>
Thanks,
Shiju

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v8 05/10] cxl/memscrub: Add CXL device patrol scrub control feature
  2024-04-29 11:20     ` Shiju Jose
@ 2024-04-29 12:21       ` Jonathan Cameron
  0 siblings, 0 replies; 22+ messages in thread
From: Jonathan Cameron @ 2024-04-29 12:21 UTC (permalink / raw)
  To: Shiju Jose
  Cc: fan, linux-cxl@vger.kernel.org, linux-acpi@vger.kernel.org,
	linux-mm@kvack.org, dan.j.williams@intel.com, dave@stgolabs.net,
	dave.jiang@intel.com, alison.schofield@intel.com,
	vishal.l.verma@intel.com, ira.weiny@intel.com,
	linux-edac@vger.kernel.org, linux-kernel@vger.kernel.org,
	david@redhat.com, Vilas.Sridharan@amd.com, leo.duran@amd.com,
	Yazen.Ghannam@amd.com, rientjes@google.com, jiaqiyan@google.com,
	tony.luck@intel.com, Jon.Grimm@amd.com,
	dave.hansen@linux.intel.com, rafael@kernel.org, lenb@kernel.org,
	naoya.horiguchi@nec.com, james.morse@arm.com,
	jthoughton@google.com, somasundaram.a@hpe.com,
	erdemaktas@google.com, pgonda@google.com, duenwen@google.com,
	mike.malvestuto@intel.com, gthelen@google.com,
	wschwartz@amperecomputing.com, dferguson@amperecomputing.com,
	wbs@os.amperecomputing.com, tanxiaofei, Zengtao (B),
	kangkang.shen@futurewei.com, wanghuiqiang, Linuxarm


> >> diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c index
> >> 0c79d9ce877c..399e43463626 100644
> >> --- a/drivers/cxl/mem.c
> >> +++ b/drivers/cxl/mem.c
> >> @@ -117,6 +117,12 @@ static int cxl_mem_probe(struct device *dev)
> >>  	if (!cxlds->media_ready)
> >>  		return -EBUSY;
> >>
> >> +	rc = cxl_mem_patrol_scrub_init(cxlmd);
> >> +	if (rc) {
> >> +		dev_dbg(&cxlmd->dev, "CXL patrol scrub init failed\n");
> >> +		return rc;
> >> +	}  
> >
> >If the device does not support memory patrol scrub feature, the above function
> >will return -EOPNOTSUPP. Since the feature is optional, should we just warn it
> >and let it go through?  
> Feedback from Jonathan was that, if this feature is built in, then should not proceed
> if the patrol scrub init failed, though it is an optional feature.

Oops. That wasn't my intent.  If the feature is implemented by the hardware and
init fails, then I think we should fail probe.  Or maybe just print a very shouty
message about it being broken.  If the feature is simply not implemented we
should definitely not fail.

Jonathan

>  
> >
> >Fan  
> >> +
> >>  	/*
> >>  	 * Someone is trying to reattach this device after it lost its port
> >>  	 * connection (an endpoint port previously registered by this memdev
> >> was
> >> --
> >> 2.34.1
> >>  
> Thanks,
> Shiju


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2024-04-29 12:21 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-19 16:47 [RFC PATCH v8 00/10] ras: scrub: introduce subsystem + CXL/ACPI-RAS2 drivers shiju.jose
2024-04-19 16:47 ` [RFC PATCH v8 01/10] ras: scrub: Add scrub subsystem shiju.jose
2024-04-24 20:25   ` fan
2024-04-25 10:38     ` Shiju Jose
2024-04-25 10:15   ` Borislav Petkov
2024-04-25 18:11     ` Shiju Jose
2024-04-19 16:47 ` [RFC PATCH v8 02/10] cxl/mbox: Add GET_SUPPORTED_FEATURES mailbox command shiju.jose
2024-04-19 16:47 ` [RFC PATCH v8 03/10] cxl/mbox: Add GET_FEATURE " shiju.jose
2024-04-24 23:19   ` fan
2024-04-25 10:38     ` Shiju Jose
2024-04-19 16:47 ` [RFC PATCH v8 04/10] cxl/mbox: Add SET_FEATURE " shiju.jose
2024-04-25 17:26   ` fan
2024-04-19 16:47 ` [RFC PATCH v8 05/10] cxl/memscrub: Add CXL device patrol scrub control feature shiju.jose
2024-04-26 23:56   ` fan
2024-04-29 11:20     ` Shiju Jose
2024-04-29 12:21       ` Jonathan Cameron
2024-04-19 16:47 ` [RFC PATCH v8 06/10] ACPICA: Add __free() based cleanup function for acpi_put_table shiju.jose
2024-04-19 18:06   ` Jonathan Cameron
2024-04-19 16:47 ` [RFC PATCH v8 07/10] platform: Add __free() based cleanup function for platform_device_put shiju.jose
2024-04-19 16:47 ` [RFC PATCH v8 08/10] ACPI:RAS2: Add ACPI RAS2 driver shiju.jose
2024-04-19 16:47 ` [RFC PATCH v8 09/10] ras: scrub: Add scrub control attributes for ACPI RAS2 shiju.jose
2024-04-19 16:47 ` [RFC PATCH v8 10/10] ras: scrub: ACPI RAS2: Add memory ACPI RAS2 driver shiju.jose

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).