NVDIMM Device and Persistent Memory development
 help / color / mirror / Atom feed
* [ndctl PATCH v3 0/5] Support poison list retrieval
@ 2023-11-17 22:35 alison.schofield
  2023-11-17 22:35 ` [ndctl PATCH v3 1/5] libcxl: add interfaces for GET_POISON_LIST mailbox commands alison.schofield
                   ` (4 more replies)
  0 siblings, 5 replies; 7+ messages in thread
From: alison.schofield @ 2023-11-17 22:35 UTC (permalink / raw
  To: Vishal Verma; +Cc: Alison Schofield, nvdimm, linux-cxl

From: Alison Schofield <alison.schofield@intel.com>

Changes since v2:
- Adjust line break in snprintf (Jonathan, Vishal)
- Replace region|memdev context struct with optional func params (Vishal)
- Include CXL Spec version and update to 3.1 references (Vishal)
- Remove '_poison' descriptor from nested poison objects (Vishal)
- Use existing UTIL_JSON_MEDIA_ERRORS flag (Vishal)
- Replace lengthy if-else with switch-case (Vishal)
- Remove needless fprintf on sysfs fail (Vishal)
- Remove needless jobj inits to NULL (Vishal)
- s/jmedia/jpoison everywhere (Vishal)
- Replace hardcoded memdev w discovered memdev in unit test (Vishal)
- Use test/common define $CXL_TEST_BUS (Vishal)
- Reset rc=1 after setup in unit test (Vishal)
- Add debugfs helpers in unit test (Vishal)
- Syntax fixups in the unit test (Vishal)
- A few minor cleanups in unit test.
- Link to v2:
  https://lore.kernel.org/linux-cxl/cover.1696196382.git.alison.schofield@intel.com/

Changes since v1:
- Replace 'media-error' language with 'poison'.
  At v1 I was spec obsessed and following it's language strictly. Jonathan
  questioned it at the time, and I've come around to simply say poison,
  since that is the language we've all been using for the past year+.
  It also aligns with the inject-poison and clear-poison options that
  have been posted on this list.
- Retrieve poison per region by iterating through the contributing memdevs.
  (The by region trigger was designed out of the driver implementation.)
- Add the HPA and region info to both the by region and by memdev cxl list
  json.
- Applied one review tag to the untouched pid patch. (Jonathan)
- Link to v1:
  https://lore.kernel.org/nvdimm/cover.1668133294.git.alison.schofield@intel.com/


Add the option to include a memory device poison list in cxl list json output.
Examples appended below: by memdev, by region, by memdev and coincidentally
in a region, and no poison found.

Example: By memdev
cxl list -m mem1 --poison -u
{
  "memdev":"mem1",
  "pmem_size":"1024.00 MiB (1073.74 MB)",
  "ram_size":"1024.00 MiB (1073.74 MB)",
  "serial":"0x1",
  "numa_node":1,
  "host":"cxl_mem.1",
  "poison":{
    "nr_records":4,
    "records":[
      {
        "dpa":"0x40000000",
        "dpa_length":64,
        "source":"Injected",
        "flags":""
      },
      {
        "dpa":"0x40001000",
        "dpa_length":64,
        "source":"Injected",
        "flags":""
      },
      {
        "dpa":"0",
        "dpa_length":64,
        "source":"Injected",
        "flags":""
      },
      {
        "dpa":"0x600",
        "dpa_length":64,
        "source":"Injected",
        "flags":""
      }
    ]
  }
}

Example: By region
cxl list -r region5 --poison -u
{
  "region":"region5",
  "resource":"0xf110000000",
  "size":"2.00 GiB (2.15 GB)",
  "type":"pmem",
  "interleave_ways":2,
  "interleave_granularity":4096,
  "decode_state":"commit",
  "poison":{
    "nr_records":2,
    "records":[
      {
        "memdev":"mem1",
        "region":"region5",
        "hpa":"0xf110001000",
        "dpa":"0x40000000",
        "dpa_length":64,
        "source":"Injected",
        "flags":""
      },
      {
        "memdev":"mem0",
        "region":"region5",
        "hpa":"0xf110000000",
        "dpa":"0x40000000",
        "dpa_length":64,
        "source":"Injected",
        "flags":""
      }
    ]
  }
}

Example: By memdev and coincidentally in a region
# cxl list -m mem0 --poison -u
{
  "memdev":"mem0",
  "pmem_size":"1024.00 MiB (1073.74 MB)",
  "ram_size":"1024.00 MiB (1073.74 MB)",
  "serial":"0",
  "numa_node":0,
  "host":"cxl_mem.0",
  "poison":{
    "nr_records":1,
    "records":[
      {
        "region":"region5",
        "hpa":"0xf110000000",
        "dpa":"0x40000000",
        "dpa_length":64,
        "source":"Injected",
        "flags":""
      }
    ]
  }
}

Example: No poison found
cxl list -m mem9 --poison -u
{
  "memdev":"mem9",
  "pmem_size":"1024.00 MiB (1073.74 MB)",
  "ram_size":"1024.00 MiB (1073.74 MB)",
  "serial":"0x9",
  "numa_node":1,
  "host":"cxl_mem.9",
  "poison":{
    "nr_records":0
  }
}

Alison Schofield (5):
  libcxl: add interfaces for GET_POISON_LIST mailbox commands
  cxl: add an optional pid check to event parsing
  cxl/list: collect and parse the poison list records
  cxl/list: add --poison option to cxl list
  cxl/test: add cxl-poison.sh unit test

 Documentation/cxl/cxl-list.txt |  64 +++++++++++
 cxl/event_trace.c              |   5 +
 cxl/event_trace.h              |   1 +
 cxl/filter.h                   |   3 +
 cxl/json.c                     | 201 +++++++++++++++++++++++++++++++++
 cxl/lib/libcxl.c               |  47 ++++++++
 cxl/lib/libcxl.sym             |   6 +
 cxl/libcxl.h                   |   2 +
 cxl/list.c                     |   2 +
 test/cxl-poison.sh             | 135 ++++++++++++++++++++++
 test/meson.build               |   2 +
 11 files changed, 468 insertions(+)
 create mode 100644 test/cxl-poison.sh


base-commit: a871e6153b11fe63780b37cdcb1eb347b296095c
-- 
2.37.3


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [ndctl PATCH v3 1/5] libcxl: add interfaces for GET_POISON_LIST mailbox commands
  2023-11-17 22:35 [ndctl PATCH v3 0/5] Support poison list retrieval alison.schofield
@ 2023-11-17 22:35 ` alison.schofield
  2023-11-17 22:35 ` [ndctl PATCH v3 2/5] cxl: add an optional pid check to event parsing alison.schofield
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: alison.schofield @ 2023-11-17 22:35 UTC (permalink / raw
  To: Vishal Verma; +Cc: Alison Schofield, nvdimm, linux-cxl

From: Alison Schofield <alison.schofield@intel.com>

CXL devices maintain a list of locations that are poisoned or result
in poison if the addresses are accessed by the host.

Per the spec (CXL 3.1 8.2.9.9.4.1), the device returns the Poison
List as a set of  Media Error Records that include the source of the
error, the starting device physical address and length.

Trigger the retrieval of the poison list by writing to the memory
device sysfs attribute: trigger_poison_list. The CXL driver only
offers triggering per memdev, so the trigger by region interface
offered here is a convenience API that triggers a poison list
retrieval for each memdev contributing to a region.

int cxl_memdev_trigger_poison_list(struct cxl_memdev *memdev);
int cxl_region_trigger_poison_list(struct cxl_region *region);

The resulting poison records are logged as kernel trace events
named 'cxl_poison'.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
---
 cxl/lib/libcxl.c   | 47 ++++++++++++++++++++++++++++++++++++++++++++++
 cxl/lib/libcxl.sym |  6 ++++++
 cxl/libcxl.h       |  2 ++
 3 files changed, 55 insertions(+)

diff --git a/cxl/lib/libcxl.c b/cxl/lib/libcxl.c
index af4ca44eae19..cc95c2d7c94a 100644
--- a/cxl/lib/libcxl.c
+++ b/cxl/lib/libcxl.c
@@ -1647,6 +1647,53 @@ CXL_EXPORT int cxl_memdev_disable_invalidate(struct cxl_memdev *memdev)
 	return 0;
 }
 
+CXL_EXPORT int cxl_memdev_trigger_poison_list(struct cxl_memdev *memdev)
+{
+	struct cxl_ctx *ctx = cxl_memdev_get_ctx(memdev);
+	char *path = memdev->dev_buf;
+	int len = memdev->buf_len, rc;
+
+	if (snprintf(path, len, "%s/trigger_poison_list",
+		     memdev->dev_path) >= len) {
+		err(ctx, "%s: buffer too small\n",
+		    cxl_memdev_get_devname(memdev));
+		return -ENXIO;
+	}
+	rc = sysfs_write_attr(ctx, path, "1\n");
+	if (rc < 0) {
+		fprintf(stderr,
+			"%s: Failed write sysfs attr trigger_poison_list\n",
+			cxl_memdev_get_devname(memdev));
+		return rc;
+	}
+	return 0;
+}
+
+CXL_EXPORT int cxl_region_trigger_poison_list(struct cxl_region *region)
+{
+	struct cxl_memdev_mapping *mapping;
+	int rc;
+
+	cxl_mapping_foreach(region, mapping) {
+		struct cxl_decoder *decoder;
+		struct cxl_memdev *memdev;
+
+		decoder = cxl_mapping_get_decoder(mapping);
+		if (!decoder)
+			continue;
+
+		memdev = cxl_decoder_get_memdev(decoder);
+		if (!memdev)
+			continue;
+
+		rc = cxl_memdev_trigger_poison_list(memdev);
+		if (rc)
+			return rc;
+	}
+
+	return 0;
+}
+
 CXL_EXPORT int cxl_memdev_enable(struct cxl_memdev *memdev)
 {
 	struct cxl_ctx *ctx = cxl_memdev_get_ctx(memdev);
diff --git a/cxl/lib/libcxl.sym b/cxl/lib/libcxl.sym
index 8fa1cca3d0d7..277b7e21d6a6 100644
--- a/cxl/lib/libcxl.sym
+++ b/cxl/lib/libcxl.sym
@@ -264,3 +264,9 @@ global:
 	cxl_memdev_update_fw;
 	cxl_memdev_cancel_fw_update;
 } LIBCXL_5;
+
+LIBCXL_7 {
+global:
+	cxl_memdev_trigger_poison_list;
+	cxl_region_trigger_poison_list;
+} LIBCXL_6;
diff --git a/cxl/libcxl.h b/cxl/libcxl.h
index 0f4f4b2648fb..ecdffe36df2c 100644
--- a/cxl/libcxl.h
+++ b/cxl/libcxl.h
@@ -460,6 +460,8 @@ enum cxl_setpartition_mode {
 
 int cxl_cmd_partition_set_mode(struct cxl_cmd *cmd,
 		enum cxl_setpartition_mode mode);
+int cxl_memdev_trigger_poison_list(struct cxl_memdev *memdev);
+int cxl_region_trigger_poison_list(struct cxl_region *region);
 
 #ifdef __cplusplus
 } /* extern "C" */
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [ndctl PATCH v3 2/5] cxl: add an optional pid check to event parsing
  2023-11-17 22:35 [ndctl PATCH v3 0/5] Support poison list retrieval alison.schofield
  2023-11-17 22:35 ` [ndctl PATCH v3 1/5] libcxl: add interfaces for GET_POISON_LIST mailbox commands alison.schofield
@ 2023-11-17 22:35 ` alison.schofield
  2023-11-17 22:35 ` [ndctl PATCH v3 3/5] cxl/list: collect and parse the poison list records alison.schofield
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: alison.schofield @ 2023-11-17 22:35 UTC (permalink / raw
  To: Vishal Verma; +Cc: Alison Schofield, nvdimm, linux-cxl, Jonathan Cameron

From: Alison Schofield <alison.schofield@intel.com>

When parsing CXL events, callers may only be interested in events
that originate from the current process. Introduce an optional
argument to the event trace context: event_pid. When event_pid is
present, only include events with a matching pid in the returned
JSON list. It is not a failure to see other, non matching results.
Simply skip those.

The initial use case for this is device poison listings where
only the poison error records requested by this process are wanted.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 cxl/event_trace.c | 5 +++++
 cxl/event_trace.h | 1 +
 2 files changed, 6 insertions(+)

diff --git a/cxl/event_trace.c b/cxl/event_trace.c
index db8cc85f0b6f..269060898118 100644
--- a/cxl/event_trace.c
+++ b/cxl/event_trace.c
@@ -208,6 +208,11 @@ static int cxl_event_parse(struct tep_event *event, struct tep_record *record,
 			return 0;
 	}
 
+	if (event_ctx->event_pid) {
+		if (event_ctx->event_pid != tep_data_pid(event->tep, record))
+			return 0;
+	}
+
 	if (event_ctx->parse_event)
 		return event_ctx->parse_event(event, record,
 					      &event_ctx->jlist_head);
diff --git a/cxl/event_trace.h b/cxl/event_trace.h
index ec6267202c8b..7f7773b2201f 100644
--- a/cxl/event_trace.h
+++ b/cxl/event_trace.h
@@ -15,6 +15,7 @@ struct event_ctx {
 	const char *system;
 	struct list_head jlist_head;
 	const char *event_name; /* optional */
+	int event_pid; /* optional */
 	int (*parse_event)(struct tep_event *event, struct tep_record *record,
 			   struct list_head *jlist_head); /* optional */
 };
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [ndctl PATCH v3 3/5] cxl/list: collect and parse the poison list records
  2023-11-17 22:35 [ndctl PATCH v3 0/5] Support poison list retrieval alison.schofield
  2023-11-17 22:35 ` [ndctl PATCH v3 1/5] libcxl: add interfaces for GET_POISON_LIST mailbox commands alison.schofield
  2023-11-17 22:35 ` [ndctl PATCH v3 2/5] cxl: add an optional pid check to event parsing alison.schofield
@ 2023-11-17 22:35 ` alison.schofield
  2023-11-17 22:35 ` [ndctl PATCH v3 4/5] cxl/list: add --poison option to cxl list alison.schofield
  2023-11-17 22:35 ` [ndctl PATCH v3 5/5] cxl/test: add cxl-poison.sh unit test alison.schofield
  4 siblings, 0 replies; 7+ messages in thread
From: alison.schofield @ 2023-11-17 22:35 UTC (permalink / raw
  To: Vishal Verma; +Cc: Alison Schofield, nvdimm, linux-cxl

From: Alison Schofield <alison.schofield@intel.com>

Poison list records are logged as events in the kernel tracing
subsystem. To prepare the poison list for cxl list, enable tracing,
trigger the poison list read, and parse the generated cxl_poison
events into a json representation.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
---
 cxl/json.c | 201 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 201 insertions(+)

diff --git a/cxl/json.c b/cxl/json.c
index 7678d02020b6..aaab2e3d9936 100644
--- a/cxl/json.c
+++ b/cxl/json.c
@@ -2,15 +2,19 @@
 // Copyright (C) 2015-2021 Intel Corporation. All rights reserved.
 #include <limits.h>
 #include <util/json.h>
+#include <util/bitmap.h>
 #include <uuid/uuid.h>
 #include <cxl/libcxl.h>
 #include <json-c/json.h>
 #include <json-c/printbuf.h>
 #include <ccan/short_types/short_types.h>
+#include <traceevent/event-parse.h>
+#include <tracefs/tracefs.h>
 
 #include "filter.h"
 #include "json.h"
 #include "../daxctl/json.h"
+#include "event_trace.h"
 
 #define CXL_FW_VERSION_STR_LEN	16
 #define CXL_FW_MAX_SLOTS	4
@@ -571,6 +575,191 @@ err_jobj:
 	return NULL;
 }
 
+/* CXL Spec 3.1 Table 8-140 Media Error Record */
+#define CXL_POISON_SOURCE_UNKNOWN 0
+#define CXL_POISON_SOURCE_EXTERNAL 1
+#define CXL_POISON_SOURCE_INTERNAL 2
+#define CXL_POISON_SOURCE_INJECTED 3
+#define CXL_POISON_SOURCE_VENDOR 7
+
+/* CXL Spec 3.1 Table 8-139 Get Poison List Output Payload */
+#define CXL_POISON_FLAG_MORE BIT(0)
+#define CXL_POISON_FLAG_OVERFLOW BIT(1)
+#define CXL_POISON_FLAG_SCANNING BIT(2)
+
+static struct json_object *
+util_cxl_poison_events_to_json(struct tracefs_instance *inst, bool is_region,
+			       unsigned long flags)
+{
+	struct json_object *jerrors, *jpoison, *jobj = NULL;
+	struct jlist_node *jnode, *next;
+	struct event_ctx ectx = {
+		.event_name = "cxl_poison",
+		.event_pid = getpid(),
+		.system = "cxl",
+	};
+	int rc, count = 0;
+
+	list_head_init(&ectx.jlist_head);
+	rc = cxl_parse_events(inst, &ectx);
+	if (rc < 0) {
+		fprintf(stderr, "Failed to parse events: %d\n", rc);
+		return NULL;
+	}
+	/* Add nr_records:0 to json */
+	if (list_empty(&ectx.jlist_head))
+		goto out;
+
+	jerrors = json_object_new_array();
+	if (!jerrors)
+		return NULL;
+
+	list_for_each_safe(&ectx.jlist_head, jnode, next, list) {
+		struct json_object *jp, *jval;
+		int source, pflags;
+		u64 addr, len;
+
+		jp = json_object_new_object();
+		if (!jp)
+			return NULL;
+
+		if (is_region) {
+			/* Add the memdev name in a by region list */
+			if (json_object_object_get_ex(jnode->jobj, "memdev",
+						      &jval))
+				json_object_object_add(jp, "memdev", jval);
+		}
+
+		/*
+		 * When listing is by memdev, region names and valid HPAs
+		 * will appear if the poison address is part of a region.
+		 * Pick up those valid region names and HPAs but ignore the
+		 * empties and invalids.
+		 */
+
+		/* Only add non NULL region names */
+		if (json_object_object_get_ex(jnode->jobj, "region", &jval)) {
+			if (strlen(json_object_get_string(jval)) != 0)
+				json_object_object_add(jp, "region", jval);
+		}
+		/* Only display valid HPAs */
+		if (json_object_object_get_ex(jnode->jobj, "hpa", &jval)) {
+			addr = json_object_get_uint64(jval);
+			if (addr != ULLONG_MAX) {
+				jobj = util_json_object_hex(addr, flags);
+				json_object_object_add(jp, "hpa", jobj);
+			}
+		}
+		if (json_object_object_get_ex(jnode->jobj, "dpa", &jval)) {
+			addr = json_object_get_int64(jval);
+			jobj = util_json_object_hex(addr, flags);
+			json_object_object_add(jp, "dpa", jobj);
+		}
+		if (json_object_object_get_ex(jnode->jobj, "dpa_length", &jval)) {
+			len = json_object_get_int64(jval);
+			jobj = util_json_object_size(len, flags);
+			json_object_object_add(jp, "dpa_length", jobj);
+		}
+		if (json_object_object_get_ex(jnode->jobj, "source", &jval)) {
+			source = json_object_get_int(jval);
+			switch (source) {
+			case CXL_POISON_SOURCE_UNKNOWN:
+				jobj = json_object_new_string("Unknown");
+				break;
+			case CXL_POISON_SOURCE_EXTERNAL:
+				jobj = json_object_new_string("External");
+				break;
+			case CXL_POISON_SOURCE_INTERNAL:
+				jobj = json_object_new_string("Internal");
+				break;
+			case CXL_POISON_SOURCE_INJECTED:
+				jobj = json_object_new_string("Injected");
+				break;
+			case CXL_POISON_SOURCE_VENDOR:
+				jobj = json_object_new_string("Vendor");
+				break;
+			default:
+				jobj = json_object_new_string("Reserved");
+			}
+			json_object_object_add(jp, "source", jobj);
+		}
+		if (json_object_object_get_ex(jnode->jobj, "flags", &jval)) {
+			char flag_str[32] = { '\0' };
+
+			pflags = json_object_get_int(jval);
+			if (pflags & CXL_POISON_FLAG_MORE)
+				strcat(flag_str, "More,");
+			if (pflags & CXL_POISON_FLAG_OVERFLOW)
+				strcat(flag_str, "Overflow,");
+			if (pflags & CXL_POISON_FLAG_SCANNING)
+				strcat(flag_str, "Scanning,");
+			jobj = json_object_new_string(flag_str);
+			if (jobj)
+				json_object_object_add(jp, "flags", jobj);
+		}
+		if (json_object_object_get_ex(jnode->jobj, "overflow_t", &jval))
+			json_object_object_add(jp, "overflow_time", jval);
+
+		json_object_array_add(jerrors, jp);
+		count++;
+	} /* list_for_each_safe */
+
+out:
+	jpoison = json_object_new_object();
+	if (!jpoison)
+		return NULL;
+
+	/* Always include the count. If count is zero, no records follow. */
+	jobj = json_object_new_int(count);
+	if (jobj)
+		json_object_object_add(jpoison, "nr_records", jobj);
+	if (count)
+		json_object_object_add(jpoison, "records", jerrors);
+
+	return jpoison;
+}
+
+static struct json_object *
+util_cxl_poison_list_to_json(struct cxl_region *region,
+			     struct cxl_memdev *memdev,
+			     unsigned long flags)
+{
+	struct json_object *jpoison = NULL;
+	struct tracefs_instance *inst;
+	int rc;
+
+	inst = tracefs_instance_create("cxl list");
+	if (!inst) {
+		fprintf(stderr, "tracefs_instance_create() failed\n");
+		return NULL;
+	}
+
+	rc = cxl_event_tracing_enable(inst, "cxl", "cxl_poison");
+	if (rc < 0) {
+		fprintf(stderr, "Failed to enable trace: %d\n", rc);
+		goto err_free;
+	}
+
+	if (region)
+		rc = cxl_region_trigger_poison_list(region);
+	else
+		rc = cxl_memdev_trigger_poison_list(memdev);
+	if (rc)
+		goto err_free;
+
+	rc = cxl_event_tracing_disable(inst);
+	if (rc < 0) {
+		fprintf(stderr, "Failed to disable trace: %d\n", rc);
+		goto err_free;
+	}
+
+	jpoison = util_cxl_poison_events_to_json(inst, region ? true : false,
+						 flags);
+err_free:
+	tracefs_instance_free(inst);
+	return jpoison;
+}
+
 struct json_object *util_cxl_memdev_to_json(struct cxl_memdev *memdev,
 		unsigned long flags)
 {
@@ -649,6 +838,12 @@ struct json_object *util_cxl_memdev_to_json(struct cxl_memdev *memdev,
 			json_object_object_add(jdev, "firmware", jobj);
 	}
 
+	if (flags & UTIL_JSON_MEDIA_ERRORS) {
+		jobj = util_cxl_poison_list_to_json(NULL, memdev, flags);
+		if (jobj)
+			json_object_object_add(jdev, "poison", jobj);
+	}
+
 	json_object_set_userdata(jdev, memdev, NULL);
 	return jdev;
 }
@@ -987,6 +1182,12 @@ struct json_object *util_cxl_region_to_json(struct cxl_region *region,
 			json_object_object_add(jregion, "state", jobj);
 	}
 
+	if (flags & UTIL_JSON_MEDIA_ERRORS) {
+		jobj = util_cxl_poison_list_to_json(region, NULL, flags);
+		if (jobj)
+			json_object_object_add(jregion, "poison", jobj);
+	}
+
 	util_cxl_mappings_append_json(jregion, region, flags);
 
 	if (flags & UTIL_JSON_DAX) {
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [ndctl PATCH v3 4/5] cxl/list: add --poison option to cxl list
  2023-11-17 22:35 [ndctl PATCH v3 0/5] Support poison list retrieval alison.schofield
                   ` (2 preceding siblings ...)
  2023-11-17 22:35 ` [ndctl PATCH v3 3/5] cxl/list: collect and parse the poison list records alison.schofield
@ 2023-11-17 22:35 ` alison.schofield
  2023-11-17 22:35 ` [ndctl PATCH v3 5/5] cxl/test: add cxl-poison.sh unit test alison.schofield
  4 siblings, 0 replies; 7+ messages in thread
From: alison.schofield @ 2023-11-17 22:35 UTC (permalink / raw
  To: Vishal Verma; +Cc: Alison Schofield, nvdimm, linux-cxl

From: Alison Schofield <alison.schofield@intel.com>

The --poison option to 'cxl list' retrieves poison lists from
memory devices supporting the capability and displays the
returned poison records in the cxl list json. This option can
apply to memdevs or regions.

Example usage in the Documentation/cxl/cxl-list.txt update.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
---
 Documentation/cxl/cxl-list.txt | 64 ++++++++++++++++++++++++++++++++++
 cxl/filter.h                   |  3 ++
 cxl/list.c                     |  2 ++
 3 files changed, 69 insertions(+)

diff --git a/Documentation/cxl/cxl-list.txt b/Documentation/cxl/cxl-list.txt
index 838de4086678..a4110fff261d 100644
--- a/Documentation/cxl/cxl-list.txt
+++ b/Documentation/cxl/cxl-list.txt
@@ -415,6 +415,70 @@ OPTIONS
 --region::
 	Specify CXL region device name(s), or device id(s), to filter the listing.
 
+-L::
+--poison::
+	Include poison information. The poison list is retrieved from the
+	device(s) and poison records are added to the listing. Apply this
+	option to memdevs and regions where devices support the poison
+	list capability.
+
+----
+# cxl list -m mem11 --poison
+[
+  {
+    "memdev":"mem11",
+    "pmem_size":268435456,
+    "ram_size":0,
+    "serial":0,
+    "host":"0000:37:00.0",
+    "poison":{
+      "nr_records":1,
+      "records":[
+        {
+          "dpa":0,
+          "dpa_length":64,
+          "source":"Internal",
+          "flags":"",
+          "overflow_time":0
+        }
+      ]
+    }
+  }
+]
+# cxl list -r region5 --poison
+[
+  {
+    "region":"region5",
+    "resource":1035623989248,
+    "size":2147483648,
+    "interleave_ways":2,
+    "interleave_granularity":4096,
+    "decode_state":"commit",
+    "poison":{
+      "nr_records":2,
+      "records":[
+        {
+          "memdev":"mem2",
+          "dpa":0,
+          "dpa_length":64,
+          "source":"Internal",
+          "flags":"",
+          "overflow_time":0
+        },
+        {
+          "memdev":"mem5",
+          "dpa":0,
+          "length":512,
+          "source":"Vendor",
+          "flags":"",
+          "overflow_time":0
+        }
+      ]
+    }
+  }
+]
+----
+
 -v::
 --verbose::
 	Increase verbosity of the output. This can be specified
diff --git a/cxl/filter.h b/cxl/filter.h
index 3f65990f835a..1241f72ccf62 100644
--- a/cxl/filter.h
+++ b/cxl/filter.h
@@ -30,6 +30,7 @@ struct cxl_filter_params {
 	bool fw;
 	bool alert_config;
 	bool dax;
+	bool poison;
 	int verbose;
 	struct log_ctx ctx;
 };
@@ -88,6 +89,8 @@ static inline unsigned long cxl_filter_to_flags(struct cxl_filter_params *param)
 		flags |= UTIL_JSON_ALERT_CONFIG;
 	if (param->dax)
 		flags |= UTIL_JSON_DAX | UTIL_JSON_DAX_DEVS;
+	if (param->poison)
+		flags |= UTIL_JSON_MEDIA_ERRORS;
 	return flags;
 }
 
diff --git a/cxl/list.c b/cxl/list.c
index 93ba51ef895c..13fef8569340 100644
--- a/cxl/list.c
+++ b/cxl/list.c
@@ -57,6 +57,8 @@ static const struct option options[] = {
 		    "include memory device firmware information"),
 	OPT_BOOLEAN('A', "alert-config", &param.alert_config,
 		    "include alert configuration information"),
+	OPT_BOOLEAN('L', "poison", &param.poison,
+		    "include poison information "),
 	OPT_INCR('v', "verbose", &param.verbose, "increase output detail"),
 #ifdef ENABLE_DEBUG
 	OPT_BOOLEAN(0, "debug", &debug, "debug list walk"),
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [ndctl PATCH v3 5/5] cxl/test: add cxl-poison.sh unit test
  2023-11-17 22:35 [ndctl PATCH v3 0/5] Support poison list retrieval alison.schofield
                   ` (3 preceding siblings ...)
  2023-11-17 22:35 ` [ndctl PATCH v3 4/5] cxl/list: add --poison option to cxl list alison.schofield
@ 2023-11-17 22:35 ` alison.schofield
  2023-11-17 23:20   ` Verma, Vishal L
  4 siblings, 1 reply; 7+ messages in thread
From: alison.schofield @ 2023-11-17 22:35 UTC (permalink / raw
  To: Vishal Verma; +Cc: Alison Schofield, nvdimm, linux-cxl

From: Alison Schofield <alison.schofield@intel.com>

Exercise cxl list, libcxl, and driver pieces of the get poison list
pathway. Inject and clear poison using debugfs and use cxl-cli to
read the poison list by memdev and by region.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
---
 test/cxl-poison.sh | 135 +++++++++++++++++++++++++++++++++++++++++++++
 test/meson.build   |   2 +
 2 files changed, 137 insertions(+)
 create mode 100644 test/cxl-poison.sh

diff --git a/test/cxl-poison.sh b/test/cxl-poison.sh
new file mode 100644
index 000000000000..a562153c8324
--- /dev/null
+++ b/test/cxl-poison.sh
@@ -0,0 +1,135 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (C) 2022 Intel Corporation. All rights reserved.
+
+. $(dirname $0)/common
+
+rc=77
+
+set -ex
+
+trap 'err $LINENO' ERR
+
+check_prereq "jq"
+
+modprobe -r cxl_test
+modprobe cxl_test
+
+rc=1
+
+# THEORY OF OPERATION: Exercise cxl-cli and cxl driver ability to
+# inject, clear, and get the poison list. Do it by memdev and by region.
+# Based on current cxl-test topology.
+
+find_memdev()
+{
+	readarray -t capable_mems < <("$CXL" list -b "$CXL_TEST_BUS" -M |
+		jq -r ".[] | select(.pmem_size != null) |
+	       	select(.ram_size != null) | .memdev")
+
+	if [ ${#capable_mems[@]} == 0 ]; then
+		echo "no memdevs found for test"
+		err "$LINENO"
+	fi
+
+	memdev=${capable_mems[0]}
+}
+
+setup_x2_region()
+{
+        # Find an x2 decoder
+        decoder=$($CXL list -b "$CXL_TEST_BUS" -D -d root | jq -r ".[] |
+          select(.pmem_capable == true) |
+          select(.nr_targets == 2) |
+          .decoder")
+
+        # Find a memdev for each host-bridge interleave position
+        port_dev0=$($CXL list -T -d $decoder | jq -r ".[] |
+            .targets | .[] | select(.position == 0) | .target")
+        port_dev1=$($CXL list -T -d $decoder | jq -r ".[] |
+            .targets | .[] | select(.position == 1) | .target")
+        mem0=$($CXL list -M -p $port_dev0 | jq -r ".[0].memdev")
+        mem1=$($CXL list -M -p $port_dev1 | jq -r ".[0].memdev")
+        memdevs="$mem0 $mem1"
+}
+
+create_region()
+{
+	setup_x2_region
+	region=$($CXL create-region -d $decoder -m $memdevs | jq -r ".region")
+	if [[ ! $region ]]; then
+		echo "create-region failed for $decoder"
+		err "$LINENO"
+	fi
+}
+
+# When cxl-cli support for inject and clear arrives, replace
+# the writes to /sys/kernel/debug with the new cxl commands.
+
+inject_poison_sysfs()
+{
+	memdev="$1"
+	addr="$2"
+
+	echo "$addr" > /sys/kernel/debug/cxl/"$memdev"/inject_poison
+}
+
+clear_poison_sysfs()
+{
+	memdev="$1"
+	addr="$2"
+
+	echo "$addr" > /sys/kernel/debug/cxl/"$memdev"/clear_poison
+}
+
+find_media_errors()
+{
+	local json="$1"
+
+	nr="$(jq -r ".nr_records" <<< "$json")"
+	if [[ $nr != $NR_ERRS ]]; then
+		echo "$mem: $NR_ERRS poison records expected, $nr found"
+		err "$LINENO"
+	fi
+}
+
+# Turn tracing on. Note that 'cxl list --poison' does toggle the tracing.
+# Turning it on here allows the test user to also view inject and clear
+# trace events.
+echo 1 > /sys/kernel/tracing/events/cxl/cxl_poison/enable
+
+# Poison by memdev
+# Inject then clear into cxl_test known pmem and ram partitions
+find_memdev
+inject_poison_sysfs "$memdev" "0x40000000"
+inject_poison_sysfs "$memdev" "0x40001000"
+inject_poison_sysfs "$memdev" "0x600"
+inject_poison_sysfs "$memdev" "0x0"
+NR_ERRS=4
+json=$("$CXL" list -m "$memdev" --poison | jq -r '.[].poison')
+find_media_errors "$json"
+clear_poison_sysfs "$memdev" "0x40000000"
+clear_poison_sysfs "$memdev" "0x40001000"
+clear_poison_sysfs "$memdev" "0x600"
+clear_poison_sysfs "$memdev" "0x0"
+NR_ERRS=0
+json=$("$CXL" list -m "$memdev" --poison | jq -r '.[].poison')
+find_media_errors "$json"
+
+# Poison by region
+# Inject then clear into cxl_test known pmem dpa mappings
+create_region
+inject_poison_sysfs "$mem0" "0x40000000"
+inject_poison_sysfs "$mem1" "0x40000000"
+NR_ERRS=2
+json=$("$CXL" list -r "$region" --poison | jq -r '.[].poison')
+find_media_errors "$json"
+clear_poison_sysfs "$mem0" "0x40000000"
+clear_poison_sysfs "$mem1" "0x40000000"
+NR_ERRS=0
+json=$("$CXL" list -r "$region" --poison | jq -r '.[].poison')
+find_media_errors "$json"
+
+check_dmesg "$LINENO"
+
+modprobe -r cxl-test
diff --git a/test/meson.build b/test/meson.build
index 224adaf41fcc..2706fa5d633c 100644
--- a/test/meson.build
+++ b/test/meson.build
@@ -157,6 +157,7 @@ cxl_create_region = find_program('cxl-create-region.sh')
 cxl_xor_region = find_program('cxl-xor-region.sh')
 cxl_update_firmware = find_program('cxl-update-firmware.sh')
 cxl_events = find_program('cxl-events.sh')
+cxl_poison = find_program('cxl-poison.sh')
 
 tests = [
   [ 'libndctl',               libndctl,		  'ndctl' ],
@@ -186,6 +187,7 @@ tests = [
   [ 'cxl-create-region.sh',   cxl_create_region,  'cxl'   ],
   [ 'cxl-xor-region.sh',      cxl_xor_region,     'cxl'   ],
   [ 'cxl-events.sh',          cxl_events,         'cxl'   ],
+  [ 'cxl-poison.sh',          cxl_poison,         'cxl'   ],
 ]
 
 if get_option('destructive').enabled()
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [ndctl PATCH v3 5/5] cxl/test: add cxl-poison.sh unit test
  2023-11-17 22:35 ` [ndctl PATCH v3 5/5] cxl/test: add cxl-poison.sh unit test alison.schofield
@ 2023-11-17 23:20   ` Verma, Vishal L
  0 siblings, 0 replies; 7+ messages in thread
From: Verma, Vishal L @ 2023-11-17 23:20 UTC (permalink / raw
  To: Schofield, Alison; +Cc: linux-cxl@vger.kernel.org, nvdimm@lists.linux.dev

On Fri, 2023-11-17 at 14:35 -0800, alison.schofield@intel.com wrote:
> From: Alison Schofield <alison.schofield@intel.com>
> 
[..]

Rest of the series is looking good, just a few minor things below.

> 
> +
> +find_media_errors()
> +{
> +       local json="$1"
> +
> +       nr="$(jq -r ".nr_records" <<< "$json")"
> +       if [[ $nr != $NR_ERRS ]]; then

Minor shellcheck complaint, the right hand side of a [[ ]] check should
be quoted, so [[ $nr != "$NR_ERRS" ]]

> +               echo "$mem: $NR_ERRS poison records expected, $nr found"

$mem is never set, maybe it needs to be extracted from the json above?

> +               err "$LINENO"
> +       fi
> +}
> +
> +# Turn tracing on. Note that 'cxl list --poison' does toggle the tracing.
> +# Turning it on here allows the test user to also view inject and clear
> +# trace events.
> +echo 1 > /sys/kernel/tracing/events/cxl/cxl_poison/enable
> +
> +# Poison by memdev
> +# Inject then clear into cxl_test known pmem and ram partitions
> +find_memdev
> +inject_poison_sysfs "$memdev" "0x40000000"
> +inject_poison_sysfs "$memdev" "0x40001000"
> +inject_poison_sysfs "$memdev" "0x600"
> +inject_poison_sysfs "$memdev" "0x0"
> +NR_ERRS=4
> +json=$("$CXL" list -m "$memdev" --poison | jq -r '.[].poison')
> +find_media_errors "$json"

Instead of setting NR_ERRS 'globally', just pass it to the
find_media_errors function as well alongside $json, and maybe rename it
to validate_nr_records() or something. More generaly, no need to
capitalize something like NR_ERRS - all caps is usually only for
variables coming from the env.

> +clear_poison_sysfs "$memdev" "0x40000000"
> +clear_poison_sysfs "$memdev" "0x40001000"
> +clear_poison_sysfs "$memdev" "0x600"
> +clear_poison_sysfs "$memdev" "0x0"
> +NR_ERRS=0
> +json=$("$CXL" list -m "$memdev" --poison | jq -r '.[].poison')

Fairly minor but shellcheck complains about quoting all the "$()"
command substitutions.

> +find_media_errors "$json"
> +
> +# Poison by region
> +# Inject then clear into cxl_test known pmem dpa mappings
> +create_region
> +inject_poison_sysfs "$mem0" "0x40000000"
> +inject_poison_sysfs "$mem1" "0x40000000"
> +NR_ERRS=2
> +json=$("$CXL" list -r "$region" --poison | jq -r '.[].poison')
> +find_media_errors "$json"
> +clear_poison_sysfs "$mem0" "0x40000000"
> +clear_poison_sysfs "$mem1" "0x40000000"
> +NR_ERRS=0
> +json=$("$CXL" list -r "$region" --poison | jq -r '.[].poison')
> +find_media_errors "$json"
> +
> +check_dmesg "$LINENO"
> +
> +modprobe -r cxl-test
> diff --git a/test/meson.build b/test/meson.build
> index 224adaf41fcc..2706fa5d633c 100644
> --- a/test/meson.build
> +++ b/test/meson.build
> @@ -157,6 +157,7 @@ cxl_create_region = find_program('cxl-create-region.sh')
>  cxl_xor_region = find_program('cxl-xor-region.sh')
>  cxl_update_firmware = find_program('cxl-update-firmware.sh')
>  cxl_events = find_program('cxl-events.sh')
> +cxl_poison = find_program('cxl-poison.sh')
>  
>  tests = [
>    [ 'libndctl',               libndctl,                  'ndctl' ],
> @@ -186,6 +187,7 @@ tests = [
>    [ 'cxl-create-region.sh',   cxl_create_region,  'cxl'   ],
>    [ 'cxl-xor-region.sh',      cxl_xor_region,     'cxl'   ],
>    [ 'cxl-events.sh',          cxl_events,         'cxl'   ],
> +  [ 'cxl-poison.sh',          cxl_poison,         'cxl'   ],
>  ]
>  
>  if get_option('destructive').enabled()


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-11-17 23:21 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-17 22:35 [ndctl PATCH v3 0/5] Support poison list retrieval alison.schofield
2023-11-17 22:35 ` [ndctl PATCH v3 1/5] libcxl: add interfaces for GET_POISON_LIST mailbox commands alison.schofield
2023-11-17 22:35 ` [ndctl PATCH v3 2/5] cxl: add an optional pid check to event parsing alison.schofield
2023-11-17 22:35 ` [ndctl PATCH v3 3/5] cxl/list: collect and parse the poison list records alison.schofield
2023-11-17 22:35 ` [ndctl PATCH v3 4/5] cxl/list: add --poison option to cxl list alison.schofield
2023-11-17 22:35 ` [ndctl PATCH v3 5/5] cxl/test: add cxl-poison.sh unit test alison.schofield
2023-11-17 23:20   ` Verma, Vishal L

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).