All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] XOR Math Fixups: translation & position
@ 2024-04-26 19:51 alison.schofield
  2024-04-26 19:51 ` [PATCH 1/3] cxl/acpi: Restore XOR'd position bits during address translation alison.schofield
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: alison.schofield @ 2024-04-26 19:51 UTC (permalink / raw)
  To: Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Alison Schofield,
	Vishal Verma, Ira Weiny, Dan Williams
  Cc: linux-cxl

From: Alison Schofield <alison.schofield@intel.com>

Rather than repeat the individual patch commit message content,
let me describe the flow of this set:

Patch 1: cxl/acpi: Restore XOR'd position bits during address translation
The problem fixed in this patch, bad HPA translations with XOR math,
came to my attention recently. Patch 1 can stand alone, but since that
discovery also shed light on how to repair an issue with calculating
positions in interleave sets (Patch 2,3) they are presented together.

Patch 2 & Patch 3 are paired. Patch 2 presents the new method for
verifying a target position in the list and Patch 3 removes the
old method. These 2 could be squashed.

FYI - the reason I don't present the code removal first is because
I think it is easier to read the diff if I leave in the old root
decoder call back setup for calc_hb, insert the new call back along
the same path, and then rip out the defunct calc_hb. That's the
way I created the patchset and it may be an easier way for reviewers
to follow along with the root decoder callback setup.


Alison Schofield (3):
  cxl/acpi: Restore XOR'd position bits during address translation
  cxl/region: Verify target positions using the ordered target list
  cxl: Remove defunct code calculating host bridge target positions

 drivers/cxl/acpi.c        | 76 ++++++++++++++++-----------------------
 drivers/cxl/core/port.c   | 21 ++---------
 drivers/cxl/core/region.c |  9 ++++-
 drivers/cxl/core/trace.c  |  5 +++
 drivers/cxl/cxl.h         | 10 +++---
 5 files changed, 51 insertions(+), 70 deletions(-)


base-commit: 4cece764965020c22cff7665b18a012006359095
-- 
2.37.3


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/3] cxl/acpi: Restore XOR'd position bits during address translation
  2024-04-26 19:51 [PATCH 0/3] XOR Math Fixups: translation & position alison.schofield
@ 2024-04-26 19:51 ` alison.schofield
  2024-04-27  1:07   ` Dan Williams
  2024-04-26 19:51 ` [PATCH 2/3] cxl/region: Verify target positions using the ordered target list alison.schofield
  2024-04-26 19:51 ` [PATCH 3/3] cxl: Remove defunct code calculating host bridge target positions alison.schofield
  2 siblings, 1 reply; 10+ messages in thread
From: alison.schofield @ 2024-04-26 19:51 UTC (permalink / raw)
  To: Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Alison Schofield,
	Vishal Verma, Ira Weiny, Dan Williams
  Cc: linux-cxl

From: Alison Schofield <alison.schofield@intel.com>

When a CXL region is created in a CXL Window (CFMWS) that uses XOR
interleave arithmetic XOR maps are applied during the HPA->DPA
translation. The XOR function changes the interleave selector
bit (aka position bit) in the HPA thereby varying which host bridge
services an HPA. The purpose is to minimize hot spots thereby
improving performance.

When a device reports a DPA in events such as poison, general_media,
and dram, the driver translates that DPA back to an HPA. Presently,
the CXL driver translation only considers the modulo position and
will report the wrong HPA for XOR configured CFMWS's.

Add a helper function that restores the XOR'd bits during DPA->HPA
address translation. Plumb a root decoder callback to the new helper
when XOR interleave arithmetic is in use. For MODULO arithmetic, just
let the callback be NULL - as in no extra work required.

Fixes: 28a3ae4ff66c ("cxl/trace: Add an HPA to cxl_poison trace events")
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
---
 drivers/cxl/acpi.c       | 49 +++++++++++++++++++++++++++++++++++++---
 drivers/cxl/core/port.c  |  5 +++-
 drivers/cxl/core/trace.c |  5 ++++
 drivers/cxl/cxl.h        |  6 ++++-
 4 files changed, 60 insertions(+), 5 deletions(-)

diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
index af5cb818f84d..519e933b5a4b 100644
--- a/drivers/cxl/acpi.c
+++ b/drivers/cxl/acpi.c
@@ -74,6 +74,44 @@ static struct cxl_dport *cxl_hb_xor(struct cxl_root_decoder *cxlrd, int pos)
 	return cxlrd->cxlsd.target[n];
 }
 
+static u64 restore_xor_pos(u64 hpa, u64 map)
+{
+	int restore_value, restore_pos = 0;
+
+	/*
+	 * Restore the position bit to its value before the
+	 * xormap was applied at HPA->DPA translation.
+	 *
+	 * restore_pos is the lowest set bit in the map
+	 * restore_value is the XORALLBITS in (hpa AND map)
+	 */
+
+	while ((map & (1ULL << restore_pos)) == 0)
+		restore_pos++;
+
+	restore_value = (hweight64(hpa & map) & 1);
+	if (restore_value)
+		hpa |= (1ULL << restore_pos);
+	else
+		hpa &= ~(1ULL << restore_pos);
+
+	return hpa;
+}
+
+static u64 cxl_xor_trans(struct cxl_root_decoder *cxlrd, u64 hpa, int iw)
+{
+	struct cxl_cxims_data *cximsd = cxlrd->platform_data;
+
+	/* No xormaps for ways of 1 or 3 */
+	if (iw == 1 || iw == 3)
+		return hpa;
+
+	for (int i = 0; i < cximsd->nr_maps; i++)
+		hpa = restore_xor_pos(hpa, cximsd->xormaps[i]);
+
+	return hpa;
+}
+
 struct cxl_cxims_context {
 	struct device *dev;
 	struct cxl_root_decoder *cxlrd;
@@ -325,6 +363,7 @@ static int __cxl_parse_cfmws(struct acpi_cedt_cfmws *cfmws,
 	struct cxl_cxims_context cxims_ctx;
 	struct cxl_root_decoder *cxlrd;
 	struct device *dev = ctx->dev;
+	cxl_addr_trans_fn addr_trans;
 	cxl_calc_hb_fn cxl_calc_hb;
 	struct cxl_decoder *cxld;
 	unsigned int ways, i, ig;
@@ -365,12 +404,16 @@ static int __cxl_parse_cfmws(struct acpi_cedt_cfmws *cfmws,
 	if (rc)
 		goto err_insert;
 
-	if (cfmws->interleave_arithmetic == ACPI_CEDT_CFMWS_ARITHMETIC_MODULO)
+	if (cfmws->interleave_arithmetic == ACPI_CEDT_CFMWS_ARITHMETIC_MODULO) {
 		cxl_calc_hb = cxl_hb_modulo;
-	else
+		addr_trans = NULL;
+
+	} else {
 		cxl_calc_hb = cxl_hb_xor;
+		addr_trans = cxl_xor_trans;
+	}
 
-	cxlrd = cxl_root_decoder_alloc(root_port, ways, cxl_calc_hb);
+	cxlrd = cxl_root_decoder_alloc(root_port, ways, cxl_calc_hb, addr_trans);
 	if (IS_ERR(cxlrd))
 		return PTR_ERR(cxlrd);
 
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 2b0cab556072..cd4f004f5372 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -1808,6 +1808,7 @@ static int cxl_switch_decoder_init(struct cxl_port *port,
  * @port: owning CXL root of this decoder
  * @nr_targets: static number of downstream targets
  * @calc_hb: which host bridge covers the n'th position by granularity
+ * @addr_trans: address translation helper function
  *
  * Return: A new cxl decoder to be registered by cxl_decoder_add(). A
  * 'CXL root' decoder is one that decodes from a top-level / static platform
@@ -1816,7 +1817,8 @@ static int cxl_switch_decoder_init(struct cxl_port *port,
  */
 struct cxl_root_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
 						unsigned int nr_targets,
-						cxl_calc_hb_fn calc_hb)
+						cxl_calc_hb_fn calc_hb,
+						cxl_addr_trans_fn addr_trans)
 {
 	struct cxl_root_decoder *cxlrd;
 	struct cxl_switch_decoder *cxlsd;
@@ -1839,6 +1841,7 @@ struct cxl_root_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
 	}
 
 	cxlrd->calc_hb = calc_hb;
+	cxlrd->addr_trans = addr_trans;
 	mutex_init(&cxlrd->range_lock);
 
 	cxld = &cxlsd->cxld;
diff --git a/drivers/cxl/core/trace.c b/drivers/cxl/core/trace.c
index d0403dc3c8ab..a7ea4a256036 100644
--- a/drivers/cxl/core/trace.c
+++ b/drivers/cxl/core/trace.c
@@ -36,6 +36,7 @@ static bool cxl_is_hpa_in_range(u64 hpa, struct cxl_region *cxlr, int pos)
 static u64 cxl_dpa_to_hpa(u64 dpa,  struct cxl_region *cxlr,
 			  struct cxl_endpoint_decoder *cxled)
 {
+	struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
 	u64 dpa_offset, hpa_offset, bits_upper, mask_upper, hpa;
 	struct cxl_region_params *p = &cxlr->params;
 	int pos = cxled->pos;
@@ -75,6 +76,10 @@ static u64 cxl_dpa_to_hpa(u64 dpa,  struct cxl_region *cxlr,
 	/* Apply the hpa_offset to the region base address */
 	hpa = hpa_offset + p->res->start;
 
+	/* An addr_trans helper is defined for XOR math */
+	if (cxlrd->addr_trans)
+		hpa = cxlrd->addr_trans(cxlrd, hpa, p->interleave_ways);
+
 	if (!cxl_is_hpa_in_range(hpa, cxlr, cxled->pos))
 		return ULLONG_MAX;
 
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 534e25e2f0a4..f0c3bd377259 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -432,12 +432,14 @@ struct cxl_switch_decoder {
 struct cxl_root_decoder;
 typedef struct cxl_dport *(*cxl_calc_hb_fn)(struct cxl_root_decoder *cxlrd,
 					    int pos);
+typedef u64 (*cxl_addr_trans_fn)(struct cxl_root_decoder *cxlrd, u64 hpa, int ways);
 
 /**
  * struct cxl_root_decoder - Static platform CXL address decoder
  * @res: host / parent resource for region allocations
  * @region_id: region id for next region provisioning event
  * @calc_hb: which host bridge covers the n'th position by granularity
+ * @addr_trans: dpa->hpa address translation helper
  * @platform_data: platform specific configuration data
  * @range_lock: sync region autodiscovery by address range
  * @qos_class: QoS performance class cookie
@@ -447,6 +449,7 @@ struct cxl_root_decoder {
 	struct resource *res;
 	atomic_t region_id;
 	cxl_calc_hb_fn calc_hb;
+	cxl_addr_trans_fn addr_trans;
 	void *platform_data;
 	struct mutex range_lock;
 	int qos_class;
@@ -773,7 +776,8 @@ bool is_switch_decoder(struct device *dev);
 bool is_endpoint_decoder(struct device *dev);
 struct cxl_root_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
 						unsigned int nr_targets,
-						cxl_calc_hb_fn calc_hb);
+						cxl_calc_hb_fn calc_hb,
+						cxl_addr_trans_fn addr_trans);
 struct cxl_dport *cxl_hb_modulo(struct cxl_root_decoder *cxlrd, int pos);
 struct cxl_switch_decoder *cxl_switch_decoder_alloc(struct cxl_port *port,
 						    unsigned int nr_targets);
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/3] cxl/region: Verify target positions using the ordered target list
  2024-04-26 19:51 [PATCH 0/3] XOR Math Fixups: translation & position alison.schofield
  2024-04-26 19:51 ` [PATCH 1/3] cxl/acpi: Restore XOR'd position bits during address translation alison.schofield
@ 2024-04-26 19:51 ` alison.schofield
  2024-04-30 22:59   ` Dan Williams
  2024-04-26 19:51 ` [PATCH 3/3] cxl: Remove defunct code calculating host bridge target positions alison.schofield
  2 siblings, 1 reply; 10+ messages in thread
From: alison.schofield @ 2024-04-26 19:51 UTC (permalink / raw)
  To: Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Alison Schofield,
	Vishal Verma, Ira Weiny, Dan Williams
  Cc: linux-cxl

From: Alison Schofield <alison.schofield@intel.com>

When a root decoder is configured the interleave target list is read
from the BIOS populated CFMWS structure. Per the CXL spec 3.1 Table
9-22 the target list is in interleave order. The CXL driver populates
its decoder target list in the same order and stores it in 'struct
cxl_switch_decoder' field "@target: active ordered target list in
current decoder configuration"

Given the promise of an ordered list, the driver can stop duplicating
the work of BIOS and simply check target positions against the ordered
list during region configuration.

The simplified check against the ordered list is presented here.
A follow-on patch will remove the unused code.

For Modulo arithmetic this is not a fix, only a simplification.
For XOR arithmetic this is a fix for HB IW of 6,12.

Fixes: f9db85bfec0d ("cxl/acpi: Support CXL XOR Interleave Math (CXIMS)")
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
---
 drivers/cxl/core/region.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 5c186e0a39b9..3c20f8364b26 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -1559,10 +1559,17 @@ static int cxl_region_attach_position(struct cxl_region *cxlr,
 				      const struct cxl_dport *dport, int pos)
 {
 	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
+	struct cxl_switch_decoder *cxlsd = &cxlrd->cxlsd;
+	struct cxl_decoder *cxld = &cxlsd->cxld;
+	int iw = cxld->interleave_ways;
 	struct cxl_port *iter;
 	int rc;
 
-	if (cxlrd->calc_hb(cxlrd, pos) != dport) {
+	if (dev_WARN_ONCE(&cxld->dev, iw != cxlsd->nr_targets,
+			  "misconfigured root decoder\n"))
+		return -ENXIO;
+
+	if (dport != cxlrd->cxlsd.target[pos % iw]) {
 		dev_dbg(&cxlr->dev, "%s:%s invalid target position for %s\n",
 			dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
 			dev_name(&cxlrd->cxlsd.cxld.dev));
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/3] cxl: Remove defunct code calculating host bridge target positions
  2024-04-26 19:51 [PATCH 0/3] XOR Math Fixups: translation & position alison.schofield
  2024-04-26 19:51 ` [PATCH 1/3] cxl/acpi: Restore XOR'd position bits during address translation alison.schofield
  2024-04-26 19:51 ` [PATCH 2/3] cxl/region: Verify target positions using the ordered target list alison.schofield
@ 2024-04-26 19:51 ` alison.schofield
  2024-04-30 23:04   ` Dan Williams
  2 siblings, 1 reply; 10+ messages in thread
From: alison.schofield @ 2024-04-26 19:51 UTC (permalink / raw)
  To: Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Alison Schofield,
	Vishal Verma, Ira Weiny, Dan Williams
  Cc: linux-cxl

From: Alison Schofield <alison.schofield@intel.com>

The CXL Spec 3.1 Table 9-22 requires that the BIOS populate the CFMWS
target list in interleave target order. This means that the calculations
the CXL driver added to determine positions when XOR math is in use,
along with the entire XOR vs Modulo call back setup is not needed.

A prior patch added a common method to verify positions.

Remove the now unused code related to the cxl_calc_hb_fn.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
---
 drivers/cxl/acpi.c      | 63 ++---------------------------------------
 drivers/cxl/core/port.c | 18 ------------
 drivers/cxl/cxl.h       |  6 ----
 3 files changed, 3 insertions(+), 84 deletions(-)

diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
index 519e933b5a4b..67ab3cd52ead 100644
--- a/drivers/cxl/acpi.c
+++ b/drivers/cxl/acpi.c
@@ -22,58 +22,6 @@ static const guid_t acpi_cxl_qtg_id_guid =
 	GUID_INIT(0xF365F9A6, 0xA7DE, 0x4071,
 		  0xA6, 0x6A, 0xB4, 0x0C, 0x0B, 0x4F, 0x8E, 0x52);
 
-/*
- * Find a targets entry (n) in the host bridge interleave list.
- * CXL Specification 3.0 Table 9-22
- */
-static int cxl_xor_calc_n(u64 hpa, struct cxl_cxims_data *cximsd, int iw,
-			  int ig)
-{
-	int i = 0, n = 0;
-	u8 eiw;
-
-	/* IW: 2,4,6,8,12,16 begin building 'n' using xormaps */
-	if (iw != 3) {
-		for (i = 0; i < cximsd->nr_maps; i++)
-			n |= (hweight64(hpa & cximsd->xormaps[i]) & 1) << i;
-	}
-	/* IW: 3,6,12 add a modulo calculation to 'n' */
-	if (!is_power_of_2(iw)) {
-		if (ways_to_eiw(iw, &eiw))
-			return -1;
-		hpa &= GENMASK_ULL(51, eiw + ig);
-		n |= do_div(hpa, 3) << i;
-	}
-	return n;
-}
-
-static struct cxl_dport *cxl_hb_xor(struct cxl_root_decoder *cxlrd, int pos)
-{
-	struct cxl_cxims_data *cximsd = cxlrd->platform_data;
-	struct cxl_switch_decoder *cxlsd = &cxlrd->cxlsd;
-	struct cxl_decoder *cxld = &cxlsd->cxld;
-	int ig = cxld->interleave_granularity;
-	int iw = cxld->interleave_ways;
-	int n = 0;
-	u64 hpa;
-
-	if (dev_WARN_ONCE(&cxld->dev,
-			  cxld->interleave_ways != cxlsd->nr_targets,
-			  "misconfigured root decoder\n"))
-		return NULL;
-
-	hpa = cxlrd->res->start + pos * ig;
-
-	/* Entry (n) is 0 for no interleave (iw == 1) */
-	if (iw != 1)
-		n = cxl_xor_calc_n(hpa, cximsd, iw, ig);
-
-	if (n < 0)
-		return NULL;
-
-	return cxlrd->cxlsd.target[n];
-}
-
 static u64 restore_xor_pos(u64 hpa, u64 map)
 {
 	int restore_value, restore_pos = 0;
@@ -364,7 +312,6 @@ static int __cxl_parse_cfmws(struct acpi_cedt_cfmws *cfmws,
 	struct cxl_root_decoder *cxlrd;
 	struct device *dev = ctx->dev;
 	cxl_addr_trans_fn addr_trans;
-	cxl_calc_hb_fn cxl_calc_hb;
 	struct cxl_decoder *cxld;
 	unsigned int ways, i, ig;
 	struct resource *res;
@@ -404,16 +351,12 @@ static int __cxl_parse_cfmws(struct acpi_cedt_cfmws *cfmws,
 	if (rc)
 		goto err_insert;
 
-	if (cfmws->interleave_arithmetic == ACPI_CEDT_CFMWS_ARITHMETIC_MODULO) {
-		cxl_calc_hb = cxl_hb_modulo;
+	if (cfmws->interleave_arithmetic == ACPI_CEDT_CFMWS_ARITHMETIC_MODULO)
 		addr_trans = NULL;
-
-	} else {
-		cxl_calc_hb = cxl_hb_xor;
+	else
 		addr_trans = cxl_xor_trans;
-	}
 
-	cxlrd = cxl_root_decoder_alloc(root_port, ways, cxl_calc_hb, addr_trans);
+	cxlrd = cxl_root_decoder_alloc(root_port, ways, addr_trans);
 	if (IS_ERR(cxlrd))
 		return PTR_ERR(cxlrd);
 
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index cd4f004f5372..93a3a3982a57 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -1733,21 +1733,6 @@ static int decoder_populate_targets(struct cxl_switch_decoder *cxlsd,
 	return 0;
 }
 
-struct cxl_dport *cxl_hb_modulo(struct cxl_root_decoder *cxlrd, int pos)
-{
-	struct cxl_switch_decoder *cxlsd = &cxlrd->cxlsd;
-	struct cxl_decoder *cxld = &cxlsd->cxld;
-	int iw;
-
-	iw = cxld->interleave_ways;
-	if (dev_WARN_ONCE(&cxld->dev, iw != cxlsd->nr_targets,
-			  "misconfigured root decoder\n"))
-		return NULL;
-
-	return cxlrd->cxlsd.target[pos % iw];
-}
-EXPORT_SYMBOL_NS_GPL(cxl_hb_modulo, CXL);
-
 static struct lock_class_key cxl_decoder_key;
 
 /**
@@ -1807,7 +1792,6 @@ static int cxl_switch_decoder_init(struct cxl_port *port,
  * cxl_root_decoder_alloc - Allocate a root level decoder
  * @port: owning CXL root of this decoder
  * @nr_targets: static number of downstream targets
- * @calc_hb: which host bridge covers the n'th position by granularity
  * @addr_trans: address translation helper function
  *
  * Return: A new cxl decoder to be registered by cxl_decoder_add(). A
@@ -1817,7 +1801,6 @@ static int cxl_switch_decoder_init(struct cxl_port *port,
  */
 struct cxl_root_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
 						unsigned int nr_targets,
-						cxl_calc_hb_fn calc_hb,
 						cxl_addr_trans_fn addr_trans)
 {
 	struct cxl_root_decoder *cxlrd;
@@ -1840,7 +1823,6 @@ struct cxl_root_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
 		return ERR_PTR(rc);
 	}
 
-	cxlrd->calc_hb = calc_hb;
 	cxlrd->addr_trans = addr_trans;
 	mutex_init(&cxlrd->range_lock);
 
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index f0c3bd377259..adc9f785f938 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -430,15 +430,12 @@ struct cxl_switch_decoder {
 };
 
 struct cxl_root_decoder;
-typedef struct cxl_dport *(*cxl_calc_hb_fn)(struct cxl_root_decoder *cxlrd,
-					    int pos);
 typedef u64 (*cxl_addr_trans_fn)(struct cxl_root_decoder *cxlrd, u64 hpa, int ways);
 
 /**
  * struct cxl_root_decoder - Static platform CXL address decoder
  * @res: host / parent resource for region allocations
  * @region_id: region id for next region provisioning event
- * @calc_hb: which host bridge covers the n'th position by granularity
  * @addr_trans: dpa->hpa address translation helper
  * @platform_data: platform specific configuration data
  * @range_lock: sync region autodiscovery by address range
@@ -448,7 +445,6 @@ typedef u64 (*cxl_addr_trans_fn)(struct cxl_root_decoder *cxlrd, u64 hpa, int wa
 struct cxl_root_decoder {
 	struct resource *res;
 	atomic_t region_id;
-	cxl_calc_hb_fn calc_hb;
 	cxl_addr_trans_fn addr_trans;
 	void *platform_data;
 	struct mutex range_lock;
@@ -776,9 +772,7 @@ bool is_switch_decoder(struct device *dev);
 bool is_endpoint_decoder(struct device *dev);
 struct cxl_root_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
 						unsigned int nr_targets,
-						cxl_calc_hb_fn calc_hb,
 						cxl_addr_trans_fn addr_trans);
-struct cxl_dport *cxl_hb_modulo(struct cxl_root_decoder *cxlrd, int pos);
 struct cxl_switch_decoder *cxl_switch_decoder_alloc(struct cxl_port *port,
 						    unsigned int nr_targets);
 int cxl_decoder_add(struct cxl_decoder *cxld, int *target_map);
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] cxl/acpi: Restore XOR'd position bits during address translation
  2024-04-26 19:51 ` [PATCH 1/3] cxl/acpi: Restore XOR'd position bits during address translation alison.schofield
@ 2024-04-27  1:07   ` Dan Williams
  2024-05-01  3:41     ` Alison Schofield
  0 siblings, 1 reply; 10+ messages in thread
From: Dan Williams @ 2024-04-27  1:07 UTC (permalink / raw)
  To: alison.schofield, Davidlohr Bueso, Jonathan Cameron, Dave Jiang,
	Vishal Verma, Ira Weiny, Dan Williams
  Cc: linux-cxl

alison.schofield@ wrote:
> From: Alison Schofield <alison.schofield@intel.com>
> 
> When a CXL region is created in a CXL Window (CFMWS) that uses XOR
> interleave arithmetic XOR maps are applied during the HPA->DPA
> translation. The XOR function changes the interleave selector
> bit (aka position bit) in the HPA thereby varying which host bridge
> services an HPA. The purpose is to minimize hot spots thereby
> improving performance.
> 
> When a device reports a DPA in events such as poison, general_media,
> and dram, the driver translates that DPA back to an HPA. Presently,
> the CXL driver translation only considers the modulo position and
> will report the wrong HPA for XOR configured CFMWS's.
> 
> Add a helper function that restores the XOR'd bits during DPA->HPA
> address translation. Plumb a root decoder callback to the new helper
> when XOR interleave arithmetic is in use. For MODULO arithmetic, just
> let the callback be NULL - as in no extra work required.
> 
> Fixes: 28a3ae4ff66c ("cxl/trace: Add an HPA to cxl_poison trace events")
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> ---
>  drivers/cxl/acpi.c       | 49 +++++++++++++++++++++++++++++++++++++---
>  drivers/cxl/core/port.c  |  5 +++-
>  drivers/cxl/core/trace.c |  5 ++++
>  drivers/cxl/cxl.h        |  6 ++++-
>  4 files changed, 60 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> index af5cb818f84d..519e933b5a4b 100644
> --- a/drivers/cxl/acpi.c
> +++ b/drivers/cxl/acpi.c
> @@ -74,6 +74,44 @@ static struct cxl_dport *cxl_hb_xor(struct cxl_root_decoder *cxlrd, int pos)
>  	return cxlrd->cxlsd.target[n];
>  }
>  
> +static u64 restore_xor_pos(u64 hpa, u64 map)
> +{
> +	int restore_value, restore_pos = 0;
> +
> +	/*
> +	 * Restore the position bit to its value before the
> +	 * xormap was applied at HPA->DPA translation.
> +	 *
> +	 * restore_pos is the lowest set bit in the map
> +	 * restore_value is the XORALLBITS in (hpa AND map)

Might be worth finally clarifying why there is no "XOR" operation in
this xor_pos routine, i.e. that XORALLBITS is identical to asking if the
hweight of the (hpa & map) is odd or even.

> +	 */
> +
> +	while ((map & (1ULL << restore_pos)) == 0)
> +		restore_pos++;

This is just open coded ffs()?

> +
> +	restore_value = (hweight64(hpa & map) & 1);
> +	if (restore_value)
> +		hpa |= (1ULL << restore_pos);
> +	else
> +		hpa &= ~(1ULL << restore_pos);

It feels like this conditional mask / set can just be an xor operation?

    hpa ^= ((hweight64(hpa & map) & 1) << restore_pos);

Otherwise I question the & and | operations relative to the HPA bit
position already being 0 or 1.

> +
> +	return hpa;
> +}
> +
> +static u64 cxl_xor_trans(struct cxl_root_decoder *cxlrd, u64 hpa, int iw)

Ok, so the driver has cxl_trace_hpa() and now cxl_xor_trans() and
"addr_trans". Can these all just be called "translate" because "trace"
feels like tracing, "trans" is only saving 4 characters, and "addr" is
redundant as nothing else needs translating besides addresses in CXL
land.

[..]
> diff --git a/drivers/cxl/core/trace.c b/drivers/cxl/core/trace.c
> index d0403dc3c8ab..a7ea4a256036 100644
> --- a/drivers/cxl/core/trace.c
> +++ b/drivers/cxl/core/trace.c
> @@ -36,6 +36,7 @@ static bool cxl_is_hpa_in_range(u64 hpa, struct cxl_region *cxlr, int pos)
>  static u64 cxl_dpa_to_hpa(u64 dpa,  struct cxl_region *cxlr,
>  			  struct cxl_endpoint_decoder *cxled)
>  {
> +	struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
>  	u64 dpa_offset, hpa_offset, bits_upper, mask_upper, hpa;
>  	struct cxl_region_params *p = &cxlr->params;
>  	int pos = cxled->pos;
> @@ -75,6 +76,10 @@ static u64 cxl_dpa_to_hpa(u64 dpa,  struct cxl_region *cxlr,
>  	/* Apply the hpa_offset to the region base address */
>  	hpa = hpa_offset + p->res->start;
>  
> +	/* An addr_trans helper is defined for XOR math */

Rather then calling out XOR math since that is an ACPI'ism I would just
say something like: "root decoder overrides typical modulo decode"

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] cxl/region: Verify target positions using the ordered target list
  2024-04-26 19:51 ` [PATCH 2/3] cxl/region: Verify target positions using the ordered target list alison.schofield
@ 2024-04-30 22:59   ` Dan Williams
  0 siblings, 0 replies; 10+ messages in thread
From: Dan Williams @ 2024-04-30 22:59 UTC (permalink / raw)
  To: alison.schofield, Davidlohr Bueso, Jonathan Cameron, Dave Jiang,
	Vishal Verma, Ira Weiny, Dan Williams
  Cc: linux-cxl

alison.schofield@ wrote:
> From: Alison Schofield <alison.schofield@intel.com>
> 
> When a root decoder is configured the interleave target list is read
> from the BIOS populated CFMWS structure. Per the CXL spec 3.1 Table
> 9-22 the target list is in interleave order. The CXL driver populates
> its decoder target list in the same order and stores it in 'struct
> cxl_switch_decoder' field "@target: active ordered target list in
> current decoder configuration"
> 
> Given the promise of an ordered list, the driver can stop duplicating
> the work of BIOS and simply check target positions against the ordered
> list during region configuration.
> 
> The simplified check against the ordered list is presented here.
> A follow-on patch will remove the unused code.
> 
> For Modulo arithmetic this is not a fix, only a simplification.
> For XOR arithmetic this is a fix for HB IW of 6,12.
> 
> Fixes: f9db85bfec0d ("cxl/acpi: Support CXL XOR Interleave Math (CXIMS)")
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> ---
>  drivers/cxl/core/region.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 5c186e0a39b9..3c20f8364b26 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -1559,10 +1559,17 @@ static int cxl_region_attach_position(struct cxl_region *cxlr,
>  				      const struct cxl_dport *dport, int pos)
>  {
>  	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> +	struct cxl_switch_decoder *cxlsd = &cxlrd->cxlsd;
> +	struct cxl_decoder *cxld = &cxlsd->cxld;
> +	int iw = cxld->interleave_ways;
>  	struct cxl_port *iter;
>  	int rc;
>  
> -	if (cxlrd->calc_hb(cxlrd, pos) != dport) {
> +	if (dev_WARN_ONCE(&cxld->dev, iw != cxlsd->nr_targets,
> +			  "misconfigured root decoder\n"))
> +		return -ENXIO;

This is a nop because root-decoders by definition have all of their
targets covered in the interleave, and the driver passes the CFWMS
interleaves_ways setting directly to the @nr_targets parameter of
cxl_switch_decoder_init().

So drop this warning, which in retrospect was never needed , and it all
gets cleaned up in your next patch.

> +
> +	if (dport != cxlrd->cxlsd.target[pos % iw]) {

Looks ok, but I don't understand why this patch is tagged as a fix?
There should be no end user visible change of this conversion, right?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/3] cxl: Remove defunct code calculating host bridge target positions
  2024-04-26 19:51 ` [PATCH 3/3] cxl: Remove defunct code calculating host bridge target positions alison.schofield
@ 2024-04-30 23:04   ` Dan Williams
  0 siblings, 0 replies; 10+ messages in thread
From: Dan Williams @ 2024-04-30 23:04 UTC (permalink / raw)
  To: alison.schofield, Davidlohr Bueso, Jonathan Cameron, Dave Jiang,
	Vishal Verma, Ira Weiny, Dan Williams
  Cc: linux-cxl

alison.schofield@ wrote:
> From: Alison Schofield <alison.schofield@intel.com>
> 
> The CXL Spec 3.1 Table 9-22 requires that the BIOS populate the CFMWS
> target list in interleave target order. This means that the calculations
> the CXL driver added to determine positions when XOR math is in use,
> along with the entire XOR vs Modulo call back setup is not needed.
> 
> A prior patch added a common method to verify positions.
> 
> Remove the now unused code related to the cxl_calc_hb_fn.
> 
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>

Looks good to me, needs to be reflowed for the the
s/addr_trans/tranlate/ comment on patch1, but otherwise:

Reviewed-by: Dan Williams <dan.j.williams@intel.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] cxl/acpi: Restore XOR'd position bits during address translation
  2024-04-27  1:07   ` Dan Williams
@ 2024-05-01  3:41     ` Alison Schofield
  2024-05-01  5:00       ` Alison Schofield
  2024-05-02  4:34       ` Alison Schofield
  0 siblings, 2 replies; 10+ messages in thread
From: Alison Schofield @ 2024-05-01  3:41 UTC (permalink / raw)
  To: Dan Williams
  Cc: Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Vishal Verma,
	Ira Weiny, linux-cxl

On Fri, Apr 26, 2024 at 06:07:59PM -0700, Dan Williams wrote:
> alison.schofield@ wrote:
> > From: Alison Schofield <alison.schofield@intel.com>
> > 
> > When a CXL region is created in a CXL Window (CFMWS) that uses XOR
> > interleave arithmetic XOR maps are applied during the HPA->DPA
> > translation. The XOR function changes the interleave selector
> > bit (aka position bit) in the HPA thereby varying which host bridge
> > services an HPA. The purpose is to minimize hot spots thereby
> > improving performance.
> > 
> > When a device reports a DPA in events such as poison, general_media,
> > and dram, the driver translates that DPA back to an HPA. Presently,
> > the CXL driver translation only considers the modulo position and
> > will report the wrong HPA for XOR configured CFMWS's.
> > 
> > Add a helper function that restores the XOR'd bits during DPA->HPA
> > address translation. Plumb a root decoder callback to the new helper
> > when XOR interleave arithmetic is in use. For MODULO arithmetic, just
> > let the callback be NULL - as in no extra work required.
> > 
> > Fixes: 28a3ae4ff66c ("cxl/trace: Add an HPA to cxl_poison trace events")
> > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > ---
> >  drivers/cxl/acpi.c       | 49 +++++++++++++++++++++++++++++++++++++---
> >  drivers/cxl/core/port.c  |  5 +++-
> >  drivers/cxl/core/trace.c |  5 ++++
> >  drivers/cxl/cxl.h        |  6 ++++-
> >  4 files changed, 60 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> > index af5cb818f84d..519e933b5a4b 100644
> > --- a/drivers/cxl/acpi.c
> > +++ b/drivers/cxl/acpi.c
> > @@ -74,6 +74,44 @@ static struct cxl_dport *cxl_hb_xor(struct cxl_root_decoder *cxlrd, int pos)
> >  	return cxlrd->cxlsd.target[n];
> >  }
> >  
> > +static u64 restore_xor_pos(u64 hpa, u64 map)
> > +{
> > +	int restore_value, restore_pos = 0;
> > +
> > +	/*
> > +	 * Restore the position bit to its value before the
> > +	 * xormap was applied at HPA->DPA translation.
> > +	 *
> > +	 * restore_pos is the lowest set bit in the map
> > +	 * restore_value is the XORALLBITS in (hpa AND map)
> 
> Might be worth finally clarifying why there is no "XOR" operation in
> this xor_pos routine, i.e. that XORALLBITS is identical to asking if the
> hweight of the (hpa & map) is odd or even.
> 

Thanks for the review Dan!

Well, I would except that a few lines below, you suggest I use
an XOR operand ;) but I know what you mean. Edited in v2.

Actually, I've abandoned this separate function in v2. Since
the LOC have been whittled down, seems useless.

> > +	 */
> > +
> > +	while ((map & (1ULL << restore_pos)) == 0)
> > +		restore_pos++;
> 
> This is just open coded ffs()?
> 
So it is!  Using ffs() in v2.

> > +
> > +	restore_value = (hweight64(hpa & map) & 1);
> > +	if (restore_value)
> > +		hpa |= (1ULL << restore_pos);
> > +	else
> > +		hpa &= ~(1ULL << restore_pos);
> 
> It feels like this conditional mask / set can just be an xor operation?
> 
>     hpa ^= ((hweight64(hpa & map) & 1) << restore_pos);

I've taken the XOR operand piece, but didn't collapse into the
one-liner you suggest.  See what you think in v1 please.

> 
> Otherwise I question the & and | operations relative to the HPA bit
> position already being 0 or 1.
> 

They are gone in next rev, but FWIW here's the truth about those ops:

if restore_value == 1, using |= always sets the bit at restore_pos
like this:
	restore_value  current_value  |= value
	1		0		1
	1		1		1

if restore_value == 0, using &= always clears the bit at restore_pos
like this:
	restore_value  current_value  |= value
	0		0		0
	0		1		0

> > +
> > +	return hpa;
> > +}
> > +
> > +static u64 cxl_xor_trans(struct cxl_root_decoder *cxlrd, u64 hpa, int iw)
> 
> Ok, so the driver has cxl_trace_hpa() and now cxl_xor_trans() and
> "addr_trans". Can these all just be called "translate" because "trace"
> feels like tracing, "trans" is only saving 4 characters, and "addr" is
> redundant as nothing else needs translating besides addresses in CXL
> land.

I think i've got it:

Existing:
cxl_trace_hpa() -> cxl_translate()

Introduced in this set:
cxl_addr_trans_fn -> cxl_translate_fn
addr_trans -> translate
cxl_xor_trans() -> cxl_xor_translate()

> 
> [..]
> > diff --git a/drivers/cxl/core/trace.c b/drivers/cxl/core/trace.c
> > index d0403dc3c8ab..a7ea4a256036 100644
> > --- a/drivers/cxl/core/trace.c
> > +++ b/drivers/cxl/core/trace.c
> > @@ -36,6 +36,7 @@ static bool cxl_is_hpa_in_range(u64 hpa, struct cxl_region *cxlr, int pos)
> >  static u64 cxl_dpa_to_hpa(u64 dpa,  struct cxl_region *cxlr,
> >  			  struct cxl_endpoint_decoder *cxled)
> >  {
> > +	struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
> >  	u64 dpa_offset, hpa_offset, bits_upper, mask_upper, hpa;
> >  	struct cxl_region_params *p = &cxlr->params;
> >  	int pos = cxled->pos;
> > @@ -75,6 +76,10 @@ static u64 cxl_dpa_to_hpa(u64 dpa,  struct cxl_region *cxlr,
> >  	/* Apply the hpa_offset to the region base address */
> >  	hpa = hpa_offset + p->res->start;
> >  
> > +	/* An addr_trans helper is defined for XOR math */
> 
> Rather then calling out XOR math since that is an ACPI'ism I would just
> say something like: "root decoder overrides typical modulo decode"

Got it.

-- Alison

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] cxl/acpi: Restore XOR'd position bits during address translation
  2024-05-01  3:41     ` Alison Schofield
@ 2024-05-01  5:00       ` Alison Schofield
  2024-05-02  4:34       ` Alison Schofield
  1 sibling, 0 replies; 10+ messages in thread
From: Alison Schofield @ 2024-05-01  5:00 UTC (permalink / raw)
  To: Dan Williams
  Cc: Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Vishal Verma,
	Ira Weiny, linux-cxl

On Tue, Apr 30, 2024 at 08:41:19PM -0700, Alison Schofield wrote:
> On Fri, Apr 26, 2024 at 06:07:59PM -0700, Dan Williams wrote:
> > alison.schofield@ wrote:
> > > From: Alison Schofield <alison.schofield@intel.com>
> > > 
> > > When a CXL region is created in a CXL Window (CFMWS) that uses XOR
> > > interleave arithmetic XOR maps are applied during the HPA->DPA
> > > translation. The XOR function changes the interleave selector
> > > bit (aka position bit) in the HPA thereby varying which host bridge
> > > services an HPA. The purpose is to minimize hot spots thereby
> > > improving performance.
> > > 
> > > When a device reports a DPA in events such as poison, general_media,
> > > and dram, the driver translates that DPA back to an HPA. Presently,
> > > the CXL driver translation only considers the modulo position and
> > > will report the wrong HPA for XOR configured CFMWS's.
> > > 
> > > Add a helper function that restores the XOR'd bits during DPA->HPA
> > > address translation. Plumb a root decoder callback to the new helper
> > > when XOR interleave arithmetic is in use. For MODULO arithmetic, just
> > > let the callback be NULL - as in no extra work required.
> > > 
> > > Fixes: 28a3ae4ff66c ("cxl/trace: Add an HPA to cxl_poison trace events")
> > > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > > ---
> > >  drivers/cxl/acpi.c       | 49 +++++++++++++++++++++++++++++++++++++---
> > >  drivers/cxl/core/port.c  |  5 +++-
> > >  drivers/cxl/core/trace.c |  5 ++++
> > >  drivers/cxl/cxl.h        |  6 ++++-
> > >  4 files changed, 60 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> > > index af5cb818f84d..519e933b5a4b 100644
> > > --- a/drivers/cxl/acpi.c
> > > +++ b/drivers/cxl/acpi.c
> > > @@ -74,6 +74,44 @@ static struct cxl_dport *cxl_hb_xor(struct cxl_root_decoder *cxlrd, int pos)
> > >  	return cxlrd->cxlsd.target[n];
> > >  }
> > >  
> > > +static u64 restore_xor_pos(u64 hpa, u64 map)
> > > +{
> > > +	int restore_value, restore_pos = 0;
> > > +
> > > +	/*
> > > +	 * Restore the position bit to its value before the
> > > +	 * xormap was applied at HPA->DPA translation.
> > > +	 *
> > > +	 * restore_pos is the lowest set bit in the map
> > > +	 * restore_value is the XORALLBITS in (hpa AND map)
> > 
> > Might be worth finally clarifying why there is no "XOR" operation in
> > this xor_pos routine, i.e. that XORALLBITS is identical to asking if the
> > hweight of the (hpa & map) is odd or even.
> > 
> 
> Thanks for the review Dan!
> 
> Well, I would except that a few lines below, you suggest I use
> an XOR operand ;) but I know what you mean. Edited in v2.
> 
> Actually, I've abandoned this separate function in v2. Since
> the LOC have been whittled down, seems useless.
> 
> > > +	 */
> > > +
> > > +	while ((map & (1ULL << restore_pos)) == 0)
> > > +		restore_pos++;
> > 
> > This is just open coded ffs()?
> > 
> So it is!  Using ffs() in v2.
> 
> > > +
> > > +	restore_value = (hweight64(hpa & map) & 1);
> > > +	if (restore_value)
> > > +		hpa |= (1ULL << restore_pos);
> > > +	else
> > > +		hpa &= ~(1ULL << restore_pos);
> > 
> > It feels like this conditional mask / set can just be an xor operation?
> > 
> >     hpa ^= ((hweight64(hpa & map) & 1) << restore_pos);
> 
> I've taken the XOR operand piece, but didn't collapse into the
> one-liner you suggest.  See what you think in v1 please.
> 
> > 
> > Otherwise I question the & and | operations relative to the HPA bit
> > position already being 0 or 1.
> > 
> 
> They are gone in next rev, but FWIW here's the truth about those ops:
> 
> if restore_value == 1, using |= always sets the bit at restore_pos
> like this:
> 	restore_value  current_value  |= value
> 	1		0		1
> 	1		1		1
> 
> if restore_value == 0, using &= always clears the bit at restore_pos
> like this:
> 	restore_value  current_value  |= value
                                      ^^ should be &= 
> 	0		0		0
> 	0		1		0
>

I guess my own typo proves ^= is cleaner ;)



> > > +
> > > +	return hpa;
> > > +}
> > > +
> > > +static u64 cxl_xor_trans(struct cxl_root_decoder *cxlrd, u64 hpa, int iw)
> > 
> > Ok, so the driver has cxl_trace_hpa() and now cxl_xor_trans() and
> > "addr_trans". Can these all just be called "translate" because "trace"
> > feels like tracing, "trans" is only saving 4 characters, and "addr" is
> > redundant as nothing else needs translating besides addresses in CXL
> > land.
> 
> I think i've got it:
> 
> Existing:
> cxl_trace_hpa() -> cxl_translate()
> 
> Introduced in this set:
> cxl_addr_trans_fn -> cxl_translate_fn
> addr_trans -> translate
> cxl_xor_trans() -> cxl_xor_translate()
> 
> > 
> > [..]
> > > diff --git a/drivers/cxl/core/trace.c b/drivers/cxl/core/trace.c
> > > index d0403dc3c8ab..a7ea4a256036 100644
> > > --- a/drivers/cxl/core/trace.c
> > > +++ b/drivers/cxl/core/trace.c
> > > @@ -36,6 +36,7 @@ static bool cxl_is_hpa_in_range(u64 hpa, struct cxl_region *cxlr, int pos)
> > >  static u64 cxl_dpa_to_hpa(u64 dpa,  struct cxl_region *cxlr,
> > >  			  struct cxl_endpoint_decoder *cxled)
> > >  {
> > > +	struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
> > >  	u64 dpa_offset, hpa_offset, bits_upper, mask_upper, hpa;
> > >  	struct cxl_region_params *p = &cxlr->params;
> > >  	int pos = cxled->pos;
> > > @@ -75,6 +76,10 @@ static u64 cxl_dpa_to_hpa(u64 dpa,  struct cxl_region *cxlr,
> > >  	/* Apply the hpa_offset to the region base address */
> > >  	hpa = hpa_offset + p->res->start;
> > >  
> > > +	/* An addr_trans helper is defined for XOR math */
> > 
> > Rather then calling out XOR math since that is an ACPI'ism I would just
> > say something like: "root decoder overrides typical modulo decode"
> 
> Got it.
> 
> -- Alison
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] cxl/acpi: Restore XOR'd position bits during address translation
  2024-05-01  3:41     ` Alison Schofield
  2024-05-01  5:00       ` Alison Schofield
@ 2024-05-02  4:34       ` Alison Schofield
  1 sibling, 0 replies; 10+ messages in thread
From: Alison Schofield @ 2024-05-02  4:34 UTC (permalink / raw)
  To: Dan Williams
  Cc: Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Vishal Verma,
	Ira Weiny, linux-cxl

On Tue, Apr 30, 2024 at 08:41:19PM -0700, Alison Schofield wrote:
> On Fri, Apr 26, 2024 at 06:07:59PM -0700, Dan Williams wrote:
> > alison.schofield@ wrote:
> > > From: Alison Schofield <alison.schofield@intel.com>
> > > 
> > > When a CXL region is created in a CXL Window (CFMWS) that uses XOR
> > > interleave arithmetic XOR maps are applied during the HPA->DPA
> > > translation. The XOR function changes the interleave selector
> > > bit (aka position bit) in the HPA thereby varying which host bridge
> > > services an HPA. The purpose is to minimize hot spots thereby
> > > improving performance.
> > > 
> > > When a device reports a DPA in events such as poison, general_media,
> > > and dram, the driver translates that DPA back to an HPA. Presently,
> > > the CXL driver translation only considers the modulo position and
> > > will report the wrong HPA for XOR configured CFMWS's.
> > > 
> > > Add a helper function that restores the XOR'd bits during DPA->HPA
> > > address translation. Plumb a root decoder callback to the new helper
> > > when XOR interleave arithmetic is in use. For MODULO arithmetic, just
> > > let the callback be NULL - as in no extra work required.
> > > 
> > > Fixes: 28a3ae4ff66c ("cxl/trace: Add an HPA to cxl_poison trace events")
> > > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > > ---
> > >  drivers/cxl/acpi.c       | 49 +++++++++++++++++++++++++++++++++++++---
> > >  drivers/cxl/core/port.c  |  5 +++-
> > >  drivers/cxl/core/trace.c |  5 ++++
> > >  drivers/cxl/cxl.h        |  6 ++++-
> > >  4 files changed, 60 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> > > index af5cb818f84d..519e933b5a4b 100644
> > > --- a/drivers/cxl/acpi.c
> > > +++ b/drivers/cxl/acpi.c
> > > @@ -74,6 +74,44 @@ static struct cxl_dport *cxl_hb_xor(struct cxl_root_decoder *cxlrd, int pos)
> > >  	return cxlrd->cxlsd.target[n];
> > >  }
> > >  
> > > +static u64 restore_xor_pos(u64 hpa, u64 map)
> > > +{
> > > +	int restore_value, restore_pos = 0;
> > > +
> > > +	/*
> > > +	 * Restore the position bit to its value before the
> > > +	 * xormap was applied at HPA->DPA translation.
> > > +	 *
> > > +	 * restore_pos is the lowest set bit in the map
> > > +	 * restore_value is the XORALLBITS in (hpa AND map)
> > 
> > Might be worth finally clarifying why there is no "XOR" operation in
> > this xor_pos routine, i.e. that XORALLBITS is identical to asking if the
> > hweight of the (hpa & map) is odd or even.
> > 
> 
> Thanks for the review Dan!
> 
> Well, I would except that a few lines below, you suggest I use
> an XOR operand ;) but I know what you mean. Edited in v2.
> 
> Actually, I've abandoned this separate function in v2. Since
> the LOC have been whittled down, seems useless.
> 
> > > +	 */
> > > +
> > > +	while ((map & (1ULL << restore_pos)) == 0)
> > > +		restore_pos++;
> > 
> > This is just open coded ffs()?
> > 
> So it is!  Using ffs() in v2.
> 
> > > +
> > > +	restore_value = (hweight64(hpa & map) & 1);
> > > +	if (restore_value)
> > > +		hpa |= (1ULL << restore_pos);
> > > +	else
> > > +		hpa &= ~(1ULL << restore_pos);
> > 
> > It feels like this conditional mask / set can just be an xor operation?
> > 
> >     hpa ^= ((hweight64(hpa & map) & 1) << restore_pos);
> 
> I've taken the XOR operand piece, but didn't collapse into the
> one-liner you suggest.  See what you think in v1 please.

I jumped on the Dan-Bandwagon too hastily. The hpa ^= doesn't
work because it toggles the bit. ie It won't clear hpa[restore_pos]
if restore_value is not set, and it needs to.

This works and is more concise than the original if-else:
	hpa = (hpa & ~(1ULL << pos)) | (val << pos);

I've changed a few things around it, but didn't want to leave
this dangling out here for another reviewer to ponder.

> 
> > 
> > Otherwise I question the & and | operations relative to the HPA bit
> > position already being 0 or 1.
> > 
> 
> They are gone in next rev, but FWIW here's the truth about those ops:
> 
> if restore_value == 1, using |= always sets the bit at restore_pos
> like this:
> 	restore_value  current_value  |= value
> 	1		0		1
> 	1		1		1
> 
> if restore_value == 0, using &= always clears the bit at restore_pos
> like this:
> 	restore_value  current_value  |= value
> 	0		0		0
> 	0		1		0
> 
> > > +
> > > +	return hpa;
> > > +}
> > > +
> > > +static u64 cxl_xor_trans(struct cxl_root_decoder *cxlrd, u64 hpa, int iw)
> > 
> > Ok, so the driver has cxl_trace_hpa() and now cxl_xor_trans() and
> > "addr_trans". Can these all just be called "translate" because "trace"
> > feels like tracing, "trans" is only saving 4 characters, and "addr" is
> > redundant as nothing else needs translating besides addresses in CXL
> > land.
> 
> I think i've got it:
> 
> Existing:
> cxl_trace_hpa() -> cxl_translate()
> 
> Introduced in this set:
> cxl_addr_trans_fn -> cxl_translate_fn
> addr_trans -> translate
> cxl_xor_trans() -> cxl_xor_translate()
> 
> > 
> > [..]
> > > diff --git a/drivers/cxl/core/trace.c b/drivers/cxl/core/trace.c
> > > index d0403dc3c8ab..a7ea4a256036 100644
> > > --- a/drivers/cxl/core/trace.c
> > > +++ b/drivers/cxl/core/trace.c
> > > @@ -36,6 +36,7 @@ static bool cxl_is_hpa_in_range(u64 hpa, struct cxl_region *cxlr, int pos)
> > >  static u64 cxl_dpa_to_hpa(u64 dpa,  struct cxl_region *cxlr,
> > >  			  struct cxl_endpoint_decoder *cxled)
> > >  {
> > > +	struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
> > >  	u64 dpa_offset, hpa_offset, bits_upper, mask_upper, hpa;
> > >  	struct cxl_region_params *p = &cxlr->params;
> > >  	int pos = cxled->pos;
> > > @@ -75,6 +76,10 @@ static u64 cxl_dpa_to_hpa(u64 dpa,  struct cxl_region *cxlr,
> > >  	/* Apply the hpa_offset to the region base address */
> > >  	hpa = hpa_offset + p->res->start;
> > >  
> > > +	/* An addr_trans helper is defined for XOR math */
> > 
> > Rather then calling out XOR math since that is an ACPI'ism I would just
> > say something like: "root decoder overrides typical modulo decode"
> 
> Got it.
> 
> -- Alison
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-05-02  4:34 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-26 19:51 [PATCH 0/3] XOR Math Fixups: translation & position alison.schofield
2024-04-26 19:51 ` [PATCH 1/3] cxl/acpi: Restore XOR'd position bits during address translation alison.schofield
2024-04-27  1:07   ` Dan Williams
2024-05-01  3:41     ` Alison Schofield
2024-05-01  5:00       ` Alison Schofield
2024-05-02  4:34       ` Alison Schofield
2024-04-26 19:51 ` [PATCH 2/3] cxl/region: Verify target positions using the ordered target list alison.schofield
2024-04-30 22:59   ` Dan Williams
2024-04-26 19:51 ` [PATCH 3/3] cxl: Remove defunct code calculating host bridge target positions alison.schofield
2024-04-30 23:04   ` Dan Williams

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.