* [PATCH v3 00/11] cxl: ACPI PRM Address Translation Support and AMD Zen5 enablement
@ 2025-09-12 14:45 Robert Richter
2025-09-12 14:45 ` [PATCH v3 01/11] cxl/region: Store root decoder in struct cxl_region Robert Richter
` (12 more replies)
0 siblings, 13 replies; 62+ messages in thread
From: Robert Richter @ 2025-09-12 14:45 UTC (permalink / raw)
To: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
Jonathan Cameron, Dave Jiang, Davidlohr Bueso
Cc: linux-cxl, linux-kernel, Gregory Price, Fabio M. De Francesco,
Terry Bowman, Joshua Hahn, Robert Richter
This patch set adds support for address translation using ACPI PRM and
enables this for AMD Zen5 platforms. This is another new appoach in
response to earlier attempts to implement CXL address translation:
* v1: [1] and the comments on it, esp. Dan's [2],
* v2: [3] and comments on [4], esp. Dave's [5]
This version 3 addresses the requests to reduce the number of patches
to a minimum and also to remove platform specific implementations
allowing the Documentation of CXL Address Translation Support in the
Kernel's "Compute Express Link: Linux Conventions" document and an
update of the CXL specification in the longterm. This patch submission
will be the base for a documention patch that describes CXL Address
Translation support accordingly. The documentation patch will be sent
in the very next step.
The CXL driver currently does not implement address translation which
assumes the host physical addresses (HPA) and system physical
addresses (SPA) are equal.
Systems with different HPA and SPA addresses need address translation.
If this is the case, the hardware addresses esp. used in the HDM
decoder configurations are different to the system's or parent port
address ranges. E.g. AMD Zen5 systems may be configured to use
'Normalized addresses'. Then, CXL endpoints have their own physical
address base which is not the same as the SPA used by the CXL host
bridge. Thus, addresses need to be translated from the endpoint's to
its CXL host bridge's address range.
To enable address translation, the endpoint's HPA range must be
translated to the CXL host bridge's address range. A callback is
introduced to translate a decoder's HPA to the next parent port's
address range. This allows the enablement of address translation for
individual ports as needed. The callback is then used to determine the
region parameters which includes the SPA translated address range of
the endpoint decoder and the interleaving configuration. This is
stored in struct cxl_region which allows an endpoint decoder to
determine that parameters based on its assigned region.
Note that only auto-discovery of decoders is supported. Thus, decoders
are locked and cannot be configured manually.
Finally, Zen5 address translation is enabled using ACPI PRMT.
This series bases on cxl/next.
V3:
* rebased onto cxl/next,
* complete rework to reduce number of required changes/patches and to
remove platform specific code (Dan and Dave),
* changed implementation allowing to add address translation to the
CXL specification (documention patch in preparation),
* simplified and generalized determination of interleaving
parameters using the address translation callback,
* depend only on the existence of the ACPI PRM GUID for CXL Address
Translation enablement, removed platform checks,
* small changes to region code only which does not require a full
rework and refactoring of the code, just separating region
parameter setup and region construction,
* moved code to new core/atl.c file,
* fixed subsys_initcall order dependency of EFI runtime services
(Gregory and Joshua),
V2:
* rebased onto cxl/next,
* split of v1 in two parts:
* removed cleanups and updates from this series to post them as a
separate series (Dave),
* this part 2 applies on top of part 1, v3,
* added tags to SOB chain,
* reworked architecture, vendor and platform setup (Jonathan):
* added patch "cxl/x86: Prepare for architectural platform setup",
* added function arch_cxl_port_platform_setup() plus a __weak
versions for archs other than x86,
* moved code to core/x86,
* added comment to cxl_to_hpa_fn (Ben),
* updated year in copyright statement (Ben),
* cxl_port_calc_hpa(): Removed HPA check for zero (Jonathan), return
1 if modified,
* cxl_port_calc_pos(): Updated description and wording (Ben),
* added sereral patches around interleaving and SPA calculation in
cxl_endpoint_decoder_initialize(),
* reworked iterator in cxl_endpoint_decoder_initialize() (Gregory),
* fixed region interleaving parameters() (Alison),
* fixed check in cxl_region_attach() (Alison),
* Clarified in coverletter that not all ports in a system must
implement the to_hpa() callback (Terry).
[1] https://lore.kernel.org/linux-cxl/20240701174754.967954-1-rrichter@amd.com/
[2] https://lore.kernel.org/linux-cxl/669086821f136_5fffa29473@dwillia2-xfh.jf.intel.com.notmuch/
[3] https://patchwork.kernel.org/project/cxl/cover/20250218132356.1809075-1-rrichter@amd.com/
[4] https://patchwork.kernel.org/project/cxl/cover/20250715191143.1023512-1-rrichter@amd.com/
[5] https://lore.kernel.org/all/78284b12-3e0b-4758-af18-397f32136c3f@intel.com/
Robert Richter (11):
cxl/region: Store root decoder in struct cxl_region
cxl/region: Store HPA range in struct cxl_region
cxl/region: Rename misleading variable name @hpa to @range
cxl/region: Add @range argument to function cxl_find_root_decoder()
cxl/region: Add @range argument to function cxl_calc_interleave_pos()
cxl/region: Separate region parameter setup and region construction
cxl: Introduce callback to translate a decoder's HPA to the next
parent port
cxl/region: Implement endpoint decoder address translation
cxl/region: Lock decoders that need address translation
cxl/acpi: Prepare use of EFI runtime services
cxl: Enable AMD Zen5 address translation using ACPI PRMT
drivers/cxl/Kconfig | 4 +
drivers/cxl/acpi.c | 8 +-
drivers/cxl/core/Makefile | 1 +
drivers/cxl/core/atl.c | 138 ++++++++++++++++++++++
drivers/cxl/core/core.h | 1 +
drivers/cxl/core/port.c | 8 ++
drivers/cxl/core/region.c | 239 ++++++++++++++++++++++++++++++--------
drivers/cxl/cxl.h | 17 +++
8 files changed, 368 insertions(+), 48 deletions(-)
create mode 100644 drivers/cxl/core/atl.c
base-commit: 561c4e30bff93b3c33e694a459f8580f8a6b3c8c
--
2.39.5
^ permalink raw reply [flat|nested] 62+ messages in thread
* [PATCH v3 01/11] cxl/region: Store root decoder in struct cxl_region
2025-09-12 14:45 [PATCH v3 00/11] cxl: ACPI PRM Address Translation Support and AMD Zen5 enablement Robert Richter
@ 2025-09-12 14:45 ` Robert Richter
2025-09-12 15:52 ` Dave Jiang
` (2 more replies)
2025-09-12 14:45 ` [PATCH v3 02/11] cxl/region: Store HPA range " Robert Richter
` (11 subsequent siblings)
12 siblings, 3 replies; 62+ messages in thread
From: Robert Richter @ 2025-09-12 14:45 UTC (permalink / raw)
To: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
Jonathan Cameron, Dave Jiang, Davidlohr Bueso
Cc: linux-cxl, linux-kernel, Gregory Price, Fabio M. De Francesco,
Terry Bowman, Joshua Hahn, Robert Richter
A region is always bound to a root decoder. The region's associated
root decoder is often needed. Add it to struct cxl_region.
This simplifies code by removing dynamic lookups and removing the root
decoder argument from the function argument list where possible.
Patch is a prerequisite to implement address translation which uses
struct cxl_region to store all relevant region and interleaving
parameters.
Signed-off-by: Robert Richter <rrichter@amd.com>
---
drivers/cxl/core/region.c | 37 +++++++++++++++++++------------------
drivers/cxl/cxl.h | 2 ++
2 files changed, 21 insertions(+), 18 deletions(-)
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 29d3809ab2bb..2c37c060d983 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -489,9 +489,9 @@ static ssize_t interleave_ways_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t len)
{
- struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev->parent);
- struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld;
struct cxl_region *cxlr = to_cxl_region(dev);
+ struct cxl_root_decoder *cxlrd = cxlr->cxlrd;
+ struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld;
struct cxl_region_params *p = &cxlr->params;
unsigned int val, save;
int rc;
@@ -552,9 +552,9 @@ static ssize_t interleave_granularity_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t len)
{
- struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev->parent);
- struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld;
struct cxl_region *cxlr = to_cxl_region(dev);
+ struct cxl_root_decoder *cxlrd = cxlr->cxlrd;
+ struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld;
struct cxl_region_params *p = &cxlr->params;
int rc, val;
u16 ig;
@@ -628,7 +628,7 @@ static DEVICE_ATTR_RO(mode);
static int alloc_hpa(struct cxl_region *cxlr, resource_size_t size)
{
- struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
+ struct cxl_root_decoder *cxlrd = cxlr->cxlrd;
struct cxl_region_params *p = &cxlr->params;
struct resource *res;
u64 remainder = 0;
@@ -1321,7 +1321,7 @@ static int cxl_port_setup_targets(struct cxl_port *port,
struct cxl_region *cxlr,
struct cxl_endpoint_decoder *cxled)
{
- struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
+ struct cxl_root_decoder *cxlrd = cxlr->cxlrd;
int parent_iw, parent_ig, ig, iw, rc, inc = 0, pos = cxled->pos;
struct cxl_port *parent_port = to_cxl_port(port->dev.parent);
struct cxl_region_ref *cxl_rr = cxl_rr_load(port, cxlr);
@@ -1678,10 +1678,10 @@ static int cxl_region_validate_position(struct cxl_region *cxlr,
}
static int cxl_region_attach_position(struct cxl_region *cxlr,
- struct cxl_root_decoder *cxlrd,
struct cxl_endpoint_decoder *cxled,
const struct cxl_dport *dport, int pos)
{
+ struct cxl_root_decoder *cxlrd = cxlr->cxlrd;
struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
struct cxl_switch_decoder *cxlsd = &cxlrd->cxlsd;
struct cxl_decoder *cxld = &cxlsd->cxld;
@@ -1918,7 +1918,7 @@ static int cxl_region_sort_targets(struct cxl_region *cxlr)
static int cxl_region_attach(struct cxl_region *cxlr,
struct cxl_endpoint_decoder *cxled, int pos)
{
- struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
+ struct cxl_root_decoder *cxlrd = cxlr->cxlrd;
struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
struct cxl_dev_state *cxlds = cxlmd->cxlds;
struct cxl_region_params *p = &cxlr->params;
@@ -2023,8 +2023,7 @@ static int cxl_region_attach(struct cxl_region *cxlr,
ep_port = cxled_to_port(cxled);
dport = cxl_find_dport_by_dev(root_port,
ep_port->host_bridge);
- rc = cxl_region_attach_position(cxlr, cxlrd, cxled,
- dport, i);
+ rc = cxl_region_attach_position(cxlr, cxled, dport, i);
if (rc)
return rc;
}
@@ -2047,7 +2046,7 @@ static int cxl_region_attach(struct cxl_region *cxlr,
if (rc)
return rc;
- rc = cxl_region_attach_position(cxlr, cxlrd, cxled, dport, pos);
+ rc = cxl_region_attach_position(cxlr, cxled, dport, pos);
if (rc)
return rc;
@@ -2343,8 +2342,8 @@ static const struct attribute_group *region_groups[] = {
static void cxl_region_release(struct device *dev)
{
- struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev->parent);
struct cxl_region *cxlr = to_cxl_region(dev);
+ struct cxl_root_decoder *cxlrd = cxlr->cxlrd;
int id = atomic_read(&cxlrd->region_id);
/*
@@ -2427,10 +2426,12 @@ static struct cxl_region *cxl_region_alloc(struct cxl_root_decoder *cxlrd, int i
* region id allocations
*/
get_device(dev->parent);
+ cxlr->cxlrd = cxlrd;
+ cxlr->id = id;
+
device_set_pm_not_required(dev);
dev->bus = &cxl_bus_type;
dev->type = &cxl_region_type;
- cxlr->id = id;
return cxlr;
}
@@ -2931,7 +2932,7 @@ static bool has_spa_to_hpa(struct cxl_root_decoder *cxlrd)
u64 cxl_dpa_to_hpa(struct cxl_region *cxlr, const struct cxl_memdev *cxlmd,
u64 dpa)
{
- struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
+ struct cxl_root_decoder *cxlrd = cxlr->cxlrd;
u64 dpa_offset, hpa_offset, bits_upper, mask_upper, hpa;
struct cxl_region_params *p = &cxlr->params;
struct cxl_endpoint_decoder *cxled = NULL;
@@ -3007,7 +3008,7 @@ static int region_offset_to_dpa_result(struct cxl_region *cxlr, u64 offset,
struct dpa_result *result)
{
struct cxl_region_params *p = &cxlr->params;
- struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
+ struct cxl_root_decoder *cxlrd = cxlr->cxlrd;
struct cxl_endpoint_decoder *cxled;
u64 hpa, hpa_offset, dpa_offset;
u64 bits_upper, bits_lower;
@@ -3401,7 +3402,7 @@ static int match_region_by_range(struct device *dev, const void *data)
static int cxl_extended_linear_cache_resize(struct cxl_region *cxlr,
struct resource *res)
{
- struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
+ struct cxl_root_decoder *cxlrd = cxlr->cxlrd;
struct cxl_region_params *p = &cxlr->params;
resource_size_t size = resource_size(res);
resource_size_t cache_size, start;
@@ -3437,9 +3438,9 @@ static int cxl_extended_linear_cache_resize(struct cxl_region *cxlr,
}
static int __construct_region(struct cxl_region *cxlr,
- struct cxl_root_decoder *cxlrd,
struct cxl_endpoint_decoder *cxled)
{
+ struct cxl_root_decoder *cxlrd = cxlr->cxlrd;
struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
struct range *hpa = &cxled->cxld.hpa_range;
struct cxl_region_params *p;
@@ -3531,7 +3532,7 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
return cxlr;
}
- rc = __construct_region(cxlr, cxlrd, cxled);
+ rc = __construct_region(cxlr, cxled);
if (rc) {
devm_release_action(port->uport_dev, unregister_region, cxlr);
return ERR_PTR(rc);
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 4fe3df06f57a..350ccd6949b3 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -517,6 +517,7 @@ enum cxl_partition_mode {
* struct cxl_region - CXL region
* @dev: This region's device
* @id: This region's id. Id is globally unique across all regions
+ * @cxlrd: Region's root decoder
* @mode: Operational mode of the mapped capacity
* @type: Endpoint decoder target type
* @cxl_nvb: nvdimm bridge for coordinating @cxlr_pmem setup / shutdown
@@ -530,6 +531,7 @@ enum cxl_partition_mode {
struct cxl_region {
struct device dev;
int id;
+ struct cxl_root_decoder *cxlrd;
enum cxl_partition_mode mode;
enum cxl_decoder_type type;
struct cxl_nvdimm_bridge *cxl_nvb;
--
2.39.5
^ permalink raw reply related [flat|nested] 62+ messages in thread
* [PATCH v3 02/11] cxl/region: Store HPA range in struct cxl_region
2025-09-12 14:45 [PATCH v3 00/11] cxl: ACPI PRM Address Translation Support and AMD Zen5 enablement Robert Richter
2025-09-12 14:45 ` [PATCH v3 01/11] cxl/region: Store root decoder in struct cxl_region Robert Richter
@ 2025-09-12 14:45 ` Robert Richter
2025-09-12 17:17 ` Dave Jiang
` (2 more replies)
2025-09-12 14:45 ` [PATCH v3 03/11] cxl/region: Rename misleading variable name @hpa to @range Robert Richter
` (10 subsequent siblings)
12 siblings, 3 replies; 62+ messages in thread
From: Robert Richter @ 2025-09-12 14:45 UTC (permalink / raw)
To: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
Jonathan Cameron, Dave Jiang, Davidlohr Bueso
Cc: linux-cxl, linux-kernel, Gregory Price, Fabio M. De Francesco,
Terry Bowman, Joshua Hahn, Robert Richter
Each region has a known host physical address (HPA) range it is
assigned to. Endpoint decoders assigned to a region share the same HPA
range. The region's address range is the system's physical address
(SPA) range.
Endpoint decoders in systems that need address translation use HPAs
which are not SPAs. To make the SPA range accessible to the endpoint
decoders, store and track the region's SPA range in struct cxl_region.
Introduce the @hpa_range member to the struct. Now, the SPA range of
an endpoint decoder can be determined based on its assigned region.
Patch is a prerequisite to implement address translation which uses
struct cxl_region to store all relevant region and interleaving
parameters.
Signed-off-by: Robert Richter <rrichter@amd.com>
---
drivers/cxl/core/region.c | 17 +++++++++++++++++
drivers/cxl/cxl.h | 2 ++
2 files changed, 19 insertions(+)
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 2c37c060d983..777d04870180 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -664,6 +664,11 @@ static int alloc_hpa(struct cxl_region *cxlr, resource_size_t size)
return PTR_ERR(res);
}
+ cxlr->hpa_range = (struct range) {
+ .start = res->start,
+ .end = res->end,
+ };
+
p->res = res;
p->state = CXL_CONFIG_INTERLEAVE_ACTIVE;
@@ -700,8 +705,14 @@ static int free_hpa(struct cxl_region *cxlr)
if (p->state >= CXL_CONFIG_ACTIVE)
return -EBUSY;
+ cxlr->hpa_range = (struct range) {
+ .start = 0,
+ .end = -1,
+ };
+
cxl_region_iomem_release(cxlr);
p->state = CXL_CONFIG_IDLE;
+
return 0;
}
@@ -2400,6 +2411,11 @@ static void unregister_region(void *_cxlr)
for (i = 0; i < p->interleave_ways; i++)
detach_target(cxlr, i);
+ cxlr->hpa_range = (struct range) {
+ .start = 0,
+ .end = -1,
+ };
+
cxl_region_iomem_release(cxlr);
put_device(&cxlr->dev);
}
@@ -3458,6 +3474,7 @@ static int __construct_region(struct cxl_region *cxlr,
}
set_bit(CXL_REGION_F_AUTO, &cxlr->flags);
+ cxlr->hpa_range = *hpa;
res = kmalloc(sizeof(*res), GFP_KERNEL);
if (!res)
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 350ccd6949b3..f182982f1c14 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -518,6 +518,7 @@ enum cxl_partition_mode {
* @dev: This region's device
* @id: This region's id. Id is globally unique across all regions
* @cxlrd: Region's root decoder
+ * @hpa_range: Address range occupied by the region
* @mode: Operational mode of the mapped capacity
* @type: Endpoint decoder target type
* @cxl_nvb: nvdimm bridge for coordinating @cxlr_pmem setup / shutdown
@@ -532,6 +533,7 @@ struct cxl_region {
struct device dev;
int id;
struct cxl_root_decoder *cxlrd;
+ struct range hpa_range;
enum cxl_partition_mode mode;
enum cxl_decoder_type type;
struct cxl_nvdimm_bridge *cxl_nvb;
--
2.39.5
^ permalink raw reply related [flat|nested] 62+ messages in thread
* [PATCH v3 03/11] cxl/region: Rename misleading variable name @hpa to @range
2025-09-12 14:45 [PATCH v3 00/11] cxl: ACPI PRM Address Translation Support and AMD Zen5 enablement Robert Richter
2025-09-12 14:45 ` [PATCH v3 01/11] cxl/region: Store root decoder in struct cxl_region Robert Richter
2025-09-12 14:45 ` [PATCH v3 02/11] cxl/region: Store HPA range " Robert Richter
@ 2025-09-12 14:45 ` Robert Richter
2025-09-12 17:33 ` Dave Jiang
2025-09-17 20:01 ` Gregory Price
2025-09-12 14:45 ` [PATCH v3 04/11] cxl/region: Add @range argument to function cxl_find_root_decoder() Robert Richter
` (9 subsequent siblings)
12 siblings, 2 replies; 62+ messages in thread
From: Robert Richter @ 2025-09-12 14:45 UTC (permalink / raw)
To: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
Jonathan Cameron, Dave Jiang, Davidlohr Bueso
Cc: linux-cxl, linux-kernel, Gregory Price, Fabio M. De Francesco,
Terry Bowman, Joshua Hahn, Robert Richter
@hpa is actually a @range, rename variables accordingly.
Signed-off-by: Robert Richter <rrichter@amd.com>
---
drivers/cxl/core/region.c | 29 +++++++++++++++--------------
1 file changed, 15 insertions(+), 14 deletions(-)
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 777d04870180..13113920aba7 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -3367,9 +3367,9 @@ static int match_decoder_by_range(struct device *dev, const void *data)
}
static struct cxl_decoder *
-cxl_port_find_switch_decoder(struct cxl_port *port, struct range *hpa)
+cxl_port_find_switch_decoder(struct cxl_port *port, struct range *range)
{
- struct device *cxld_dev = device_find_child(&port->dev, hpa,
+ struct device *cxld_dev = device_find_child(&port->dev, range,
match_decoder_by_range);
return cxld_dev ? to_cxl_decoder(cxld_dev) : NULL;
@@ -3382,14 +3382,14 @@ cxl_find_root_decoder(struct cxl_endpoint_decoder *cxled)
struct cxl_port *port = cxled_to_port(cxled);
struct cxl_root *cxl_root __free(put_cxl_root) = find_cxl_root(port);
struct cxl_decoder *root, *cxld = &cxled->cxld;
- struct range *hpa = &cxld->hpa_range;
+ struct range *range = &cxld->hpa_range;
- root = cxl_port_find_switch_decoder(&cxl_root->port, hpa);
+ root = cxl_port_find_switch_decoder(&cxl_root->port, range);
if (!root) {
dev_err(cxlmd->dev.parent,
"%s:%s no CXL window for range %#llx:%#llx\n",
dev_name(&cxlmd->dev), dev_name(&cxld->dev),
- cxld->hpa_range.start, cxld->hpa_range.end);
+ range->start, range->end);
return NULL;
}
@@ -3458,7 +3458,7 @@ static int __construct_region(struct cxl_region *cxlr,
{
struct cxl_root_decoder *cxlrd = cxlr->cxlrd;
struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
- struct range *hpa = &cxled->cxld.hpa_range;
+ struct range *range = &cxled->cxld.hpa_range;
struct cxl_region_params *p;
struct resource *res;
int rc;
@@ -3474,13 +3474,13 @@ static int __construct_region(struct cxl_region *cxlr,
}
set_bit(CXL_REGION_F_AUTO, &cxlr->flags);
- cxlr->hpa_range = *hpa;
+ cxlr->hpa_range = *range;
res = kmalloc(sizeof(*res), GFP_KERNEL);
if (!res)
return -ENOMEM;
- *res = DEFINE_RES_MEM_NAMED(hpa->start, range_len(hpa),
+ *res = DEFINE_RES_MEM_NAMED(range->start, range_len(range),
dev_name(&cxlr->dev));
rc = cxl_extended_linear_cache_resize(cxlr, res);
@@ -3559,11 +3559,11 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
}
static struct cxl_region *
-cxl_find_region_by_range(struct cxl_root_decoder *cxlrd, struct range *hpa)
+cxl_find_region_by_range(struct cxl_root_decoder *cxlrd, struct range *range)
{
struct device *region_dev;
- region_dev = device_find_child(&cxlrd->cxlsd.cxld.dev, hpa,
+ region_dev = device_find_child(&cxlrd->cxlsd.cxld.dev, range,
match_region_by_range);
if (!region_dev)
return NULL;
@@ -3573,7 +3573,7 @@ cxl_find_region_by_range(struct cxl_root_decoder *cxlrd, struct range *hpa)
int cxl_add_to_region(struct cxl_endpoint_decoder *cxled)
{
- struct range *hpa = &cxled->cxld.hpa_range;
+ struct range *range = &cxled->cxld.hpa_range;
struct cxl_region_params *p;
bool attach = false;
int rc;
@@ -3584,12 +3584,13 @@ int cxl_add_to_region(struct cxl_endpoint_decoder *cxled)
return -ENXIO;
/*
- * Ensure that if multiple threads race to construct_region() for @hpa
- * one does the construction and the others add to that.
+ * Ensure that if multiple threads race to construct_region()
+ * for the HPA range one does the construction and the others
+ * add to that.
*/
mutex_lock(&cxlrd->range_lock);
struct cxl_region *cxlr __free(put_cxl_region) =
- cxl_find_region_by_range(cxlrd, hpa);
+ cxl_find_region_by_range(cxlrd, range);
if (!cxlr)
cxlr = construct_region(cxlrd, cxled);
mutex_unlock(&cxlrd->range_lock);
--
2.39.5
^ permalink raw reply related [flat|nested] 62+ messages in thread
* [PATCH v3 04/11] cxl/region: Add @range argument to function cxl_find_root_decoder()
2025-09-12 14:45 [PATCH v3 00/11] cxl: ACPI PRM Address Translation Support and AMD Zen5 enablement Robert Richter
` (2 preceding siblings ...)
2025-09-12 14:45 ` [PATCH v3 03/11] cxl/region: Rename misleading variable name @hpa to @range Robert Richter
@ 2025-09-12 14:45 ` Robert Richter
2025-09-17 20:05 ` Gregory Price
2025-09-12 14:45 ` [PATCH v3 05/11] cxl/region: Add @range argument to function cxl_calc_interleave_pos() Robert Richter
` (8 subsequent siblings)
12 siblings, 1 reply; 62+ messages in thread
From: Robert Richter @ 2025-09-12 14:45 UTC (permalink / raw)
To: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
Jonathan Cameron, Dave Jiang, Davidlohr Bueso
Cc: linux-cxl, linux-kernel, Gregory Price, Fabio M. De Francesco,
Terry Bowman, Joshua Hahn, Robert Richter
cxl_find_root_decoder() uses the endpoint decoder's HPA range to find
the root decoder. This requires endpoints and root decoders to be in
the same memory domain, which is not the case for systems that need
address translation.
Add a separate @range argument to function cxl_find_root_decoder() to
specify the root decoder's address range. Now it is possible to pass a
translated address range of an endpoint decoder to function
cxl_find_root_decoder().
Patch is a prerequisite to implement address translation.
Signed-off-by: Robert Richter <rrichter@amd.com>
---
drivers/cxl/core/region.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 13113920aba7..8ccc171ac724 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -3376,19 +3376,18 @@ cxl_port_find_switch_decoder(struct cxl_port *port, struct range *range)
}
static struct cxl_root_decoder *
-cxl_find_root_decoder(struct cxl_endpoint_decoder *cxled)
+cxl_find_root_decoder(struct cxl_endpoint_decoder *cxled, struct range *range)
{
struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
struct cxl_port *port = cxled_to_port(cxled);
struct cxl_root *cxl_root __free(put_cxl_root) = find_cxl_root(port);
- struct cxl_decoder *root, *cxld = &cxled->cxld;
- struct range *range = &cxld->hpa_range;
+ struct cxl_decoder *root;
root = cxl_port_find_switch_decoder(&cxl_root->port, range);
if (!root) {
dev_err(cxlmd->dev.parent,
"%s:%s no CXL window for range %#llx:%#llx\n",
- dev_name(&cxlmd->dev), dev_name(&cxld->dev),
+ dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
range->start, range->end);
return NULL;
}
@@ -3579,7 +3578,7 @@ int cxl_add_to_region(struct cxl_endpoint_decoder *cxled)
int rc;
struct cxl_root_decoder *cxlrd __free(put_cxl_root_decoder) =
- cxl_find_root_decoder(cxled);
+ cxl_find_root_decoder(cxled, range);
if (!cxlrd)
return -ENXIO;
--
2.39.5
^ permalink raw reply related [flat|nested] 62+ messages in thread
* [PATCH v3 05/11] cxl/region: Add @range argument to function cxl_calc_interleave_pos()
2025-09-12 14:45 [PATCH v3 00/11] cxl: ACPI PRM Address Translation Support and AMD Zen5 enablement Robert Richter
` (3 preceding siblings ...)
2025-09-12 14:45 ` [PATCH v3 04/11] cxl/region: Add @range argument to function cxl_find_root_decoder() Robert Richter
@ 2025-09-12 14:45 ` Robert Richter
2025-09-17 20:09 ` Gregory Price
2025-09-23 21:52 ` Alison Schofield
2025-09-12 14:45 ` [PATCH v3 06/11] cxl/region: Separate region parameter setup and region construction Robert Richter
` (7 subsequent siblings)
12 siblings, 2 replies; 62+ messages in thread
From: Robert Richter @ 2025-09-12 14:45 UTC (permalink / raw)
To: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
Jonathan Cameron, Dave Jiang, Davidlohr Bueso
Cc: linux-cxl, linux-kernel, Gregory Price, Fabio M. De Francesco,
Terry Bowman, Joshua Hahn, Robert Richter
cxl_calc_interleave_pos() uses the endpoint decoder's HPA range to
determine its interleaving position. This requires the endpoint
decoders to be an SPA, which is not the case for systems that need
address translation.
Add a separate @range argument to function cxl_calc_interleave_pos()
to specify the address range. Now it is possible to pass the SPA
translated address range of an endpoint decoder to function
cxl_calc_interleave_pos().
Patch is a prerequisite to implement address translation.
Signed-off-by: Robert Richter <rrichter@amd.com>
---
drivers/cxl/core/region.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 8ccc171ac724..106692f1e310 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -1844,11 +1844,11 @@ static int find_pos_and_ways(struct cxl_port *port, struct range *range,
* Return: position >= 0 on success
* -ENXIO on failure
*/
-static int cxl_calc_interleave_pos(struct cxl_endpoint_decoder *cxled)
+static int cxl_calc_interleave_pos(struct cxl_endpoint_decoder *cxled,
+ struct range *range)
{
struct cxl_port *iter, *port = cxled_to_port(cxled);
struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
- struct range *range = &cxled->cxld.hpa_range;
int parent_ways = 0, parent_pos = 0, pos = 0;
int rc;
@@ -1909,7 +1909,7 @@ static int cxl_region_sort_targets(struct cxl_region *cxlr)
for (i = 0; i < p->nr_targets; i++) {
struct cxl_endpoint_decoder *cxled = p->targets[i];
- cxled->pos = cxl_calc_interleave_pos(cxled);
+ cxled->pos = cxl_calc_interleave_pos(cxled, &cxlr->hpa_range);
/*
* Record that sorting failed, but still continue to calc
* cxled->pos so that follow-on code paths can reliably
@@ -2093,7 +2093,7 @@ static int cxl_region_attach(struct cxl_region *cxlr,
struct cxl_endpoint_decoder *cxled = p->targets[i];
int test_pos;
- test_pos = cxl_calc_interleave_pos(cxled);
+ test_pos = cxl_calc_interleave_pos(cxled, &cxlr->hpa_range);
dev_dbg(&cxled->cxld.dev,
"Test cxl_calc_interleave_pos(): %s test_pos:%d cxled->pos:%d\n",
(test_pos == cxled->pos) ? "success" : "fail",
--
2.39.5
^ permalink raw reply related [flat|nested] 62+ messages in thread
* [PATCH v3 06/11] cxl/region: Separate region parameter setup and region construction
2025-09-12 14:45 [PATCH v3 00/11] cxl: ACPI PRM Address Translation Support and AMD Zen5 enablement Robert Richter
` (4 preceding siblings ...)
2025-09-12 14:45 ` [PATCH v3 05/11] cxl/region: Add @range argument to function cxl_calc_interleave_pos() Robert Richter
@ 2025-09-12 14:45 ` Robert Richter
2025-09-12 21:10 ` Dave Jiang
2025-09-17 20:15 ` Gregory Price
2025-09-12 14:45 ` [PATCH v3 07/11] cxl: Introduce callback to translate a decoder's HPA to the next parent port Robert Richter
` (6 subsequent siblings)
12 siblings, 2 replies; 62+ messages in thread
From: Robert Richter @ 2025-09-12 14:45 UTC (permalink / raw)
To: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
Jonathan Cameron, Dave Jiang, Davidlohr Bueso
Cc: linux-cxl, linux-kernel, Gregory Price, Fabio M. De Francesco,
Terry Bowman, Joshua Hahn, Robert Richter
To construct a region, the region parameters such as address range and
interleaving config need to be determined. This is done while
constructing the region by inspecting the endpoint decoder
configuration. The endpoint decoder is passed as a function argument.
With address translation the endpoint decoder data is no longer
sufficient to extract the region parameters as some of the information
is obtained using other methods such as using firmware calls.
In a first step, separate code to determine and setup the region
parameters from the region construction. Temporarily store all the
data to create the region in the new struct cxl_region_context. Add a
new function setup_region_parameters() to fill that struct and later
use it to construct the region. This simplifies the extension of the
function to support other methods needed, esp. to support address
translation.
Patch is a prerequisite to implement address translation.
Signed-off-by: Robert Richter <rrichter@amd.com>
---
drivers/cxl/core/region.c | 50 +++++++++++++++++++++++++++++----------
1 file changed, 38 insertions(+), 12 deletions(-)
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 106692f1e310..57697504410b 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -3414,6 +3414,26 @@ static int match_region_by_range(struct device *dev, const void *data)
return 0;
}
+struct cxl_region_context {
+ struct cxl_endpoint_decoder *cxled;
+ struct cxl_memdev *cxlmd;
+ struct range hpa_range;
+ int interleave_ways;
+ int interleave_granularity;
+};
+
+static int setup_region_params(struct cxl_endpoint_decoder *cxled,
+ struct cxl_region_context *ctx)
+{
+ ctx->cxled = cxled;
+ ctx->cxlmd = cxled_to_memdev(cxled);
+ ctx->hpa_range = cxled->cxld.hpa_range;
+ ctx->interleave_ways = cxled->cxld.interleave_ways;
+ ctx->interleave_granularity = cxled->cxld.interleave_granularity;
+
+ return 0;
+}
+
static int cxl_extended_linear_cache_resize(struct cxl_region *cxlr,
struct resource *res)
{
@@ -3453,11 +3473,12 @@ static int cxl_extended_linear_cache_resize(struct cxl_region *cxlr,
}
static int __construct_region(struct cxl_region *cxlr,
- struct cxl_endpoint_decoder *cxled)
+ struct cxl_region_context *ctx)
{
+ struct cxl_endpoint_decoder *cxled = ctx->cxled;
struct cxl_root_decoder *cxlrd = cxlr->cxlrd;
- struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
- struct range *range = &cxled->cxld.hpa_range;
+ struct cxl_memdev *cxlmd = ctx->cxlmd;
+ struct range *range = &ctx->hpa_range;
struct cxl_region_params *p;
struct resource *res;
int rc;
@@ -3506,8 +3527,8 @@ static int __construct_region(struct cxl_region *cxlr,
}
p->res = res;
- p->interleave_ways = cxled->cxld.interleave_ways;
- p->interleave_granularity = cxled->cxld.interleave_granularity;
+ p->interleave_ways = ctx->interleave_ways;
+ p->interleave_granularity = ctx->interleave_granularity;
p->state = CXL_CONFIG_INTERLEAVE_ACTIVE;
rc = sysfs_update_group(&cxlr->dev.kobj, get_cxl_region_target_group());
@@ -3527,9 +3548,10 @@ static int __construct_region(struct cxl_region *cxlr,
/* Establish an empty region covering the given HPA range */
static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
- struct cxl_endpoint_decoder *cxled)
+ struct cxl_region_context *ctx)
{
- struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
+ struct cxl_endpoint_decoder *cxled = ctx->cxled;
+ struct cxl_memdev *cxlmd = ctx->cxlmd;
struct cxl_port *port = cxlrd_to_port(cxlrd);
struct cxl_dev_state *cxlds = cxlmd->cxlds;
int rc, part = READ_ONCE(cxled->part);
@@ -3548,7 +3570,7 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
return cxlr;
}
- rc = __construct_region(cxlr, cxled);
+ rc = __construct_region(cxlr, ctx);
if (rc) {
devm_release_action(port->uport_dev, unregister_region, cxlr);
return ERR_PTR(rc);
@@ -3572,13 +3594,17 @@ cxl_find_region_by_range(struct cxl_root_decoder *cxlrd, struct range *range)
int cxl_add_to_region(struct cxl_endpoint_decoder *cxled)
{
- struct range *range = &cxled->cxld.hpa_range;
+ struct cxl_region_context ctx;
struct cxl_region_params *p;
bool attach = false;
int rc;
+ rc = setup_region_params(cxled, &ctx);
+ if (rc)
+ return rc;
+
struct cxl_root_decoder *cxlrd __free(put_cxl_root_decoder) =
- cxl_find_root_decoder(cxled, range);
+ cxl_find_root_decoder(cxled, &ctx.hpa_range);
if (!cxlrd)
return -ENXIO;
@@ -3589,9 +3615,9 @@ int cxl_add_to_region(struct cxl_endpoint_decoder *cxled)
*/
mutex_lock(&cxlrd->range_lock);
struct cxl_region *cxlr __free(put_cxl_region) =
- cxl_find_region_by_range(cxlrd, range);
+ cxl_find_region_by_range(cxlrd, &ctx.hpa_range);
if (!cxlr)
- cxlr = construct_region(cxlrd, cxled);
+ cxlr = construct_region(cxlrd, &ctx);
mutex_unlock(&cxlrd->range_lock);
rc = PTR_ERR_OR_ZERO(cxlr);
--
2.39.5
^ permalink raw reply related [flat|nested] 62+ messages in thread
* [PATCH v3 07/11] cxl: Introduce callback to translate a decoder's HPA to the next parent port
2025-09-12 14:45 [PATCH v3 00/11] cxl: ACPI PRM Address Translation Support and AMD Zen5 enablement Robert Richter
` (5 preceding siblings ...)
2025-09-12 14:45 ` [PATCH v3 06/11] cxl/region: Separate region parameter setup and region construction Robert Richter
@ 2025-09-12 14:45 ` Robert Richter
2025-09-12 21:21 ` Dave Jiang
2025-09-15 20:22 ` Dave Jiang
2025-09-12 14:45 ` [PATCH v3 08/11] cxl/region: Implement endpoint decoder address translation Robert Richter
` (5 subsequent siblings)
12 siblings, 2 replies; 62+ messages in thread
From: Robert Richter @ 2025-09-12 14:45 UTC (permalink / raw)
To: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
Jonathan Cameron, Dave Jiang, Davidlohr Bueso
Cc: linux-cxl, linux-kernel, Gregory Price, Fabio M. De Francesco,
Terry Bowman, Joshua Hahn, Robert Richter
To enable address translation, the endpoint decoder's HPA range must
be translated when crossing memory domains to the next parent port's
address ranges up to the root port. The root port's HPA range is
equivalent to the system's SPA range.
Introduce a callback to translate an address of the decoder's HPA
range to the address range of the parent port. The callback can be set
for ports that need to handle address translation.
Reviewed-by: Gregory Price <gourry@gourry.net>
Signed-off-by: Robert Richter <rrichter@amd.com>
---
drivers/cxl/cxl.h | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index f182982f1c14..eb837867d932 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -429,6 +429,17 @@ struct cxl_rd_ops {
u64 (*spa_to_hpa)(struct cxl_root_decoder *cxlrd, u64 spa);
};
+/**
+ * cxl_to_hpa_fn - type of a callback function to translate an HPA
+ * @cxld: cxl_decoder to translate from
+ * @hpa: HPA of the @cxld decoder's address range
+ *
+ * The callback translates a decoder's HPA to the address range of the
+ * decoder's parent port. The return value is the translated HPA on
+ * success or ULLONG_MAX otherwise.
+ */
+typedef u64 (*cxl_to_hpa_fn)(struct cxl_decoder *cxld, u64 hpa);
+
/**
* struct cxl_root_decoder - Static platform CXL address decoder
* @res: host / parent resource for region allocations
@@ -599,6 +610,7 @@ struct cxl_dax_region {
* @parent_dport: dport that points to this port in the parent
* @decoder_ida: allocator for decoder ids
* @reg_map: component and ras register mapping parameters
+ * @to_hpa: Callback to translate a child port's decoder address to the port's HPA address range
* @nr_dports: number of entries in @dports
* @hdm_end: track last allocated HDM decoder instance for allocation ordering
* @commit_end: cursor to track highest committed decoder for commit ordering
@@ -619,6 +631,7 @@ struct cxl_port {
struct cxl_dport *parent_dport;
struct ida decoder_ida;
struct cxl_register_map reg_map;
+ cxl_to_hpa_fn to_hpa;
int nr_dports;
int hdm_end;
int commit_end;
--
2.39.5
^ permalink raw reply related [flat|nested] 62+ messages in thread
* [PATCH v3 08/11] cxl/region: Implement endpoint decoder address translation
2025-09-12 14:45 [PATCH v3 00/11] cxl: ACPI PRM Address Translation Support and AMD Zen5 enablement Robert Richter
` (6 preceding siblings ...)
2025-09-12 14:45 ` [PATCH v3 07/11] cxl: Introduce callback to translate a decoder's HPA to the next parent port Robert Richter
@ 2025-09-12 14:45 ` Robert Richter
2025-09-15 10:46 ` Jonathan Cameron
2025-09-12 14:45 ` [PATCH v3 09/11] cxl/region: Lock decoders that need " Robert Richter
` (4 subsequent siblings)
12 siblings, 1 reply; 62+ messages in thread
From: Robert Richter @ 2025-09-12 14:45 UTC (permalink / raw)
To: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
Jonathan Cameron, Dave Jiang, Davidlohr Bueso
Cc: linux-cxl, linux-kernel, Gregory Price, Fabio M. De Francesco,
Terry Bowman, Joshua Hahn, Robert Richter
Systems that need address translation have the endpoint decoders
programmed for a different address space. Host physical addresses
(HPA) are different from their system physical addresses (SPA). The
decoder's address range and interleaving configuration of such
endpoints cannot be used to determine the region parameters. The
region's address range must be SPA which the decoder does not
provide. In addition, an endpoint's incoming HPA is already converted
to the devices physical address (DPA). Thus it has interleaving
disabled.
Address translation may provide different ways to determine an
endpoint's SPA, e.g. it may support a firmware call. This allows the
determination of the region's parameters without inspecting the
endpoint decoders.
Implement the setup of address translation given there is a function
to convert an endpoint's HPA (which is identical to its DPA) to an
SPA. Use the previously introduced cxl_to_hpa_fn callback for this.
Convert the decoder's address range and ensure it is 256MB aligned.
Identify the region's interleaving ways by inspecting the address
ranges. Also determine the interleaving granularity using the address
translation callback. Note that the position of the chunk from one
interleaving block to the next may vary and thus cannot be considered
constant. Address offsets larger than the interleaving block size
cannot be used to calculate the granularity. Thus, probe the
granularity using address translation for various HPAs in the same
interleaving block.
Note that this patch does not yet enable address translation as
callbacks have not been initialized.
Signed-off-by: Robert Richter <rrichter@amd.com>
---
drivers/cxl/core/region.c | 95 ++++++++++++++++++++++++++++++++++++++-
1 file changed, 94 insertions(+), 1 deletion(-)
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 57697504410b..9fb1e9508213 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -3422,16 +3422,109 @@ struct cxl_region_context {
int interleave_granularity;
};
+static int setup_address_translation(struct cxl_endpoint_decoder *cxled,
+ struct cxl_region_context *ctx)
+{
+ struct cxl_port *port = to_cxl_port(cxled->cxld.dev.parent->parent);
+ struct cxl_decoder *cxld = &cxled->cxld;
+ struct range range = ctx->hpa_range;
+ u64 spa_len, len = range_len(&range);
+ u64 addr, base = range.start;
+ int ways, gran;
+
+ if (!len || !port->to_hpa)
+ return 0;
+
+ if (!IS_ALIGNED(range.start, SZ_256M) ||
+ !IS_ALIGNED(range.end + 1, SZ_256M)) {
+ dev_warn(&port->dev,
+ "CXL address translation: Unaligned decoder HPA range: %#llx-%#llx(%s)\n",
+ range.start, range.end, dev_name(&cxld->dev));
+ return -ENXIO;
+ }
+
+ /* Translate HPA range to SPA. */
+ range.start = port->to_hpa(cxld, range.start);
+ range.end = port->to_hpa(cxld, range.end);
+
+ if (range.start == ULLONG_MAX || range.end == ULLONG_MAX) {
+ dev_warn(&port->dev,
+ "CXL address translation: Failed to translate HPA range: %#llx-%#llx:%#llx-%#llx(%s)\n",
+ range.start, range.end, ctx->hpa_range.start,
+ ctx->hpa_range.end, dev_name(&cxld->dev));
+ return -ENXIO;
+ }
+
+ /*
+ * Since translated addresses include the interleaving
+ * offsets, align the range to 256 MB.
+ */
+ range.start = ALIGN_DOWN(range.start, SZ_256M);
+ range.end = ALIGN(range.end, SZ_256M) - 1;
+
+ spa_len = range_len(&range);
+ if (!len || !spa_len || spa_len % len) {
+ dev_warn(&port->dev,
+ "CXL address translation: HPA range not contiguous: %#llx-%#llx:%#llx-%#llx(%s)\n",
+ range.start, range.end, ctx->hpa_range.start,
+ ctx->hpa_range.end, dev_name(&cxld->dev));
+ return -ENXIO;
+ }
+
+ ways = spa_len / len;
+ gran = SZ_256;
+
+ /*
+ * Determine interleave granularity
+ *
+ * Note: The position of the chunk from one interleaving block
+ * to the next may vary and thus cannot be considered
+ * constant. Address offsets larger than the interleaving
+ * block size cannot be used to calculate the granularity.
+ */
+ while (ways > 1 && gran <= SZ_16M) {
+ addr = port->to_hpa(cxld, base + gran);
+ if (addr != base + gran)
+ break;
+ gran <<= 1;
+ }
+
+ if (gran > SZ_16M) {
+ dev_warn(&port->dev,
+ "CXL address translation: Cannot determine granularity: %#llx-%#llx:%#llx-%#llx(%s)\n",
+ range.start, range.end, ctx->hpa_range.start,
+ ctx->hpa_range.end, dev_name(&cxld->dev));
+ return -ENXIO;
+ }
+
+ ctx->hpa_range = range;
+ ctx->interleave_ways = ways;
+ ctx->interleave_granularity = gran;
+
+ dev_dbg(&cxld->dev,
+ "address mapping found for %s (hpa -> spa): %#llx+%#llx -> %#llx+%#llx ways:%d granularity:%d\n",
+ dev_name(ctx->cxlmd->dev.parent), base, len, range.start,
+ spa_len, ways, gran);
+
+ return 0;
+}
+
static int setup_region_params(struct cxl_endpoint_decoder *cxled,
struct cxl_region_context *ctx)
{
+ int rc;
+
ctx->cxled = cxled;
ctx->cxlmd = cxled_to_memdev(cxled);
ctx->hpa_range = cxled->cxld.hpa_range;
ctx->interleave_ways = cxled->cxld.interleave_ways;
ctx->interleave_granularity = cxled->cxld.interleave_granularity;
- return 0;
+ rc = setup_address_translation(cxled, ctx);
+ if (rc)
+ return rc;
+
+ return rc;
}
static int cxl_extended_linear_cache_resize(struct cxl_region *cxlr,
--
2.39.5
^ permalink raw reply related [flat|nested] 62+ messages in thread
* [PATCH v3 09/11] cxl/region: Lock decoders that need address translation
2025-09-12 14:45 [PATCH v3 00/11] cxl: ACPI PRM Address Translation Support and AMD Zen5 enablement Robert Richter
` (7 preceding siblings ...)
2025-09-12 14:45 ` [PATCH v3 08/11] cxl/region: Implement endpoint decoder address translation Robert Richter
@ 2025-09-12 14:45 ` Robert Richter
2025-09-17 20:52 ` Gregory Price
2025-09-12 14:45 ` [PATCH v3 10/11] cxl/acpi: Prepare use of EFI runtime services Robert Richter
` (3 subsequent siblings)
12 siblings, 1 reply; 62+ messages in thread
From: Robert Richter @ 2025-09-12 14:45 UTC (permalink / raw)
To: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
Jonathan Cameron, Dave Jiang, Davidlohr Bueso
Cc: linux-cxl, linux-kernel, Gregory Price, Fabio M. De Francesco,
Terry Bowman, Joshua Hahn, Robert Richter
There is only support to translate addresses from an endpoint to its
parent port, but not in the opposite direction from the parent to the
endpoint. Thus, the endpoint address range cannot be determined and
setup manually for a given SPA range of a region. If the parent
implements the ->to_hpa() callback, address translation is
needed. Then, forbid reprogramming of the decoders and lock them.
Signed-off-by: Robert Richter <rrichter@amd.com>
---
drivers/cxl/core/region.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 9fb1e9508213..44ea59252ff0 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -3497,6 +3497,16 @@ static int setup_address_translation(struct cxl_endpoint_decoder *cxled,
return -ENXIO;
}
+ /*
+ * There is only support to translate from the endpoint to its
+ * parent port, but not in the opposite direction from the
+ * parent to the endpoint. Thus, the endpoint address range
+ * cannot be determined and setup manually. If the address range
+ * was translated and modified, forbid reprogramming of the
+ * decoders and lock them.
+ */
+ cxld->flags |= CXL_DECODER_F_LOCK;
+
ctx->hpa_range = range;
ctx->interleave_ways = ways;
ctx->interleave_granularity = gran;
--
2.39.5
^ permalink raw reply related [flat|nested] 62+ messages in thread
* [PATCH v3 10/11] cxl/acpi: Prepare use of EFI runtime services
2025-09-12 14:45 [PATCH v3 00/11] cxl: ACPI PRM Address Translation Support and AMD Zen5 enablement Robert Richter
` (8 preceding siblings ...)
2025-09-12 14:45 ` [PATCH v3 09/11] cxl/region: Lock decoders that need " Robert Richter
@ 2025-09-12 14:45 ` Robert Richter
2025-09-17 20:54 ` Gregory Price
2025-09-12 14:45 ` [PATCH v3 11/11] cxl: Enable AMD Zen5 address translation using ACPI PRMT Robert Richter
` (2 subsequent siblings)
12 siblings, 1 reply; 62+ messages in thread
From: Robert Richter @ 2025-09-12 14:45 UTC (permalink / raw)
To: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
Jonathan Cameron, Dave Jiang, Davidlohr Bueso
Cc: linux-cxl, linux-kernel, Gregory Price, Fabio M. De Francesco,
Terry Bowman, Joshua Hahn, Robert Richter
In order to use EFI runtime services, esp. ACPI PRM which uses the
efi_rts_wq workqueue, initialize EFI before CXL ACPI.
There is a subsys_initcall order dependency if driver is builtin:
subsys_initcall(cxl_acpi_init);
subsys_initcall(efisubsys_init);
Prevent the efi_rts_wq workqueue being used by cxl_acpi_init() before
its allocation. Use subsys_initcall_sync(cxl_acpi_init) to always run
efisubsys_init() first.
Reported-by: Gregory Price <gourry@gourry.net>
Tested-by: Joshua Hahn <joshua.hahnjy@gmail.com>
Reviewed-by: Joshua Hahn <joshua.hahnjy@gmail.com>
Signed-off-by: Robert Richter <rrichter@amd.com>
---
drivers/cxl/acpi.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
index 26c494704437..95a5ba395c1a 100644
--- a/drivers/cxl/acpi.c
+++ b/drivers/cxl/acpi.c
@@ -1012,8 +1012,12 @@ static void __exit cxl_acpi_exit(void)
cxl_bus_drain();
}
-/* load before dax_hmem sees 'Soft Reserved' CXL ranges */
-subsys_initcall(cxl_acpi_init);
+/*
+ * Load before dax_hmem sees 'Soft Reserved' CXL ranges. Use
+ * subsys_initcall_sync() since there is an order dependency with
+ * subsys_initcall(efisubsys_init), which must run first.
+ */
+subsys_initcall_sync(cxl_acpi_init);
/*
* Arrange for host-bridge ports to be active synchronous with
--
2.39.5
^ permalink raw reply related [flat|nested] 62+ messages in thread
* [PATCH v3 11/11] cxl: Enable AMD Zen5 address translation using ACPI PRMT
2025-09-12 14:45 [PATCH v3 00/11] cxl: ACPI PRM Address Translation Support and AMD Zen5 enablement Robert Richter
` (9 preceding siblings ...)
2025-09-12 14:45 ` [PATCH v3 10/11] cxl/acpi: Prepare use of EFI runtime services Robert Richter
@ 2025-09-12 14:45 ` Robert Richter
2025-09-12 23:46 ` Dave Jiang
` (3 more replies)
2025-09-12 15:45 ` [PATCH v3 00/11] cxl: ACPI PRM Address Translation Support and AMD Zen5 enablement Dave Jiang
2025-09-23 3:26 ` Gregory Price
12 siblings, 4 replies; 62+ messages in thread
From: Robert Richter @ 2025-09-12 14:45 UTC (permalink / raw)
To: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
Jonathan Cameron, Dave Jiang, Davidlohr Bueso
Cc: linux-cxl, linux-kernel, Gregory Price, Fabio M. De Francesco,
Terry Bowman, Joshua Hahn, Robert Richter
Add AMD Zen5 support for address translation.
Zen5 systems may be configured to use 'Normalized addresses'. Then,
CXL endpoints use their own physical address space and are programmed
passthrough (DPA == HPA), the number of interleaving ways for the
endpoint is set to one. The Host Physical Addresses (HPAs) need to be
translated from the endpoint to its CXL host bridge. The HPA of a CXL
host bridge is equivalent to the System Physical Address (SPA).
ACPI Platform Runtime Mechanism (PRM) is used to translate the CXL
Device Physical Address (DPA) to its System Physical Address. This is
documented in:
AMD Family 1Ah Models 00h–0Fh and Models 10h–1Fh
ACPI v6.5 Porting Guide, Publication # 58088
https://www.amd.com/en/search/documentation/hub.html
To implement AMD Zen5 address translation the following steps are
needed:
AMD Zen5 systems support the ACPI PRM CXL Address Translation firmware
call (Address Translation - CXL DPA to System Physical Address, see
ACPI v6.5 Porting Guide above) when address translation is enabled.
The existence of the callback can be identified using a specific GUID
as documented. The initialization code checks firmware and kernel
support of ACPI PRM.
Introduce a new file core/atl.c to handle ACPI PRM specific address
translation code. Naming is loosely related to the kernel's AMD
Address Translation Library (CONFIG_AMD_ATL) but implementation does
not dependent on it, nor it is vendor specific. Use Kbuild and Kconfig
options respectively to enable the code depending on architecture and
platform options.
Implement an ACPI PRM firmware call for CXL address translation in the
new function cxl_prm_to_hpa(). This includes sanity checks. Enable the
callback for applicable CXL host bridges using the new cxl_atl_init()
function.
Signed-off-by: Robert Richter <rrichter@amd.com>
---
drivers/cxl/Kconfig | 4 ++
drivers/cxl/core/Makefile | 1 +
drivers/cxl/core/atl.c | 138 ++++++++++++++++++++++++++++++++++++++
drivers/cxl/core/core.h | 1 +
drivers/cxl/core/port.c | 8 +++
5 files changed, 152 insertions(+)
create mode 100644 drivers/cxl/core/atl.c
diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
index 48b7314afdb8..31f9c96ef908 100644
--- a/drivers/cxl/Kconfig
+++ b/drivers/cxl/Kconfig
@@ -233,4 +233,8 @@ config CXL_MCE
def_bool y
depends on X86_MCE && MEMORY_FAILURE
+config CXL_ATL
+ def_bool y
+ depends on ACPI_PRMT
+
endif
diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile
index 5ad8fef210b5..11fe272a6e29 100644
--- a/drivers/cxl/core/Makefile
+++ b/drivers/cxl/core/Makefile
@@ -20,3 +20,4 @@ cxl_core-$(CONFIG_CXL_REGION) += region.o
cxl_core-$(CONFIG_CXL_MCE) += mce.o
cxl_core-$(CONFIG_CXL_FEATURES) += features.o
cxl_core-$(CONFIG_CXL_EDAC_MEM_FEATURES) += edac.o
+cxl_core-$(CONFIG_CXL_ATL) += atl.o
diff --git a/drivers/cxl/core/atl.c b/drivers/cxl/core/atl.c
new file mode 100644
index 000000000000..5fc21eddaade
--- /dev/null
+++ b/drivers/cxl/core/atl.c
@@ -0,0 +1,138 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2025 Advanced Micro Devices, Inc.
+ */
+
+#include <linux/prmt.h>
+#include <linux/pci.h>
+
+#include <cxlmem.h>
+#include "core.h"
+
+static bool check_prm_address_translation(struct cxl_port *port)
+{
+ /* Applies to CXL host bridges only */
+ return !is_cxl_root(port) && port->host_bridge &&
+ is_cxl_root(to_cxl_port(port->dev.parent));
+}
+
+/*
+ * PRM Address Translation - CXL DPA to System Physical Address
+ *
+ * Reference:
+ *
+ * AMD Family 1Ah Models 00h–0Fh and Models 10h–1Fh
+ * ACPI v6.5 Porting Guide, Publication # 58088
+ */
+
+static const guid_t prm_cxl_dpa_spa_guid =
+ GUID_INIT(0xee41b397, 0x25d4, 0x452c, 0xad, 0x54, 0x48, 0xc6, 0xe3,
+ 0x48, 0x0b, 0x94);
+
+struct prm_cxl_dpa_spa_data {
+ u64 dpa;
+ u8 reserved;
+ u8 devfn;
+ u8 bus;
+ u8 segment;
+ void *out;
+} __packed;
+
+static u64 prm_cxl_dpa_spa(struct pci_dev *pci_dev, u64 dpa)
+{
+ struct prm_cxl_dpa_spa_data data;
+ u64 spa;
+ int rc;
+
+ data = (struct prm_cxl_dpa_spa_data) {
+ .dpa = dpa,
+ .devfn = pci_dev->devfn,
+ .bus = pci_dev->bus->number,
+ .segment = pci_domain_nr(pci_dev->bus),
+ .out = &spa,
+ };
+
+ rc = acpi_call_prm_handler(prm_cxl_dpa_spa_guid, &data);
+ if (rc) {
+ pci_dbg(pci_dev, "failed to get SPA for %#llx: %d\n", dpa, rc);
+ return ULLONG_MAX;
+ }
+
+ pci_dbg(pci_dev, "PRM address translation: DPA -> SPA: %#llx -> %#llx\n", dpa, spa);
+
+ return spa;
+}
+
+static u64 cxl_prm_to_hpa(struct cxl_decoder *cxld, u64 hpa)
+{
+ struct cxl_memdev *cxlmd;
+ struct pci_dev *pci_dev;
+ struct cxl_port *port;
+ struct cxl_endpoint_decoder *cxled;
+
+ /* Only translate from endpoint to its parent port. */
+ if (!is_endpoint_decoder(&cxld->dev))
+ return hpa;
+
+ cxled = to_cxl_endpoint_decoder(&cxld->dev);
+
+ /*
+ * Nothing to do if base is non-zero and Normalized Addressing
+ * is disabled.
+ */
+ if (cxld->hpa_range.start != cxled->dpa_res->start)
+ return hpa;
+
+ /*
+ * Endpoints are programmed passthrough in Normalized
+ * Addressing mode.
+ */
+ if (cxld->interleave_ways != 1) {
+ dev_dbg(&cxld->dev, "unexpected interleaving config: ways: %d granularity: %d\n",
+ cxld->interleave_ways, cxld->interleave_granularity);
+ return ULLONG_MAX;
+ }
+
+ if (hpa < cxld->hpa_range.start || hpa > cxld->hpa_range.end) {
+ dev_dbg(&cxld->dev, "hpa addr %#llx out of range %#llx-%#llx\n",
+ hpa, cxld->hpa_range.start, cxld->hpa_range.end);
+ return ULLONG_MAX;
+ }
+
+ port = to_cxl_port(cxld->dev.parent);
+ cxlmd = port ? to_cxl_memdev(port->uport_dev) : NULL;
+ if (!port || !dev_is_pci(cxlmd->dev.parent)) {
+ dev_dbg(&cxld->dev, "No endpoint found: %s, range %#llx-%#llx\n",
+ dev_name(cxld->dev.parent), cxld->hpa_range.start,
+ cxld->hpa_range.end);
+ return ULLONG_MAX;
+ }
+ pci_dev = to_pci_dev(cxlmd->dev.parent);
+
+ return prm_cxl_dpa_spa(pci_dev, hpa);
+}
+
+static void cxl_prm_init(struct cxl_port *port)
+{
+ u64 spa;
+ struct prm_cxl_dpa_spa_data data = { .out = &spa, };
+ int rc;
+
+ if (!check_prm_address_translation(port))
+ return;
+
+ /* Check kernel (-EOPNOTSUPP) and firmware support (-ENODEV) */
+ rc = acpi_call_prm_handler(prm_cxl_dpa_spa_guid, &data);
+ if (rc == -EOPNOTSUPP || rc == -ENODEV)
+ return;
+
+ port->to_hpa = cxl_prm_to_hpa;
+
+ dev_dbg(port->host_bridge, "PRM address translation enabled for %s.\n",
+ dev_name(&port->dev));
+}
+
+void cxl_atl_init(struct cxl_port *port)
+{
+ cxl_prm_init(port);
+}
diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index eac8cc1bdaa0..624e438d052a 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -150,6 +150,7 @@ int cxl_port_get_switch_dport_bandwidth(struct cxl_port *port,
int cxl_ras_init(void);
void cxl_ras_exit(void);
int cxl_gpf_port_setup(struct cxl_dport *dport);
+void cxl_atl_init(struct cxl_port *port);
#ifdef CONFIG_CXL_FEATURES
struct cxl_feat_entry *
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 8f36ff413f5d..8007e002888e 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -831,6 +831,12 @@ static void cxl_debugfs_create_dport_dir(struct cxl_dport *dport)
&cxl_einj_inject_fops);
}
+static void setup_address_translation(struct cxl_port *port)
+{
+ if (IS_ENABLED(CONFIG_CXL_ATL))
+ cxl_atl_init(port);
+}
+
static int cxl_port_add(struct cxl_port *port,
resource_size_t component_reg_phys,
struct cxl_dport *parent_dport)
@@ -868,6 +874,8 @@ static int cxl_port_add(struct cxl_port *port,
return rc;
}
+ setup_address_translation(port);
+
rc = device_add(dev);
if (rc)
return rc;
--
2.39.5
^ permalink raw reply related [flat|nested] 62+ messages in thread
* Re: [PATCH v3 00/11] cxl: ACPI PRM Address Translation Support and AMD Zen5 enablement
2025-09-12 14:45 [PATCH v3 00/11] cxl: ACPI PRM Address Translation Support and AMD Zen5 enablement Robert Richter
` (10 preceding siblings ...)
2025-09-12 14:45 ` [PATCH v3 11/11] cxl: Enable AMD Zen5 address translation using ACPI PRMT Robert Richter
@ 2025-09-12 15:45 ` Dave Jiang
2025-09-15 8:42 ` Robert Richter
2025-09-23 3:26 ` Gregory Price
12 siblings, 1 reply; 62+ messages in thread
From: Dave Jiang @ 2025-09-12 15:45 UTC (permalink / raw)
To: Robert Richter, Alison Schofield, Vishal Verma, Ira Weiny,
Dan Williams, Jonathan Cameron, Davidlohr Bueso
Cc: linux-cxl, linux-kernel, Gregory Price, Fabio M. De Francesco,
Terry Bowman, Joshua Hahn
On 9/12/25 7:45 AM, Robert Richter wrote:
> This patch set adds support for address translation using ACPI PRM and
> enables this for AMD Zen5 platforms. This is another new appoach in
> response to earlier attempts to implement CXL address translation:
>
> * v1: [1] and the comments on it, esp. Dan's [2],
> * v2: [3] and comments on [4], esp. Dave's [5]
>
> This version 3 addresses the requests to reduce the number of patches
> to a minimum and also to remove platform specific implementations
> allowing the Documentation of CXL Address Translation Support in the
> Kernel's "Compute Express Link: Linux Conventions" document and an
> update of the CXL specification in the longterm. This patch submission
> will be the base for a documention patch that describes CXL Address
> Translation support accordingly. The documentation patch will be sent
> in the very next step.
>
> The CXL driver currently does not implement address translation which
> assumes the host physical addresses (HPA) and system physical
> addresses (SPA) are equal.
>
> Systems with different HPA and SPA addresses need address translation.
> If this is the case, the hardware addresses esp. used in the HDM
> decoder configurations are different to the system's or parent port
> address ranges. E.g. AMD Zen5 systems may be configured to use
> 'Normalized addresses'. Then, CXL endpoints have their own physical
> address base which is not the same as the SPA used by the CXL host
> bridge. Thus, addresses need to be translated from the endpoint's to
> its CXL host bridge's address range.
>
> To enable address translation, the endpoint's HPA range must be
> translated to the CXL host bridge's address range. A callback is
> introduced to translate a decoder's HPA to the next parent port's
> address range. This allows the enablement of address translation for
> individual ports as needed. The callback is then used to determine the
> region parameters which includes the SPA translated address range of
> the endpoint decoder and the interleaving configuration. This is
> stored in struct cxl_region which allows an endpoint decoder to
> determine that parameters based on its assigned region.
>
> Note that only auto-discovery of decoders is supported. Thus, decoders
> are locked and cannot be configured manually.
Hi Robert, thanks for reworking this.
What happens with the manual configured path if only auto-discovery is supported? Things don't work? It works with no translation needed? Platform will lock all decoders and not allow manual configuration for CXL devices?
DJ
>
> Finally, Zen5 address translation is enabled using ACPI PRMT.
>
> This series bases on cxl/next.
>
> V3:
> * rebased onto cxl/next,
> * complete rework to reduce number of required changes/patches and to
> remove platform specific code (Dan and Dave),
> * changed implementation allowing to add address translation to the
> CXL specification (documention patch in preparation),
> * simplified and generalized determination of interleaving
> parameters using the address translation callback,
> * depend only on the existence of the ACPI PRM GUID for CXL Address
> Translation enablement, removed platform checks,
> * small changes to region code only which does not require a full
> rework and refactoring of the code, just separating region
> parameter setup and region construction,
> * moved code to new core/atl.c file,
> * fixed subsys_initcall order dependency of EFI runtime services
> (Gregory and Joshua),
>
> V2:
> * rebased onto cxl/next,
> * split of v1 in two parts:
> * removed cleanups and updates from this series to post them as a
> separate series (Dave),
> * this part 2 applies on top of part 1, v3,
> * added tags to SOB chain,
> * reworked architecture, vendor and platform setup (Jonathan):
> * added patch "cxl/x86: Prepare for architectural platform setup",
> * added function arch_cxl_port_platform_setup() plus a __weak
> versions for archs other than x86,
> * moved code to core/x86,
> * added comment to cxl_to_hpa_fn (Ben),
> * updated year in copyright statement (Ben),
> * cxl_port_calc_hpa(): Removed HPA check for zero (Jonathan), return
> 1 if modified,
> * cxl_port_calc_pos(): Updated description and wording (Ben),
> * added sereral patches around interleaving and SPA calculation in
> cxl_endpoint_decoder_initialize(),
> * reworked iterator in cxl_endpoint_decoder_initialize() (Gregory),
> * fixed region interleaving parameters() (Alison),
> * fixed check in cxl_region_attach() (Alison),
> * Clarified in coverletter that not all ports in a system must
> implement the to_hpa() callback (Terry).
>
> [1] https://lore.kernel.org/linux-cxl/20240701174754.967954-1-rrichter@amd.com/
> [2] https://lore.kernel.org/linux-cxl/669086821f136_5fffa29473@dwillia2-xfh.jf.intel.com.notmuch/
> [3] https://patchwork.kernel.org/project/cxl/cover/20250218132356.1809075-1-rrichter@amd.com/
> [4] https://patchwork.kernel.org/project/cxl/cover/20250715191143.1023512-1-rrichter@amd.com/
> [5] https://lore.kernel.org/all/78284b12-3e0b-4758-af18-397f32136c3f@intel.com/
>
> Robert Richter (11):
> cxl/region: Store root decoder in struct cxl_region
> cxl/region: Store HPA range in struct cxl_region
> cxl/region: Rename misleading variable name @hpa to @range
> cxl/region: Add @range argument to function cxl_find_root_decoder()
> cxl/region: Add @range argument to function cxl_calc_interleave_pos()
> cxl/region: Separate region parameter setup and region construction
> cxl: Introduce callback to translate a decoder's HPA to the next
> parent port
> cxl/region: Implement endpoint decoder address translation
> cxl/region: Lock decoders that need address translation
> cxl/acpi: Prepare use of EFI runtime services
> cxl: Enable AMD Zen5 address translation using ACPI PRMT
>
> drivers/cxl/Kconfig | 4 +
> drivers/cxl/acpi.c | 8 +-
> drivers/cxl/core/Makefile | 1 +
> drivers/cxl/core/atl.c | 138 ++++++++++++++++++++++
> drivers/cxl/core/core.h | 1 +
> drivers/cxl/core/port.c | 8 ++
> drivers/cxl/core/region.c | 239 ++++++++++++++++++++++++++++++--------
> drivers/cxl/cxl.h | 17 +++
> 8 files changed, 368 insertions(+), 48 deletions(-)
> create mode 100644 drivers/cxl/core/atl.c
>
>
> base-commit: 561c4e30bff93b3c33e694a459f8580f8a6b3c8c
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v3 01/11] cxl/region: Store root decoder in struct cxl_region
2025-09-12 14:45 ` [PATCH v3 01/11] cxl/region: Store root decoder in struct cxl_region Robert Richter
@ 2025-09-12 15:52 ` Dave Jiang
2025-09-15 10:14 ` Jonathan Cameron
2025-09-17 19:56 ` Gregory Price
2025-09-23 21:40 ` Alison Schofield
2 siblings, 1 reply; 62+ messages in thread
From: Dave Jiang @ 2025-09-12 15:52 UTC (permalink / raw)
To: Robert Richter, Alison Schofield, Vishal Verma, Ira Weiny,
Dan Williams, Jonathan Cameron, Davidlohr Bueso
Cc: linux-cxl, linux-kernel, Gregory Price, Fabio M. De Francesco,
Terry Bowman, Joshua Hahn
On 9/12/25 7:45 AM, Robert Richter wrote:
> A region is always bound to a root decoder. The region's associated
> root decoder is often needed. Add it to struct cxl_region.
>
> This simplifies code by removing dynamic lookups and removing the root
> decoder argument from the function argument list where possible.
>
> Patch is a prerequisite to implement address translation which uses
> struct cxl_region to store all relevant region and interleaving
> parameters.
>
> Signed-off-by: Robert Richter <rrichter@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> ---
> drivers/cxl/core/region.c | 37 +++++++++++++++++++------------------
> drivers/cxl/cxl.h | 2 ++
> 2 files changed, 21 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 29d3809ab2bb..2c37c060d983 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -489,9 +489,9 @@ static ssize_t interleave_ways_store(struct device *dev,
> struct device_attribute *attr,
> const char *buf, size_t len)
> {
> - struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev->parent);
> - struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld;
> struct cxl_region *cxlr = to_cxl_region(dev);
> + struct cxl_root_decoder *cxlrd = cxlr->cxlrd;
> + struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld;
> struct cxl_region_params *p = &cxlr->params;
> unsigned int val, save;
> int rc;
> @@ -552,9 +552,9 @@ static ssize_t interleave_granularity_store(struct device *dev,
> struct device_attribute *attr,
> const char *buf, size_t len)
> {
> - struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev->parent);
> - struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld;
> struct cxl_region *cxlr = to_cxl_region(dev);
> + struct cxl_root_decoder *cxlrd = cxlr->cxlrd;
> + struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld;
> struct cxl_region_params *p = &cxlr->params;
> int rc, val;
> u16 ig;
> @@ -628,7 +628,7 @@ static DEVICE_ATTR_RO(mode);
>
> static int alloc_hpa(struct cxl_region *cxlr, resource_size_t size)
> {
> - struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
> + struct cxl_root_decoder *cxlrd = cxlr->cxlrd;
> struct cxl_region_params *p = &cxlr->params;
> struct resource *res;
> u64 remainder = 0;
> @@ -1321,7 +1321,7 @@ static int cxl_port_setup_targets(struct cxl_port *port,
> struct cxl_region *cxlr,
> struct cxl_endpoint_decoder *cxled)
> {
> - struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
> + struct cxl_root_decoder *cxlrd = cxlr->cxlrd;
> int parent_iw, parent_ig, ig, iw, rc, inc = 0, pos = cxled->pos;
> struct cxl_port *parent_port = to_cxl_port(port->dev.parent);
> struct cxl_region_ref *cxl_rr = cxl_rr_load(port, cxlr);
> @@ -1678,10 +1678,10 @@ static int cxl_region_validate_position(struct cxl_region *cxlr,
> }
>
> static int cxl_region_attach_position(struct cxl_region *cxlr,
> - struct cxl_root_decoder *cxlrd,
> struct cxl_endpoint_decoder *cxled,
> const struct cxl_dport *dport, int pos)
> {
> + struct cxl_root_decoder *cxlrd = cxlr->cxlrd;
> struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> struct cxl_switch_decoder *cxlsd = &cxlrd->cxlsd;
> struct cxl_decoder *cxld = &cxlsd->cxld;
> @@ -1918,7 +1918,7 @@ static int cxl_region_sort_targets(struct cxl_region *cxlr)
> static int cxl_region_attach(struct cxl_region *cxlr,
> struct cxl_endpoint_decoder *cxled, int pos)
> {
> - struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
> + struct cxl_root_decoder *cxlrd = cxlr->cxlrd;
> struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> struct cxl_dev_state *cxlds = cxlmd->cxlds;
> struct cxl_region_params *p = &cxlr->params;
> @@ -2023,8 +2023,7 @@ static int cxl_region_attach(struct cxl_region *cxlr,
> ep_port = cxled_to_port(cxled);
> dport = cxl_find_dport_by_dev(root_port,
> ep_port->host_bridge);
> - rc = cxl_region_attach_position(cxlr, cxlrd, cxled,
> - dport, i);
> + rc = cxl_region_attach_position(cxlr, cxled, dport, i);
> if (rc)
> return rc;
> }
> @@ -2047,7 +2046,7 @@ static int cxl_region_attach(struct cxl_region *cxlr,
> if (rc)
> return rc;
>
> - rc = cxl_region_attach_position(cxlr, cxlrd, cxled, dport, pos);
> + rc = cxl_region_attach_position(cxlr, cxled, dport, pos);
> if (rc)
> return rc;
>
> @@ -2343,8 +2342,8 @@ static const struct attribute_group *region_groups[] = {
>
> static void cxl_region_release(struct device *dev)
> {
> - struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev->parent);
> struct cxl_region *cxlr = to_cxl_region(dev);
> + struct cxl_root_decoder *cxlrd = cxlr->cxlrd;
> int id = atomic_read(&cxlrd->region_id);
>
> /*
> @@ -2427,10 +2426,12 @@ static struct cxl_region *cxl_region_alloc(struct cxl_root_decoder *cxlrd, int i
> * region id allocations
> */
> get_device(dev->parent);
> + cxlr->cxlrd = cxlrd;
> + cxlr->id = id;
> +
> device_set_pm_not_required(dev);
> dev->bus = &cxl_bus_type;
> dev->type = &cxl_region_type;
> - cxlr->id = id;
>
> return cxlr;
> }
> @@ -2931,7 +2932,7 @@ static bool has_spa_to_hpa(struct cxl_root_decoder *cxlrd)
> u64 cxl_dpa_to_hpa(struct cxl_region *cxlr, const struct cxl_memdev *cxlmd,
> u64 dpa)
> {
> - struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
> + struct cxl_root_decoder *cxlrd = cxlr->cxlrd;
> u64 dpa_offset, hpa_offset, bits_upper, mask_upper, hpa;
> struct cxl_region_params *p = &cxlr->params;
> struct cxl_endpoint_decoder *cxled = NULL;
> @@ -3007,7 +3008,7 @@ static int region_offset_to_dpa_result(struct cxl_region *cxlr, u64 offset,
> struct dpa_result *result)
> {
> struct cxl_region_params *p = &cxlr->params;
> - struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
> + struct cxl_root_decoder *cxlrd = cxlr->cxlrd;
> struct cxl_endpoint_decoder *cxled;
> u64 hpa, hpa_offset, dpa_offset;
> u64 bits_upper, bits_lower;
> @@ -3401,7 +3402,7 @@ static int match_region_by_range(struct device *dev, const void *data)
> static int cxl_extended_linear_cache_resize(struct cxl_region *cxlr,
> struct resource *res)
> {
> - struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
> + struct cxl_root_decoder *cxlrd = cxlr->cxlrd;
> struct cxl_region_params *p = &cxlr->params;
> resource_size_t size = resource_size(res);
> resource_size_t cache_size, start;
> @@ -3437,9 +3438,9 @@ static int cxl_extended_linear_cache_resize(struct cxl_region *cxlr,
> }
>
> static int __construct_region(struct cxl_region *cxlr,
> - struct cxl_root_decoder *cxlrd,
> struct cxl_endpoint_decoder *cxled)
> {
> + struct cxl_root_decoder *cxlrd = cxlr->cxlrd;
> struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> struct range *hpa = &cxled->cxld.hpa_range;
> struct cxl_region_params *p;
> @@ -3531,7 +3532,7 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
> return cxlr;
> }
>
> - rc = __construct_region(cxlr, cxlrd, cxled);
> + rc = __construct_region(cxlr, cxled);
> if (rc) {
> devm_release_action(port->uport_dev, unregister_region, cxlr);
> return ERR_PTR(rc);
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 4fe3df06f57a..350ccd6949b3 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -517,6 +517,7 @@ enum cxl_partition_mode {
> * struct cxl_region - CXL region
> * @dev: This region's device
> * @id: This region's id. Id is globally unique across all regions
> + * @cxlrd: Region's root decoder
> * @mode: Operational mode of the mapped capacity
> * @type: Endpoint decoder target type
> * @cxl_nvb: nvdimm bridge for coordinating @cxlr_pmem setup / shutdown
> @@ -530,6 +531,7 @@ enum cxl_partition_mode {
> struct cxl_region {
> struct device dev;
> int id;
> + struct cxl_root_decoder *cxlrd;
> enum cxl_partition_mode mode;
> enum cxl_decoder_type type;
> struct cxl_nvdimm_bridge *cxl_nvb;
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v3 02/11] cxl/region: Store HPA range in struct cxl_region
2025-09-12 14:45 ` [PATCH v3 02/11] cxl/region: Store HPA range " Robert Richter
@ 2025-09-12 17:17 ` Dave Jiang
2025-09-15 7:19 ` Robert Richter
2025-09-15 10:23 ` Jonathan Cameron
2025-09-17 19:58 ` Gregory Price
2 siblings, 1 reply; 62+ messages in thread
From: Dave Jiang @ 2025-09-12 17:17 UTC (permalink / raw)
To: Robert Richter, Alison Schofield, Vishal Verma, Ira Weiny,
Dan Williams, Jonathan Cameron, Davidlohr Bueso
Cc: linux-cxl, linux-kernel, Gregory Price, Fabio M. De Francesco,
Terry Bowman, Joshua Hahn
On 9/12/25 7:45 AM, Robert Richter wrote:
> Each region has a known host physical address (HPA) range it is
> assigned to. Endpoint decoders assigned to a region share the same HPA
> range. The region's address range is the system's physical address
> (SPA) range.
>
> Endpoint decoders in systems that need address translation use HPAs
> which are not SPAs. To make the SPA range accessible to the endpoint
> decoders, store and track the region's SPA range in struct cxl_region.
> Introduce the @hpa_range member to the struct. Now, the SPA range of
> an endpoint decoder can be determined based on its assigned region.
>
> Patch is a prerequisite to implement address translation which uses
> struct cxl_region to store all relevant region and interleaving
> parameters.
>
> Signed-off-by: Robert Richter <rrichter@amd.com>
Just a nit below. Otherwise looks ok
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> ---
> drivers/cxl/core/region.c | 17 +++++++++++++++++
> drivers/cxl/cxl.h | 2 ++
> 2 files changed, 19 insertions(+)
>
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 2c37c060d983..777d04870180 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -664,6 +664,11 @@ static int alloc_hpa(struct cxl_region *cxlr, resource_size_t size)
> return PTR_ERR(res);
> }
>
> + cxlr->hpa_range = (struct range) {
> + .start = res->start,
> + .end = res->end,
> + };
> +
> p->res = res;
> p->state = CXL_CONFIG_INTERLEAVE_ACTIVE;
>
> @@ -700,8 +705,14 @@ static int free_hpa(struct cxl_region *cxlr)
> if (p->state >= CXL_CONFIG_ACTIVE)
> return -EBUSY;
>
> + cxlr->hpa_range = (struct range) {
> + .start = 0,
> + .end = -1,
> + };
> +
> cxl_region_iomem_release(cxlr);
> p->state = CXL_CONFIG_IDLE;
> +
stray blank line
> return 0;
> }
>
> @@ -2400,6 +2411,11 @@ static void unregister_region(void *_cxlr)
> for (i = 0; i < p->interleave_ways; i++)
> detach_target(cxlr, i);
>
> + cxlr->hpa_range = (struct range) {
> + .start = 0,
> + .end = -1,
> + };
> +
> cxl_region_iomem_release(cxlr);
> put_device(&cxlr->dev);
> }
> @@ -3458,6 +3474,7 @@ static int __construct_region(struct cxl_region *cxlr,
> }
>
> set_bit(CXL_REGION_F_AUTO, &cxlr->flags);
> + cxlr->hpa_range = *hpa;
>
> res = kmalloc(sizeof(*res), GFP_KERNEL);
> if (!res)
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 350ccd6949b3..f182982f1c14 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -518,6 +518,7 @@ enum cxl_partition_mode {
> * @dev: This region's device
> * @id: This region's id. Id is globally unique across all regions
> * @cxlrd: Region's root decoder
> + * @hpa_range: Address range occupied by the region
> * @mode: Operational mode of the mapped capacity
> * @type: Endpoint decoder target type
> * @cxl_nvb: nvdimm bridge for coordinating @cxlr_pmem setup / shutdown
> @@ -532,6 +533,7 @@ struct cxl_region {
> struct device dev;
> int id;
> struct cxl_root_decoder *cxlrd;
> + struct range hpa_range;
> enum cxl_partition_mode mode;
> enum cxl_decoder_type type;
> struct cxl_nvdimm_bridge *cxl_nvb;
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v3 03/11] cxl/region: Rename misleading variable name @hpa to @range
2025-09-12 14:45 ` [PATCH v3 03/11] cxl/region: Rename misleading variable name @hpa to @range Robert Richter
@ 2025-09-12 17:33 ` Dave Jiang
2025-09-15 7:27 ` Robert Richter
2025-09-17 20:01 ` Gregory Price
1 sibling, 1 reply; 62+ messages in thread
From: Dave Jiang @ 2025-09-12 17:33 UTC (permalink / raw)
To: Robert Richter, Alison Schofield, Vishal Verma, Ira Weiny,
Dan Williams, Jonathan Cameron, Davidlohr Bueso
Cc: linux-cxl, linux-kernel, Gregory Price, Fabio M. De Francesco,
Terry Bowman, Joshua Hahn
On 9/12/25 7:45 AM, Robert Richter wrote:
> @hpa is actually a @range, rename variables accordingly.
it's a range of HPA right? May as well call it 'hpa_range' to distinguish from 'dpa_range' or 'spa_range'
>
> Signed-off-by: Robert Richter <rrichter@amd.com>
> ---
> drivers/cxl/core/region.c | 29 +++++++++++++++--------------
> 1 file changed, 15 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 777d04870180..13113920aba7 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -3367,9 +3367,9 @@ static int match_decoder_by_range(struct device *dev, const void *data)
> }
>
> static struct cxl_decoder *
> -cxl_port_find_switch_decoder(struct cxl_port *port, struct range *hpa)
> +cxl_port_find_switch_decoder(struct cxl_port *port, struct range *range)
> {
> - struct device *cxld_dev = device_find_child(&port->dev, hpa,
> + struct device *cxld_dev = device_find_child(&port->dev, range,
> match_decoder_by_range);
>
> return cxld_dev ? to_cxl_decoder(cxld_dev) : NULL;
> @@ -3382,14 +3382,14 @@ cxl_find_root_decoder(struct cxl_endpoint_decoder *cxled)
> struct cxl_port *port = cxled_to_port(cxled);
> struct cxl_root *cxl_root __free(put_cxl_root) = find_cxl_root(port);
> struct cxl_decoder *root, *cxld = &cxled->cxld;
> - struct range *hpa = &cxld->hpa_range;
> + struct range *range = &cxld->hpa_range;
>
> - root = cxl_port_find_switch_decoder(&cxl_root->port, hpa);
> + root = cxl_port_find_switch_decoder(&cxl_root->port, range);
> if (!root) {
> dev_err(cxlmd->dev.parent,
> "%s:%s no CXL window for range %#llx:%#llx\n",
> dev_name(&cxlmd->dev), dev_name(&cxld->dev),
> - cxld->hpa_range.start, cxld->hpa_range.end);
> + range->start, range->end);
> return NULL;
> }
>
> @@ -3458,7 +3458,7 @@ static int __construct_region(struct cxl_region *cxlr,
> {
> struct cxl_root_decoder *cxlrd = cxlr->cxlrd;
> struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> - struct range *hpa = &cxled->cxld.hpa_range;
> + struct range *range = &cxled->cxld.hpa_range;
> struct cxl_region_params *p;
> struct resource *res;
> int rc;
> @@ -3474,13 +3474,13 @@ static int __construct_region(struct cxl_region *cxlr,
> }
>
> set_bit(CXL_REGION_F_AUTO, &cxlr->flags);
> - cxlr->hpa_range = *hpa;
> + cxlr->hpa_range = *range;
>
> res = kmalloc(sizeof(*res), GFP_KERNEL);
> if (!res)
> return -ENOMEM;
>
> - *res = DEFINE_RES_MEM_NAMED(hpa->start, range_len(hpa),
> + *res = DEFINE_RES_MEM_NAMED(range->start, range_len(range),
> dev_name(&cxlr->dev));
>
> rc = cxl_extended_linear_cache_resize(cxlr, res);
> @@ -3559,11 +3559,11 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
> }
>
> static struct cxl_region *
> -cxl_find_region_by_range(struct cxl_root_decoder *cxlrd, struct range *hpa)
> +cxl_find_region_by_range(struct cxl_root_decoder *cxlrd, struct range *range)
> {
> struct device *region_dev;
>
> - region_dev = device_find_child(&cxlrd->cxlsd.cxld.dev, hpa,
> + region_dev = device_find_child(&cxlrd->cxlsd.cxld.dev, range,
> match_region_by_range);
> if (!region_dev)
> return NULL;
> @@ -3573,7 +3573,7 @@ cxl_find_region_by_range(struct cxl_root_decoder *cxlrd, struct range *hpa)
>
> int cxl_add_to_region(struct cxl_endpoint_decoder *cxled)
> {
> - struct range *hpa = &cxled->cxld.hpa_range;
> + struct range *range = &cxled->cxld.hpa_range;
> struct cxl_region_params *p;
> bool attach = false;
> int rc;
> @@ -3584,12 +3584,13 @@ int cxl_add_to_region(struct cxl_endpoint_decoder *cxled)
> return -ENXIO;
>
> /*
> - * Ensure that if multiple threads race to construct_region() for @hpa
> - * one does the construction and the others add to that.
> + * Ensure that if multiple threads race to construct_region()
> + * for the HPA range one does the construction and the others
> + * add to that.
> */
> mutex_lock(&cxlrd->range_lock);
> struct cxl_region *cxlr __free(put_cxl_region) =
> - cxl_find_region_by_range(cxlrd, hpa);
> + cxl_find_region_by_range(cxlrd, range);
> if (!cxlr)
> cxlr = construct_region(cxlrd, cxled);
> mutex_unlock(&cxlrd->range_lock);
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v3 06/11] cxl/region: Separate region parameter setup and region construction
2025-09-12 14:45 ` [PATCH v3 06/11] cxl/region: Separate region parameter setup and region construction Robert Richter
@ 2025-09-12 21:10 ` Dave Jiang
2025-09-15 7:31 ` Robert Richter
2025-09-17 20:15 ` Gregory Price
1 sibling, 1 reply; 62+ messages in thread
From: Dave Jiang @ 2025-09-12 21:10 UTC (permalink / raw)
To: Robert Richter, Alison Schofield, Vishal Verma, Ira Weiny,
Dan Williams, Jonathan Cameron, Davidlohr Bueso
Cc: linux-cxl, linux-kernel, Gregory Price, Fabio M. De Francesco,
Terry Bowman, Joshua Hahn
On 9/12/25 7:45 AM, Robert Richter wrote:
> To construct a region, the region parameters such as address range and
> interleaving config need to be determined. This is done while
> constructing the region by inspecting the endpoint decoder
> configuration. The endpoint decoder is passed as a function argument.
>
> With address translation the endpoint decoder data is no longer
> sufficient to extract the region parameters as some of the information
> is obtained using other methods such as using firmware calls.
>
> In a first step, separate code to determine and setup the region
> parameters from the region construction. Temporarily store all the
> data to create the region in the new struct cxl_region_context. Add a
> new function setup_region_parameters() to fill that struct and later
> use it to construct the region. This simplifies the extension of the
> function to support other methods needed, esp. to support address
> translation.
>
> Patch is a prerequisite to implement address translation.
>
> Signed-off-by: Robert Richter <rrichter@amd.com>
> ---
> drivers/cxl/core/region.c | 50 +++++++++++++++++++++++++++++----------
> 1 file changed, 38 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 106692f1e310..57697504410b 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -3414,6 +3414,26 @@ static int match_region_by_range(struct device *dev, const void *data)
> return 0;
> }
>
> +struct cxl_region_context {
> + struct cxl_endpoint_decoder *cxled;
> + struct cxl_memdev *cxlmd;
> + struct range hpa_range;
> + int interleave_ways;
> + int interleave_granularity;
> +};
> +
> +static int setup_region_params(struct cxl_endpoint_decoder *cxled,
> + struct cxl_region_context *ctx)
> +{
> + ctx->cxled = cxled;
> + ctx->cxlmd = cxled_to_memdev(cxled);
> + ctx->hpa_range = cxled->cxld.hpa_range;
> + ctx->interleave_ways = cxled->cxld.interleave_ways;
> + ctx->interleave_granularity = cxled->cxld.interleave_granularity;
You can init like this:
*ctx = (struct cxl_region_context) {
.cxled = cxled,
.cxlmd = cxled_to_memdev(cxled),
.hpa_range = cxled->cxld.hpa_range,
.interleave_ways = cxled->cxld.interleave_ways,
.interleave_granularity = cxled->cxld.interleave_granularity,
};
> +
> + return 0;
Can probably make this function void if no expected errors and only assignments.
DJ
> +}
> +
> static int cxl_extended_linear_cache_resize(struct cxl_region *cxlr,
> struct resource *res)
> {
> @@ -3453,11 +3473,12 @@ static int cxl_extended_linear_cache_resize(struct cxl_region *cxlr,
> }
>
> static int __construct_region(struct cxl_region *cxlr,
> - struct cxl_endpoint_decoder *cxled)
> + struct cxl_region_context *ctx)
> {
> + struct cxl_endpoint_decoder *cxled = ctx->cxled;
> struct cxl_root_decoder *cxlrd = cxlr->cxlrd;
> - struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> - struct range *range = &cxled->cxld.hpa_range;
> + struct cxl_memdev *cxlmd = ctx->cxlmd;
> + struct range *range = &ctx->hpa_range;
> struct cxl_region_params *p;
> struct resource *res;
> int rc;
> @@ -3506,8 +3527,8 @@ static int __construct_region(struct cxl_region *cxlr,
> }
>
> p->res = res;
> - p->interleave_ways = cxled->cxld.interleave_ways;
> - p->interleave_granularity = cxled->cxld.interleave_granularity;
> + p->interleave_ways = ctx->interleave_ways;
> + p->interleave_granularity = ctx->interleave_granularity;
> p->state = CXL_CONFIG_INTERLEAVE_ACTIVE;
>
> rc = sysfs_update_group(&cxlr->dev.kobj, get_cxl_region_target_group());
> @@ -3527,9 +3548,10 @@ static int __construct_region(struct cxl_region *cxlr,
>
> /* Establish an empty region covering the given HPA range */
> static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
> - struct cxl_endpoint_decoder *cxled)
> + struct cxl_region_context *ctx)
> {
> - struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> + struct cxl_endpoint_decoder *cxled = ctx->cxled;
> + struct cxl_memdev *cxlmd = ctx->cxlmd;
> struct cxl_port *port = cxlrd_to_port(cxlrd);
> struct cxl_dev_state *cxlds = cxlmd->cxlds;
> int rc, part = READ_ONCE(cxled->part);
> @@ -3548,7 +3570,7 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
> return cxlr;
> }
>
> - rc = __construct_region(cxlr, cxled);
> + rc = __construct_region(cxlr, ctx);
> if (rc) {
> devm_release_action(port->uport_dev, unregister_region, cxlr);
> return ERR_PTR(rc);
> @@ -3572,13 +3594,17 @@ cxl_find_region_by_range(struct cxl_root_decoder *cxlrd, struct range *range)
>
> int cxl_add_to_region(struct cxl_endpoint_decoder *cxled)
> {
> - struct range *range = &cxled->cxld.hpa_range;
> + struct cxl_region_context ctx;
> struct cxl_region_params *p;
> bool attach = false;
> int rc;
>
> + rc = setup_region_params(cxled, &ctx);
> + if (rc)
> + return rc;
> +
> struct cxl_root_decoder *cxlrd __free(put_cxl_root_decoder) =
> - cxl_find_root_decoder(cxled, range);
> + cxl_find_root_decoder(cxled, &ctx.hpa_range);
> if (!cxlrd)
> return -ENXIO;
>
> @@ -3589,9 +3615,9 @@ int cxl_add_to_region(struct cxl_endpoint_decoder *cxled)
> */
> mutex_lock(&cxlrd->range_lock);
> struct cxl_region *cxlr __free(put_cxl_region) =
> - cxl_find_region_by_range(cxlrd, range);
> + cxl_find_region_by_range(cxlrd, &ctx.hpa_range);
> if (!cxlr)
> - cxlr = construct_region(cxlrd, cxled);
> + cxlr = construct_region(cxlrd, &ctx);
> mutex_unlock(&cxlrd->range_lock);
>
> rc = PTR_ERR_OR_ZERO(cxlr);
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v3 07/11] cxl: Introduce callback to translate a decoder's HPA to the next parent port
2025-09-12 14:45 ` [PATCH v3 07/11] cxl: Introduce callback to translate a decoder's HPA to the next parent port Robert Richter
@ 2025-09-12 21:21 ` Dave Jiang
2025-09-15 7:55 ` Robert Richter
2025-09-15 20:22 ` Dave Jiang
1 sibling, 1 reply; 62+ messages in thread
From: Dave Jiang @ 2025-09-12 21:21 UTC (permalink / raw)
To: Robert Richter, Alison Schofield, Vishal Verma, Ira Weiny,
Dan Williams, Jonathan Cameron, Davidlohr Bueso
Cc: linux-cxl, linux-kernel, Gregory Price, Fabio M. De Francesco,
Terry Bowman, Joshua Hahn
On 9/12/25 7:45 AM, Robert Richter wrote:
> To enable address translation, the endpoint decoder's HPA range must
> be translated when crossing memory domains to the next parent port's
> address ranges up to the root port. The root port's HPA range is
> equivalent to the system's SPA range.
>
> Introduce a callback to translate an address of the decoder's HPA
> range to the address range of the parent port. The callback can be set
> for ports that need to handle address translation.
>
> Reviewed-by: Gregory Price <gourry@gourry.net>
> Signed-off-by: Robert Richter <rrichter@amd.com>
> ---
> drivers/cxl/cxl.h | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index f182982f1c14..eb837867d932 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -429,6 +429,17 @@ struct cxl_rd_ops {
> u64 (*spa_to_hpa)(struct cxl_root_decoder *cxlrd, u64 spa);
> };
>
> +/**
> + * cxl_to_hpa_fn - type of a callback function to translate an HPA
> + * @cxld: cxl_decoder to translate from
> + * @hpa: HPA of the @cxld decoder's address range
> + *
> + * The callback translates a decoder's HPA to the address range of the
> + * decoder's parent port. The return value is the translated HPA on
> + * success or ULLONG_MAX otherwise.
> + */
> +typedef u64 (*cxl_to_hpa_fn)(struct cxl_decoder *cxld, u64 hpa);
cxl_to_parent_hpa_fn()?
DJ
> +
> /**
> * struct cxl_root_decoder - Static platform CXL address decoder
> * @res: host / parent resource for region allocations
> @@ -599,6 +610,7 @@ struct cxl_dax_region {
> * @parent_dport: dport that points to this port in the parent
> * @decoder_ida: allocator for decoder ids
> * @reg_map: component and ras register mapping parameters
> + * @to_hpa: Callback to translate a child port's decoder address to the port's HPA address range
> * @nr_dports: number of entries in @dports
> * @hdm_end: track last allocated HDM decoder instance for allocation ordering
> * @commit_end: cursor to track highest committed decoder for commit ordering
> @@ -619,6 +631,7 @@ struct cxl_port {
> struct cxl_dport *parent_dport;
> struct ida decoder_ida;
> struct cxl_register_map reg_map;
> + cxl_to_hpa_fn to_hpa;
> int nr_dports;
> int hdm_end;
> int commit_end;
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v3 11/11] cxl: Enable AMD Zen5 address translation using ACPI PRMT
2025-09-12 14:45 ` [PATCH v3 11/11] cxl: Enable AMD Zen5 address translation using ACPI PRMT Robert Richter
@ 2025-09-12 23:46 ` Dave Jiang
2025-09-15 8:34 ` Robert Richter
2025-09-15 10:59 ` Jonathan Cameron
` (2 subsequent siblings)
3 siblings, 1 reply; 62+ messages in thread
From: Dave Jiang @ 2025-09-12 23:46 UTC (permalink / raw)
To: Robert Richter, Alison Schofield, Vishal Verma, Ira Weiny,
Dan Williams, Jonathan Cameron, Davidlohr Bueso
Cc: linux-cxl, linux-kernel, Gregory Price, Fabio M. De Francesco,
Terry Bowman, Joshua Hahn
On 9/12/25 7:45 AM, Robert Richter wrote:
> Add AMD Zen5 support for address translation.
>
> Zen5 systems may be configured to use 'Normalized addresses'. Then,
> CXL endpoints use their own physical address space and are programmed
> passthrough (DPA == HPA), the number of interleaving ways for the
> endpoint is set to one. The Host Physical Addresses (HPAs) need to be
> translated from the endpoint to its CXL host bridge. The HPA of a CXL
> host bridge is equivalent to the System Physical Address (SPA).
>
> ACPI Platform Runtime Mechanism (PRM) is used to translate the CXL
> Device Physical Address (DPA) to its System Physical Address. This is
> documented in:
>
> AMD Family 1Ah Models 00h–0Fh and Models 10h–1Fh
> ACPI v6.5 Porting Guide, Publication # 58088
> https://www.amd.com/en/search/documentation/hub.html
>
> To implement AMD Zen5 address translation the following steps are
> needed:
>
> AMD Zen5 systems support the ACPI PRM CXL Address Translation firmware
> call (Address Translation - CXL DPA to System Physical Address, see
> ACPI v6.5 Porting Guide above) when address translation is enabled.
> The existence of the callback can be identified using a specific GUID
> as documented. The initialization code checks firmware and kernel
> support of ACPI PRM.
>
> Introduce a new file core/atl.c to handle ACPI PRM specific address
> translation code. Naming is loosely related to the kernel's AMD
> Address Translation Library (CONFIG_AMD_ATL) but implementation does
> not dependent on it, nor it is vendor specific. Use Kbuild and Kconfig
> options respectively to enable the code depending on architecture and
> platform options.
>
> Implement an ACPI PRM firmware call for CXL address translation in the
> new function cxl_prm_to_hpa(). This includes sanity checks. Enable the
> callback for applicable CXL host bridges using the new cxl_atl_init()
> function.
>
> Signed-off-by: Robert Richter <rrichter@amd.com>
I'm still trying to digest the series. Couple things below.
> ---
> drivers/cxl/Kconfig | 4 ++
> drivers/cxl/core/Makefile | 1 +
> drivers/cxl/core/atl.c | 138 ++++++++++++++++++++++++++++++++++++++
> drivers/cxl/core/core.h | 1 +
> drivers/cxl/core/port.c | 8 +++
> 5 files changed, 152 insertions(+)
> create mode 100644 drivers/cxl/core/atl.c
>
> diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> index 48b7314afdb8..31f9c96ef908 100644
> --- a/drivers/cxl/Kconfig
> +++ b/drivers/cxl/Kconfig
> @@ -233,4 +233,8 @@ config CXL_MCE
> def_bool y
> depends on X86_MCE && MEMORY_FAILURE
>
> +config CXL_ATL
> + def_bool y
> + depends on ACPI_PRMT
> +
> endif
> diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile
> index 5ad8fef210b5..11fe272a6e29 100644
> --- a/drivers/cxl/core/Makefile
> +++ b/drivers/cxl/core/Makefile
> @@ -20,3 +20,4 @@ cxl_core-$(CONFIG_CXL_REGION) += region.o
> cxl_core-$(CONFIG_CXL_MCE) += mce.o
> cxl_core-$(CONFIG_CXL_FEATURES) += features.o
> cxl_core-$(CONFIG_CXL_EDAC_MEM_FEATURES) += edac.o
> +cxl_core-$(CONFIG_CXL_ATL) += atl.o
> diff --git a/drivers/cxl/core/atl.c b/drivers/cxl/core/atl.c
> new file mode 100644
> index 000000000000..5fc21eddaade
> --- /dev/null
> +++ b/drivers/cxl/core/atl.c
> @@ -0,0 +1,138 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (C) 2025 Advanced Micro Devices, Inc.
> + */
> +
> +#include <linux/prmt.h>
> +#include <linux/pci.h>
> +
> +#include <cxlmem.h>
> +#include "core.h"
> +
> +static bool check_prm_address_translation(struct cxl_port *port)
> +{
> + /* Applies to CXL host bridges only */
> + return !is_cxl_root(port) && port->host_bridge &&
> + is_cxl_root(to_cxl_port(port->dev.parent));
> +}
> +
> +/*
> + * PRM Address Translation - CXL DPA to System Physical Address
> + *
> + * Reference:
> + *
> + * AMD Family 1Ah Models 00h–0Fh and Models 10h–1Fh
> + * ACPI v6.5 Porting Guide, Publication # 58088
> + */
> +
> +static const guid_t prm_cxl_dpa_spa_guid =
> + GUID_INIT(0xee41b397, 0x25d4, 0x452c, 0xad, 0x54, 0x48, 0xc6, 0xe3,
> + 0x48, 0x0b, 0x94);
> +
> +struct prm_cxl_dpa_spa_data {
> + u64 dpa;
> + u8 reserved;
> + u8 devfn;
> + u8 bus;
> + u8 segment;
> + void *out;
> +} __packed;
> +
> +static u64 prm_cxl_dpa_spa(struct pci_dev *pci_dev, u64 dpa)
> +{
> + struct prm_cxl_dpa_spa_data data;
> + u64 spa;
> + int rc;
> +
> + data = (struct prm_cxl_dpa_spa_data) {
> + .dpa = dpa,
> + .devfn = pci_dev->devfn,
> + .bus = pci_dev->bus->number,
> + .segment = pci_domain_nr(pci_dev->bus),
> + .out = &spa,
> + };
> +
> + rc = acpi_call_prm_handler(prm_cxl_dpa_spa_guid, &data);
> + if (rc) {
> + pci_dbg(pci_dev, "failed to get SPA for %#llx: %d\n", dpa, rc);
> + return ULLONG_MAX;
> + }
> +
> + pci_dbg(pci_dev, "PRM address translation: DPA -> SPA: %#llx -> %#llx\n", dpa, spa);
> +
> + return spa;
> +}
> +
> +static u64 cxl_prm_to_hpa(struct cxl_decoder *cxld, u64 hpa)
> +{
> + struct cxl_memdev *cxlmd;
> + struct pci_dev *pci_dev;
> + struct cxl_port *port;
> + struct cxl_endpoint_decoder *cxled;
> +
> + /* Only translate from endpoint to its parent port. */
> + if (!is_endpoint_decoder(&cxld->dev))
> + return hpa;
> +
> + cxled = to_cxl_endpoint_decoder(&cxld->dev);
> +
> + /*
> + * Nothing to do if base is non-zero and Normalized Addressing
> + * is disabled.
> + */
Not sure if this comment matches the code below
> + if (cxld->hpa_range.start != cxled->dpa_res->start)
> + return hpa;
> +
> + /*
> + * Endpoints are programmed passthrough in Normalized
> + * Addressing mode.
> + */
Not sure if the comment here matches the conditional check.
> + if (cxld->interleave_ways != 1) {
> + dev_dbg(&cxld->dev, "unexpected interleaving config: ways: %d granularity: %d\n",
> + cxld->interleave_ways, cxld->interleave_granularity);
> + return ULLONG_MAX;
> + }
> +
> + if (hpa < cxld->hpa_range.start || hpa > cxld->hpa_range.end) {
> + dev_dbg(&cxld->dev, "hpa addr %#llx out of range %#llx-%#llx\n",
Suggest use %pr for range printing to avoid 0-day complaints on 32bit compilers.
> + hpa, cxld->hpa_range.start, cxld->hpa_range.end);
> + return ULLONG_MAX;
> + }
> +
> + port = to_cxl_port(cxld->dev.parent);
> + cxlmd = port ? to_cxl_memdev(port->uport_dev) : NULL;
> + if (!port || !dev_is_pci(cxlmd->dev.parent)) {
> + dev_dbg(&cxld->dev, "No endpoint found: %s, range %#llx-%#llx\n",
> + dev_name(cxld->dev.parent), cxld->hpa_range.start,
> + cxld->hpa_range.end);
> + return ULLONG_MAX;
> + }
> + pci_dev = to_pci_dev(cxlmd->dev.parent);
> +
> + return prm_cxl_dpa_spa(pci_dev, hpa);
> +}
> +
> +static void cxl_prm_init(struct cxl_port *port)
> +{
> + u64 spa;
> + struct prm_cxl_dpa_spa_data data = { .out = &spa, };
> + int rc;
> +
> + if (!check_prm_address_translation(port))
> + return;
> +
> + /* Check kernel (-EOPNOTSUPP) and firmware support (-ENODEV) */
> + rc = acpi_call_prm_handler(prm_cxl_dpa_spa_guid, &data);
> + if (rc == -EOPNOTSUPP || rc == -ENODEV)
> + return;
> +
> + port->to_hpa = cxl_prm_to_hpa;
> +
> + dev_dbg(port->host_bridge, "PRM address translation enabled for %s.\n",
> + dev_name(&port->dev));
> +}
> +
> +void cxl_atl_init(struct cxl_port *port)
> +{
> + cxl_prm_init(port);
> +}
> diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
> index eac8cc1bdaa0..624e438d052a 100644
> --- a/drivers/cxl/core/core.h
> +++ b/drivers/cxl/core/core.h
> @@ -150,6 +150,7 @@ int cxl_port_get_switch_dport_bandwidth(struct cxl_port *port,
> int cxl_ras_init(void);
> void cxl_ras_exit(void);
> int cxl_gpf_port_setup(struct cxl_dport *dport);
> +void cxl_atl_init(struct cxl_port *port);
>
> #ifdef CONFIG_CXL_FEATURES
> struct cxl_feat_entry *
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index 8f36ff413f5d..8007e002888e 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -831,6 +831,12 @@ static void cxl_debugfs_create_dport_dir(struct cxl_dport *dport)
> &cxl_einj_inject_fops);
> }
>
> +static void setup_address_translation(struct cxl_port *port)
> +{
> + if (IS_ENABLED(CONFIG_CXL_ATL))
> + cxl_atl_init(port);
> +}
> +
> static int cxl_port_add(struct cxl_port *port,
> resource_size_t component_reg_phys,
> struct cxl_dport *parent_dport)
> @@ -868,6 +874,8 @@ static int cxl_port_add(struct cxl_port *port,
> return rc;
> }
>
> + setup_address_translation(port);
Given that the address translation callback only is needed for the host bridge, should this be called from acpi_probe() when the host bridge is being setup rather than going through every port add and checking if the port is a host bridge?
DJ
> +
> rc = device_add(dev);
> if (rc)
> return rc;
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v3 02/11] cxl/region: Store HPA range in struct cxl_region
2025-09-12 17:17 ` Dave Jiang
@ 2025-09-15 7:19 ` Robert Richter
2025-09-15 16:24 ` Dave Jiang
0 siblings, 1 reply; 62+ messages in thread
From: Robert Richter @ 2025-09-15 7:19 UTC (permalink / raw)
To: Dave Jiang
Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
Jonathan Cameron, Davidlohr Bueso, linux-cxl, linux-kernel,
Gregory Price, Fabio M. De Francesco, Terry Bowman, Joshua Hahn
On 12.09.25 10:17:14, Dave Jiang wrote:
>
>
> On 9/12/25 7:45 AM, Robert Richter wrote:
> > Each region has a known host physical address (HPA) range it is
> > assigned to. Endpoint decoders assigned to a region share the same HPA
> > range. The region's address range is the system's physical address
> > (SPA) range.
> >
> > Endpoint decoders in systems that need address translation use HPAs
> > which are not SPAs. To make the SPA range accessible to the endpoint
> > decoders, store and track the region's SPA range in struct cxl_region.
> > Introduce the @hpa_range member to the struct. Now, the SPA range of
> > an endpoint decoder can be determined based on its assigned region.
> >
> > Patch is a prerequisite to implement address translation which uses
> > struct cxl_region to store all relevant region and interleaving
> > parameters.
> >
> > Signed-off-by: Robert Richter <rrichter@amd.com>
>
> Just a nit below. Otherwise looks ok
>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
>
> > ---
> > drivers/cxl/core/region.c | 17 +++++++++++++++++
> > drivers/cxl/cxl.h | 2 ++
> > 2 files changed, 19 insertions(+)
> >
> > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> > index 2c37c060d983..777d04870180 100644
> > --- a/drivers/cxl/core/region.c
> > +++ b/drivers/cxl/core/region.c
> > @@ -664,6 +664,11 @@ static int alloc_hpa(struct cxl_region *cxlr, resource_size_t size)
> > return PTR_ERR(res);
> > }
> >
> > + cxlr->hpa_range = (struct range) {
> > + .start = res->start,
> > + .end = res->end,
> > + };
> > +
> > p->res = res;
> > p->state = CXL_CONFIG_INTERLEAVE_ACTIVE;
> >
> > @@ -700,8 +705,14 @@ static int free_hpa(struct cxl_region *cxlr)
> > if (p->state >= CXL_CONFIG_ACTIVE)
> > return -EBUSY;
> >
> > + cxlr->hpa_range = (struct range) {
> > + .start = 0,
> > + .end = -1,
> > + };
> > +
> > cxl_region_iomem_release(cxlr);
> > p->state = CXL_CONFIG_IDLE;
> > +
>
> stray blank line
> > return 0;
> > }
This small cleanup was intended and separates the return from other
statements to better group the code in (sort of) blocks. It is not
worth separate patch and it is common practice to have small cleanups
in the area of code that is changed. That allows small style fixes to
the code while reworking it, but avoids separate code cleanups causing
extra efforts, conflicts and the risk of changing stable code.
Anyway, let me know if you want me remove the change.
Thanks,
-Robert
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v3 03/11] cxl/region: Rename misleading variable name @hpa to @range
2025-09-12 17:33 ` Dave Jiang
@ 2025-09-15 7:27 ` Robert Richter
2025-09-15 10:25 ` Jonathan Cameron
0 siblings, 1 reply; 62+ messages in thread
From: Robert Richter @ 2025-09-15 7:27 UTC (permalink / raw)
To: Dave Jiang
Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
Jonathan Cameron, Davidlohr Bueso, linux-cxl, linux-kernel,
Gregory Price, Fabio M. De Francesco, Terry Bowman, Joshua Hahn
On 12.09.25 10:33:30, Dave Jiang wrote:
>
>
> On 9/12/25 7:45 AM, Robert Richter wrote:
> > @hpa is actually a @range, rename variables accordingly.
>
> it's a range of HPA right? May as well call it 'hpa_range' to distinguish from 'dpa_range' or 'spa_range'
To me this is ok as it is locally used only in the functions. I used
the short version to not hit the 80 char line limit in some of the
statements for readability. Range is most of the time HPA unless for
special cases, which use a prefix then. It also becomes clear viewed
in context, e.g.
range = &cxld->hpa_range;
Thus, I rather would like to not change that.
-Robert
>
> >
> > Signed-off-by: Robert Richter <rrichter@amd.com>
> > ---
> > drivers/cxl/core/region.c | 29 +++++++++++++++--------------
> > 1 file changed, 15 insertions(+), 14 deletions(-)
> >
> > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> > index 777d04870180..13113920aba7 100644
> > --- a/drivers/cxl/core/region.c
> > +++ b/drivers/cxl/core/region.c
> > @@ -3367,9 +3367,9 @@ static int match_decoder_by_range(struct device *dev, const void *data)
> > }
> >
> > static struct cxl_decoder *
> > -cxl_port_find_switch_decoder(struct cxl_port *port, struct range *hpa)
> > +cxl_port_find_switch_decoder(struct cxl_port *port, struct range *range)
> > {
> > - struct device *cxld_dev = device_find_child(&port->dev, hpa,
> > + struct device *cxld_dev = device_find_child(&port->dev, range,
> > match_decoder_by_range);
> >
> > return cxld_dev ? to_cxl_decoder(cxld_dev) : NULL;
> > @@ -3382,14 +3382,14 @@ cxl_find_root_decoder(struct cxl_endpoint_decoder *cxled)
> > struct cxl_port *port = cxled_to_port(cxled);
> > struct cxl_root *cxl_root __free(put_cxl_root) = find_cxl_root(port);
> > struct cxl_decoder *root, *cxld = &cxled->cxld;
> > - struct range *hpa = &cxld->hpa_range;
> > + struct range *range = &cxld->hpa_range;
> >
> > - root = cxl_port_find_switch_decoder(&cxl_root->port, hpa);
> > + root = cxl_port_find_switch_decoder(&cxl_root->port, range);
> > if (!root) {
> > dev_err(cxlmd->dev.parent,
> > "%s:%s no CXL window for range %#llx:%#llx\n",
> > dev_name(&cxlmd->dev), dev_name(&cxld->dev),
> > - cxld->hpa_range.start, cxld->hpa_range.end);
> > + range->start, range->end);
> > return NULL;
> > }
> >
> > @@ -3458,7 +3458,7 @@ static int __construct_region(struct cxl_region *cxlr,
> > {
> > struct cxl_root_decoder *cxlrd = cxlr->cxlrd;
> > struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> > - struct range *hpa = &cxled->cxld.hpa_range;
> > + struct range *range = &cxled->cxld.hpa_range;
> > struct cxl_region_params *p;
> > struct resource *res;
> > int rc;
> > @@ -3474,13 +3474,13 @@ static int __construct_region(struct cxl_region *cxlr,
> > }
> >
> > set_bit(CXL_REGION_F_AUTO, &cxlr->flags);
> > - cxlr->hpa_range = *hpa;
> > + cxlr->hpa_range = *range;
> >
> > res = kmalloc(sizeof(*res), GFP_KERNEL);
> > if (!res)
> > return -ENOMEM;
> >
> > - *res = DEFINE_RES_MEM_NAMED(hpa->start, range_len(hpa),
> > + *res = DEFINE_RES_MEM_NAMED(range->start, range_len(range),
> > dev_name(&cxlr->dev));
> >
> > rc = cxl_extended_linear_cache_resize(cxlr, res);
> > @@ -3559,11 +3559,11 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
> > }
> >
> > static struct cxl_region *
> > -cxl_find_region_by_range(struct cxl_root_decoder *cxlrd, struct range *hpa)
> > +cxl_find_region_by_range(struct cxl_root_decoder *cxlrd, struct range *range)
> > {
> > struct device *region_dev;
> >
> > - region_dev = device_find_child(&cxlrd->cxlsd.cxld.dev, hpa,
> > + region_dev = device_find_child(&cxlrd->cxlsd.cxld.dev, range,
> > match_region_by_range);
> > if (!region_dev)
> > return NULL;
> > @@ -3573,7 +3573,7 @@ cxl_find_region_by_range(struct cxl_root_decoder *cxlrd, struct range *hpa)
> >
> > int cxl_add_to_region(struct cxl_endpoint_decoder *cxled)
> > {
> > - struct range *hpa = &cxled->cxld.hpa_range;
> > + struct range *range = &cxled->cxld.hpa_range;
> > struct cxl_region_params *p;
> > bool attach = false;
> > int rc;
> > @@ -3584,12 +3584,13 @@ int cxl_add_to_region(struct cxl_endpoint_decoder *cxled)
> > return -ENXIO;
> >
> > /*
> > - * Ensure that if multiple threads race to construct_region() for @hpa
> > - * one does the construction and the others add to that.
> > + * Ensure that if multiple threads race to construct_region()
> > + * for the HPA range one does the construction and the others
> > + * add to that.
> > */
> > mutex_lock(&cxlrd->range_lock);
> > struct cxl_region *cxlr __free(put_cxl_region) =
> > - cxl_find_region_by_range(cxlrd, hpa);
> > + cxl_find_region_by_range(cxlrd, range);
> > if (!cxlr)
> > cxlr = construct_region(cxlrd, cxled);
> > mutex_unlock(&cxlrd->range_lock);
>
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v3 06/11] cxl/region: Separate region parameter setup and region construction
2025-09-12 21:10 ` Dave Jiang
@ 2025-09-15 7:31 ` Robert Richter
2025-09-15 16:26 ` Dave Jiang
0 siblings, 1 reply; 62+ messages in thread
From: Robert Richter @ 2025-09-15 7:31 UTC (permalink / raw)
To: Dave Jiang
Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
Jonathan Cameron, Davidlohr Bueso, linux-cxl, linux-kernel,
Gregory Price, Fabio M. De Francesco, Terry Bowman, Joshua Hahn
On 12.09.25 14:10:06, Dave Jiang wrote:
>
>
> On 9/12/25 7:45 AM, Robert Richter wrote:
> > To construct a region, the region parameters such as address range and
> > interleaving config need to be determined. This is done while
> > constructing the region by inspecting the endpoint decoder
> > configuration. The endpoint decoder is passed as a function argument.
> >
> > With address translation the endpoint decoder data is no longer
> > sufficient to extract the region parameters as some of the information
> > is obtained using other methods such as using firmware calls.
> >
> > In a first step, separate code to determine and setup the region
> > parameters from the region construction. Temporarily store all the
> > data to create the region in the new struct cxl_region_context. Add a
> > new function setup_region_parameters() to fill that struct and later
> > use it to construct the region. This simplifies the extension of the
> > function to support other methods needed, esp. to support address
> > translation.
> >
> > Patch is a prerequisite to implement address translation.
> >
> > Signed-off-by: Robert Richter <rrichter@amd.com>
> > ---
> > drivers/cxl/core/region.c | 50 +++++++++++++++++++++++++++++----------
> > 1 file changed, 38 insertions(+), 12 deletions(-)
> >
> > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> > index 106692f1e310..57697504410b 100644
> > --- a/drivers/cxl/core/region.c
> > +++ b/drivers/cxl/core/region.c
> > @@ -3414,6 +3414,26 @@ static int match_region_by_range(struct device *dev, const void *data)
> > return 0;
> > }
> >
> > +struct cxl_region_context {
> > + struct cxl_endpoint_decoder *cxled;
> > + struct cxl_memdev *cxlmd;
> > + struct range hpa_range;
> > + int interleave_ways;
> > + int interleave_granularity;
> > +};
> > +
> > +static int setup_region_params(struct cxl_endpoint_decoder *cxled,
> > + struct cxl_region_context *ctx)
> > +{
> > + ctx->cxled = cxled;
> > + ctx->cxlmd = cxled_to_memdev(cxled);
> > + ctx->hpa_range = cxled->cxld.hpa_range;
> > + ctx->interleave_ways = cxled->cxld.interleave_ways;
> > + ctx->interleave_granularity = cxled->cxld.interleave_granularity;
>
> You can init like this:
>
> *ctx = (struct cxl_region_context) {
> .cxled = cxled,
> .cxlmd = cxled_to_memdev(cxled),
> .hpa_range = cxled->cxld.hpa_range,
> .interleave_ways = cxled->cxld.interleave_ways,
> .interleave_granularity = cxled->cxld.interleave_granularity,
> };
Will change that for readability and to zero-init possibly missing
members.
>
>
> > +
> > + return 0;
>
> Can probably make this function void if no expected errors and only assignments.
A later extension to the code may return an error code, so I prepared
the function interface already for this.
-Robert
>
> DJ
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v3 07/11] cxl: Introduce callback to translate a decoder's HPA to the next parent port
2025-09-12 21:21 ` Dave Jiang
@ 2025-09-15 7:55 ` Robert Richter
2025-09-15 16:32 ` Dave Jiang
0 siblings, 1 reply; 62+ messages in thread
From: Robert Richter @ 2025-09-15 7:55 UTC (permalink / raw)
To: Dave Jiang
Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
Jonathan Cameron, Davidlohr Bueso, linux-cxl, linux-kernel,
Gregory Price, Fabio M. De Francesco, Terry Bowman, Joshua Hahn
On 12.09.25 14:21:16, Dave Jiang wrote:
>
>
> On 9/12/25 7:45 AM, Robert Richter wrote:
> > To enable address translation, the endpoint decoder's HPA range must
> > be translated when crossing memory domains to the next parent port's
> > address ranges up to the root port. The root port's HPA range is
> > equivalent to the system's SPA range.
> >
> > Introduce a callback to translate an address of the decoder's HPA
> > range to the address range of the parent port. The callback can be set
> > for ports that need to handle address translation.
> >
> > Reviewed-by: Gregory Price <gourry@gourry.net>
> > Signed-off-by: Robert Richter <rrichter@amd.com>
> > ---
> > drivers/cxl/cxl.h | 13 +++++++++++++
> > 1 file changed, 13 insertions(+)
> >
> > diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> > index f182982f1c14..eb837867d932 100644
> > --- a/drivers/cxl/cxl.h
> > +++ b/drivers/cxl/cxl.h
> > @@ -429,6 +429,17 @@ struct cxl_rd_ops {
> > u64 (*spa_to_hpa)(struct cxl_root_decoder *cxlrd, u64 spa);
> > };
> >
> > +/**
> > + * cxl_to_hpa_fn - type of a callback function to translate an HPA
> > + * @cxld: cxl_decoder to translate from
> > + * @hpa: HPA of the @cxld decoder's address range
> > + *
> > + * The callback translates a decoder's HPA to the address range of the
> > + * decoder's parent port. The return value is the translated HPA on
> > + * success or ULLONG_MAX otherwise.
> > + */
> > +typedef u64 (*cxl_to_hpa_fn)(struct cxl_decoder *cxld, u64 hpa);
>
> cxl_to_parent_hpa_fn()?
It is actually a translation from the decoder's address space to the
address space of the port this callback is attached to. Parent port
might be confusing here and maybe the comment needs a rework too.
Though, finally it is used in the context where the decoder's port is
child to the port with this callback.
Comment should be more precise, ok?
-Robert
>
> DJ
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v3 11/11] cxl: Enable AMD Zen5 address translation using ACPI PRMT
2025-09-12 23:46 ` Dave Jiang
@ 2025-09-15 8:34 ` Robert Richter
0 siblings, 0 replies; 62+ messages in thread
From: Robert Richter @ 2025-09-15 8:34 UTC (permalink / raw)
To: Dave Jiang
Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
Jonathan Cameron, Davidlohr Bueso, linux-cxl, linux-kernel,
Gregory Price, Fabio M. De Francesco, Terry Bowman, Joshua Hahn
On 12.09.25 16:46:23, Dave Jiang wrote:
>
>
> On 9/12/25 7:45 AM, Robert Richter wrote:
> > Add AMD Zen5 support for address translation.
> >
> > Zen5 systems may be configured to use 'Normalized addresses'. Then,
> > CXL endpoints use their own physical address space and are programmed
> > passthrough (DPA == HPA), the number of interleaving ways for the
> > endpoint is set to one. The Host Physical Addresses (HPAs) need to be
> > translated from the endpoint to its CXL host bridge. The HPA of a CXL
> > host bridge is equivalent to the System Physical Address (SPA).
> >
> > ACPI Platform Runtime Mechanism (PRM) is used to translate the CXL
> > Device Physical Address (DPA) to its System Physical Address. This is
> > documented in:
> >
> > AMD Family 1Ah Models 00h–0Fh and Models 10h–1Fh
> > ACPI v6.5 Porting Guide, Publication # 58088
> > https://www.amd.com/en/search/documentation/hub.html
> >
> > To implement AMD Zen5 address translation the following steps are
> > needed:
> >
> > AMD Zen5 systems support the ACPI PRM CXL Address Translation firmware
> > call (Address Translation - CXL DPA to System Physical Address, see
> > ACPI v6.5 Porting Guide above) when address translation is enabled.
> > The existence of the callback can be identified using a specific GUID
> > as documented. The initialization code checks firmware and kernel
> > support of ACPI PRM.
> >
> > Introduce a new file core/atl.c to handle ACPI PRM specific address
> > translation code. Naming is loosely related to the kernel's AMD
> > Address Translation Library (CONFIG_AMD_ATL) but implementation does
> > not dependent on it, nor it is vendor specific. Use Kbuild and Kconfig
> > options respectively to enable the code depending on architecture and
> > platform options.
> >
> > Implement an ACPI PRM firmware call for CXL address translation in the
> > new function cxl_prm_to_hpa(). This includes sanity checks. Enable the
> > callback for applicable CXL host bridges using the new cxl_atl_init()
> > function.
> >
> > Signed-off-by: Robert Richter <rrichter@amd.com>
>
> I'm still trying to digest the series. Couple things below.
Thank you for review, I appreciate that.
>
> > ---
> > drivers/cxl/Kconfig | 4 ++
> > drivers/cxl/core/Makefile | 1 +
> > drivers/cxl/core/atl.c | 138 ++++++++++++++++++++++++++++++++++++++
> > drivers/cxl/core/core.h | 1 +
> > drivers/cxl/core/port.c | 8 +++
> > 5 files changed, 152 insertions(+)
> > create mode 100644 drivers/cxl/core/atl.c
> >
> > diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> > index 48b7314afdb8..31f9c96ef908 100644
> > --- a/drivers/cxl/Kconfig
> > +++ b/drivers/cxl/Kconfig
> > @@ -233,4 +233,8 @@ config CXL_MCE
> > def_bool y
> > depends on X86_MCE && MEMORY_FAILURE
> >
> > +config CXL_ATL
> > + def_bool y
> > + depends on ACPI_PRMT
> > +
> > endif
> > diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile
> > index 5ad8fef210b5..11fe272a6e29 100644
> > --- a/drivers/cxl/core/Makefile
> > +++ b/drivers/cxl/core/Makefile
> > @@ -20,3 +20,4 @@ cxl_core-$(CONFIG_CXL_REGION) += region.o
> > cxl_core-$(CONFIG_CXL_MCE) += mce.o
> > cxl_core-$(CONFIG_CXL_FEATURES) += features.o
> > cxl_core-$(CONFIG_CXL_EDAC_MEM_FEATURES) += edac.o
> > +cxl_core-$(CONFIG_CXL_ATL) += atl.o
> > diff --git a/drivers/cxl/core/atl.c b/drivers/cxl/core/atl.c
> > new file mode 100644
> > index 000000000000..5fc21eddaade
> > --- /dev/null
> > +++ b/drivers/cxl/core/atl.c
> > @@ -0,0 +1,138 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * Copyright (C) 2025 Advanced Micro Devices, Inc.
> > + */
> > +
> > +#include <linux/prmt.h>
> > +#include <linux/pci.h>
> > +
> > +#include <cxlmem.h>
> > +#include "core.h"
> > +
> > +static bool check_prm_address_translation(struct cxl_port *port)
> > +{
> > + /* Applies to CXL host bridges only */
> > + return !is_cxl_root(port) && port->host_bridge &&
> > + is_cxl_root(to_cxl_port(port->dev.parent));
> > +}
> > +
> > +/*
> > + * PRM Address Translation - CXL DPA to System Physical Address
> > + *
> > + * Reference:
> > + *
> > + * AMD Family 1Ah Models 00h–0Fh and Models 10h–1Fh
> > + * ACPI v6.5 Porting Guide, Publication # 58088
> > + */
> > +
> > +static const guid_t prm_cxl_dpa_spa_guid =
> > + GUID_INIT(0xee41b397, 0x25d4, 0x452c, 0xad, 0x54, 0x48, 0xc6, 0xe3,
> > + 0x48, 0x0b, 0x94);
> > +
> > +struct prm_cxl_dpa_spa_data {
> > + u64 dpa;
> > + u8 reserved;
> > + u8 devfn;
> > + u8 bus;
> > + u8 segment;
> > + void *out;
> > +} __packed;
> > +
> > +static u64 prm_cxl_dpa_spa(struct pci_dev *pci_dev, u64 dpa)
> > +{
> > + struct prm_cxl_dpa_spa_data data;
> > + u64 spa;
> > + int rc;
> > +
> > + data = (struct prm_cxl_dpa_spa_data) {
> > + .dpa = dpa,
> > + .devfn = pci_dev->devfn,
> > + .bus = pci_dev->bus->number,
> > + .segment = pci_domain_nr(pci_dev->bus),
> > + .out = &spa,
> > + };
> > +
> > + rc = acpi_call_prm_handler(prm_cxl_dpa_spa_guid, &data);
> > + if (rc) {
> > + pci_dbg(pci_dev, "failed to get SPA for %#llx: %d\n", dpa, rc);
> > + return ULLONG_MAX;
> > + }
> > +
> > + pci_dbg(pci_dev, "PRM address translation: DPA -> SPA: %#llx -> %#llx\n", dpa, spa);
> > +
> > + return spa;
> > +}
> > +
> > +static u64 cxl_prm_to_hpa(struct cxl_decoder *cxld, u64 hpa)
> > +{
> > + struct cxl_memdev *cxlmd;
> > + struct pci_dev *pci_dev;
> > + struct cxl_port *port;
> > + struct cxl_endpoint_decoder *cxled;
> > +
> > + /* Only translate from endpoint to its parent port. */
> > + if (!is_endpoint_decoder(&cxld->dev))
> > + return hpa;
> > +
> > + cxled = to_cxl_endpoint_decoder(&cxld->dev);
> > +
> > + /*
> > + * Nothing to do if base is non-zero and Normalized Addressing
> > + * is disabled.
> > + */
>
> Not sure if this comment matches the code below
Non-zero is wrong, I meant it must be zero-based (means DPA == HPA).
Otherwise, if not zero-based, Normalized Addressing is disabled and
translation is not needed.
> > + if (cxld->hpa_range.start != cxled->dpa_res->start)
> > + return hpa;
> > +
> > + /*
> > + * Endpoints are programmed passthrough in Normalized
> > + * Addressing mode.
> > + */
> Not sure if the comment here matches the conditional check.
Passthrough implies interleaving is disabled and thus ways must be 1,
will update comment.
> > + if (cxld->interleave_ways != 1) {
> > + dev_dbg(&cxld->dev, "unexpected interleaving config: ways: %d granularity: %d\n",
> > + cxld->interleave_ways, cxld->interleave_granularity);
> > + return ULLONG_MAX;
> > + }
> > +
> > + if (hpa < cxld->hpa_range.start || hpa > cxld->hpa_range.end) {
> > + dev_dbg(&cxld->dev, "hpa addr %#llx out of range %#llx-%#llx\n",
>
> Suggest use %pr for range printing to avoid 0-day complaints on 32bit compilers.
Ok.
>
> > + hpa, cxld->hpa_range.start, cxld->hpa_range.end);
> > + return ULLONG_MAX;
> > + }
> > +
> > + port = to_cxl_port(cxld->dev.parent);
> > + cxlmd = port ? to_cxl_memdev(port->uport_dev) : NULL;
> > + if (!port || !dev_is_pci(cxlmd->dev.parent)) {
> > + dev_dbg(&cxld->dev, "No endpoint found: %s, range %#llx-%#llx\n",
> > + dev_name(cxld->dev.parent), cxld->hpa_range.start,
> > + cxld->hpa_range.end);
> > + return ULLONG_MAX;
> > + }
> > + pci_dev = to_pci_dev(cxlmd->dev.parent);
> > +
> > + return prm_cxl_dpa_spa(pci_dev, hpa);
> > +}
> > +
> > +static void cxl_prm_init(struct cxl_port *port)
> > +{
> > + u64 spa;
> > + struct prm_cxl_dpa_spa_data data = { .out = &spa, };
> > + int rc;
> > +
> > + if (!check_prm_address_translation(port))
> > + return;
> > +
> > + /* Check kernel (-EOPNOTSUPP) and firmware support (-ENODEV) */
> > + rc = acpi_call_prm_handler(prm_cxl_dpa_spa_guid, &data);
> > + if (rc == -EOPNOTSUPP || rc == -ENODEV)
> > + return;
> > +
> > + port->to_hpa = cxl_prm_to_hpa;
> > +
> > + dev_dbg(port->host_bridge, "PRM address translation enabled for %s.\n",
> > + dev_name(&port->dev));
> > +}
> > +
> > +void cxl_atl_init(struct cxl_port *port)
> > +{
> > + cxl_prm_init(port);
> > +}
> > diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
> > index eac8cc1bdaa0..624e438d052a 100644
> > --- a/drivers/cxl/core/core.h
> > +++ b/drivers/cxl/core/core.h
> > @@ -150,6 +150,7 @@ int cxl_port_get_switch_dport_bandwidth(struct cxl_port *port,
> > int cxl_ras_init(void);
> > void cxl_ras_exit(void);
> > int cxl_gpf_port_setup(struct cxl_dport *dport);
> > +void cxl_atl_init(struct cxl_port *port);
> >
> > #ifdef CONFIG_CXL_FEATURES
> > struct cxl_feat_entry *
> > diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> > index 8f36ff413f5d..8007e002888e 100644
> > --- a/drivers/cxl/core/port.c
> > +++ b/drivers/cxl/core/port.c
> > @@ -831,6 +831,12 @@ static void cxl_debugfs_create_dport_dir(struct cxl_dport *dport)
> > &cxl_einj_inject_fops);
> > }
> >
> > +static void setup_address_translation(struct cxl_port *port)
> > +{
> > + if (IS_ENABLED(CONFIG_CXL_ATL))
> > + cxl_atl_init(port);
> > +}
> > +
> > static int cxl_port_add(struct cxl_port *port,
> > resource_size_t component_reg_phys,
> > struct cxl_dport *parent_dport)
> > @@ -868,6 +874,8 @@ static int cxl_port_add(struct cxl_port *port,
> > return rc;
> > }
> >
> > + setup_address_translation(port);
>
> Given that the address translation callback only is needed for the
> host bridge, should this be called from acpi_probe() when the host
> bridge is being setup rather than going through every port add and
> checking if the port is a host bridge?
Will check if that is feasible.
Thanks,
-Robert
> DJ
>
> > +
> > rc = device_add(dev);
> > if (rc)
> > return rc;
>
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v3 00/11] cxl: ACPI PRM Address Translation Support and AMD Zen5 enablement
2025-09-12 15:45 ` [PATCH v3 00/11] cxl: ACPI PRM Address Translation Support and AMD Zen5 enablement Dave Jiang
@ 2025-09-15 8:42 ` Robert Richter
0 siblings, 0 replies; 62+ messages in thread
From: Robert Richter @ 2025-09-15 8:42 UTC (permalink / raw)
To: Dave Jiang
Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
Jonathan Cameron, Davidlohr Bueso, linux-cxl, linux-kernel,
Gregory Price, Fabio M. De Francesco, Terry Bowman, Joshua Hahn
On 12.09.25 08:45:37, Dave Jiang wrote:
>
>
> On 9/12/25 7:45 AM, Robert Richter wrote:
> > This patch set adds support for address translation using ACPI PRM and
> > enables this for AMD Zen5 platforms. This is another new appoach in
> > response to earlier attempts to implement CXL address translation:
> >
> > * v1: [1] and the comments on it, esp. Dan's [2],
> > * v2: [3] and comments on [4], esp. Dave's [5]
> >
> > This version 3 addresses the requests to reduce the number of patches
> > to a minimum and also to remove platform specific implementations
> > allowing the Documentation of CXL Address Translation Support in the
> > Kernel's "Compute Express Link: Linux Conventions" document and an
> > update of the CXL specification in the longterm. This patch submission
> > will be the base for a documention patch that describes CXL Address
> > Translation support accordingly. The documentation patch will be sent
> > in the very next step.
> >
> > The CXL driver currently does not implement address translation which
> > assumes the host physical addresses (HPA) and system physical
> > addresses (SPA) are equal.
> >
> > Systems with different HPA and SPA addresses need address translation.
> > If this is the case, the hardware addresses esp. used in the HDM
> > decoder configurations are different to the system's or parent port
> > address ranges. E.g. AMD Zen5 systems may be configured to use
> > 'Normalized addresses'. Then, CXL endpoints have their own physical
> > address base which is not the same as the SPA used by the CXL host
> > bridge. Thus, addresses need to be translated from the endpoint's to
> > its CXL host bridge's address range.
> >
> > To enable address translation, the endpoint's HPA range must be
> > translated to the CXL host bridge's address range. A callback is
> > introduced to translate a decoder's HPA to the next parent port's
> > address range. This allows the enablement of address translation for
> > individual ports as needed. The callback is then used to determine the
> > region parameters which includes the SPA translated address range of
> > the endpoint decoder and the interleaving configuration. This is
> > stored in struct cxl_region which allows an endpoint decoder to
> > determine that parameters based on its assigned region.
> >
> > Note that only auto-discovery of decoders is supported. Thus, decoders
> > are locked and cannot be configured manually.
>
> Hi Robert, thanks for reworking this.
>
> What happens with the manual configured path if only auto-discovery
> is supported? Things don't work? It works with no translation
> needed? Platform will lock all decoders and not allow manual
> configuration for CXL devices?
Endpoints, root ports and bridges are always pre-configured by
firmware in this case. A manual setup is not supported. That is why
the endpoints are locked to prevent the kernel from reconfiguring the
decoders, see:
[PATCH v3 09/11] cxl/region: Lock decoders that need address translation
-Robert
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v3 01/11] cxl/region: Store root decoder in struct cxl_region
2025-09-12 15:52 ` Dave Jiang
@ 2025-09-15 10:14 ` Jonathan Cameron
0 siblings, 0 replies; 62+ messages in thread
From: Jonathan Cameron @ 2025-09-15 10:14 UTC (permalink / raw)
To: Dave Jiang
Cc: Robert Richter, Alison Schofield, Vishal Verma, Ira Weiny,
Dan Williams, Davidlohr Bueso, linux-cxl, linux-kernel,
Gregory Price, Fabio M. De Francesco, Terry Bowman, Joshua Hahn
On Fri, 12 Sep 2025 08:52:06 -0700
Dave Jiang <dave.jiang@intel.com> wrote:
> On 9/12/25 7:45 AM, Robert Richter wrote:
> > A region is always bound to a root decoder. The region's associated
> > root decoder is often needed. Add it to struct cxl_region.
> >
> > This simplifies code by removing dynamic lookups and removing the root
> > decoder argument from the function argument list where possible.
> >
> > Patch is a prerequisite to implement address translation which uses
> > struct cxl_region to store all relevant region and interleaving
> > parameters.
> >
> > Signed-off-by: Robert Richter <rrichter@amd.com>
>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v3 02/11] cxl/region: Store HPA range in struct cxl_region
2025-09-12 14:45 ` [PATCH v3 02/11] cxl/region: Store HPA range " Robert Richter
2025-09-12 17:17 ` Dave Jiang
@ 2025-09-15 10:23 ` Jonathan Cameron
2025-09-17 8:15 ` Robert Richter
2025-09-17 19:58 ` Gregory Price
2 siblings, 1 reply; 62+ messages in thread
From: Jonathan Cameron @ 2025-09-15 10:23 UTC (permalink / raw)
To: Robert Richter
Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
Dave Jiang, Davidlohr Bueso, linux-cxl, linux-kernel,
Gregory Price, Fabio M. De Francesco, Terry Bowman, Joshua Hahn
On Fri, 12 Sep 2025 16:45:04 +0200
Robert Richter <rrichter@amd.com> wrote:
> Each region has a known host physical address (HPA) range it is
> assigned to. Endpoint decoders assigned to a region share the same HPA
> range. The region's address range is the system's physical address
> (SPA) range.
>
> Endpoint decoders in systems that need address translation use HPAs
> which are not SPAs. To make the SPA range accessible to the endpoint
> decoders, store and track the region's SPA range in struct cxl_region.
> Introduce the @hpa_range member to the struct. Now, the SPA range of
> an endpoint decoder can be determined based on its assigned region.
>
> Patch is a prerequisite to implement address translation which uses
> struct cxl_region to store all relevant region and interleaving
> parameters.
>
> Signed-off-by: Robert Richter <rrichter@amd.com>
Trivial comment inline.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> ---
> drivers/cxl/core/region.c | 17 +++++++++++++++++
> drivers/cxl/cxl.h | 2 ++
> 2 files changed, 19 insertions(+)
>
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 2c37c060d983..777d04870180 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -664,6 +664,11 @@ static int alloc_hpa(struct cxl_region *cxlr, resource_size_t size)
> return PTR_ERR(res);
> }
>
> + cxlr->hpa_range = (struct range) {
> + .start = res->start,
> + .end = res->end,
> + };
> +
> p->res = res;
> p->state = CXL_CONFIG_INTERLEAVE_ACTIVE;
>
> @@ -700,8 +705,14 @@ static int free_hpa(struct cxl_region *cxlr)
> if (p->state >= CXL_CONFIG_ACTIVE)
> return -EBUSY;
>
> + cxlr->hpa_range = (struct range) {
> + .start = 0,
> + .end = -1,
> + };
There is DEFINE_RANGE() which you could use here
cxlr->hpa_range = DEFINE_RANGE(0, -1);
Not sure if it is worth bothering though.
> +
> cxl_region_iomem_release(cxlr);
> p->state = CXL_CONFIG_IDLE;
> +
> return 0;
> }
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v3 03/11] cxl/region: Rename misleading variable name @hpa to @range
2025-09-15 7:27 ` Robert Richter
@ 2025-09-15 10:25 ` Jonathan Cameron
2025-09-17 8:10 ` Robert Richter
0 siblings, 1 reply; 62+ messages in thread
From: Jonathan Cameron @ 2025-09-15 10:25 UTC (permalink / raw)
To: Robert Richter
Cc: Dave Jiang, Alison Schofield, Vishal Verma, Ira Weiny,
Dan Williams, Davidlohr Bueso, linux-cxl, linux-kernel,
Gregory Price, Fabio M. De Francesco, Terry Bowman, Joshua Hahn
On Mon, 15 Sep 2025 09:27:36 +0200
Robert Richter <rrichter@amd.com> wrote:
> On 12.09.25 10:33:30, Dave Jiang wrote:
> >
> >
> > On 9/12/25 7:45 AM, Robert Richter wrote:
> > > @hpa is actually a @range, rename variables accordingly.
> >
> > it's a range of HPA right? May as well call it 'hpa_range' to distinguish from 'dpa_range' or 'spa_range'
>
> To me this is ok as it is locally used only in the functions. I used
> the short version to not hit the 80 char line limit in some of the
> statements for readability. Range is most of the time HPA unless for
> special cases, which use a prefix then. It also becomes clear viewed
> in context, e.g.
>
> range = &cxld->hpa_range;
>
> Thus, I rather would like to not change that.
I'm with Dave on this one. Better to have some slightly long lines
and avoid any potential confusion.
Jonathan
>
> -Robert
>
> >
> > >
> > > Signed-off-by: Robert Richter <rrichter@amd.com>
> > > ---
> > > drivers/cxl/core/region.c | 29 +++++++++++++++--------------
> > > 1 file changed, 15 insertions(+), 14 deletions(-)
> > >
> > > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> > > index 777d04870180..13113920aba7 100644
> > > --- a/drivers/cxl/core/region.c
> > > +++ b/drivers/cxl/core/region.c
> > > @@ -3367,9 +3367,9 @@ static int match_decoder_by_range(struct device *dev, const void *data)
> > > }
> > >
> > > static struct cxl_decoder *
> > > -cxl_port_find_switch_decoder(struct cxl_port *port, struct range *hpa)
> > > +cxl_port_find_switch_decoder(struct cxl_port *port, struct range *range)
> > > {
> > > - struct device *cxld_dev = device_find_child(&port->dev, hpa,
> > > + struct device *cxld_dev = device_find_child(&port->dev, range,
> > > match_decoder_by_range);
> > >
> > > return cxld_dev ? to_cxl_decoder(cxld_dev) : NULL;
> > > @@ -3382,14 +3382,14 @@ cxl_find_root_decoder(struct cxl_endpoint_decoder *cxled)
> > > struct cxl_port *port = cxled_to_port(cxled);
> > > struct cxl_root *cxl_root __free(put_cxl_root) = find_cxl_root(port);
> > > struct cxl_decoder *root, *cxld = &cxled->cxld;
> > > - struct range *hpa = &cxld->hpa_range;
> > > + struct range *range = &cxld->hpa_range;
> > >
> > > - root = cxl_port_find_switch_decoder(&cxl_root->port, hpa);
> > > + root = cxl_port_find_switch_decoder(&cxl_root->port, range);
> > > if (!root) {
> > > dev_err(cxlmd->dev.parent,
> > > "%s:%s no CXL window for range %#llx:%#llx\n",
> > > dev_name(&cxlmd->dev), dev_name(&cxld->dev),
> > > - cxld->hpa_range.start, cxld->hpa_range.end);
> > > + range->start, range->end);
> > > return NULL;
> > > }
> > >
> > > @@ -3458,7 +3458,7 @@ static int __construct_region(struct cxl_region *cxlr,
> > > {
> > > struct cxl_root_decoder *cxlrd = cxlr->cxlrd;
> > > struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> > > - struct range *hpa = &cxled->cxld.hpa_range;
> > > + struct range *range = &cxled->cxld.hpa_range;
> > > struct cxl_region_params *p;
> > > struct resource *res;
> > > int rc;
> > > @@ -3474,13 +3474,13 @@ static int __construct_region(struct cxl_region *cxlr,
> > > }
> > >
> > > set_bit(CXL_REGION_F_AUTO, &cxlr->flags);
> > > - cxlr->hpa_range = *hpa;
> > > + cxlr->hpa_range = *range;
> > >
> > > res = kmalloc(sizeof(*res), GFP_KERNEL);
> > > if (!res)
> > > return -ENOMEM;
> > >
> > > - *res = DEFINE_RES_MEM_NAMED(hpa->start, range_len(hpa),
> > > + *res = DEFINE_RES_MEM_NAMED(range->start, range_len(range),
> > > dev_name(&cxlr->dev));
> > >
> > > rc = cxl_extended_linear_cache_resize(cxlr, res);
> > > @@ -3559,11 +3559,11 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
> > > }
> > >
> > > static struct cxl_region *
> > > -cxl_find_region_by_range(struct cxl_root_decoder *cxlrd, struct range *hpa)
> > > +cxl_find_region_by_range(struct cxl_root_decoder *cxlrd, struct range *range)
> > > {
> > > struct device *region_dev;
> > >
> > > - region_dev = device_find_child(&cxlrd->cxlsd.cxld.dev, hpa,
> > > + region_dev = device_find_child(&cxlrd->cxlsd.cxld.dev, range,
> > > match_region_by_range);
> > > if (!region_dev)
> > > return NULL;
> > > @@ -3573,7 +3573,7 @@ cxl_find_region_by_range(struct cxl_root_decoder *cxlrd, struct range *hpa)
> > >
> > > int cxl_add_to_region(struct cxl_endpoint_decoder *cxled)
> > > {
> > > - struct range *hpa = &cxled->cxld.hpa_range;
> > > + struct range *range = &cxled->cxld.hpa_range;
> > > struct cxl_region_params *p;
> > > bool attach = false;
> > > int rc;
> > > @@ -3584,12 +3584,13 @@ int cxl_add_to_region(struct cxl_endpoint_decoder *cxled)
> > > return -ENXIO;
> > >
> > > /*
> > > - * Ensure that if multiple threads race to construct_region() for @hpa
> > > - * one does the construction and the others add to that.
> > > + * Ensure that if multiple threads race to construct_region()
> > > + * for the HPA range one does the construction and the others
> > > + * add to that.
> > > */
> > > mutex_lock(&cxlrd->range_lock);
> > > struct cxl_region *cxlr __free(put_cxl_region) =
> > > - cxl_find_region_by_range(cxlrd, hpa);
> > > + cxl_find_region_by_range(cxlrd, range);
> > > if (!cxlr)
> > > cxlr = construct_region(cxlrd, cxled);
> > > mutex_unlock(&cxlrd->range_lock);
> >
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v3 08/11] cxl/region: Implement endpoint decoder address translation
2025-09-12 14:45 ` [PATCH v3 08/11] cxl/region: Implement endpoint decoder address translation Robert Richter
@ 2025-09-15 10:46 ` Jonathan Cameron
2025-09-15 16:35 ` Dave Jiang
` (2 more replies)
0 siblings, 3 replies; 62+ messages in thread
From: Jonathan Cameron @ 2025-09-15 10:46 UTC (permalink / raw)
To: Robert Richter
Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
Dave Jiang, Davidlohr Bueso, linux-cxl, linux-kernel,
Gregory Price, Fabio M. De Francesco, Terry Bowman, Joshua Hahn
On Fri, 12 Sep 2025 16:45:10 +0200
Robert Richter <rrichter@amd.com> wrote:
> Systems that need address translation have the endpoint decoders
> programmed for a different address space. Host physical addresses
> (HPA) are different from their system physical addresses (SPA). The
> decoder's address range and interleaving configuration of such
> endpoints cannot be used to determine the region parameters. The
> region's address range must be SPA which the decoder does not
> provide. In addition, an endpoint's incoming HPA is already converted
> to the devices physical address (DPA). Thus it has interleaving
> disabled.
>
> Address translation may provide different ways to determine an
> endpoint's SPA, e.g. it may support a firmware call. This allows the
> determination of the region's parameters without inspecting the
> endpoint decoders.
>
> Implement the setup of address translation given there is a function
> to convert an endpoint's HPA (which is identical to its DPA) to an
> SPA. Use the previously introduced cxl_to_hpa_fn callback for this.
> Convert the decoder's address range and ensure it is 256MB aligned.
>
> Identify the region's interleaving ways by inspecting the address
> ranges. Also determine the interleaving granularity using the address
> translation callback. Note that the position of the chunk from one
> interleaving block to the next may vary and thus cannot be considered
> constant. Address offsets larger than the interleaving block size
> cannot be used to calculate the granularity. Thus, probe the
> granularity using address translation for various HPAs in the same
> interleaving block.
>
> Note that this patch does not yet enable address translation as
> callbacks have not been initialized.
>
> Signed-off-by: Robert Richter <rrichter@amd.com>
> ---
> drivers/cxl/core/region.c | 95 ++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 94 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 57697504410b..9fb1e9508213 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -3422,16 +3422,109 @@ struct cxl_region_context {
> int interleave_granularity;
> };
>
> +static int setup_address_translation(struct cxl_endpoint_decoder *cxled,
> + struct cxl_region_context *ctx)
> +{
> + struct cxl_port *port = to_cxl_port(cxled->cxld.dev.parent->parent);
When there is a parent->parent it always makes me nervous that I haven't
reasoned out what port this actually is. A comment would help or
a more specific macro where the name lets us know what we are getting.
> + struct cxl_decoder *cxld = &cxled->cxld;
> + struct range range = ctx->hpa_range;
> + u64 spa_len, len = range_len(&range);
> + u64 addr, base = range.start;
> + int ways, gran;
> +
> + if (!len || !port->to_hpa)
> + return 0;
> +
> + if (!IS_ALIGNED(range.start, SZ_256M) ||
> + !IS_ALIGNED(range.end + 1, SZ_256M)) {
> + dev_warn(&port->dev,
> + "CXL address translation: Unaligned decoder HPA range: %#llx-%#llx(%s)\n",
> + range.start, range.end, dev_name(&cxld->dev));
> + return -ENXIO;
> + }
> +
> + /* Translate HPA range to SPA. */
> + range.start = port->to_hpa(cxld, range.start);
This is where the generic naming as 'range' gets really confusing.
hpa_range etc with separate struct range for each would definitely help
For the checks and inputs maybe just use ctx->hpa_range directly.
> + range.end = port->to_hpa(cxld, range.end);
Perhaps use the DEFINE_RANGE macro or
range = (struct range) {
.start = ...
style as per earlier patches.
> +
> + if (range.start == ULLONG_MAX || range.end == ULLONG_MAX) {
> + dev_warn(&port->dev,
> + "CXL address translation: Failed to translate HPA range: %#llx-%#llx:%#llx-%#llx(%s)\n",
> + range.start, range.end, ctx->hpa_range.start,
> + ctx->hpa_range.end, dev_name(&cxld->dev));
> + return -ENXIO;
> + }
> +
> + /*
> + * Since translated addresses include the interleaving
> + * offsets, align the range to 256 MB.
So we pass in an HPA range without interleaving offsets and get back
one with them? Is that unavoidable, or can we potentially push
this bit into the callback? Probably with separate callbacks to
get the interleave details.
Overall I'm not really following what is going on here. Maybe
some ascii art would help?
> + */
> + range.start = ALIGN_DOWN(range.start, SZ_256M);
> + range.end = ALIGN(range.end, SZ_256M) - 1;
> +
> + spa_len = range_len(&range);
> + if (!len || !spa_len || spa_len % len) {
> + dev_warn(&port->dev,
> + "CXL address translation: HPA range not contiguous: %#llx-%#llx:%#llx-%#llx(%s)\n",
> + range.start, range.end, ctx->hpa_range.start,
> + ctx->hpa_range.end, dev_name(&cxld->dev));
> + return -ENXIO;
> + }
> +
> + ways = spa_len / len;
> + gran = SZ_256;
> +
> + /*
> + * Determine interleave granularity
> + *
> + * Note: The position of the chunk from one interleaving block
> + * to the next may vary and thus cannot be considered
> + * constant. Address offsets larger than the interleaving
> + * block size cannot be used to calculate the granularity.
> + */
> + while (ways > 1 && gran <= SZ_16M) {
> + addr = port->to_hpa(cxld, base + gran);
> + if (addr != base + gran)
> + break;
> + gran <<= 1;
> + }
> +
> + if (gran > SZ_16M) {
> + dev_warn(&port->dev,
> + "CXL address translation: Cannot determine granularity: %#llx-%#llx:%#llx-%#llx(%s)\n",
> + range.start, range.end, ctx->hpa_range.start,
> + ctx->hpa_range.end, dev_name(&cxld->dev));
> + return -ENXIO;
> + }
> +
> + ctx->hpa_range = range;
> + ctx->interleave_ways = ways;
> + ctx->interleave_granularity = gran;
> +
> + dev_dbg(&cxld->dev,
> + "address mapping found for %s (hpa -> spa): %#llx+%#llx -> %#llx+%#llx ways:%d granularity:%d\n",
> + dev_name(ctx->cxlmd->dev.parent), base, len, range.start,
> + spa_len, ways, gran);
> +
> + return 0;
> +}
> +
> static int setup_region_params(struct cxl_endpoint_decoder *cxled,
> struct cxl_region_context *ctx)
> {
> + int rc;
> +
> ctx->cxled = cxled;
> ctx->cxlmd = cxled_to_memdev(cxled);
> ctx->hpa_range = cxled->cxld.hpa_range;
> ctx->interleave_ways = cxled->cxld.interleave_ways;
> ctx->interleave_granularity = cxled->cxld.interleave_granularity;
>
> - return 0;
> + rc = setup_address_translation(cxled, ctx);
A quick search suggested nothing new gets added after this. As such
return setup_address_translation(...);
is probably appropriate here.
> + if (rc)
> + return rc;
> +
> + return rc;
> }
>
> static int cxl_extended_linear_cache_resize(struct cxl_region *cxlr,
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v3 11/11] cxl: Enable AMD Zen5 address translation using ACPI PRMT
2025-09-12 14:45 ` [PATCH v3 11/11] cxl: Enable AMD Zen5 address translation using ACPI PRMT Robert Richter
2025-09-12 23:46 ` Dave Jiang
@ 2025-09-15 10:59 ` Jonathan Cameron
2025-09-17 9:43 ` Robert Richter
2025-09-17 21:01 ` Dave Jiang
2025-09-24 17:09 ` Gregory Price
3 siblings, 1 reply; 62+ messages in thread
From: Jonathan Cameron @ 2025-09-15 10:59 UTC (permalink / raw)
To: Robert Richter
Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
Dave Jiang, Davidlohr Bueso, linux-cxl, linux-kernel,
Gregory Price, Fabio M. De Francesco, Terry Bowman, Joshua Hahn
On Fri, 12 Sep 2025 16:45:13 +0200
Robert Richter <rrichter@amd.com> wrote:
> Add AMD Zen5 support for address translation.
>
> Zen5 systems may be configured to use 'Normalized addresses'. Then,
> CXL endpoints use their own physical address space and are programmed
> passthrough (DPA == HPA), the number of interleaving ways for the
> endpoint is set to one. The Host Physical Addresses (HPAs) need to be
> translated from the endpoint to its CXL host bridge. The HPA of a CXL
> host bridge is equivalent to the System Physical Address (SPA).
>
> ACPI Platform Runtime Mechanism (PRM) is used to translate the CXL
> Device Physical Address (DPA) to its System Physical Address. This is
> documented in:
>
> AMD Family 1Ah Models 00h–0Fh and Models 10h–1Fh
> ACPI v6.5 Porting Guide, Publication # 58088
> https://www.amd.com/en/search/documentation/hub.html
>
> To implement AMD Zen5 address translation the following steps are
> needed:
>
> AMD Zen5 systems support the ACPI PRM CXL Address Translation firmware
> call (Address Translation - CXL DPA to System Physical Address, see
> ACPI v6.5 Porting Guide above) when address translation is enabled.
> The existence of the callback can be identified using a specific GUID
> as documented. The initialization code checks firmware and kernel
> support of ACPI PRM.
>
> Introduce a new file core/atl.c to handle ACPI PRM specific address
> translation code. Naming is loosely related to the kernel's AMD
> Address Translation Library (CONFIG_AMD_ATL) but implementation does
> not dependent on it, nor it is vendor specific. Use Kbuild and Kconfig
> options respectively to enable the code depending on architecture and
> platform options.
>
> Implement an ACPI PRM firmware call for CXL address translation in the
> new function cxl_prm_to_hpa(). This includes sanity checks. Enable the
> callback for applicable CXL host bridges using the new cxl_atl_init()
> function.
>
> Signed-off-by: Robert Richter <rrichter@amd.com>
A few minor additions inline.
J
> ---
> drivers/cxl/Kconfig | 4 ++
> drivers/cxl/core/Makefile | 1 +
> drivers/cxl/core/atl.c | 138 ++++++++++++++++++++++++++++++++++++++
> drivers/cxl/core/core.h | 1 +
> drivers/cxl/core/port.c | 8 +++
> 5 files changed, 152 insertions(+)
> create mode 100644 drivers/cxl/core/atl.c
>
> diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> index 48b7314afdb8..31f9c96ef908 100644
> --- a/drivers/cxl/Kconfig
> +++ b/drivers/cxl/Kconfig
> @@ -233,4 +233,8 @@ config CXL_MCE
> def_bool y
> depends on X86_MCE && MEMORY_FAILURE
>
> +config CXL_ATL
> + def_bool y
Given no help we can't turn this off manually and it's down to
whether ACPI_PRMT is configured or not.
To me this feels like something we should be able to control.
Not a huge amount of code, but none the less 'so far' it only
applies to particular AMD platforms yet ACPI_PRMT gets built
on ARM platforms and other stuff even on AMD (CONFIG_AMD_ATL_PRM)
> + depends on ACPI_PRMT
> +
> endif
> diff --git a/drivers/cxl/core/atl.c b/drivers/cxl/core/atl.c
> new file mode 100644
> index 000000000000..5fc21eddaade
> --- /dev/null
> +++ b/drivers/cxl/core/atl.c
> +struct prm_cxl_dpa_spa_data {
> + u64 dpa;
> + u8 reserved;
> + u8 devfn;
> + u8 bus;
> + u8 segment;
> + void *out;
If reality is out is always a u64 * maybe just give it that type.
> +} __packed;
> +
> +static u64 prm_cxl_dpa_spa(struct pci_dev *pci_dev, u64 dpa)
> +{
> + struct prm_cxl_dpa_spa_data data;
> + u64 spa;
> + int rc;
> +
> + data = (struct prm_cxl_dpa_spa_data) {
> + .dpa = dpa,
> + .devfn = pci_dev->devfn,
> + .bus = pci_dev->bus->number,
> + .segment = pci_domain_nr(pci_dev->bus),
> + .out = &spa,
> + };
> +
> + rc = acpi_call_prm_handler(prm_cxl_dpa_spa_guid, &data);
> + if (rc) {
> + pci_dbg(pci_dev, "failed to get SPA for %#llx: %d\n", dpa, rc);
> + return ULLONG_MAX;
> + }
> +
> + pci_dbg(pci_dev, "PRM address translation: DPA -> SPA: %#llx -> %#llx\n", dpa, spa);
> +
> + return spa;
> +}
> +
> +static u64 cxl_prm_to_hpa(struct cxl_decoder *cxld, u64 hpa)
> +{
> + pci_dev = to_pci_dev(cxlmd->dev.parent);
return prm_cxl_dpa_spa(to_pci_dev(cxlmd->dev.parent), hpa);
seem fine to me and shortens things a little.
> +
> + return prm_cxl_dpa_spa(pci_dev, hpa);
> +}
> +
> +static void cxl_prm_init(struct cxl_port *port)
> +{
> + u64 spa;
> + struct prm_cxl_dpa_spa_data data = { .out = &spa, };
> + int rc;
> +
> + if (!check_prm_address_translation(port))
> + return;
> +
> + /* Check kernel (-EOPNOTSUPP) and firmware support (-ENODEV) */
> + rc = acpi_call_prm_handler(prm_cxl_dpa_spa_guid, &data);
> + if (rc == -EOPNOTSUPP || rc == -ENODEV)
> + return;
So other error values are fine? IF they don't occur no need to be explicit
just check rc < 0 and return.
> +
> + port->to_hpa = cxl_prm_to_hpa;
> +
> + dev_dbg(port->host_bridge, "PRM address translation enabled for %s.\n",
> + dev_name(&port->dev));
> +}
> +
> +void cxl_atl_init(struct cxl_port *port)
> +{
> + cxl_prm_init(port);
Why not just rename cxl_prm_init() to cxl_atl_init() and get rid of this wrapper?
> +}
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v3 02/11] cxl/region: Store HPA range in struct cxl_region
2025-09-15 7:19 ` Robert Richter
@ 2025-09-15 16:24 ` Dave Jiang
0 siblings, 0 replies; 62+ messages in thread
From: Dave Jiang @ 2025-09-15 16:24 UTC (permalink / raw)
To: Robert Richter
Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
Jonathan Cameron, Davidlohr Bueso, linux-cxl, linux-kernel,
Gregory Price, Fabio M. De Francesco, Terry Bowman, Joshua Hahn
On 9/15/25 12:19 AM, Robert Richter wrote:
> On 12.09.25 10:17:14, Dave Jiang wrote:
>>
>>
>> On 9/12/25 7:45 AM, Robert Richter wrote:
>>> Each region has a known host physical address (HPA) range it is
>>> assigned to. Endpoint decoders assigned to a region share the same HPA
>>> range. The region's address range is the system's physical address
>>> (SPA) range.
>>>
>>> Endpoint decoders in systems that need address translation use HPAs
>>> which are not SPAs. To make the SPA range accessible to the endpoint
>>> decoders, store and track the region's SPA range in struct cxl_region.
>>> Introduce the @hpa_range member to the struct. Now, the SPA range of
>>> an endpoint decoder can be determined based on its assigned region.
>>>
>>> Patch is a prerequisite to implement address translation which uses
>>> struct cxl_region to store all relevant region and interleaving
>>> parameters.
>>>
>>> Signed-off-by: Robert Richter <rrichter@amd.com>
>>
>> Just a nit below. Otherwise looks ok
>>
>> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
>>
>>> ---
>>> drivers/cxl/core/region.c | 17 +++++++++++++++++
>>> drivers/cxl/cxl.h | 2 ++
>>> 2 files changed, 19 insertions(+)
>>>
>>> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
>>> index 2c37c060d983..777d04870180 100644
>>> --- a/drivers/cxl/core/region.c
>>> +++ b/drivers/cxl/core/region.c
>>> @@ -664,6 +664,11 @@ static int alloc_hpa(struct cxl_region *cxlr, resource_size_t size)
>>> return PTR_ERR(res);
>>> }
>>>
>>> + cxlr->hpa_range = (struct range) {
>>> + .start = res->start,
>>> + .end = res->end,
>>> + };
>>> +
>>> p->res = res;
>>> p->state = CXL_CONFIG_INTERLEAVE_ACTIVE;
>>>
>>> @@ -700,8 +705,14 @@ static int free_hpa(struct cxl_region *cxlr)
>>> if (p->state >= CXL_CONFIG_ACTIVE)
>>> return -EBUSY;
>>>
>>> + cxlr->hpa_range = (struct range) {
>>> + .start = 0,
>>> + .end = -1,
>>> + };
>>> +
>>> cxl_region_iomem_release(cxlr);
>>> p->state = CXL_CONFIG_IDLE;
>>> +
>>
>> stray blank line
>>> return 0;
>>> }
>
> This small cleanup was intended and separates the return from other
> statements to better group the code in (sort of) blocks. It is not
> worth separate patch and it is common practice to have small cleanups
> in the area of code that is changed. That allows small style fixes to
> the code while reworking it, but avoids separate code cleanups causing
> extra efforts, conflicts and the risk of changing stable code.
>
> Anyway, let me know if you want me remove the change.
Yeah please just drop the change. While it's nice to have, it may potentially cause backport issues generally speaking.
>
> Thanks,
>
> -Robert
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v3 06/11] cxl/region: Separate region parameter setup and region construction
2025-09-15 7:31 ` Robert Richter
@ 2025-09-15 16:26 ` Dave Jiang
0 siblings, 0 replies; 62+ messages in thread
From: Dave Jiang @ 2025-09-15 16:26 UTC (permalink / raw)
To: Robert Richter
Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
Jonathan Cameron, Davidlohr Bueso, linux-cxl, linux-kernel,
Gregory Price, Fabio M. De Francesco, Terry Bowman, Joshua Hahn
On 9/15/25 12:31 AM, Robert Richter wrote:
> On 12.09.25 14:10:06, Dave Jiang wrote:
>>
>>
>> On 9/12/25 7:45 AM, Robert Richter wrote:
>>> To construct a region, the region parameters such as address range and
>>> interleaving config need to be determined. This is done while
>>> constructing the region by inspecting the endpoint decoder
>>> configuration. The endpoint decoder is passed as a function argument.
>>>
>>> With address translation the endpoint decoder data is no longer
>>> sufficient to extract the region parameters as some of the information
>>> is obtained using other methods such as using firmware calls.
>>>
>>> In a first step, separate code to determine and setup the region
>>> parameters from the region construction. Temporarily store all the
>>> data to create the region in the new struct cxl_region_context. Add a
>>> new function setup_region_parameters() to fill that struct and later
>>> use it to construct the region. This simplifies the extension of the
>>> function to support other methods needed, esp. to support address
>>> translation.
>>>
>>> Patch is a prerequisite to implement address translation.
>>>
>>> Signed-off-by: Robert Richter <rrichter@amd.com>
>>> ---
>>> drivers/cxl/core/region.c | 50 +++++++++++++++++++++++++++++----------
>>> 1 file changed, 38 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
>>> index 106692f1e310..57697504410b 100644
>>> --- a/drivers/cxl/core/region.c
>>> +++ b/drivers/cxl/core/region.c
>>> @@ -3414,6 +3414,26 @@ static int match_region_by_range(struct device *dev, const void *data)
>>> return 0;
>>> }
>>>
>>> +struct cxl_region_context {
>>> + struct cxl_endpoint_decoder *cxled;
>>> + struct cxl_memdev *cxlmd;
>>> + struct range hpa_range;
>>> + int interleave_ways;
>>> + int interleave_granularity;
>>> +};
>>> +
>>> +static int setup_region_params(struct cxl_endpoint_decoder *cxled,
>>> + struct cxl_region_context *ctx)
>>> +{
>>> + ctx->cxled = cxled;
>>> + ctx->cxlmd = cxled_to_memdev(cxled);
>>> + ctx->hpa_range = cxled->cxld.hpa_range;
>>> + ctx->interleave_ways = cxled->cxld.interleave_ways;
>>> + ctx->interleave_granularity = cxled->cxld.interleave_granularity;
>>
>> You can init like this:
>>
>> *ctx = (struct cxl_region_context) {
>> .cxled = cxled,
>> .cxlmd = cxled_to_memdev(cxled),
>> .hpa_range = cxled->cxld.hpa_range,
>> .interleave_ways = cxled->cxld.interleave_ways,
>> .interleave_granularity = cxled->cxld.interleave_granularity,
>> };
>
> Will change that for readability and to zero-init possibly missing
> members.
>
>>
>>
>>> +
>>> + return 0;
>>
>> Can probably make this function void if no expected errors and only assignments.
>
> A later extension to the code may return an error code, so I prepared
> the function interface already for this.
I realized that when I saw it in the later patch. So please ignore the comment.
>
> -Robert
>
>>
>> DJ
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v3 07/11] cxl: Introduce callback to translate a decoder's HPA to the next parent port
2025-09-15 7:55 ` Robert Richter
@ 2025-09-15 16:32 ` Dave Jiang
0 siblings, 0 replies; 62+ messages in thread
From: Dave Jiang @ 2025-09-15 16:32 UTC (permalink / raw)
To: Robert Richter
Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
Jonathan Cameron, Davidlohr Bueso, linux-cxl, linux-kernel,
Gregory Price, Fabio M. De Francesco, Terry Bowman, Joshua Hahn
On 9/15/25 12:55 AM, Robert Richter wrote:
> On 12.09.25 14:21:16, Dave Jiang wrote:
>>
>>
>> On 9/12/25 7:45 AM, Robert Richter wrote:
>>> To enable address translation, the endpoint decoder's HPA range must
>>> be translated when crossing memory domains to the next parent port's
>>> address ranges up to the root port. The root port's HPA range is
>>> equivalent to the system's SPA range.
>>>
>>> Introduce a callback to translate an address of the decoder's HPA
>>> range to the address range of the parent port. The callback can be set
>>> for ports that need to handle address translation.
>>>
>>> Reviewed-by: Gregory Price <gourry@gourry.net>
>>> Signed-off-by: Robert Richter <rrichter@amd.com>
>>> ---
>>> drivers/cxl/cxl.h | 13 +++++++++++++
>>> 1 file changed, 13 insertions(+)
>>>
>>> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
>>> index f182982f1c14..eb837867d932 100644
>>> --- a/drivers/cxl/cxl.h
>>> +++ b/drivers/cxl/cxl.h
>>> @@ -429,6 +429,17 @@ struct cxl_rd_ops {
>>> u64 (*spa_to_hpa)(struct cxl_root_decoder *cxlrd, u64 spa);
>>> };
>>>
>>> +/**
>>> + * cxl_to_hpa_fn - type of a callback function to translate an HPA
>>> + * @cxld: cxl_decoder to translate from
>>> + * @hpa: HPA of the @cxld decoder's address range
>>> + *
>>> + * The callback translates a decoder's HPA to the address range of the
>>> + * decoder's parent port. The return value is the translated HPA on
>>> + * success or ULLONG_MAX otherwise.
>>> + */
>>> +typedef u64 (*cxl_to_hpa_fn)(struct cxl_decoder *cxld, u64 hpa);
>>
>> cxl_to_parent_hpa_fn()?
>
> It is actually a translation from the decoder's address space to the
> address space of the port this callback is attached to. Parent port
> might be confusing here and maybe the comment needs a rework too.
> Though, finally it is used in the context where the decoder's port is
> child to the port with this callback.
>
> Comment should be more precise, ok?
All I can say is naming is hard. But cxl_to_hpa_fn() doesn't exactly convey what it does, and we need something that provides a clearer intent preferably. Maybe someone else can have a better suggestion.
DJ
>
> -Robert
>
>>
>> DJ
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v3 08/11] cxl/region: Implement endpoint decoder address translation
2025-09-15 10:46 ` Jonathan Cameron
@ 2025-09-15 16:35 ` Dave Jiang
2025-09-17 9:04 ` Robert Richter
2025-09-17 20:51 ` Gregory Price
2 siblings, 0 replies; 62+ messages in thread
From: Dave Jiang @ 2025-09-15 16:35 UTC (permalink / raw)
To: Jonathan Cameron, Robert Richter
Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
Davidlohr Bueso, linux-cxl, linux-kernel, Gregory Price,
Fabio M. De Francesco, Terry Bowman, Joshua Hahn
On 9/15/25 3:46 AM, Jonathan Cameron wrote:
> On Fri, 12 Sep 2025 16:45:10 +0200
> Robert Richter <rrichter@amd.com> wrote:
>
>> Systems that need address translation have the endpoint decoders
>> programmed for a different address space. Host physical addresses
>> (HPA) are different from their system physical addresses (SPA). The
>> decoder's address range and interleaving configuration of such
>> endpoints cannot be used to determine the region parameters. The
>> region's address range must be SPA which the decoder does not
>> provide. In addition, an endpoint's incoming HPA is already converted
>> to the devices physical address (DPA). Thus it has interleaving
>> disabled.
>>
>> Address translation may provide different ways to determine an
>> endpoint's SPA, e.g. it may support a firmware call. This allows the
>> determination of the region's parameters without inspecting the
>> endpoint decoders.
>>
>> Implement the setup of address translation given there is a function
>> to convert an endpoint's HPA (which is identical to its DPA) to an
>> SPA. Use the previously introduced cxl_to_hpa_fn callback for this.
>> Convert the decoder's address range and ensure it is 256MB aligned.
>>
>> Identify the region's interleaving ways by inspecting the address
>> ranges. Also determine the interleaving granularity using the address
>> translation callback. Note that the position of the chunk from one
>> interleaving block to the next may vary and thus cannot be considered
>> constant. Address offsets larger than the interleaving block size
>> cannot be used to calculate the granularity. Thus, probe the
>> granularity using address translation for various HPAs in the same
>> interleaving block.
>>
>> Note that this patch does not yet enable address translation as
>> callbacks have not been initialized.
>>
>> Signed-off-by: Robert Richter <rrichter@amd.com>
>> ---
>> drivers/cxl/core/region.c | 95 ++++++++++++++++++++++++++++++++++++++-
>> 1 file changed, 94 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
>> index 57697504410b..9fb1e9508213 100644
>> --- a/drivers/cxl/core/region.c
>> +++ b/drivers/cxl/core/region.c
>> @@ -3422,16 +3422,109 @@ struct cxl_region_context {
>> int interleave_granularity;
>> };
>>
>> +static int setup_address_translation(struct cxl_endpoint_decoder *cxled,
>> + struct cxl_region_context *ctx)
>> +{
>> + struct cxl_port *port = to_cxl_port(cxled->cxld.dev.parent->parent);
>
> When there is a parent->parent it always makes me nervous that I haven't
> reasoned out what port this actually is. A comment would help or
> a more specific macro where the name lets us know what we are getting.
I was also going to suggest that name 'port' to 'parent_port' as well to make it clear what it is.
DJ
>
>> + struct cxl_decoder *cxld = &cxled->cxld;
>> + struct range range = ctx->hpa_range;
>> + u64 spa_len, len = range_len(&range);
>> + u64 addr, base = range.start;
>> + int ways, gran;
>> +
>> + if (!len || !port->to_hpa)
>> + return 0;
>> +
>> + if (!IS_ALIGNED(range.start, SZ_256M) ||
>> + !IS_ALIGNED(range.end + 1, SZ_256M)) {
>> + dev_warn(&port->dev,
>> + "CXL address translation: Unaligned decoder HPA range: %#llx-%#llx(%s)\n",
>> + range.start, range.end, dev_name(&cxld->dev));
>> + return -ENXIO;
>> + }
>> +
>> + /* Translate HPA range to SPA. */
>> + range.start = port->to_hpa(cxld, range.start);
>
> This is where the generic naming as 'range' gets really confusing.
> hpa_range etc with separate struct range for each would definitely help
>
> For the checks and inputs maybe just use ctx->hpa_range directly.
>
>
>> + range.end = port->to_hpa(cxld, range.end);
> Perhaps use the DEFINE_RANGE macro or
> range = (struct range) {
> .start = ...
> style as per earlier patches.
>
>> +
>> + if (range.start == ULLONG_MAX || range.end == ULLONG_MAX) {
>> + dev_warn(&port->dev,
>> + "CXL address translation: Failed to translate HPA range: %#llx-%#llx:%#llx-%#llx(%s)\n",
>> + range.start, range.end, ctx->hpa_range.start,
>> + ctx->hpa_range.end, dev_name(&cxld->dev));
>> + return -ENXIO;
>> + }
>> +
>> + /*
>> + * Since translated addresses include the interleaving
>> + * offsets, align the range to 256 MB.
>
> So we pass in an HPA range without interleaving offsets and get back
> one with them? Is that unavoidable, or can we potentially push
> this bit into the callback? Probably with separate callbacks to
> get the interleave details.
>
> Overall I'm not really following what is going on here. Maybe
> some ascii art would help?
>
>> + */
>> + range.start = ALIGN_DOWN(range.start, SZ_256M);
>> + range.end = ALIGN(range.end, SZ_256M) - 1;
>> +
>> + spa_len = range_len(&range);
>> + if (!len || !spa_len || spa_len % len) {
>> + dev_warn(&port->dev,
>> + "CXL address translation: HPA range not contiguous: %#llx-%#llx:%#llx-%#llx(%s)\n",
>> + range.start, range.end, ctx->hpa_range.start,
>> + ctx->hpa_range.end, dev_name(&cxld->dev));
>> + return -ENXIO;
>> + }
>> +
>> + ways = spa_len / len;
>> + gran = SZ_256;
>> +
>> + /*
>> + * Determine interleave granularity
>> + *
>> + * Note: The position of the chunk from one interleaving block
>> + * to the next may vary and thus cannot be considered
>> + * constant. Address offsets larger than the interleaving
>> + * block size cannot be used to calculate the granularity.
>> + */
>> + while (ways > 1 && gran <= SZ_16M) {
>> + addr = port->to_hpa(cxld, base + gran);
>> + if (addr != base + gran)
>> + break;
>> + gran <<= 1;
>> + }
>> +
>> + if (gran > SZ_16M) {
>> + dev_warn(&port->dev,
>> + "CXL address translation: Cannot determine granularity: %#llx-%#llx:%#llx-%#llx(%s)\n",
>> + range.start, range.end, ctx->hpa_range.start,
>> + ctx->hpa_range.end, dev_name(&cxld->dev));
>> + return -ENXIO;
>> + }
>> +
>> + ctx->hpa_range = range;
>> + ctx->interleave_ways = ways;
>> + ctx->interleave_granularity = gran;
>> +
>> + dev_dbg(&cxld->dev,
>> + "address mapping found for %s (hpa -> spa): %#llx+%#llx -> %#llx+%#llx ways:%d granularity:%d\n",
>> + dev_name(ctx->cxlmd->dev.parent), base, len, range.start,
>> + spa_len, ways, gran);
>> +
>> + return 0;
>> +}
>> +
>> static int setup_region_params(struct cxl_endpoint_decoder *cxled,
>> struct cxl_region_context *ctx)
>> {
>> + int rc;
>> +
>> ctx->cxled = cxled;
>> ctx->cxlmd = cxled_to_memdev(cxled);
>> ctx->hpa_range = cxled->cxld.hpa_range;
>> ctx->interleave_ways = cxled->cxld.interleave_ways;
>> ctx->interleave_granularity = cxled->cxld.interleave_granularity;
>>
>> - return 0;
>> + rc = setup_address_translation(cxled, ctx);
>
> A quick search suggested nothing new gets added after this. As such
> return setup_address_translation(...);
> is probably appropriate here.
>
>
>> + if (rc)
>> + return rc;
>> +
>> + return rc;
>> }
>>
>> static int cxl_extended_linear_cache_resize(struct cxl_region *cxlr,
>
>
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v3 07/11] cxl: Introduce callback to translate a decoder's HPA to the next parent port
2025-09-12 14:45 ` [PATCH v3 07/11] cxl: Introduce callback to translate a decoder's HPA to the next parent port Robert Richter
2025-09-12 21:21 ` Dave Jiang
@ 2025-09-15 20:22 ` Dave Jiang
2025-09-17 8:20 ` Robert Richter
1 sibling, 1 reply; 62+ messages in thread
From: Dave Jiang @ 2025-09-15 20:22 UTC (permalink / raw)
To: Robert Richter, Alison Schofield, Vishal Verma, Ira Weiny,
Dan Williams, Jonathan Cameron, Davidlohr Bueso
Cc: linux-cxl, linux-kernel, Gregory Price, Fabio M. De Francesco,
Terry Bowman, Joshua Hahn
On 9/12/25 7:45 AM, Robert Richter wrote:
> To enable address translation, the endpoint decoder's HPA range must
> be translated when crossing memory domains to the next parent port's
> address ranges up to the root port. The root port's HPA range is
> equivalent to the system's SPA range.
>
> Introduce a callback to translate an address of the decoder's HPA
> range to the address range of the parent port. The callback can be set
> for ports that need to handle address translation.
>
> Reviewed-by: Gregory Price <gourry@gourry.net>
> Signed-off-by: Robert Richter <rrichter@amd.com>
> ---
> drivers/cxl/cxl.h | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index f182982f1c14..eb837867d932 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -429,6 +429,17 @@ struct cxl_rd_ops {
> u64 (*spa_to_hpa)(struct cxl_root_decoder *cxlrd, u64 spa);
> };
>
> +/**
> + * cxl_to_hpa_fn - type of a callback function to translate an HPA
> + * @cxld: cxl_decoder to translate from
> + * @hpa: HPA of the @cxld decoder's address range
> + *
> + * The callback translates a decoder's HPA to the address range of the
> + * decoder's parent port. The return value is the translated HPA on
> + * success or ULLONG_MAX otherwise.
> + */
> +typedef u64 (*cxl_to_hpa_fn)(struct cxl_decoder *cxld, u64 hpa);
> +
> /**
> * struct cxl_root_decoder - Static platform CXL address decoder
> * @res: host / parent resource for region allocations
> @@ -599,6 +610,7 @@ struct cxl_dax_region {
> * @parent_dport: dport that points to this port in the parent
> * @decoder_ida: allocator for decoder ids
> * @reg_map: component and ras register mapping parameters
> + * @to_hpa: Callback to translate a child port's decoder address to the port's HPA address range
> * @nr_dports: number of entries in @dports
> * @hdm_end: track last allocated HDM decoder instance for allocation ordering
> * @commit_end: cursor to track highest committed decoder for commit ordering
> @@ -619,6 +631,7 @@ struct cxl_port {
> struct cxl_dport *parent_dport;
> struct ida decoder_ida;
> struct cxl_register_map reg_map;
> + cxl_to_hpa_fn to_hpa;
The more I look at this, the more I feel the callback should be part of 'struct cxl_rd_ops' and not under each port. While this provides flexibility in a general case if there is a need to translate at each level, the actual use case in the field today only requires translation at the top as far as I can tell. And the translate functionality should be part of a decoder and not a port. And in this case, the root decoder would suffice.
DJ
> int nr_dports;
> int hdm_end;
> int commit_end;
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v3 03/11] cxl/region: Rename misleading variable name @hpa to @range
2025-09-15 10:25 ` Jonathan Cameron
@ 2025-09-17 8:10 ` Robert Richter
0 siblings, 0 replies; 62+ messages in thread
From: Robert Richter @ 2025-09-17 8:10 UTC (permalink / raw)
To: Jonathan Cameron
Cc: Dave Jiang, Alison Schofield, Vishal Verma, Ira Weiny,
Dan Williams, Davidlohr Bueso, linux-cxl, linux-kernel,
Gregory Price, Fabio M. De Francesco, Terry Bowman, Joshua Hahn
On 15.09.25 11:25:55, Jonathan Cameron wrote:
> On Mon, 15 Sep 2025 09:27:36 +0200
> Robert Richter <rrichter@amd.com> wrote:
>
> > On 12.09.25 10:33:30, Dave Jiang wrote:
> > >
> > >
> > > On 9/12/25 7:45 AM, Robert Richter wrote:
> > > > @hpa is actually a @range, rename variables accordingly.
> > >
> > > it's a range of HPA right? May as well call it 'hpa_range' to distinguish from 'dpa_range' or 'spa_range'
> >
> > To me this is ok as it is locally used only in the functions. I used
> > the short version to not hit the 80 char line limit in some of the
> > statements for readability. Range is most of the time HPA unless for
> > special cases, which use a prefix then. It also becomes clear viewed
> > in context, e.g.
> >
> > range = &cxld->hpa_range;
> >
> > Thus, I rather would like to not change that.
>
> I'm with Dave on this one. Better to have some slightly long lines
> and avoid any potential confusion.
Ok, thanks.
-Robert
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v3 02/11] cxl/region: Store HPA range in struct cxl_region
2025-09-15 10:23 ` Jonathan Cameron
@ 2025-09-17 8:15 ` Robert Richter
0 siblings, 0 replies; 62+ messages in thread
From: Robert Richter @ 2025-09-17 8:15 UTC (permalink / raw)
To: Jonathan Cameron
Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
Dave Jiang, Davidlohr Bueso, linux-cxl, linux-kernel,
Gregory Price, Fabio M. De Francesco, Terry Bowman, Joshua Hahn
On 15.09.25 11:23:41, Jonathan Cameron wrote:
> On Fri, 12 Sep 2025 16:45:04 +0200
> Robert Richter <rrichter@amd.com> wrote:
>
> > Each region has a known host physical address (HPA) range it is
> > assigned to. Endpoint decoders assigned to a region share the same HPA
> > range. The region's address range is the system's physical address
> > (SPA) range.
> >
> > Endpoint decoders in systems that need address translation use HPAs
> > which are not SPAs. To make the SPA range accessible to the endpoint
> > decoders, store and track the region's SPA range in struct cxl_region.
> > Introduce the @hpa_range member to the struct. Now, the SPA range of
> > an endpoint decoder can be determined based on its assigned region.
> >
> > Patch is a prerequisite to implement address translation which uses
> > struct cxl_region to store all relevant region and interleaving
> > parameters.
> >
> > Signed-off-by: Robert Richter <rrichter@amd.com>
> Trivial comment inline.
>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
>
> > ---
> > drivers/cxl/core/region.c | 17 +++++++++++++++++
> > drivers/cxl/cxl.h | 2 ++
> > 2 files changed, 19 insertions(+)
> >
> > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> > index 2c37c060d983..777d04870180 100644
> > --- a/drivers/cxl/core/region.c
> > +++ b/drivers/cxl/core/region.c
> > @@ -664,6 +664,11 @@ static int alloc_hpa(struct cxl_region *cxlr, resource_size_t size)
> > return PTR_ERR(res);
> > }
> >
> > + cxlr->hpa_range = (struct range) {
> > + .start = res->start,
> > + .end = res->end,
> > + };
> > +
> > p->res = res;
> > p->state = CXL_CONFIG_INTERLEAVE_ACTIVE;
> >
> > @@ -700,8 +705,14 @@ static int free_hpa(struct cxl_region *cxlr)
> > if (p->state >= CXL_CONFIG_ACTIVE)
> > return -EBUSY;
> >
> > + cxlr->hpa_range = (struct range) {
> > + .start = 0,
> > + .end = -1,
> > + };
>
> There is DEFINE_RANGE() which you could use here
> cxlr->hpa_range = DEFINE_RANGE(0, -1);
>
> Not sure if it is worth bothering though.
It is pretty new. Given the wide range of users of DEFINE_RES_*, let's
start using this one too.
-Robert
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v3 07/11] cxl: Introduce callback to translate a decoder's HPA to the next parent port
2025-09-15 20:22 ` Dave Jiang
@ 2025-09-17 8:20 ` Robert Richter
0 siblings, 0 replies; 62+ messages in thread
From: Robert Richter @ 2025-09-17 8:20 UTC (permalink / raw)
To: Dave Jiang
Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
Jonathan Cameron, Davidlohr Bueso, linux-cxl, linux-kernel,
Gregory Price, Fabio M. De Francesco, Terry Bowman, Joshua Hahn
On 15.09.25 13:22:14, Dave Jiang wrote:
> On 9/12/25 7:45 AM, Robert Richter wrote:
> > @@ -619,6 +631,7 @@ struct cxl_port {
> > struct cxl_dport *parent_dport;
> > struct ida decoder_ida;
> > struct cxl_register_map reg_map;
> > + cxl_to_hpa_fn to_hpa;
>
> The more I look at this, the more I feel the callback should be part
> of 'struct cxl_rd_ops' and not under each port. While this provides
> flexibility in a general case if there is a need to translate at
> each level, the actual use case in the field today only requires
> translation at the top as far as I can tell. And the translate
> functionality should be part of a decoder and not a port. And in
> this case, the root decoder would suffice.
Ok, I see the tendence to handle this in a more specific use case. A
change of the implementation should be possible, will change that.
Thanks,
-Robert
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v3 08/11] cxl/region: Implement endpoint decoder address translation
2025-09-15 10:46 ` Jonathan Cameron
2025-09-15 16:35 ` Dave Jiang
@ 2025-09-17 9:04 ` Robert Richter
2025-09-17 20:51 ` Gregory Price
2 siblings, 0 replies; 62+ messages in thread
From: Robert Richter @ 2025-09-17 9:04 UTC (permalink / raw)
To: Jonathan Cameron
Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
Dave Jiang, Davidlohr Bueso, linux-cxl, linux-kernel,
Gregory Price, Fabio M. De Francesco, Terry Bowman, Joshua Hahn
On 15.09.25 11:46:14, Jonathan Cameron wrote:
> On Fri, 12 Sep 2025 16:45:10 +0200
> Robert Richter <rrichter@amd.com> wrote:
>
> > Systems that need address translation have the endpoint decoders
> > programmed for a different address space. Host physical addresses
> > (HPA) are different from their system physical addresses (SPA). The
> > decoder's address range and interleaving configuration of such
> > endpoints cannot be used to determine the region parameters. The
> > region's address range must be SPA which the decoder does not
> > provide. In addition, an endpoint's incoming HPA is already converted
> > to the devices physical address (DPA). Thus it has interleaving
> > disabled.
> >
> > Address translation may provide different ways to determine an
> > endpoint's SPA, e.g. it may support a firmware call. This allows the
> > determination of the region's parameters without inspecting the
> > endpoint decoders.
> >
> > Implement the setup of address translation given there is a function
> > to convert an endpoint's HPA (which is identical to its DPA) to an
> > SPA. Use the previously introduced cxl_to_hpa_fn callback for this.
> > Convert the decoder's address range and ensure it is 256MB aligned.
> >
> > Identify the region's interleaving ways by inspecting the address
> > ranges. Also determine the interleaving granularity using the address
> > translation callback. Note that the position of the chunk from one
> > interleaving block to the next may vary and thus cannot be considered
> > constant. Address offsets larger than the interleaving block size
> > cannot be used to calculate the granularity. Thus, probe the
> > granularity using address translation for various HPAs in the same
> > interleaving block.
> >
> > Note that this patch does not yet enable address translation as
> > callbacks have not been initialized.
> >
> > Signed-off-by: Robert Richter <rrichter@amd.com>
> > ---
> > drivers/cxl/core/region.c | 95 ++++++++++++++++++++++++++++++++++++++-
> > 1 file changed, 94 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> > index 57697504410b..9fb1e9508213 100644
> > --- a/drivers/cxl/core/region.c
> > +++ b/drivers/cxl/core/region.c
> > @@ -3422,16 +3422,109 @@ struct cxl_region_context {
> > int interleave_granularity;
> > };
> >
> > +static int setup_address_translation(struct cxl_endpoint_decoder *cxled,
> > + struct cxl_region_context *ctx)
> > +{
> > + struct cxl_port *port = to_cxl_port(cxled->cxld.dev.parent->parent);
>
> When there is a parent->parent it always makes me nervous that I haven't
> reasoned out what port this actually is. A comment would help or
> a more specific macro where the name lets us know what we are getting.
Yes, will improve that. Since the implemenatation will be changed to
be more specific to only translate cxled -> host_bridge, this section
will become more readable as well.
>
> > + struct cxl_decoder *cxld = &cxled->cxld;
> > + struct range range = ctx->hpa_range;
> > + u64 spa_len, len = range_len(&range);
> > + u64 addr, base = range.start;
> > + int ways, gran;
> > +
> > + if (!len || !port->to_hpa)
> > + return 0;
> > +
> > + if (!IS_ALIGNED(range.start, SZ_256M) ||
> > + !IS_ALIGNED(range.end + 1, SZ_256M)) {
> > + dev_warn(&port->dev,
> > + "CXL address translation: Unaligned decoder HPA range: %#llx-%#llx(%s)\n",
> > + range.start, range.end, dev_name(&cxld->dev));
> > + return -ENXIO;
> > + }
> > +
> > + /* Translate HPA range to SPA. */
> > + range.start = port->to_hpa(cxld, range.start);
>
> This is where the generic naming as 'range' gets really confusing.
> hpa_range etc with separate struct range for each would definitely help
>
> For the checks and inputs maybe just use ctx->hpa_range directly.
>
>
> > + range.end = port->to_hpa(cxld, range.end);
> Perhaps use the DEFINE_RANGE macro or
> range = (struct range) {
> .start = ...
> style as per earlier patches.
Ok.
>
> > +
> > + if (range.start == ULLONG_MAX || range.end == ULLONG_MAX) {
> > + dev_warn(&port->dev,
> > + "CXL address translation: Failed to translate HPA range: %#llx-%#llx:%#llx-%#llx(%s)\n",
> > + range.start, range.end, ctx->hpa_range.start,
> > + ctx->hpa_range.end, dev_name(&cxld->dev));
> > + return -ENXIO;
> > + }
> > +
> > + /*
> > + * Since translated addresses include the interleaving
> > + * offsets, align the range to 256 MB.
>
> So we pass in an HPA range without interleaving offsets and get back
> one with them? Is that unavoidable, or can we potentially push
> this bit into the callback? Probably with separate callbacks to
> get the interleave details.
While the translation is used here to get the HPA range for whole
region regardless of the specific endpoint, the call should also
provide translated addresses of the endpoint, esp. for a later use in
tracing and error reporting. As this function extracts the range, do
the alignment here too and not in the callback.
>
> Overall I'm not really following what is going on here. Maybe
> some ascii art would help?
Uh, how about this:
___ Start of region
/ End of region ___
/ \
|----------------------------------------------------------------------|
| chunk 1 | chunk 2 | ... | ................ | chunk1 | chunk 2 | ... |
|----------------------------------------------------------------------|
\ \ / /
\ \___ Start HPA EP2 / /
\___ Start HPA EP1 / /
End HPA EP1 ____/ /
End HPA EP2 ____/
As regions are aligned 256MB, use that instead of the gran * ways
blocksize.
>
> > + */
> > + range.start = ALIGN_DOWN(range.start, SZ_256M);
> > + range.end = ALIGN(range.end, SZ_256M) - 1;
> > +
> > + spa_len = range_len(&range);
> > + if (!len || !spa_len || spa_len % len) {
> > + dev_warn(&port->dev,
> > + "CXL address translation: HPA range not contiguous: %#llx-%#llx:%#llx-%#llx(%s)\n",
> > + range.start, range.end, ctx->hpa_range.start,
> > + ctx->hpa_range.end, dev_name(&cxld->dev));
> > + return -ENXIO;
> > + }
> > +
> > + ways = spa_len / len;
> > + gran = SZ_256;
> > +
> > + /*
> > + * Determine interleave granularity
> > + *
> > + * Note: The position of the chunk from one interleaving block
> > + * to the next may vary and thus cannot be considered
> > + * constant. Address offsets larger than the interleaving
> > + * block size cannot be used to calculate the granularity.
> > + */
> > + while (ways > 1 && gran <= SZ_16M) {
> > + addr = port->to_hpa(cxld, base + gran);
> > + if (addr != base + gran)
> > + break;
> > + gran <<= 1;
> > + }
> > +
> > + if (gran > SZ_16M) {
> > + dev_warn(&port->dev,
> > + "CXL address translation: Cannot determine granularity: %#llx-%#llx:%#llx-%#llx(%s)\n",
> > + range.start, range.end, ctx->hpa_range.start,
> > + ctx->hpa_range.end, dev_name(&cxld->dev));
> > + return -ENXIO;
> > + }
> > +
> > + ctx->hpa_range = range;
> > + ctx->interleave_ways = ways;
> > + ctx->interleave_granularity = gran;
> > +
> > + dev_dbg(&cxld->dev,
> > + "address mapping found for %s (hpa -> spa): %#llx+%#llx -> %#llx+%#llx ways:%d granularity:%d\n",
> > + dev_name(ctx->cxlmd->dev.parent), base, len, range.start,
> > + spa_len, ways, gran);
> > +
> > + return 0;
> > +}
> > +
> > static int setup_region_params(struct cxl_endpoint_decoder *cxled,
> > struct cxl_region_context *ctx)
> > {
> > + int rc;
> > +
> > ctx->cxled = cxled;
> > ctx->cxlmd = cxled_to_memdev(cxled);
> > ctx->hpa_range = cxled->cxld.hpa_range;
> > ctx->interleave_ways = cxled->cxld.interleave_ways;
> > ctx->interleave_granularity = cxled->cxld.interleave_granularity;
> >
> > - return 0;
> > + rc = setup_address_translation(cxled, ctx);
>
> A quick search suggested nothing new gets added after this. As such
> return setup_address_translation(...);
> is probably appropriate here.
>
>
> > + if (rc)
> > + return rc;
> > +
> > + return rc;
Considered a tail call here too but dropped that idea. That would
suggest the function should not be extended. But function is open to
extend the setup, maybe Low Memory Hole or so. However, there are
advantages for both and can change that.
Thanks for review,
-Robert
> > }
> >
> > static int cxl_extended_linear_cache_resize(struct cxl_region *cxlr,
>
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v3 11/11] cxl: Enable AMD Zen5 address translation using ACPI PRMT
2025-09-15 10:59 ` Jonathan Cameron
@ 2025-09-17 9:43 ` Robert Richter
2025-09-17 13:50 ` Jonathan Cameron
0 siblings, 1 reply; 62+ messages in thread
From: Robert Richter @ 2025-09-17 9:43 UTC (permalink / raw)
To: Jonathan Cameron
Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
Dave Jiang, Davidlohr Bueso, linux-cxl, linux-kernel,
Gregory Price, Fabio M. De Francesco, Terry Bowman, Joshua Hahn
On 15.09.25 11:59:48, Jonathan Cameron wrote:
> On Fri, 12 Sep 2025 16:45:13 +0200
> Robert Richter <rrichter@amd.com> wrote:
>
> > Add AMD Zen5 support for address translation.
> >
> > Zen5 systems may be configured to use 'Normalized addresses'. Then,
> > CXL endpoints use their own physical address space and are programmed
> > passthrough (DPA == HPA), the number of interleaving ways for the
> > endpoint is set to one. The Host Physical Addresses (HPAs) need to be
> > translated from the endpoint to its CXL host bridge. The HPA of a CXL
> > host bridge is equivalent to the System Physical Address (SPA).
> >
> > ACPI Platform Runtime Mechanism (PRM) is used to translate the CXL
> > Device Physical Address (DPA) to its System Physical Address. This is
> > documented in:
> >
> > AMD Family 1Ah Models 00h–0Fh and Models 10h–1Fh
> > ACPI v6.5 Porting Guide, Publication # 58088
> > https://www.amd.com/en/search/documentation/hub.html
> >
> > To implement AMD Zen5 address translation the following steps are
> > needed:
> >
> > AMD Zen5 systems support the ACPI PRM CXL Address Translation firmware
> > call (Address Translation - CXL DPA to System Physical Address, see
> > ACPI v6.5 Porting Guide above) when address translation is enabled.
> > The existence of the callback can be identified using a specific GUID
> > as documented. The initialization code checks firmware and kernel
> > support of ACPI PRM.
> >
> > Introduce a new file core/atl.c to handle ACPI PRM specific address
> > translation code. Naming is loosely related to the kernel's AMD
> > Address Translation Library (CONFIG_AMD_ATL) but implementation does
> > not dependent on it, nor it is vendor specific. Use Kbuild and Kconfig
> > options respectively to enable the code depending on architecture and
> > platform options.
> >
> > Implement an ACPI PRM firmware call for CXL address translation in the
> > new function cxl_prm_to_hpa(). This includes sanity checks. Enable the
> > callback for applicable CXL host bridges using the new cxl_atl_init()
> > function.
> >
> > Signed-off-by: Robert Richter <rrichter@amd.com>
> A few minor additions inline.
>
> J
> > ---
> > drivers/cxl/Kconfig | 4 ++
> > drivers/cxl/core/Makefile | 1 +
> > drivers/cxl/core/atl.c | 138 ++++++++++++++++++++++++++++++++++++++
> > drivers/cxl/core/core.h | 1 +
> > drivers/cxl/core/port.c | 8 +++
> > 5 files changed, 152 insertions(+)
> > create mode 100644 drivers/cxl/core/atl.c
> >
> > diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> > index 48b7314afdb8..31f9c96ef908 100644
> > --- a/drivers/cxl/Kconfig
> > +++ b/drivers/cxl/Kconfig
> > @@ -233,4 +233,8 @@ config CXL_MCE
> > def_bool y
> > depends on X86_MCE && MEMORY_FAILURE
> >
> > +config CXL_ATL
> > + def_bool y
>
> Given no help we can't turn this off manually and it's down to
> whether ACPI_PRMT is configured or not.
>
> To me this feels like something we should be able to control.
> Not a huge amount of code, but none the less 'so far' it only
> applies to particular AMD platforms yet ACPI_PRMT gets built
> on ARM platforms and other stuff even on AMD (CONFIG_AMD_ATL_PRM)
How about default y where possible but have a menu entry to disable
address translation?
config CXL_ATL
bool "CXL Address Translation support"
default y
depends on ACPI_PRMT
I don't want to make it specific to AMD.
>
>
>
> > + depends on ACPI_PRMT
> > +
> > endif
>
> > diff --git a/drivers/cxl/core/atl.c b/drivers/cxl/core/atl.c
> > new file mode 100644
> > index 000000000000..5fc21eddaade
> > --- /dev/null
> > +++ b/drivers/cxl/core/atl.c
>
> > +struct prm_cxl_dpa_spa_data {
> > + u64 dpa;
> > + u8 reserved;
> > + u8 devfn;
> > + u8 bus;
> > + u8 segment;
> > + void *out;
>
> If reality is out is always a u64 * maybe just give it that type.
Will check that.
>
> > +} __packed;
> > +
> > +static u64 prm_cxl_dpa_spa(struct pci_dev *pci_dev, u64 dpa)
> > +{
> > + struct prm_cxl_dpa_spa_data data;
> > + u64 spa;
> > + int rc;
> > +
> > + data = (struct prm_cxl_dpa_spa_data) {
> > + .dpa = dpa,
> > + .devfn = pci_dev->devfn,
> > + .bus = pci_dev->bus->number,
> > + .segment = pci_domain_nr(pci_dev->bus),
> > + .out = &spa,
> > + };
> > +
> > + rc = acpi_call_prm_handler(prm_cxl_dpa_spa_guid, &data);
> > + if (rc) {
> > + pci_dbg(pci_dev, "failed to get SPA for %#llx: %d\n", dpa, rc);
> > + return ULLONG_MAX;
> > + }
> > +
> > + pci_dbg(pci_dev, "PRM address translation: DPA -> SPA: %#llx -> %#llx\n", dpa, spa);
> > +
> > + return spa;
> > +}
> > +
> > +static u64 cxl_prm_to_hpa(struct cxl_decoder *cxld, u64 hpa)
> > +{
>
> > + pci_dev = to_pci_dev(cxlmd->dev.parent);
>
>
> return prm_cxl_dpa_spa(to_pci_dev(cxlmd->dev.parent), hpa);
> seem fine to me and shortens things a little.
Ok.
>
> > +
> > + return prm_cxl_dpa_spa(pci_dev, hpa);
> > +}
> > +
> > +static void cxl_prm_init(struct cxl_port *port)
> > +{
> > + u64 spa;
> > + struct prm_cxl_dpa_spa_data data = { .out = &spa, };
> > + int rc;
> > +
> > + if (!check_prm_address_translation(port))
> > + return;
> > +
> > + /* Check kernel (-EOPNOTSUPP) and firmware support (-ENODEV) */
> > + rc = acpi_call_prm_handler(prm_cxl_dpa_spa_guid, &data);
> > + if (rc == -EOPNOTSUPP || rc == -ENODEV)
> > + return;
>
> So other error values are fine? IF they don't occur no need to be explicit
> just check rc < 0 and return.
This is just to check the existence of the PRM, but it will fail (if
exists) here as parameters are a stub only. Both error codes are
reserved for firmware or kernel support respectively. Else, it returns
the PRM's error code, which is ignored here.
>
> > +
> > + port->to_hpa = cxl_prm_to_hpa;
> > +
> > + dev_dbg(port->host_bridge, "PRM address translation enabled for %s.\n",
> > + dev_name(&port->dev));
> > +}
> > +
> > +void cxl_atl_init(struct cxl_port *port)
> > +{
> > + cxl_prm_init(port);
> Why not just rename cxl_prm_init() to cxl_atl_init() and get rid of this wrapper?
cxl_prm_init() handles the PRM specifics, while cxl_atl_init() is used
as an entry for the core module to enable address translation. I
thought it would be misleading to name cxl_prm_init() different. The
compiler result should be the same for both.
-Robert
>
> > +}
>
>
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v3 11/11] cxl: Enable AMD Zen5 address translation using ACPI PRMT
2025-09-17 9:43 ` Robert Richter
@ 2025-09-17 13:50 ` Jonathan Cameron
0 siblings, 0 replies; 62+ messages in thread
From: Jonathan Cameron @ 2025-09-17 13:50 UTC (permalink / raw)
To: Robert Richter
Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
Dave Jiang, Davidlohr Bueso, linux-cxl, linux-kernel,
Gregory Price, Fabio M. De Francesco, Terry Bowman, Joshua Hahn
On Wed, 17 Sep 2025 11:43:39 +0200
Robert Richter <rrichter@amd.com> wrote:
> On 15.09.25 11:59:48, Jonathan Cameron wrote:
> > On Fri, 12 Sep 2025 16:45:13 +0200
> > Robert Richter <rrichter@amd.com> wrote:
> >
> > > Add AMD Zen5 support for address translation.
> > >
> > > Zen5 systems may be configured to use 'Normalized addresses'. Then,
> > > CXL endpoints use their own physical address space and are programmed
> > > passthrough (DPA == HPA), the number of interleaving ways for the
> > > endpoint is set to one. The Host Physical Addresses (HPAs) need to be
> > > translated from the endpoint to its CXL host bridge. The HPA of a CXL
> > > host bridge is equivalent to the System Physical Address (SPA).
> > >
> > > ACPI Platform Runtime Mechanism (PRM) is used to translate the CXL
> > > Device Physical Address (DPA) to its System Physical Address. This is
> > > documented in:
> > >
> > > AMD Family 1Ah Models 00h–0Fh and Models 10h–1Fh
> > > ACPI v6.5 Porting Guide, Publication # 58088
> > > https://www.amd.com/en/search/documentation/hub.html
> > >
> > > To implement AMD Zen5 address translation the following steps are
> > > needed:
> > >
> > > AMD Zen5 systems support the ACPI PRM CXL Address Translation firmware
> > > call (Address Translation - CXL DPA to System Physical Address, see
> > > ACPI v6.5 Porting Guide above) when address translation is enabled.
> > > The existence of the callback can be identified using a specific GUID
> > > as documented. The initialization code checks firmware and kernel
> > > support of ACPI PRM.
> > >
> > > Introduce a new file core/atl.c to handle ACPI PRM specific address
> > > translation code. Naming is loosely related to the kernel's AMD
> > > Address Translation Library (CONFIG_AMD_ATL) but implementation does
> > > not dependent on it, nor it is vendor specific. Use Kbuild and Kconfig
> > > options respectively to enable the code depending on architecture and
> > > platform options.
> > >
> > > Implement an ACPI PRM firmware call for CXL address translation in the
> > > new function cxl_prm_to_hpa(). This includes sanity checks. Enable the
> > > callback for applicable CXL host bridges using the new cxl_atl_init()
> > > function.
> > >
> > > Signed-off-by: Robert Richter <rrichter@amd.com>
> > A few minor additions inline.
> >
> > J
> > > ---
> > > drivers/cxl/Kconfig | 4 ++
> > > drivers/cxl/core/Makefile | 1 +
> > > drivers/cxl/core/atl.c | 138 ++++++++++++++++++++++++++++++++++++++
> > > drivers/cxl/core/core.h | 1 +
> > > drivers/cxl/core/port.c | 8 +++
> > > 5 files changed, 152 insertions(+)
> > > create mode 100644 drivers/cxl/core/atl.c
> > >
> > > diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> > > index 48b7314afdb8..31f9c96ef908 100644
> > > --- a/drivers/cxl/Kconfig
> > > +++ b/drivers/cxl/Kconfig
> > > @@ -233,4 +233,8 @@ config CXL_MCE
> > > def_bool y
> > > depends on X86_MCE && MEMORY_FAILURE
> > >
> > > +config CXL_ATL
> > > + def_bool y
> >
> > Given no help we can't turn this off manually and it's down to
> > whether ACPI_PRMT is configured or not.
> >
> > To me this feels like something we should be able to control.
> > Not a huge amount of code, but none the less 'so far' it only
> > applies to particular AMD platforms yet ACPI_PRMT gets built
> > on ARM platforms and other stuff even on AMD (CONFIG_AMD_ATL_PRM)
>
> How about default y where possible but have a menu entry to disable
> address translation?
>
> config CXL_ATL
> bool "CXL Address Translation support"
> default y
> depends on ACPI_PRMT
>
> I don't want to make it specific to AMD.
Can we drop the default y?
That feels to me like a defconfig thing rather than a thing we should
apply generally.
Also remember to add some detailed help text given it's now visible in
menuconfig etc.
>
> >
> >
> >
> > > + depends on ACPI_PRMT
> > > +
> > > endif
> >
> > > +static void cxl_prm_init(struct cxl_port *port)
> > > +{
> > > + u64 spa;
> > > + struct prm_cxl_dpa_spa_data data = { .out = &spa, };
> > > + int rc;
> > > +
> > > + if (!check_prm_address_translation(port))
> > > + return;
> > > +
> > > + /* Check kernel (-EOPNOTSUPP) and firmware support (-ENODEV) */
> > > + rc = acpi_call_prm_handler(prm_cxl_dpa_spa_guid, &data);
> > > + if (rc == -EOPNOTSUPP || rc == -ENODEV)
> > > + return;
> >
> > So other error values are fine? IF they don't occur no need to be explicit
> > just check rc < 0 and return.
>
> This is just to check the existence of the PRM, but it will fail (if
> exists) here as parameters are a stub only. Both error codes are
> reserved for firmware or kernel support respectively. Else, it returns
> the PRM's error code, which is ignored here.
Why is the PRM error code not something we want to pay attention to?
Perhaps that's the comment that is missing. I guess because we will
see any consistent failure later?
>
> >
> > > +
> > > + port->to_hpa = cxl_prm_to_hpa;
> > > +
> > > + dev_dbg(port->host_bridge, "PRM address translation enabled for %s.\n",
> > > + dev_name(&port->dev));
> > > +}
> > > +
> > > +void cxl_atl_init(struct cxl_port *port)
> > > +{
> > > + cxl_prm_init(port);
> > Why not just rename cxl_prm_init() to cxl_atl_init() and get rid of this wrapper?
>
> cxl_prm_init() handles the PRM specifics, while cxl_atl_init() is used
> as an entry for the core module to enable address translation. I
> thought it would be misleading to name cxl_prm_init() different. The
> compiler result should be the same for both.
Ok. I'm not sure it's worth the separation but as you say it'll almost
certainly get flattened anyway!
Jonathan
>
> -Robert
>
> >
> > > +}
> >
> >
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v3 01/11] cxl/region: Store root decoder in struct cxl_region
2025-09-12 14:45 ` [PATCH v3 01/11] cxl/region: Store root decoder in struct cxl_region Robert Richter
2025-09-12 15:52 ` Dave Jiang
@ 2025-09-17 19:56 ` Gregory Price
2025-09-23 21:40 ` Alison Schofield
2 siblings, 0 replies; 62+ messages in thread
From: Gregory Price @ 2025-09-17 19:56 UTC (permalink / raw)
To: Robert Richter
Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
Jonathan Cameron, Dave Jiang, Davidlohr Bueso, linux-cxl,
linux-kernel, Fabio M. De Francesco, Terry Bowman, Joshua Hahn
On Fri, Sep 12, 2025 at 04:45:03PM +0200, Robert Richter wrote:
> A region is always bound to a root decoder. The region's associated
> root decoder is often needed. Add it to struct cxl_region.
>
> This simplifies code by removing dynamic lookups and removing the root
> decoder argument from the function argument list where possible.
>
> Patch is a prerequisite to implement address translation which uses
> struct cxl_region to store all relevant region and interleaving
> parameters.
>
> Signed-off-by: Robert Richter <rrichter@amd.com>
Nice
Reviewed-by: Gregory Price <gourry@gourry.net>
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v3 02/11] cxl/region: Store HPA range in struct cxl_region
2025-09-12 14:45 ` [PATCH v3 02/11] cxl/region: Store HPA range " Robert Richter
2025-09-12 17:17 ` Dave Jiang
2025-09-15 10:23 ` Jonathan Cameron
@ 2025-09-17 19:58 ` Gregory Price
2 siblings, 0 replies; 62+ messages in thread
From: Gregory Price @ 2025-09-17 19:58 UTC (permalink / raw)
To: Robert Richter
Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
Jonathan Cameron, Dave Jiang, Davidlohr Bueso, linux-cxl,
linux-kernel, Fabio M. De Francesco, Terry Bowman, Joshua Hahn
On Fri, Sep 12, 2025 at 04:45:04PM +0200, Robert Richter wrote:
> Each region has a known host physical address (HPA) range it is
> assigned to. Endpoint decoders assigned to a region share the same HPA
> range. The region's address range is the system's physical address
> (SPA) range.
>
> Endpoint decoders in systems that need address translation use HPAs
> which are not SPAs. To make the SPA range accessible to the endpoint
> decoders, store and track the region's SPA range in struct cxl_region.
> Introduce the @hpa_range member to the struct. Now, the SPA range of
> an endpoint decoder can be determined based on its assigned region.
>
> Patch is a prerequisite to implement address translation which uses
> struct cxl_region to store all relevant region and interleaving
> parameters.
>
> Signed-off-by: Robert Richter <rrichter@amd.com>
Reviewed-by: Gregory Price <gourry@gourry.net>
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v3 03/11] cxl/region: Rename misleading variable name @hpa to @range
2025-09-12 14:45 ` [PATCH v3 03/11] cxl/region: Rename misleading variable name @hpa to @range Robert Richter
2025-09-12 17:33 ` Dave Jiang
@ 2025-09-17 20:01 ` Gregory Price
1 sibling, 0 replies; 62+ messages in thread
From: Gregory Price @ 2025-09-17 20:01 UTC (permalink / raw)
To: Robert Richter
Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
Jonathan Cameron, Dave Jiang, Davidlohr Bueso, linux-cxl,
linux-kernel, Fabio M. De Francesco, Terry Bowman, Joshua Hahn
On Fri, Sep 12, 2025 at 04:45:05PM +0200, Robert Richter wrote:
> @hpa is actually a @range, rename variables accordingly.
>
> Signed-off-by: Robert Richter <rrichter@amd.com>
Nothing major to add on naming feedback so otherwise this is fine.
Reviewed-by: Gregory Price <gourry@gourry.net>
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v3 04/11] cxl/region: Add @range argument to function cxl_find_root_decoder()
2025-09-12 14:45 ` [PATCH v3 04/11] cxl/region: Add @range argument to function cxl_find_root_decoder() Robert Richter
@ 2025-09-17 20:05 ` Gregory Price
0 siblings, 0 replies; 62+ messages in thread
From: Gregory Price @ 2025-09-17 20:05 UTC (permalink / raw)
To: Robert Richter
Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
Jonathan Cameron, Dave Jiang, Davidlohr Bueso, linux-cxl,
linux-kernel, Fabio M. De Francesco, Terry Bowman, Joshua Hahn
On Fri, Sep 12, 2025 at 04:45:06PM +0200, Robert Richter wrote:
> cxl_find_root_decoder() uses the endpoint decoder's HPA range to find
> the root decoder. This requires endpoints and root decoders to be in
> the same memory domain, which is not the case for systems that need
> address translation.
>
> Add a separate @range argument to function cxl_find_root_decoder() to
> specify the root decoder's address range. Now it is possible to pass a
> translated address range of an endpoint decoder to function
> cxl_find_root_decoder().
>
> Patch is a prerequisite to implement address translation.
>
> Signed-off-by: Robert Richter <rrichter@amd.com>
Maybe worth noting that this patch is just a refactor (i.e. no-op).
Reviewed-by: Gregory Price <gourry@gourry.net>
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v3 05/11] cxl/region: Add @range argument to function cxl_calc_interleave_pos()
2025-09-12 14:45 ` [PATCH v3 05/11] cxl/region: Add @range argument to function cxl_calc_interleave_pos() Robert Richter
@ 2025-09-17 20:09 ` Gregory Price
2025-09-23 21:52 ` Alison Schofield
1 sibling, 0 replies; 62+ messages in thread
From: Gregory Price @ 2025-09-17 20:09 UTC (permalink / raw)
To: Robert Richter
Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
Jonathan Cameron, Dave Jiang, Davidlohr Bueso, linux-cxl,
linux-kernel, Fabio M. De Francesco, Terry Bowman, Joshua Hahn
On Fri, Sep 12, 2025 at 04:45:07PM +0200, Robert Richter wrote:
> cxl_calc_interleave_pos() uses the endpoint decoder's HPA range to
> determine its interleaving position. This requires the endpoint
> decoders to be an SPA, which is not the case for systems that need
> address translation.
>
> Add a separate @range argument to function cxl_calc_interleave_pos()
> to specify the address range. Now it is possible to pass the SPA
> translated address range of an endpoint decoder to function
> cxl_calc_interleave_pos().
>
> Patch is a prerequisite to implement address translation.
>
> Signed-off-by: Robert Richter <rrichter@amd.com>
As with patch 4, maybe worth noting that this is simply a refactor, but
otherwise
Reviewed-by: Gregory Price <gourry@gourry.net>
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v3 06/11] cxl/region: Separate region parameter setup and region construction
2025-09-12 14:45 ` [PATCH v3 06/11] cxl/region: Separate region parameter setup and region construction Robert Richter
2025-09-12 21:10 ` Dave Jiang
@ 2025-09-17 20:15 ` Gregory Price
1 sibling, 0 replies; 62+ messages in thread
From: Gregory Price @ 2025-09-17 20:15 UTC (permalink / raw)
To: Robert Richter
Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
Jonathan Cameron, Dave Jiang, Davidlohr Bueso, linux-cxl,
linux-kernel, Fabio M. De Francesco, Terry Bowman, Joshua Hahn
On Fri, Sep 12, 2025 at 04:45:08PM +0200, Robert Richter wrote:
> To construct a region, the region parameters such as address range and
> interleaving config need to be determined. This is done while
> constructing the region by inspecting the endpoint decoder
> configuration. The endpoint decoder is passed as a function argument.
>
> With address translation the endpoint decoder data is no longer
> sufficient to extract the region parameters as some of the information
> is obtained using other methods such as using firmware calls.
>
> In a first step, separate code to determine and setup the region
> parameters from the region construction. Temporarily store all the
> data to create the region in the new struct cxl_region_context. Add a
> new function setup_region_parameters() to fill that struct and later
> use it to construct the region. This simplifies the extension of the
> function to support other methods needed, esp. to support address
> translation.
>
> Patch is a prerequisite to implement address translation.
>
> Signed-off-by: Robert Richter <rrichter@amd.com>
Dave already got the init code, so with that change feel free to add
Reviewed-by: Gregory Price <gourry@gourry.net>
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v3 08/11] cxl/region: Implement endpoint decoder address translation
2025-09-15 10:46 ` Jonathan Cameron
2025-09-15 16:35 ` Dave Jiang
2025-09-17 9:04 ` Robert Richter
@ 2025-09-17 20:51 ` Gregory Price
2025-09-17 20:57 ` Dave Jiang
2025-09-18 13:59 ` Gregory Price
2 siblings, 2 replies; 62+ messages in thread
From: Gregory Price @ 2025-09-17 20:51 UTC (permalink / raw)
To: Jonathan Cameron
Cc: Robert Richter, Alison Schofield, Vishal Verma, Ira Weiny,
Dan Williams, Dave Jiang, Davidlohr Bueso, linux-cxl,
linux-kernel, Fabio M. De Francesco, Terry Bowman, Joshua Hahn
On Mon, Sep 15, 2025 at 11:46:14AM +0100, Jonathan Cameron wrote:
> > + /*
> > + * Since translated addresses include the interleaving
> > + * offsets, align the range to 256 MB.
>
> So we pass in an HPA range without interleaving offsets and get back
> one with them? Is that unavoidable, or can we potentially push
> this bit into the callback? Probably with separate callbacks to
> get the interleave details.
>
> Overall I'm not really following what is going on here. Maybe
> some ascii art would help?
>
The endpoints in this case are encoded with "normalized" (base-0) with
a size of only the memory they provide. As a result, the decoder
interleave settings will always be passthrough (iw=1, ig=ignored).
This chunk translates the normalized address region to the relevant SPA
region, and translates the IW/IG to what it actually is (i.e. what it
*would have* been on a "normal" system).
Took me a while when i originally reviewed and tested this set.
Example - this is how you'd expect a real system supported by this code
to be programmed:
region {
.start = 0x20000000
.end = 0x3fffffff
.iw = 2
.ig = 256
}
endpoint1_decoder {
.start = 0x0
.end = 0xfffffff
.iw = 1
.ig = 256
}
endpoint2_decoder {
.start = 0x0
.end = 0xfffffff
.iw = 1
.ig = 256
}
when you do the translation from either decoder's hpa start/end,
you want the following output:
range {
.start = 0x20000000
.end = 0x3fffffff
.iw = 2
.ig = 256
}
If you assume a "normal" system - this is the settings the decoders
would have been programmed with in the first place.
You have to do the alignment because the translation function (may)
only work on granularity alignment.
Example:
endpoint1->to_hpa(0) => 0x0
endpoint1->to_hpa(0xfffffff) => 0xffffe00
endpoint2->to_hpa(0) => 0x100
endpoint2->to_hpa(0xfffffff) => 0xfffff00
So this code applies the appropriate alignment and returns the
translated iw/ig for use elsewhere in the stack when validating the rest
of the decoders.
(haven't gotten to later commits, but iirc it was eventually used)
~Gregory
> > + */
> > + range.start = ALIGN_DOWN(range.start, SZ_256M);
> > + range.end = ALIGN(range.end, SZ_256M) - 1;
> > +
> > + spa_len = range_len(&range);
> > + if (!len || !spa_len || spa_len % len) {
> > + dev_warn(&port->dev,
> > + "CXL address translation: HPA range not contiguous: %#llx-%#llx:%#llx-%#llx(%s)\n",
> > + range.start, range.end, ctx->hpa_range.start,
> > + ctx->hpa_range.end, dev_name(&cxld->dev));
> > + return -ENXIO;
> > + }
> > +
> > + ways = spa_len / len;
> > + gran = SZ_256;
> > +
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v3 09/11] cxl/region: Lock decoders that need address translation
2025-09-12 14:45 ` [PATCH v3 09/11] cxl/region: Lock decoders that need " Robert Richter
@ 2025-09-17 20:52 ` Gregory Price
0 siblings, 0 replies; 62+ messages in thread
From: Gregory Price @ 2025-09-17 20:52 UTC (permalink / raw)
To: Robert Richter
Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
Jonathan Cameron, Dave Jiang, Davidlohr Bueso, linux-cxl,
linux-kernel, Fabio M. De Francesco, Terry Bowman, Joshua Hahn
On Fri, Sep 12, 2025 at 04:45:11PM +0200, Robert Richter wrote:
> There is only support to translate addresses from an endpoint to its
> parent port, but not in the opposite direction from the parent to the
> endpoint. Thus, the endpoint address range cannot be determined and
> setup manually for a given SPA range of a region. If the parent
> implements the ->to_hpa() callback, address translation is
> needed. Then, forbid reprogramming of the decoders and lock them.
>
> Signed-off-by: Robert Richter <rrichter@amd.com>
Reviewed-by: Gregory Price <gourry@gourry.net>
> ---
> drivers/cxl/core/region.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 9fb1e9508213..44ea59252ff0 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -3497,6 +3497,16 @@ static int setup_address_translation(struct cxl_endpoint_decoder *cxled,
> return -ENXIO;
> }
>
> + /*
> + * There is only support to translate from the endpoint to its
> + * parent port, but not in the opposite direction from the
> + * parent to the endpoint. Thus, the endpoint address range
> + * cannot be determined and setup manually. If the address range
> + * was translated and modified, forbid reprogramming of the
> + * decoders and lock them.
> + */
> + cxld->flags |= CXL_DECODER_F_LOCK;
> +
> ctx->hpa_range = range;
> ctx->interleave_ways = ways;
> ctx->interleave_granularity = gran;
> --
> 2.39.5
>
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v3 10/11] cxl/acpi: Prepare use of EFI runtime services
2025-09-12 14:45 ` [PATCH v3 10/11] cxl/acpi: Prepare use of EFI runtime services Robert Richter
@ 2025-09-17 20:54 ` Gregory Price
0 siblings, 0 replies; 62+ messages in thread
From: Gregory Price @ 2025-09-17 20:54 UTC (permalink / raw)
To: Robert Richter
Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
Jonathan Cameron, Dave Jiang, Davidlohr Bueso, linux-cxl,
linux-kernel, Fabio M. De Francesco, Terry Bowman, Joshua Hahn
On Fri, Sep 12, 2025 at 04:45:12PM +0200, Robert Richter wrote:
> In order to use EFI runtime services, esp. ACPI PRM which uses the
> efi_rts_wq workqueue, initialize EFI before CXL ACPI.
>
> There is a subsys_initcall order dependency if driver is builtin:
>
> subsys_initcall(cxl_acpi_init);
> subsys_initcall(efisubsys_init);
>
> Prevent the efi_rts_wq workqueue being used by cxl_acpi_init() before
> its allocation. Use subsys_initcall_sync(cxl_acpi_init) to always run
> efisubsys_init() first.
>
> Reported-by: Gregory Price <gourry@gourry.net>
> Tested-by: Joshua Hahn <joshua.hahnjy@gmail.com>
> Reviewed-by: Joshua Hahn <joshua.hahnjy@gmail.com>
Reviewed-by: Gregory Price <gourry@gourry.net>
> Signed-off-by: Robert Richter <rrichter@amd.com>
> ---
> drivers/cxl/acpi.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> index 26c494704437..95a5ba395c1a 100644
> --- a/drivers/cxl/acpi.c
> +++ b/drivers/cxl/acpi.c
> @@ -1012,8 +1012,12 @@ static void __exit cxl_acpi_exit(void)
> cxl_bus_drain();
> }
>
> -/* load before dax_hmem sees 'Soft Reserved' CXL ranges */
> -subsys_initcall(cxl_acpi_init);
> +/*
> + * Load before dax_hmem sees 'Soft Reserved' CXL ranges. Use
> + * subsys_initcall_sync() since there is an order dependency with
> + * subsys_initcall(efisubsys_init), which must run first.
> + */
> +subsys_initcall_sync(cxl_acpi_init);
>
> /*
> * Arrange for host-bridge ports to be active synchronous with
> --
> 2.39.5
>
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v3 08/11] cxl/region: Implement endpoint decoder address translation
2025-09-17 20:51 ` Gregory Price
@ 2025-09-17 20:57 ` Dave Jiang
2025-09-18 13:59 ` Gregory Price
1 sibling, 0 replies; 62+ messages in thread
From: Dave Jiang @ 2025-09-17 20:57 UTC (permalink / raw)
To: Gregory Price, Jonathan Cameron
Cc: Robert Richter, Alison Schofield, Vishal Verma, Ira Weiny,
Dan Williams, Davidlohr Bueso, linux-cxl, linux-kernel,
Fabio M. De Francesco, Terry Bowman, Joshua Hahn
On 9/17/25 1:51 PM, Gregory Price wrote:
> On Mon, Sep 15, 2025 at 11:46:14AM +0100, Jonathan Cameron wrote:
>>> + /*
>>> + * Since translated addresses include the interleaving
>>> + * offsets, align the range to 256 MB.
>>
>> So we pass in an HPA range without interleaving offsets and get back
>> one with them? Is that unavoidable, or can we potentially push
>> this bit into the callback? Probably with separate callbacks to
>> get the interleave details.
>>
>> Overall I'm not really following what is going on here. Maybe
>> some ascii art would help?
>>
>
> The endpoints in this case are encoded with "normalized" (base-0) with
> a size of only the memory they provide. As a result, the decoder
> interleave settings will always be passthrough (iw=1, ig=ignored).
>
> This chunk translates the normalized address region to the relevant SPA
> region, and translates the IW/IG to what it actually is (i.e. what it
> *would have* been on a "normal" system).
>
> Took me a while when i originally reviewed and tested this set.
>
> Example - this is how you'd expect a real system supported by this code
> to be programmed:
>
> region {
> .start = 0x20000000
> .end = 0x3fffffff
> .iw = 2
> .ig = 256
> }
>
> endpoint1_decoder {
> .start = 0x0
> .end = 0xfffffff
> .iw = 1
> .ig = 256
> }
>
> endpoint2_decoder {
> .start = 0x0
> .end = 0xfffffff
> .iw = 1
> .ig = 256
> }
>
> when you do the translation from either decoder's hpa start/end,
> you want the following output:
>
> range {
> .start = 0x20000000
> .end = 0x3fffffff
> .iw = 2
> .ig = 256
> }
>
> If you assume a "normal" system - this is the settings the decoders
> would have been programmed with in the first place.
>
> You have to do the alignment because the translation function (may)
> only work on granularity alignment.
>
> Example:
> endpoint1->to_hpa(0) => 0x0
> endpoint1->to_hpa(0xfffffff) => 0xffffe00
> endpoint2->to_hpa(0) => 0x100
> endpoint2->to_hpa(0xfffffff) => 0xfffff00
>
> So this code applies the appropriate alignment and returns the
> translated iw/ig for use elsewhere in the stack when validating the rest
> of the decoders.
Having this explanation added to the Conventions document would be good to have.
>
> (haven't gotten to later commits, but iirc it was eventually used)
>
> ~Gregory
>
>>> + */
>>> + range.start = ALIGN_DOWN(range.start, SZ_256M);
>>> + range.end = ALIGN(range.end, SZ_256M) - 1;
>>> +
>>> + spa_len = range_len(&range);
>>> + if (!len || !spa_len || spa_len % len) {
>>> + dev_warn(&port->dev,
>>> + "CXL address translation: HPA range not contiguous: %#llx-%#llx:%#llx-%#llx(%s)\n",
>>> + range.start, range.end, ctx->hpa_range.start,
>>> + ctx->hpa_range.end, dev_name(&cxld->dev));
>>> + return -ENXIO;
>>> + }
>>> +
>>> + ways = spa_len / len;
>>> + gran = SZ_256;
>>> +
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v3 11/11] cxl: Enable AMD Zen5 address translation using ACPI PRMT
2025-09-12 14:45 ` [PATCH v3 11/11] cxl: Enable AMD Zen5 address translation using ACPI PRMT Robert Richter
2025-09-12 23:46 ` Dave Jiang
2025-09-15 10:59 ` Jonathan Cameron
@ 2025-09-17 21:01 ` Dave Jiang
2025-09-24 17:09 ` Gregory Price
3 siblings, 0 replies; 62+ messages in thread
From: Dave Jiang @ 2025-09-17 21:01 UTC (permalink / raw)
To: Robert Richter, Alison Schofield, Vishal Verma, Ira Weiny,
Dan Williams, Jonathan Cameron, Davidlohr Bueso
Cc: linux-cxl, linux-kernel, Gregory Price, Fabio M. De Francesco,
Terry Bowman, Joshua Hahn
On 9/12/25 7:45 AM, Robert Richter wrote:
> Add AMD Zen5 support for address translation.
>
> Zen5 systems may be configured to use 'Normalized addresses'. Then,
> CXL endpoints use their own physical address space and are programmed
> passthrough (DPA == HPA), the number of interleaving ways for the
> endpoint is set to one. The Host Physical Addresses (HPAs) need to be
> translated from the endpoint to its CXL host bridge. The HPA of a CXL
> host bridge is equivalent to the System Physical Address (SPA).
I think moving this patch to the first patch of the series may help set the right frame of mind for reviewing this series by explaining what's going on.
DJ
>
> ACPI Platform Runtime Mechanism (PRM) is used to translate the CXL
> Device Physical Address (DPA) to its System Physical Address. This is
> documented in:
>
> AMD Family 1Ah Models 00h–0Fh and Models 10h–1Fh
> ACPI v6.5 Porting Guide, Publication # 58088
> https://www.amd.com/en/search/documentation/hub.html
>
> To implement AMD Zen5 address translation the following steps are
> needed:
>
> AMD Zen5 systems support the ACPI PRM CXL Address Translation firmware
> call (Address Translation - CXL DPA to System Physical Address, see
> ACPI v6.5 Porting Guide above) when address translation is enabled.
> The existence of the callback can be identified using a specific GUID
> as documented. The initialization code checks firmware and kernel
> support of ACPI PRM.
>
> Introduce a new file core/atl.c to handle ACPI PRM specific address
> translation code. Naming is loosely related to the kernel's AMD
> Address Translation Library (CONFIG_AMD_ATL) but implementation does
> not dependent on it, nor it is vendor specific. Use Kbuild and Kconfig
> options respectively to enable the code depending on architecture and
> platform options.
>
> Implement an ACPI PRM firmware call for CXL address translation in the
> new function cxl_prm_to_hpa(). This includes sanity checks. Enable the
> callback for applicable CXL host bridges using the new cxl_atl_init()
> function.
>
> Signed-off-by: Robert Richter <rrichter@amd.com>
> ---
> drivers/cxl/Kconfig | 4 ++
> drivers/cxl/core/Makefile | 1 +
> drivers/cxl/core/atl.c | 138 ++++++++++++++++++++++++++++++++++++++
> drivers/cxl/core/core.h | 1 +
> drivers/cxl/core/port.c | 8 +++
> 5 files changed, 152 insertions(+)
> create mode 100644 drivers/cxl/core/atl.c
>
> diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> index 48b7314afdb8..31f9c96ef908 100644
> --- a/drivers/cxl/Kconfig
> +++ b/drivers/cxl/Kconfig
> @@ -233,4 +233,8 @@ config CXL_MCE
> def_bool y
> depends on X86_MCE && MEMORY_FAILURE
>
> +config CXL_ATL
> + def_bool y
> + depends on ACPI_PRMT
> +
> endif
> diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile
> index 5ad8fef210b5..11fe272a6e29 100644
> --- a/drivers/cxl/core/Makefile
> +++ b/drivers/cxl/core/Makefile
> @@ -20,3 +20,4 @@ cxl_core-$(CONFIG_CXL_REGION) += region.o
> cxl_core-$(CONFIG_CXL_MCE) += mce.o
> cxl_core-$(CONFIG_CXL_FEATURES) += features.o
> cxl_core-$(CONFIG_CXL_EDAC_MEM_FEATURES) += edac.o
> +cxl_core-$(CONFIG_CXL_ATL) += atl.o
> diff --git a/drivers/cxl/core/atl.c b/drivers/cxl/core/atl.c
> new file mode 100644
> index 000000000000..5fc21eddaade
> --- /dev/null
> +++ b/drivers/cxl/core/atl.c
> @@ -0,0 +1,138 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (C) 2025 Advanced Micro Devices, Inc.
> + */
> +
> +#include <linux/prmt.h>
> +#include <linux/pci.h>
> +
> +#include <cxlmem.h>
> +#include "core.h"
> +
> +static bool check_prm_address_translation(struct cxl_port *port)
> +{
> + /* Applies to CXL host bridges only */
> + return !is_cxl_root(port) && port->host_bridge &&
> + is_cxl_root(to_cxl_port(port->dev.parent));
> +}
> +
> +/*
> + * PRM Address Translation - CXL DPA to System Physical Address
> + *
> + * Reference:
> + *
> + * AMD Family 1Ah Models 00h–0Fh and Models 10h–1Fh
> + * ACPI v6.5 Porting Guide, Publication # 58088
> + */
> +
> +static const guid_t prm_cxl_dpa_spa_guid =
> + GUID_INIT(0xee41b397, 0x25d4, 0x452c, 0xad, 0x54, 0x48, 0xc6, 0xe3,
> + 0x48, 0x0b, 0x94);
> +
> +struct prm_cxl_dpa_spa_data {
> + u64 dpa;
> + u8 reserved;
> + u8 devfn;
> + u8 bus;
> + u8 segment;
> + void *out;
> +} __packed;
> +
> +static u64 prm_cxl_dpa_spa(struct pci_dev *pci_dev, u64 dpa)
> +{
> + struct prm_cxl_dpa_spa_data data;
> + u64 spa;
> + int rc;
> +
> + data = (struct prm_cxl_dpa_spa_data) {
> + .dpa = dpa,
> + .devfn = pci_dev->devfn,
> + .bus = pci_dev->bus->number,
> + .segment = pci_domain_nr(pci_dev->bus),
> + .out = &spa,
> + };
> +
> + rc = acpi_call_prm_handler(prm_cxl_dpa_spa_guid, &data);
> + if (rc) {
> + pci_dbg(pci_dev, "failed to get SPA for %#llx: %d\n", dpa, rc);
> + return ULLONG_MAX;
> + }
> +
> + pci_dbg(pci_dev, "PRM address translation: DPA -> SPA: %#llx -> %#llx\n", dpa, spa);
> +
> + return spa;
> +}
> +
> +static u64 cxl_prm_to_hpa(struct cxl_decoder *cxld, u64 hpa)
> +{
> + struct cxl_memdev *cxlmd;
> + struct pci_dev *pci_dev;
> + struct cxl_port *port;
> + struct cxl_endpoint_decoder *cxled;
> +
> + /* Only translate from endpoint to its parent port. */
> + if (!is_endpoint_decoder(&cxld->dev))
> + return hpa;
> +
> + cxled = to_cxl_endpoint_decoder(&cxld->dev);
> +
> + /*
> + * Nothing to do if base is non-zero and Normalized Addressing
> + * is disabled.
> + */
> + if (cxld->hpa_range.start != cxled->dpa_res->start)
> + return hpa;
> +
> + /*
> + * Endpoints are programmed passthrough in Normalized
> + * Addressing mode.
> + */
> + if (cxld->interleave_ways != 1) {
> + dev_dbg(&cxld->dev, "unexpected interleaving config: ways: %d granularity: %d\n",
> + cxld->interleave_ways, cxld->interleave_granularity);
> + return ULLONG_MAX;
> + }
> +
> + if (hpa < cxld->hpa_range.start || hpa > cxld->hpa_range.end) {
> + dev_dbg(&cxld->dev, "hpa addr %#llx out of range %#llx-%#llx\n",
> + hpa, cxld->hpa_range.start, cxld->hpa_range.end);
> + return ULLONG_MAX;
> + }
> +
> + port = to_cxl_port(cxld->dev.parent);
> + cxlmd = port ? to_cxl_memdev(port->uport_dev) : NULL;
> + if (!port || !dev_is_pci(cxlmd->dev.parent)) {
> + dev_dbg(&cxld->dev, "No endpoint found: %s, range %#llx-%#llx\n",
> + dev_name(cxld->dev.parent), cxld->hpa_range.start,
> + cxld->hpa_range.end);
> + return ULLONG_MAX;
> + }
> + pci_dev = to_pci_dev(cxlmd->dev.parent);
> +
> + return prm_cxl_dpa_spa(pci_dev, hpa);
> +}
> +
> +static void cxl_prm_init(struct cxl_port *port)
> +{
> + u64 spa;
> + struct prm_cxl_dpa_spa_data data = { .out = &spa, };
> + int rc;
> +
> + if (!check_prm_address_translation(port))
> + return;
> +
> + /* Check kernel (-EOPNOTSUPP) and firmware support (-ENODEV) */
> + rc = acpi_call_prm_handler(prm_cxl_dpa_spa_guid, &data);
> + if (rc == -EOPNOTSUPP || rc == -ENODEV)
> + return;
> +
> + port->to_hpa = cxl_prm_to_hpa;
> +
> + dev_dbg(port->host_bridge, "PRM address translation enabled for %s.\n",
> + dev_name(&port->dev));
> +}
> +
> +void cxl_atl_init(struct cxl_port *port)
> +{
> + cxl_prm_init(port);
> +}
> diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
> index eac8cc1bdaa0..624e438d052a 100644
> --- a/drivers/cxl/core/core.h
> +++ b/drivers/cxl/core/core.h
> @@ -150,6 +150,7 @@ int cxl_port_get_switch_dport_bandwidth(struct cxl_port *port,
> int cxl_ras_init(void);
> void cxl_ras_exit(void);
> int cxl_gpf_port_setup(struct cxl_dport *dport);
> +void cxl_atl_init(struct cxl_port *port);
>
> #ifdef CONFIG_CXL_FEATURES
> struct cxl_feat_entry *
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index 8f36ff413f5d..8007e002888e 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -831,6 +831,12 @@ static void cxl_debugfs_create_dport_dir(struct cxl_dport *dport)
> &cxl_einj_inject_fops);
> }
>
> +static void setup_address_translation(struct cxl_port *port)
> +{
> + if (IS_ENABLED(CONFIG_CXL_ATL))
> + cxl_atl_init(port);
> +}
> +
> static int cxl_port_add(struct cxl_port *port,
> resource_size_t component_reg_phys,
> struct cxl_dport *parent_dport)
> @@ -868,6 +874,8 @@ static int cxl_port_add(struct cxl_port *port,
> return rc;
> }
>
> + setup_address_translation(port);
> +
> rc = device_add(dev);
> if (rc)
> return rc;
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v3 08/11] cxl/region: Implement endpoint decoder address translation
2025-09-17 20:51 ` Gregory Price
2025-09-17 20:57 ` Dave Jiang
@ 2025-09-18 13:59 ` Gregory Price
1 sibling, 0 replies; 62+ messages in thread
From: Gregory Price @ 2025-09-18 13:59 UTC (permalink / raw)
To: Jonathan Cameron
Cc: Robert Richter, Alison Schofield, Vishal Verma, Ira Weiny,
Dan Williams, Dave Jiang, Davidlohr Bueso, linux-cxl,
linux-kernel, Fabio M. De Francesco, Terry Bowman, Joshua Hahn
On Wed, Sep 17, 2025 at 04:51:37PM -0400, Gregory Price wrote:
> You have to do the alignment because the translation function (may)
> only work on granularity alignment.
>
> Example:
> endpoint1->to_hpa(0) => 0x0
> endpoint1->to_hpa(0xfffffff) => 0xffffe00
0x3ffffe00
> endpoint2->to_hpa(0) => 0x100
> endpoint2->to_hpa(0xfffffff) => 0xfffff00
0x3fffff00
minor corrections above if intending to use for documentation
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v3 00/11] cxl: ACPI PRM Address Translation Support and AMD Zen5 enablement
2025-09-12 14:45 [PATCH v3 00/11] cxl: ACPI PRM Address Translation Support and AMD Zen5 enablement Robert Richter
` (11 preceding siblings ...)
2025-09-12 15:45 ` [PATCH v3 00/11] cxl: ACPI PRM Address Translation Support and AMD Zen5 enablement Dave Jiang
@ 2025-09-23 3:26 ` Gregory Price
12 siblings, 0 replies; 62+ messages in thread
From: Gregory Price @ 2025-09-23 3:26 UTC (permalink / raw)
To: Robert Richter
Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
Jonathan Cameron, Dave Jiang, Davidlohr Bueso, linux-cxl,
linux-kernel, Fabio M. De Francesco, Terry Bowman, Joshua Hahn
On Fri, Sep 12, 2025 at 04:45:02PM +0200, Robert Richter wrote:
> This patch set adds support for address translation using ACPI PRM and
> enables this for AMD Zen5 platforms. This is another new appoach in
> response to earlier attempts to implement CXL address translation:
>
For the series:
Tested-by: Gregory Price <gourry@gourry.net>
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v3 01/11] cxl/region: Store root decoder in struct cxl_region
2025-09-12 14:45 ` [PATCH v3 01/11] cxl/region: Store root decoder in struct cxl_region Robert Richter
2025-09-12 15:52 ` Dave Jiang
2025-09-17 19:56 ` Gregory Price
@ 2025-09-23 21:40 ` Alison Schofield
2025-09-26 17:52 ` Robert Richter
2 siblings, 1 reply; 62+ messages in thread
From: Alison Schofield @ 2025-09-23 21:40 UTC (permalink / raw)
To: Robert Richter
Cc: Vishal Verma, Ira Weiny, Dan Williams, Jonathan Cameron,
Dave Jiang, Davidlohr Bueso, linux-cxl, linux-kernel,
Gregory Price, Fabio M. De Francesco, Terry Bowman, Joshua Hahn
On Fri, Sep 12, 2025 at 04:45:03PM +0200, Robert Richter wrote:
> A region is always bound to a root decoder. The region's associated
> root decoder is often needed. Add it to struct cxl_region.
>
> This simplifies code by removing dynamic lookups and removing the root
> decoder argument from the function argument list where possible.
>
> Patch is a prerequisite to implement address translation which uses
> struct cxl_region to store all relevant region and interleaving
> parameters.
Where is the follow on patch that makes this a prerequisite?
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v3 05/11] cxl/region: Add @range argument to function cxl_calc_interleave_pos()
2025-09-12 14:45 ` [PATCH v3 05/11] cxl/region: Add @range argument to function cxl_calc_interleave_pos() Robert Richter
2025-09-17 20:09 ` Gregory Price
@ 2025-09-23 21:52 ` Alison Schofield
2025-09-26 18:22 ` Robert Richter
1 sibling, 1 reply; 62+ messages in thread
From: Alison Schofield @ 2025-09-23 21:52 UTC (permalink / raw)
To: Robert Richter
Cc: Vishal Verma, Ira Weiny, Dan Williams, Jonathan Cameron,
Dave Jiang, Davidlohr Bueso, linux-cxl, linux-kernel,
Gregory Price, Fabio M. De Francesco, Terry Bowman, Joshua Hahn
On Fri, Sep 12, 2025 at 04:45:07PM +0200, Robert Richter wrote:
> cxl_calc_interleave_pos() uses the endpoint decoder's HPA range to
> determine its interleaving position. This requires the endpoint
> decoders to be an SPA, which is not the case for systems that need
> address translation.
>
> Add a separate @range argument to function cxl_calc_interleave_pos()
> to specify the address range. Now it is possible to pass the SPA
> translated address range of an endpoint decoder to function
> cxl_calc_interleave_pos().
>
> Patch is a prerequisite to implement address translation.
>
> Signed-off-by: Robert Richter <rrichter@amd.com>
> ---
> drivers/cxl/core/region.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 8ccc171ac724..106692f1e310 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -1844,11 +1844,11 @@ static int find_pos_and_ways(struct cxl_port *port, struct range *range,
> * Return: position >= 0 on success
> * -ENXIO on failure
> */
> -static int cxl_calc_interleave_pos(struct cxl_endpoint_decoder *cxled)
> +static int cxl_calc_interleave_pos(struct cxl_endpoint_decoder *cxled,
> + struct range *range)
> {
> struct cxl_port *iter, *port = cxled_to_port(cxled);
> struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> - struct range *range = &cxled->cxld.hpa_range;
> int parent_ways = 0, parent_pos = 0, pos = 0;
> int rc;
Will this work? Change the assignment rather than adding a parameter:
- struct range *range = &cxled->cxld.hpa_range;
+ struct range *range = &cxled->cxld.region->hpa_range;
>
> @@ -1909,7 +1909,7 @@ static int cxl_region_sort_targets(struct cxl_region *cxlr)
> for (i = 0; i < p->nr_targets; i++) {
> struct cxl_endpoint_decoder *cxled = p->targets[i];
>
> - cxled->pos = cxl_calc_interleave_pos(cxled);
> + cxled->pos = cxl_calc_interleave_pos(cxled, &cxlr->hpa_range);
> /*
> * Record that sorting failed, but still continue to calc
> * cxled->pos so that follow-on code paths can reliably
> @@ -2093,7 +2093,7 @@ static int cxl_region_attach(struct cxl_region *cxlr,
> struct cxl_endpoint_decoder *cxled = p->targets[i];
> int test_pos;
>
> - test_pos = cxl_calc_interleave_pos(cxled);
> + test_pos = cxl_calc_interleave_pos(cxled, &cxlr->hpa_range);
> dev_dbg(&cxled->cxld.dev,
> "Test cxl_calc_interleave_pos(): %s test_pos:%d cxled->pos:%d\n",
> (test_pos == cxled->pos) ? "success" : "fail",
> --
> 2.39.5
>
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v3 11/11] cxl: Enable AMD Zen5 address translation using ACPI PRMT
2025-09-12 14:45 ` [PATCH v3 11/11] cxl: Enable AMD Zen5 address translation using ACPI PRMT Robert Richter
` (2 preceding siblings ...)
2025-09-17 21:01 ` Dave Jiang
@ 2025-09-24 17:09 ` Gregory Price
2025-09-26 16:59 ` Robert Richter
3 siblings, 1 reply; 62+ messages in thread
From: Gregory Price @ 2025-09-24 17:09 UTC (permalink / raw)
To: Robert Richter
Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
Jonathan Cameron, Dave Jiang, Davidlohr Bueso, linux-cxl,
linux-kernel, Fabio M. De Francesco, Terry Bowman, Joshua Hahn
On Fri, Sep 12, 2025 at 04:45:13PM +0200, Robert Richter wrote:
> +static void cxl_prm_init(struct cxl_port *port)
> +{
> + u64 spa;
> + struct prm_cxl_dpa_spa_data data = { .out = &spa, };
> + int rc;
> +
> + if (!check_prm_address_translation(port))
> + return;
> +
> + /* Check kernel (-EOPNOTSUPP) and firmware support (-ENODEV) */
> + rc = acpi_call_prm_handler(prm_cxl_dpa_spa_guid, &data);
> + if (rc == -EOPNOTSUPP || rc == -ENODEV)
> + return;
> +
> + port->to_hpa = cxl_prm_to_hpa;
> +
> + dev_dbg(port->host_bridge, "PRM address translation enabled for %s.\n",
> + dev_name(&port->dev));
> +}
Is it possible that the PRMT function is present but uninitialize?
For example if expanders are not in a normalized address mode.
This code would likely still add the to_hpa() function reference even
if the underlying PRMT function hasn't been set up for translation.
~Gregory
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v3 11/11] cxl: Enable AMD Zen5 address translation using ACPI PRMT
2025-09-24 17:09 ` Gregory Price
@ 2025-09-26 16:59 ` Robert Richter
2025-09-27 1:44 ` Gregory Price
0 siblings, 1 reply; 62+ messages in thread
From: Robert Richter @ 2025-09-26 16:59 UTC (permalink / raw)
To: Gregory Price
Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
Jonathan Cameron, Dave Jiang, Davidlohr Bueso, linux-cxl,
linux-kernel, Fabio M. De Francesco, Terry Bowman, Joshua Hahn
On 24.09.25 13:09:46, Gregory Price wrote:
> On Fri, Sep 12, 2025 at 04:45:13PM +0200, Robert Richter wrote:
> > +static void cxl_prm_init(struct cxl_port *port)
> > +{
> > + u64 spa;
> > + struct prm_cxl_dpa_spa_data data = { .out = &spa, };
> > + int rc;
> > +
> > + if (!check_prm_address_translation(port))
> > + return;
> > +
> > + /* Check kernel (-EOPNOTSUPP) and firmware support (-ENODEV) */
> > + rc = acpi_call_prm_handler(prm_cxl_dpa_spa_guid, &data);
> > + if (rc == -EOPNOTSUPP || rc == -ENODEV)
> > + return;
> > +
> > + port->to_hpa = cxl_prm_to_hpa;
> > +
> > + dev_dbg(port->host_bridge, "PRM address translation enabled for %s.\n",
> > + dev_name(&port->dev));
> > +}
>
> Is it possible that the PRMT function is present but uninitialize?
> For example if expanders are not in a normalized address mode.
>
> This code would likely still add the to_hpa() function reference even
> if the underlying PRMT function hasn't been set up for translation.
At this point during init, it is not yet possible to determine
normalized address mode. Endpoint and hdm decoders are still unknown.
Thus, PRM is enabled for the port. Later, during region setup, there is
a check for that while determining the region's HPA range. Addresses
are translated and ranges adjusted only if needed. Once the ranges are
set up, no further PRM handler calls are executed.
Thanks for review and testing.
-Robert
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v3 01/11] cxl/region: Store root decoder in struct cxl_region
2025-09-23 21:40 ` Alison Schofield
@ 2025-09-26 17:52 ` Robert Richter
0 siblings, 0 replies; 62+ messages in thread
From: Robert Richter @ 2025-09-26 17:52 UTC (permalink / raw)
To: Alison Schofield
Cc: Vishal Verma, Ira Weiny, Dan Williams, Jonathan Cameron,
Dave Jiang, Davidlohr Bueso, linux-cxl, linux-kernel,
Gregory Price, Fabio M. De Francesco, Terry Bowman, Joshua Hahn
On 23.09.25 14:40:31, Alison Schofield wrote:
> On Fri, Sep 12, 2025 at 04:45:03PM +0200, Robert Richter wrote:
> > A region is always bound to a root decoder. The region's associated
> > root decoder is often needed. Add it to struct cxl_region.
> >
> > This simplifies code by removing dynamic lookups and removing the root
> > decoder argument from the function argument list where possible.
> >
> > Patch is a prerequisite to implement address translation which uses
> > struct cxl_region to store all relevant region and interleaving
> > parameters.
>
> Where is the follow on patch that makes this a prerequisite?
I changed this to simplify and rework the argument list of
__construct_region(cxlr, cxled). Later, this changes further to use a
context argument: __construct_region(cxlr, ctx).
The patch additionally simplifies the arg list of
cxl_region_attach_position() and it removes the use of
to_cxl_root_decoder() which always reconstructs and checks the
pointer. The pointer never changes and is frequently used. Also, code
becomes more readable as this amphazises the binding between both
objects. Overall, I don't see a reason to use to_cxl_root_decoder() as
an alternative to avoid the the pointer, which would generally be
possible.
I will update the description accordingly.
Thanks,
-Robert
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v3 05/11] cxl/region: Add @range argument to function cxl_calc_interleave_pos()
2025-09-23 21:52 ` Alison Schofield
@ 2025-09-26 18:22 ` Robert Richter
0 siblings, 0 replies; 62+ messages in thread
From: Robert Richter @ 2025-09-26 18:22 UTC (permalink / raw)
To: Alison Schofield
Cc: Vishal Verma, Ira Weiny, Dan Williams, Jonathan Cameron,
Dave Jiang, Davidlohr Bueso, linux-cxl, linux-kernel,
Gregory Price, Fabio M. De Francesco, Terry Bowman, Joshua Hahn
On 23.09.25 14:52:04, Alison Schofield wrote:
> On Fri, Sep 12, 2025 at 04:45:07PM +0200, Robert Richter wrote:
> > cxl_calc_interleave_pos() uses the endpoint decoder's HPA range to
> > determine its interleaving position. This requires the endpoint
> > decoders to be an SPA, which is not the case for systems that need
> > address translation.
> >
> > Add a separate @range argument to function cxl_calc_interleave_pos()
> > to specify the address range. Now it is possible to pass the SPA
> > translated address range of an endpoint decoder to function
> > cxl_calc_interleave_pos().
> >
> > Patch is a prerequisite to implement address translation.
> >
> > Signed-off-by: Robert Richter <rrichter@amd.com>
> > ---
> > drivers/cxl/core/region.c | 8 ++++----
> > 1 file changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> > index 8ccc171ac724..106692f1e310 100644
> > --- a/drivers/cxl/core/region.c
> > +++ b/drivers/cxl/core/region.c
> > @@ -1844,11 +1844,11 @@ static int find_pos_and_ways(struct cxl_port *port, struct range *range,
> > * Return: position >= 0 on success
> > * -ENXIO on failure
> > */
> > -static int cxl_calc_interleave_pos(struct cxl_endpoint_decoder *cxled)
> > +static int cxl_calc_interleave_pos(struct cxl_endpoint_decoder *cxled,
> > + struct range *range)
> > {
> > struct cxl_port *iter, *port = cxled_to_port(cxled);
> > struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> > - struct range *range = &cxled->cxld.hpa_range;
> > int parent_ways = 0, parent_pos = 0, pos = 0;
> > int rc;
>
> Will this work? Change the assignment rather than adding a parameter:
>
> - struct range *range = &cxled->cxld.hpa_range;
> + struct range *range = &cxled->cxld.region->hpa_range;
For cxl_region_sort_targets() it doesn't as the region is not yet
attached. This is the calling order:
cxl_region_sort_targets
cxl_calc_interleave_pos
# needs cxlr->hpa_range
cxl_region_attach_position
...
cxld->region = cxlr;
It would be nice if we can make this work and drop the argument. Will
take a look.
Thanks,
-Robert
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v3 11/11] cxl: Enable AMD Zen5 address translation using ACPI PRMT
2025-09-26 16:59 ` Robert Richter
@ 2025-09-27 1:44 ` Gregory Price
2025-09-29 12:39 ` Robert Richter
0 siblings, 1 reply; 62+ messages in thread
From: Gregory Price @ 2025-09-27 1:44 UTC (permalink / raw)
To: Robert Richter
Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
Jonathan Cameron, Dave Jiang, Davidlohr Bueso, linux-cxl,
linux-kernel, Fabio M. De Francesco, Terry Bowman, Joshua Hahn
On Fri, Sep 26, 2025 at 06:59:40PM +0200, Robert Richter wrote:
> On 24.09.25 13:09:46, Gregory Price wrote:
> > On Fri, Sep 12, 2025 at 04:45:13PM +0200, Robert Richter wrote:
> > > +static void cxl_prm_init(struct cxl_port *port)
> > > +{
> > > + u64 spa;
> > > + struct prm_cxl_dpa_spa_data data = { .out = &spa, };
> > > + int rc;
> > > +
> > > + if (!check_prm_address_translation(port))
> > > + return;
> > > +
> > > + /* Check kernel (-EOPNOTSUPP) and firmware support (-ENODEV) */
> > > + rc = acpi_call_prm_handler(prm_cxl_dpa_spa_guid, &data);
> > > + if (rc == -EOPNOTSUPP || rc == -ENODEV)
> > > + return;
> > > +
> > > + port->to_hpa = cxl_prm_to_hpa;
> > > +
> > > + dev_dbg(port->host_bridge, "PRM address translation enabled for %s.\n",
> > > + dev_name(&port->dev));
> > > +}
> >
> > Is it possible that the PRMT function is present but uninitialize?
> > For example if expanders are not in a normalized address mode.
> >
> > This code would likely still add the to_hpa() function reference even
> > if the underlying PRMT function hasn't been set up for translation.
>
> At this point during init, it is not yet possible to determine
> normalized address mode. Endpoint and hdm decoders are still unknown.
> Thus, PRM is enabled for the port. Later, during region setup, there is
> a check for that while determining the region's HPA range. Addresses
> are translated and ranges adjusted only if needed. Once the ranges are
> set up, no further PRM handler calls are executed.
>
Right, but that errors out if address translation fails (below).
Is this code missing a check for the endpoint_decoder.start==0?
static int setup_address_translation(struct cxl_endpoint_decoder *cxled,
struct cxl_region_context *ctx)
{
...
/* Translate HPA range to SPA. */
range.start = port->to_hpa(cxld, range.start);
range.end = port->to_hpa(cxld, range.end);
if (range.start == ULLONG_MAX || range.end == ULLONG_MAX) {
dev_warn(&port->dev,
"CXL address translation: Failed to translate HPA range: %#llx-%#llx:%#llx-%#llx(%s)\n",
range.start, range.end, ctx->hpa_range.start,
ctx->hpa_range.end, dev_name(&cxld->dev));
return -ENXIO;
}
...
}
static int setup_region_params(struct cxl_endpoint_decoder *cxled,
struct cxl_region_context *ctx)
{
...
rc = setup_address_translation(cxled, ctx);
...
}
int cxl_add_to_region(struct cxl_endpoint_decoder *cxled)
{
...
rc = setup_region_params(cxled, &ctx);
if (rc)
return rc;
}
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v3 11/11] cxl: Enable AMD Zen5 address translation using ACPI PRMT
2025-09-27 1:44 ` Gregory Price
@ 2025-09-29 12:39 ` Robert Richter
0 siblings, 0 replies; 62+ messages in thread
From: Robert Richter @ 2025-09-29 12:39 UTC (permalink / raw)
To: Gregory Price
Cc: Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
Jonathan Cameron, Dave Jiang, Davidlohr Bueso, linux-cxl,
linux-kernel, Fabio M. De Francesco, Terry Bowman, Joshua Hahn
On 26.09.25 21:44:43, Gregory Price wrote:
> On Fri, Sep 26, 2025 at 06:59:40PM +0200, Robert Richter wrote:
> > On 24.09.25 13:09:46, Gregory Price wrote:
> > > On Fri, Sep 12, 2025 at 04:45:13PM +0200, Robert Richter wrote:
> > > > +static void cxl_prm_init(struct cxl_port *port)
> > > > +{
> > > > + u64 spa;
> > > > + struct prm_cxl_dpa_spa_data data = { .out = &spa, };
> > > > + int rc;
> > > > +
> > > > + if (!check_prm_address_translation(port))
> > > > + return;
> > > > +
> > > > + /* Check kernel (-EOPNOTSUPP) and firmware support (-ENODEV) */
> > > > + rc = acpi_call_prm_handler(prm_cxl_dpa_spa_guid, &data);
> > > > + if (rc == -EOPNOTSUPP || rc == -ENODEV)
> > > > + return;
> > > > +
> > > > + port->to_hpa = cxl_prm_to_hpa;
> > > > +
> > > > + dev_dbg(port->host_bridge, "PRM address translation enabled for %s.\n",
> > > > + dev_name(&port->dev));
> > > > +}
> > >
> > > Is it possible that the PRMT function is present but uninitialize?
> > > For example if expanders are not in a normalized address mode.
> > >
> > > This code would likely still add the to_hpa() function reference even
> > > if the underlying PRMT function hasn't been set up for translation.
> >
> > At this point during init, it is not yet possible to determine
> > normalized address mode. Endpoint and hdm decoders are still unknown.
> > Thus, PRM is enabled for the port. Later, during region setup, there is
> > a check for that while determining the region's HPA range. Addresses
> > are translated and ranges adjusted only if needed. Once the ranges are
> > set up, no further PRM handler calls are executed.
> >
>
> Right, but that errors out if address translation fails (below).
>
> Is this code missing a check for the endpoint_decoder.start==0?
Right, it looks like the code does some unnecessary recalculation for
the SPA case. There is the following check for Normalized Addressing
in cxl_prm_to_hpa() which already bypasses the prm call:
if (cxld->hpa_range.start != cxled->dpa_res->start)
return hpa;
But setup_address_translation() continues and recalculates ways and
gran. Finally this should result in the same parameters. I will do
some testing for SPA mode and expect the addr translation debug
messages to show up in this mode too. As a side effect, decoders in
SPA mode will be locked. Need to confirm this and will fix that.
Thanks for catching this,
-Robert
^ permalink raw reply [flat|nested] 62+ messages in thread
end of thread, other threads:[~2025-09-29 12:39 UTC | newest]
Thread overview: 62+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-12 14:45 [PATCH v3 00/11] cxl: ACPI PRM Address Translation Support and AMD Zen5 enablement Robert Richter
2025-09-12 14:45 ` [PATCH v3 01/11] cxl/region: Store root decoder in struct cxl_region Robert Richter
2025-09-12 15:52 ` Dave Jiang
2025-09-15 10:14 ` Jonathan Cameron
2025-09-17 19:56 ` Gregory Price
2025-09-23 21:40 ` Alison Schofield
2025-09-26 17:52 ` Robert Richter
2025-09-12 14:45 ` [PATCH v3 02/11] cxl/region: Store HPA range " Robert Richter
2025-09-12 17:17 ` Dave Jiang
2025-09-15 7:19 ` Robert Richter
2025-09-15 16:24 ` Dave Jiang
2025-09-15 10:23 ` Jonathan Cameron
2025-09-17 8:15 ` Robert Richter
2025-09-17 19:58 ` Gregory Price
2025-09-12 14:45 ` [PATCH v3 03/11] cxl/region: Rename misleading variable name @hpa to @range Robert Richter
2025-09-12 17:33 ` Dave Jiang
2025-09-15 7:27 ` Robert Richter
2025-09-15 10:25 ` Jonathan Cameron
2025-09-17 8:10 ` Robert Richter
2025-09-17 20:01 ` Gregory Price
2025-09-12 14:45 ` [PATCH v3 04/11] cxl/region: Add @range argument to function cxl_find_root_decoder() Robert Richter
2025-09-17 20:05 ` Gregory Price
2025-09-12 14:45 ` [PATCH v3 05/11] cxl/region: Add @range argument to function cxl_calc_interleave_pos() Robert Richter
2025-09-17 20:09 ` Gregory Price
2025-09-23 21:52 ` Alison Schofield
2025-09-26 18:22 ` Robert Richter
2025-09-12 14:45 ` [PATCH v3 06/11] cxl/region: Separate region parameter setup and region construction Robert Richter
2025-09-12 21:10 ` Dave Jiang
2025-09-15 7:31 ` Robert Richter
2025-09-15 16:26 ` Dave Jiang
2025-09-17 20:15 ` Gregory Price
2025-09-12 14:45 ` [PATCH v3 07/11] cxl: Introduce callback to translate a decoder's HPA to the next parent port Robert Richter
2025-09-12 21:21 ` Dave Jiang
2025-09-15 7:55 ` Robert Richter
2025-09-15 16:32 ` Dave Jiang
2025-09-15 20:22 ` Dave Jiang
2025-09-17 8:20 ` Robert Richter
2025-09-12 14:45 ` [PATCH v3 08/11] cxl/region: Implement endpoint decoder address translation Robert Richter
2025-09-15 10:46 ` Jonathan Cameron
2025-09-15 16:35 ` Dave Jiang
2025-09-17 9:04 ` Robert Richter
2025-09-17 20:51 ` Gregory Price
2025-09-17 20:57 ` Dave Jiang
2025-09-18 13:59 ` Gregory Price
2025-09-12 14:45 ` [PATCH v3 09/11] cxl/region: Lock decoders that need " Robert Richter
2025-09-17 20:52 ` Gregory Price
2025-09-12 14:45 ` [PATCH v3 10/11] cxl/acpi: Prepare use of EFI runtime services Robert Richter
2025-09-17 20:54 ` Gregory Price
2025-09-12 14:45 ` [PATCH v3 11/11] cxl: Enable AMD Zen5 address translation using ACPI PRMT Robert Richter
2025-09-12 23:46 ` Dave Jiang
2025-09-15 8:34 ` Robert Richter
2025-09-15 10:59 ` Jonathan Cameron
2025-09-17 9:43 ` Robert Richter
2025-09-17 13:50 ` Jonathan Cameron
2025-09-17 21:01 ` Dave Jiang
2025-09-24 17:09 ` Gregory Price
2025-09-26 16:59 ` Robert Richter
2025-09-27 1:44 ` Gregory Price
2025-09-29 12:39 ` Robert Richter
2025-09-12 15:45 ` [PATCH v3 00/11] cxl: ACPI PRM Address Translation Support and AMD Zen5 enablement Dave Jiang
2025-09-15 8:42 ` Robert Richter
2025-09-23 3:26 ` Gregory Price
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).