Linux-CXL Archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/9] Enabling DCD emulation support in Qemu
@ 2023-11-07 18:07 nifan.cxl
  2023-11-07 18:07 ` [PATCH v3 1/9] hw/cxl/cxl-mailbox-utils: Add dc_event_log_size field to output payload of identify memory device command nifan.cxl
                   ` (10 more replies)
  0 siblings, 11 replies; 37+ messages in thread
From: nifan.cxl @ 2023-11-07 18:07 UTC (permalink / raw
  To: qemu-devel
  Cc: jonathan.cameron, linux-cxl, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, nifan, jim.harris, Fan Ni

From: Fan Ni <nifan.cxl@gmail.com>


The patch series are based on Jonathan's branch cxl-2023-09-26.

The main changes include,
1. Update cxl_find_dc_region to detect the case the range of the extent cross
    multiple DC regions.
2. Add comments to explain the checks performed in function
    cxl_detect_malformed_extent_list. (Jonathan)
3. Minimize the checks in cmd_dcd_add_dyn_cap_rsp.(Jonathan)
4. Update total_extent_count in add/release dynamic capacity response function.
    (Ira and Jorgen Hansen).
5. Fix the logic issue in test_bits and renamed it to
    test_any_bits_set to clear its function.
6. Add pending extent list for dc extent add event.
7. When add extent response is received, use the pending-to-add list to
    verify the extents are valid.
8. Add test_any_bits_set and cxl_insert_extent_to_extent_list declaration to
    cxl_device.h so it can be used in different files.
9. Updated ct3d_qmp_cxl_event_log_enc to include dynamic capacity event
    log type.
10. Extract the functionality to delete extent from extent list to a helper
    function.
11. Move the update of the bitmap which reflects which blocks are backed with
dc extents from the moment when a dc extent is offered to the moment when it
is accepted from the host.
12. Free dc_name after calling address_space_init to avoid memory leak when
    returning early. (Nathan)
13. Add code to detect and reject QMP requests without any extents. (Jonathan)
14. Add code to detect and reject QMP requests where the extent len is 0.
15. Change the QMP interface and move the region-id out of extents and now
    each command only takes care of extent add/release request in a single
    region. (Jonathan)
16. Change the region bitmap length from decode_len to len.
17. Rename "dpa" to "offset" in the add/release dc extent qmp interface.
    (Jonathan)
18. Block any dc extent release command if the exact extent is not already in
    the extent list of the device.

The code is tested together with Ira's kernel DCD support:
https://github.com/weiny2/linux-kernel/tree/dcd-v3-2023-10-30

Cover letter from v2 is here:
https://lore.kernel.org/linux-cxl/20230724162313.34196-1-fan.ni@samsung.com/T/#m63039621087023691c9749a0af1212deb5549ddf

Last version (v2) is here:
https://lore.kernel.org/linux-cxl/20230725183939.2741025-1-fan.ni@samsung.com/

More DCD related discussions are here:
https://lore.kernel.org/linux-cxl/650cc29ab3f64_50d07294e7@iweiny-mobl.notmuch/



Fan Ni (9):
  hw/cxl/cxl-mailbox-utils: Add dc_event_log_size field to output
    payload of identify memory device command
  hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative
    and mailbox command support
  include/hw/cxl/cxl_device: Rename mem_size as static_mem_size for
    type3 memory devices
  hw/mem/cxl_type3: Add support to create DC regions to type3 memory
    devices
  hw/mem/cxl_type3: Add host backend and address space handling for DC
    regions
  hw/mem/cxl_type3: Add DC extent list representative and get DC extent
    list mailbox support
  hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release
    dynamic capacity response
  hw/cxl/events: Add qmp interfaces to add/release dynamic capacity
    extents
  hw/mem/cxl_type3: Add dpa range validation for accesses to dc regions

 hw/cxl/cxl-mailbox-utils.c  | 469 +++++++++++++++++++++++++++++-
 hw/mem/cxl_type3.c          | 548 +++++++++++++++++++++++++++++++++---
 hw/mem/cxl_type3_stubs.c    |  14 +
 include/hw/cxl/cxl_device.h |  64 ++++-
 include/hw/cxl/cxl_events.h |  15 +
 qapi/cxl.json               |  60 +++-
 6 files changed, 1123 insertions(+), 47 deletions(-)

-- 
2.42.0


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v3 1/9] hw/cxl/cxl-mailbox-utils: Add dc_event_log_size field to output payload of identify memory device command
  2023-11-07 18:07 [PATCH v3 0/9] Enabling DCD emulation support in Qemu nifan.cxl
@ 2023-11-07 18:07 ` nifan.cxl
  2023-11-07 18:07 ` [PATCH v3 2/9] hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative and mailbox command support nifan.cxl
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 37+ messages in thread
From: nifan.cxl @ 2023-11-07 18:07 UTC (permalink / raw
  To: qemu-devel
  Cc: jonathan.cameron, linux-cxl, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, nifan, jim.harris, Fan Ni

From: Fan Ni <fan.ni@samsung.com>

Based on CXL spec 3.0 Table 8-94 (Identify Memory Device Output
Payload), dynamic capacity event log size should be part of
output of the Identify command.
Add dc_event_log_size to the output payload for the host to get the info.

Signed-off-by: Fan Ni <fan.ni@samsung.com>
---
 hw/cxl/cxl-mailbox-utils.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index f1145e9671..8eceedfa87 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -21,6 +21,7 @@
 #include "sysemu/hostmem.h"
 
 #define CXL_CAPACITY_MULTIPLIER   (256 * MiB)
+#define CXL_DC_EVENT_LOG_SIZE 8
 
 /*
  * How to add a new command, example. The command set FOO, with cmd BAR.
@@ -753,8 +754,9 @@ static CXLRetCode cmd_identify_memory_device(const struct cxl_cmd *cmd,
         uint16_t inject_poison_limit;
         uint8_t poison_caps;
         uint8_t qos_telemetry_caps;
+        uint16_t dc_event_log_size;
     } QEMU_PACKED *id;
-    QEMU_BUILD_BUG_ON(sizeof(*id) != 0x43);
+    QEMU_BUILD_BUG_ON(sizeof(*id) != 0x45);
     CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
     CXLType3Class *cvc = CXL_TYPE3_GET_CLASS(ct3d);
     CXLDeviceState *cxl_dstate = &ct3d->cxl_dstate;
@@ -780,6 +782,7 @@ static CXLRetCode cmd_identify_memory_device(const struct cxl_cmd *cmd,
     st24_le_p(id->poison_list_max_mer, 256);
     /* No limit - so limited by main poison record limit */
     stw_le_p(&id->inject_poison_limit, 0);
+    stw_le_p(&id->dc_event_log_size, CXL_DC_EVENT_LOG_SIZE);
 
     *len_out = sizeof(*id);
     return CXL_MBOX_SUCCESS;
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 2/9] hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative and mailbox command support
  2023-11-07 18:07 [PATCH v3 0/9] Enabling DCD emulation support in Qemu nifan.cxl
  2023-11-07 18:07 ` [PATCH v3 1/9] hw/cxl/cxl-mailbox-utils: Add dc_event_log_size field to output payload of identify memory device command nifan.cxl
@ 2023-11-07 18:07 ` nifan.cxl
  2024-01-24 14:51   ` Jonathan Cameron
  2024-01-24 15:48   ` Jonathan Cameron
  2023-11-07 18:07 ` [PATCH v3 3/9] include/hw/cxl/cxl_device: Rename mem_size as static_mem_size for type3 memory devices nifan.cxl
                   ` (8 subsequent siblings)
  10 siblings, 2 replies; 37+ messages in thread
From: nifan.cxl @ 2023-11-07 18:07 UTC (permalink / raw
  To: qemu-devel
  Cc: jonathan.cameron, linux-cxl, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, nifan, jim.harris, Fan Ni

From: Fan Ni <fan.ni@samsung.com>

Per cxl spec 3.0, add dynamic capacity region representative based on
Table 8-126 and extend the cxl type3 device definition to include dc region
information. Also, based on info in 8.2.9.8.9.1, add 'Get Dynamic Capacity
Configuration' mailbox support.

Note: decode_len of a dc region is aligned to 256*MiB, need to be divided by
256 * MiB before returned to the host for "Get Dynamic Capacity Configuration"
mailbox command.

Signed-off-by: Fan Ni <fan.ni@samsung.com>
---
 hw/cxl/cxl-mailbox-utils.c  | 80 +++++++++++++++++++++++++++++++++++++
 hw/mem/cxl_type3.c          |  6 +++
 include/hw/cxl/cxl_device.h | 17 ++++++++
 3 files changed, 103 insertions(+)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 8eceedfa87..f80dd6474f 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -80,6 +80,8 @@ enum {
         #define GET_POISON_LIST        0x0
         #define INJECT_POISON          0x1
         #define CLEAR_POISON           0x2
+    DCD_CONFIG  = 0x48,
+        #define GET_DC_CONFIG          0x0
     PHYSICAL_SWITCH = 0x51,
         #define IDENTIFY_SWITCH_DEVICE      0x0
         #define GET_PHYSICAL_PORT_STATE     0x1
@@ -1210,6 +1212,74 @@ static CXLRetCode cmd_media_clear_poison(const struct cxl_cmd *cmd,
     return CXL_MBOX_SUCCESS;
 }
 
+/*
+ * CXL r3.0 section 8.2.9.8.9.1: Get Dynamic Capacity Configuration
+ * (Opcode: 4800h)
+ */
+static CXLRetCode cmd_dcd_get_dyn_cap_config(const struct cxl_cmd *cmd,
+                                             uint8_t *payload_in,
+                                             size_t len_in,
+                                             uint8_t *payload_out,
+                                             size_t *len_out,
+                                             CXLCCI *cci)
+{
+    CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
+    struct get_dyn_cap_config_in_pl {
+        uint8_t region_cnt;
+        uint8_t start_region_id;
+    } QEMU_PACKED;
+
+    struct get_dyn_cap_config_out_pl {
+        uint8_t num_regions;
+        uint8_t rsvd1[7];
+        struct {
+            uint64_t base;
+            uint64_t decode_len;
+            uint64_t region_len;
+            uint64_t block_size;
+            uint32_t dsmadhandle;
+            uint8_t flags;
+            uint8_t rsvd2[3];
+        } QEMU_PACKED records[];
+    } QEMU_PACKED;
+
+    struct get_dyn_cap_config_in_pl *in = (void *)payload_in;
+    struct get_dyn_cap_config_out_pl *out = (void *)payload_out;
+    uint16_t record_count = 0, i;
+    uint16_t out_pl_len;
+    uint8_t start_region_id = in->start_region_id;
+
+    if (start_region_id >= ct3d->dc.num_regions) {
+        return CXL_MBOX_INVALID_INPUT;
+    }
+
+    record_count = MIN(ct3d->dc.num_regions - in->start_region_id,
+            in->region_cnt);
+
+    out_pl_len = sizeof(*out) + record_count * sizeof(out->records[0]);
+    assert(out_pl_len <= CXL_MAILBOX_MAX_PAYLOAD_SIZE);
+
+    memset(out, 0, out_pl_len);
+    out->num_regions = record_count;
+    for (i = 0; i < record_count; i++) {
+        stq_le_p(&out->records[i].base,
+                ct3d->dc.regions[start_region_id + i].base);
+        stq_le_p(&out->records[i].decode_len,
+                ct3d->dc.regions[start_region_id + i].decode_len /
+                CXL_CAPACITY_MULTIPLIER);
+        stq_le_p(&out->records[i].region_len,
+                ct3d->dc.regions[start_region_id + i].len);
+        stq_le_p(&out->records[i].block_size,
+                ct3d->dc.regions[start_region_id + i].block_size);
+        stl_le_p(&out->records[i].dsmadhandle,
+                ct3d->dc.regions[start_region_id + i].dsmadhandle);
+        out->records[i].flags = ct3d->dc.regions[start_region_id + i].flags;
+    }
+
+    *len_out = out_pl_len;
+    return CXL_MBOX_SUCCESS;
+}
+
 #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
 #define IMMEDIATE_DATA_CHANGE (1 << 2)
 #define IMMEDIATE_POLICY_CHANGE (1 << 3)
@@ -1254,6 +1324,11 @@ static const struct cxl_cmd cxl_cmd_set[256][256] = {
         cmd_media_clear_poison, 72, 0 },
 };
 
+static const struct cxl_cmd cxl_cmd_set_dcd[256][256] = {
+    [DCD_CONFIG][GET_DC_CONFIG] = { "DCD_GET_DC_CONFIG",
+        cmd_dcd_get_dyn_cap_config, 2, 0 },
+};
+
 static const struct cxl_cmd cxl_cmd_set_sw[256][256] = {
     [INFOSTAT][IS_IDENTIFY] = { "IDENTIFY", cmd_infostat_identify, 0, 18 },
     [INFOSTAT][BACKGROUND_OPERATION_STATUS] = { "BACKGROUND_OPERATION_STATUS",
@@ -1465,7 +1540,12 @@ void cxl_initialize_mailbox_swcci(CXLCCI *cci, DeviceState *intf,
 
 void cxl_initialize_mailbox_t3(CXLCCI *cci, DeviceState *d, size_t payload_max)
 {
+    CXLType3Dev *ct3d = CXL_TYPE3(d);
+
     cxl_copy_cci_commands(cci, cxl_cmd_set);
+    if (ct3d->dc.num_regions) {
+        cxl_copy_cci_commands(cci, cxl_cmd_set_dcd);
+    }
     cci->d = d;
 
     /* No separation for PCI MB as protocol handled in PCI device */
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 7b4d1ee774..6c1ccda159 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -1075,6 +1075,12 @@ static void ct3d_reset(DeviceState *dev)
     uint32_t *reg_state = ct3d->cxl_cstate.crb.cache_mem_registers;
     uint32_t *write_msk = ct3d->cxl_cstate.crb.cache_mem_regs_write_mask;
 
+    if (ct3d->dc.num_regions) {
+        ct3d->cxl_dstate.is_dcd = true;
+    } else {
+        ct3d->cxl_dstate.is_dcd = false;
+    }
+
     cxl_component_register_init_common(reg_state, write_msk, CXL2_TYPE3_DEVICE);
     cxl_device_register_init_t3(ct3d);
 
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index 4f2ef0b899..334c51fddb 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -235,6 +235,7 @@ typedef struct cxl_device_state {
     uint64_t mem_size;
     uint64_t pmem_size;
     uint64_t vmem_size;
+    bool is_dcd;
 
     const struct cxl_cmd (*cxl_cmd_set)[256];
     CPMUState cpmu[CXL_NUM_CPMU_INSTANCES];
@@ -417,6 +418,17 @@ typedef struct CXLPoison {
 typedef QLIST_HEAD(, CXLPoison) CXLPoisonList;
 #define CXL_POISON_LIST_LIMIT 256
 
+#define DCD_MAX_REGION_NUM 8
+
+typedef struct CXLDCDRegion {
+    uint64_t base;
+    uint64_t decode_len; /* aligned to 256*MiB */
+    uint64_t len;
+    uint64_t block_size;
+    uint32_t dsmadhandle;
+    uint8_t flags;
+} CXLDCDRegion;
+
 struct CXLType3Dev {
     /* Private */
     PCIDevice parent_obj;
@@ -453,6 +465,11 @@ struct CXLType3Dev {
     unsigned int poison_list_cnt;
     bool poison_list_overflowed;
     uint64_t poison_list_overflow_ts;
+
+    struct dynamic_capacity {
+        uint8_t num_regions; /* 0-8 regions */
+        CXLDCDRegion regions[DCD_MAX_REGION_NUM];
+    } dc;
 };
 
 #define TYPE_CXL_TYPE3 "cxl-type3"
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 3/9] include/hw/cxl/cxl_device: Rename mem_size as static_mem_size for type3 memory devices
  2023-11-07 18:07 [PATCH v3 0/9] Enabling DCD emulation support in Qemu nifan.cxl
  2023-11-07 18:07 ` [PATCH v3 1/9] hw/cxl/cxl-mailbox-utils: Add dc_event_log_size field to output payload of identify memory device command nifan.cxl
  2023-11-07 18:07 ` [PATCH v3 2/9] hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative and mailbox command support nifan.cxl
@ 2023-11-07 18:07 ` nifan.cxl
  2024-01-24 14:54   ` Jonathan Cameron
  2023-11-07 18:07 ` [PATCH v3 4/9] hw/mem/cxl_type3: Add support to create DC regions to " nifan.cxl
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 37+ messages in thread
From: nifan.cxl @ 2023-11-07 18:07 UTC (permalink / raw
  To: qemu-devel
  Cc: jonathan.cameron, linux-cxl, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, nifan, jim.harris, Fan Ni

From: Fan Ni <fan.ni@samsung.com>

Rename mem_size as static_mem_size for type3 memdev to cover static RAM and
pmem capacity, preparing for the introduction of dynamic capacity to support
dynamic capacity devices.

Signed-off-by: Fan Ni <fan.ni@samsung.com>
---
 hw/cxl/cxl-mailbox-utils.c  | 4 ++--
 hw/mem/cxl_type3.c          | 8 ++++----
 include/hw/cxl/cxl_device.h | 2 +-
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index f80dd6474f..707fd9fe7f 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -774,7 +774,7 @@ static CXLRetCode cmd_identify_memory_device(const struct cxl_cmd *cmd,
     snprintf(id->fw_revision, 0x10, "BWFW VERSION %02d", 0);
 
     stq_le_p(&id->total_capacity,
-             cxl_dstate->mem_size / CXL_CAPACITY_MULTIPLIER);
+            cxl_dstate->static_mem_size / CXL_CAPACITY_MULTIPLIER);
     stq_le_p(&id->persistent_capacity,
              cxl_dstate->pmem_size / CXL_CAPACITY_MULTIPLIER);
     stq_le_p(&id->volatile_capacity,
@@ -1149,7 +1149,7 @@ static CXLRetCode cmd_media_clear_poison(const struct cxl_cmd *cmd,
     struct clear_poison_pl *in = (void *)payload_in;
 
     dpa = ldq_le_p(&in->dpa);
-    if (dpa + CXL_CACHE_LINE_SIZE > cxl_dstate->mem_size) {
+    if (dpa + CXL_CACHE_LINE_SIZE > cxl_dstate->static_mem_size) {
         return CXL_MBOX_INVALID_PA;
     }
 
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 6c1ccda159..754c885cd1 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -762,7 +762,7 @@ static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
         }
         address_space_init(&ct3d->hostvmem_as, vmr, v_name);
         ct3d->cxl_dstate.vmem_size = memory_region_size(vmr);
-        ct3d->cxl_dstate.mem_size += memory_region_size(vmr);
+        ct3d->cxl_dstate.static_mem_size += memory_region_size(vmr);
         g_free(v_name);
     }
 
@@ -785,7 +785,7 @@ static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
         }
         address_space_init(&ct3d->hostpmem_as, pmr, p_name);
         ct3d->cxl_dstate.pmem_size = memory_region_size(pmr);
-        ct3d->cxl_dstate.mem_size += memory_region_size(pmr);
+        ct3d->cxl_dstate.static_mem_size += memory_region_size(pmr);
         g_free(p_name);
     }
 
@@ -1008,7 +1008,7 @@ static int cxl_type3_hpa_to_as_and_dpa(CXLType3Dev *ct3d,
         return -EINVAL;
     }
 
-    if (*dpa_offset > ct3d->cxl_dstate.mem_size) {
+    if (*dpa_offset > ct3d->cxl_dstate.static_mem_size) {
         return -EINVAL;
     }
 
@@ -1188,7 +1188,7 @@ static bool set_cacheline(CXLType3Dev *ct3d, uint64_t dpa_offset, uint8_t *data)
         return false;
     }
 
-    if (dpa_offset + CXL_CACHE_LINE_SIZE > ct3d->cxl_dstate.mem_size) {
+    if (dpa_offset + CXL_CACHE_LINE_SIZE > ct3d->cxl_dstate.static_mem_size) {
         return false;
     }
 
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index 334c51fddb..de6469eef7 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -232,7 +232,7 @@ typedef struct cxl_device_state {
     } timestamp;
 
     /* memory region size, HDM */
-    uint64_t mem_size;
+    uint64_t static_mem_size;
     uint64_t pmem_size;
     uint64_t vmem_size;
     bool is_dcd;
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 4/9] hw/mem/cxl_type3: Add support to create DC regions to type3 memory devices
  2023-11-07 18:07 [PATCH v3 0/9] Enabling DCD emulation support in Qemu nifan.cxl
                   ` (2 preceding siblings ...)
  2023-11-07 18:07 ` [PATCH v3 3/9] include/hw/cxl/cxl_device: Rename mem_size as static_mem_size for type3 memory devices nifan.cxl
@ 2023-11-07 18:07 ` nifan.cxl
  2024-01-24 15:23   ` Jonathan Cameron
  2023-11-07 18:07 ` [PATCH v3 5/9] hw/mem/cxl_type3: Add host backend and address space handling for DC regions nifan.cxl
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 37+ messages in thread
From: nifan.cxl @ 2023-11-07 18:07 UTC (permalink / raw
  To: qemu-devel
  Cc: jonathan.cameron, linux-cxl, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, nifan, jim.harris, Fan Ni

From: Fan Ni <fan.ni@samsung.com>

With the change, when setting up memory for type3 memory device, we can
create DC regions.
A property 'num-dc-regions' is added to ct3_props to allow users to pass the
number of DC regions to create. To make it easier, other region parameters
like region base, length, and block size are hard coded. If needed,
these parameters can be added easily.

With the change, we can create DC regions with proper kernel side
support as below:

region=$(cat /sys/bus/cxl/devices/decoder0.0/create_dc_region)
echo $region> /sys/bus/cxl/devices/decoder0.0/create_dc_region
echo 256 > /sys/bus/cxl/devices/$region/interleave_granularity
echo 1 > /sys/bus/cxl/devices/$region/interleave_ways

echo "dc0" >/sys/bus/cxl/devices/decoder2.0/mode
echo 0x40000000 >/sys/bus/cxl/devices/decoder2.0/dpa_size

echo 0x40000000 > /sys/bus/cxl/devices/$region/size
echo  "decoder2.0" > /sys/bus/cxl/devices/$region/target0
echo 1 > /sys/bus/cxl/devices/$region/commit
echo $region > /sys/bus/cxl/drivers/cxl_region/bind

Signed-off-by: Fan Ni <fan.ni@samsung.com>
---
 hw/mem/cxl_type3.c | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 754c885cd1..2d67d2015c 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -721,6 +721,36 @@ static void ct3d_reg_write(void *opaque, hwaddr offset, uint64_t value,
     }
 }
 
+static int cxl_create_dc_regions(CXLType3Dev *ct3d)
+{
+    int i;
+    uint64_t region_base = 0;
+    uint64_t region_len =  2 * GiB;
+    uint64_t decode_len = 8; /* 8*256MB */
+    uint64_t blk_size = 2 * MiB;
+    CXLDCDRegion *region;
+
+    if (ct3d->hostvmem) {
+        region_base += ct3d->hostvmem->size;
+    }
+    if (ct3d->hostpmem) {
+        region_base += ct3d->hostpmem->size;
+    }
+    for (i = 0; i < ct3d->dc.num_regions; i++) {
+        region = &ct3d->dc.regions[i];
+        region->base = region_base;
+        region->decode_len = decode_len;
+        region->len = region_len;
+        region->block_size = blk_size;
+        /* dsmad_handle is set when creating cdat table entries */
+        region->flags = 0;
+
+        region_base += region->len;
+    }
+
+    return 0;
+}
+
 static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
 {
     DeviceState *ds = DEVICE(ct3d);
@@ -789,6 +819,10 @@ static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
         g_free(p_name);
     }
 
+    if (cxl_create_dc_regions(ct3d)) {
+        return false;
+    }
+
     return true;
 }
 
@@ -1108,6 +1142,7 @@ static Property ct3_props[] = {
     DEFINE_PROP_UINT64("sn", CXLType3Dev, sn, UI64_NULL),
     DEFINE_PROP_STRING("cdat", CXLType3Dev, cxl_cstate.cdat.filename),
     DEFINE_PROP_UINT16("spdm", CXLType3Dev, spdm_port, 0),
+    DEFINE_PROP_UINT8("num-dc-regions", CXLType3Dev, dc.num_regions, 0),
     DEFINE_PROP_END_OF_LIST(),
 };
 
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 5/9] hw/mem/cxl_type3: Add host backend and address space handling for DC regions
  2023-11-07 18:07 [PATCH v3 0/9] Enabling DCD emulation support in Qemu nifan.cxl
                   ` (3 preceding siblings ...)
  2023-11-07 18:07 ` [PATCH v3 4/9] hw/mem/cxl_type3: Add support to create DC regions to " nifan.cxl
@ 2023-11-07 18:07 ` nifan.cxl
  2024-01-24 15:47   ` Jonathan Cameron
  2023-11-07 18:07 ` [PATCH v3 6/9] hw/mem/cxl_type3: Add DC extent list representative and get DC extent list mailbox support nifan.cxl
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 37+ messages in thread
From: nifan.cxl @ 2023-11-07 18:07 UTC (permalink / raw
  To: qemu-devel
  Cc: jonathan.cameron, linux-cxl, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, nifan, jim.harris, Fan Ni

From: Fan Ni <fan.ni@samsung.com>

Add (file/memory backed) host backend, all the dynamic capacity regions
will share a single, large enough host backend. Set up address space for
DC regions to support read/write operations to dynamic capacity for DCD.

With the change, following supports are added:
1. Add a new property to type3 device "nonvolatile-dc-memdev" to point to host
   memory backend for dynamic capacity. Currently, all dc regions share one
   one host backend.
2. Add namespace for dynamic capacity for read/write support;
3. Create cdat entries for each dynamic capacity region;
4. Fix dvsec range registers to include DC regions.

Signed-off-by: Fan Ni <fan.ni@samsung.com>
---
 hw/cxl/cxl-mailbox-utils.c  |  16 ++-
 hw/mem/cxl_type3.c          | 198 +++++++++++++++++++++++++++++-------
 include/hw/cxl/cxl_device.h |   4 +
 3 files changed, 179 insertions(+), 39 deletions(-)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 707fd9fe7f..1f512b3e6b 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -596,7 +596,8 @@ static CXLRetCode cmd_firmware_update_get_info(const struct cxl_cmd *cmd,
                                                size_t *len_out,
                                                CXLCCI *cci)
 {
-    CXLDeviceState *cxl_dstate = &CXL_TYPE3(cci->d)->cxl_dstate;
+    CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
+    CXLDeviceState *cxl_dstate = &ct3d->cxl_dstate;
     struct {
         uint8_t slots_supported;
         uint8_t slot_info;
@@ -610,7 +611,8 @@ static CXLRetCode cmd_firmware_update_get_info(const struct cxl_cmd *cmd,
     QEMU_BUILD_BUG_ON(sizeof(*fw_info) != 0x50);
 
     if ((cxl_dstate->vmem_size < CXL_CAPACITY_MULTIPLIER) ||
-        (cxl_dstate->pmem_size < CXL_CAPACITY_MULTIPLIER)) {
+        (cxl_dstate->pmem_size < CXL_CAPACITY_MULTIPLIER) ||
+        (ct3d->dc.total_capacity < CXL_CAPACITY_MULTIPLIER)) {
         return CXL_MBOX_INTERNAL_ERROR;
     }
 
@@ -764,7 +766,8 @@ static CXLRetCode cmd_identify_memory_device(const struct cxl_cmd *cmd,
     CXLDeviceState *cxl_dstate = &ct3d->cxl_dstate;
 
     if ((!QEMU_IS_ALIGNED(cxl_dstate->vmem_size, CXL_CAPACITY_MULTIPLIER)) ||
-        (!QEMU_IS_ALIGNED(cxl_dstate->pmem_size, CXL_CAPACITY_MULTIPLIER))) {
+        (!QEMU_IS_ALIGNED(cxl_dstate->pmem_size, CXL_CAPACITY_MULTIPLIER)) ||
+        (!QEMU_IS_ALIGNED(ct3d->dc.total_capacity, CXL_CAPACITY_MULTIPLIER))) {
         return CXL_MBOX_INTERNAL_ERROR;
     }
 
@@ -805,9 +808,11 @@ static CXLRetCode cmd_ccls_get_partition_info(const struct cxl_cmd *cmd,
         uint64_t next_pmem;
     } QEMU_PACKED *part_info = (void *)payload_out;
     QEMU_BUILD_BUG_ON(sizeof(*part_info) != 0x20);
+    CXLType3Dev *ct3d = container_of(cxl_dstate, CXLType3Dev, cxl_dstate);
 
     if ((!QEMU_IS_ALIGNED(cxl_dstate->vmem_size, CXL_CAPACITY_MULTIPLIER)) ||
-        (!QEMU_IS_ALIGNED(cxl_dstate->pmem_size, CXL_CAPACITY_MULTIPLIER))) {
+        (!QEMU_IS_ALIGNED(cxl_dstate->pmem_size, CXL_CAPACITY_MULTIPLIER)) ||
+        (!QEMU_IS_ALIGNED(ct3d->dc.total_capacity, CXL_CAPACITY_MULTIPLIER))) {
         return CXL_MBOX_INTERNAL_ERROR;
     }
 
@@ -1149,7 +1154,8 @@ static CXLRetCode cmd_media_clear_poison(const struct cxl_cmd *cmd,
     struct clear_poison_pl *in = (void *)payload_in;
 
     dpa = ldq_le_p(&in->dpa);
-    if (dpa + CXL_CACHE_LINE_SIZE > cxl_dstate->static_mem_size) {
+    if (dpa + CXL_CACHE_LINE_SIZE > cxl_dstate->static_mem_size +
+            ct3d->dc.total_capacity) {
         return CXL_MBOX_INVALID_PA;
     }
 
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 2d67d2015c..152a51306d 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -31,6 +31,7 @@
 #include "hw/pci/spdm.h"
 
 #define DWORD_BYTE 4
+#define CXL_CAPACITY_MULTIPLIER   (256 * MiB)
 
 /* Default CDAT entries for a memory region */
 enum {
@@ -44,8 +45,9 @@ enum {
 };
 
 static int ct3_build_cdat_entries_for_mr(CDATSubHeader **cdat_table,
-                                         int dsmad_handle, MemoryRegion *mr,
-                                         bool is_pmem, uint64_t dpa_base)
+                                         int dsmad_handle, uint64_t size,
+                                         bool is_pmem, bool is_dynamic,
+                                         uint64_t dpa_base)
 {
     g_autofree CDATDsmas *dsmas = NULL;
     g_autofree CDATDslbis *dslbis0 = NULL;
@@ -64,9 +66,10 @@ static int ct3_build_cdat_entries_for_mr(CDATSubHeader **cdat_table,
             .length = sizeof(*dsmas),
         },
         .DSMADhandle = dsmad_handle,
-        .flags = is_pmem ? CDAT_DSMAS_FLAG_NV : 0,
+        .flags = (is_pmem ? CDAT_DSMAS_FLAG_NV : 0) |
+            (is_dynamic ? CDAT_DSMAS_FLAG_DYNAMIC_CAP : 0),
         .DPA_base = dpa_base,
-        .DPA_length = memory_region_size(mr),
+        .DPA_length = size,
     };
 
     /* For now, no memory side cache, plausiblish numbers */
@@ -150,7 +153,7 @@ static int ct3_build_cdat_entries_for_mr(CDATSubHeader **cdat_table,
          */
         .EFI_memory_type_attr = is_pmem ? 2 : 1,
         .DPA_offset = 0,
-        .DPA_length = memory_region_size(mr),
+        .DPA_length = size,
     };
 
     /* Header always at start of structure */
@@ -169,21 +172,28 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table, void *priv)
     g_autofree CDATSubHeader **table = NULL;
     CXLType3Dev *ct3d = priv;
     MemoryRegion *volatile_mr = NULL, *nonvolatile_mr = NULL;
+    MemoryRegion *dc_mr = NULL;
     int dsmad_handle = 0;
     int cur_ent = 0;
     int len = 0;
     int rc, i;
+    uint64_t vmr_size = 0, pmr_size = 0;
 
-    if (!ct3d->hostpmem && !ct3d->hostvmem) {
+    if (!ct3d->hostpmem && !ct3d->hostvmem && !ct3d->dc.num_regions) {
         return 0;
     }
 
+    if (ct3d->hostpmem && ct3d->hostvmem && ct3d->dc.host_dc) {
+        warn_report("The device has static ram and pmem and dynamic capacity");
+    }
+
     if (ct3d->hostvmem) {
         volatile_mr = host_memory_backend_get_memory(ct3d->hostvmem);
         if (!volatile_mr) {
             return -EINVAL;
         }
         len += CT3_CDAT_NUM_ENTRIES;
+        vmr_size = memory_region_size(volatile_mr);
     }
 
     if (ct3d->hostpmem) {
@@ -192,6 +202,19 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table, void *priv)
             return -EINVAL;
         }
         len += CT3_CDAT_NUM_ENTRIES;
+        pmr_size = memory_region_size(nonvolatile_mr);
+    }
+
+    if (ct3d->dc.num_regions) {
+        if (ct3d->dc.host_dc) {
+            dc_mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
+            if (!dc_mr) {
+                return -EINVAL;
+            }
+            len += CT3_CDAT_NUM_ENTRIES * ct3d->dc.num_regions;
+        } else {
+            return -EINVAL;
+        }
     }
 
     table = g_malloc0(len * sizeof(*table));
@@ -201,8 +224,8 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table, void *priv)
 
     /* Now fill them in */
     if (volatile_mr) {
-        rc = ct3_build_cdat_entries_for_mr(table, dsmad_handle++, volatile_mr,
-                                           false, 0);
+        rc = ct3_build_cdat_entries_for_mr(table, dsmad_handle++, vmr_size,
+                                           false, false, 0);
         if (rc < 0) {
             return rc;
         }
@@ -210,14 +233,38 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table, void *priv)
     }
 
     if (nonvolatile_mr) {
-        uint64_t base = volatile_mr ? memory_region_size(volatile_mr) : 0;
         rc = ct3_build_cdat_entries_for_mr(&(table[cur_ent]), dsmad_handle++,
-                                           nonvolatile_mr, true, base);
+                                           pmr_size, true, false, vmr_size);
         if (rc < 0) {
             goto error_cleanup;
         }
         cur_ent += CT3_CDAT_NUM_ENTRIES;
     }
+
+    if (dc_mr) {
+        uint64_t region_base = vmr_size + pmr_size;
+
+        /*
+         * Currently we create cdat entries for each region, should we only
+         * create dsmas table instead??
+         * We assume all dc regions are non-volatile for now.
+         *
+         */
+        for (i = 0; i < ct3d->dc.num_regions; i++) {
+            rc = ct3_build_cdat_entries_for_mr(&(table[cur_ent]),
+                                               dsmad_handle++,
+                                               ct3d->dc.regions[i].len,
+                                               true, true, region_base);
+            if (rc < 0) {
+                goto error_cleanup;
+            }
+            ct3d->dc.regions[i].dsmadhandle = dsmad_handle - 1;
+
+            cur_ent += CT3_CDAT_NUM_ENTRIES;
+            region_base += ct3d->dc.regions[i].len;
+        }
+    }
+
     assert(len == cur_ent);
 
     *cdat_table = g_steal_pointer(&table);
@@ -445,11 +492,24 @@ static void build_dvsecs(CXLType3Dev *ct3d)
             range2_size_hi = ct3d->hostpmem->size >> 32;
             range2_size_lo = (2 << 5) | (2 << 2) | 0x3 |
                              (ct3d->hostpmem->size & 0xF0000000);
+        } else if (ct3d->dc.host_dc) {
+            range2_size_hi = ct3d->dc.host_dc->size >> 32;
+            range2_size_lo = (2 << 5) | (2 << 2) | 0x3 |
+                             (ct3d->dc.host_dc->size & 0xF0000000);
         }
-    } else {
+    } else if (ct3d->hostpmem) {
         range1_size_hi = ct3d->hostpmem->size >> 32;
         range1_size_lo = (2 << 5) | (2 << 2) | 0x3 |
                          (ct3d->hostpmem->size & 0xF0000000);
+        if (ct3d->dc.host_dc) {
+            range2_size_hi = ct3d->dc.host_dc->size >> 32;
+            range2_size_lo = (2 << 5) | (2 << 2) | 0x3 |
+                             (ct3d->dc.host_dc->size & 0xF0000000);
+        }
+    } else {
+        range1_size_hi = ct3d->dc.host_dc->size >> 32;
+        range1_size_lo = (2 << 5) | (2 << 2) | 0x3 |
+            (ct3d->dc.host_dc->size & 0xF0000000);
     }
 
     dvsec = (uint8_t *)&(CXLDVSECDevice){
@@ -721,6 +781,9 @@ static void ct3d_reg_write(void *opaque, hwaddr offset, uint64_t value,
     }
 }
 
+/*
+ * TODO: region parameters are hard coded, may need to change in the future.
+ */
 static int cxl_create_dc_regions(CXLType3Dev *ct3d)
 {
     int i;
@@ -736,6 +799,7 @@ static int cxl_create_dc_regions(CXLType3Dev *ct3d)
     if (ct3d->hostpmem) {
         region_base += ct3d->hostpmem->size;
     }
+
     for (i = 0; i < ct3d->dc.num_regions; i++) {
         region = &ct3d->dc.regions[i];
         region->base = region_base;
@@ -755,7 +819,8 @@ static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
 {
     DeviceState *ds = DEVICE(ct3d);
 
-    if (!ct3d->hostmem && !ct3d->hostvmem && !ct3d->hostpmem) {
+    if (!ct3d->hostmem && !ct3d->hostvmem && !ct3d->hostpmem
+            && !ct3d->dc.num_regions) {
         error_setg(errp, "at least one memdev property must be set");
         return false;
     } else if (ct3d->hostmem && ct3d->hostpmem) {
@@ -823,6 +888,50 @@ static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
         return false;
     }
 
+    ct3d->dc.total_capacity = 0;
+    if (ct3d->dc.host_dc) {
+        MemoryRegion *dc_mr;
+        char *dc_name;
+        uint64_t total_region_size = 0;
+        int i;
+
+        dc_mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
+        if (!dc_mr) {
+            error_setg(errp, "dynamic capacity must have backing device");
+            return false;
+        }
+        /* FIXME: set dc as nonvolatile for now */
+        memory_region_set_nonvolatile(dc_mr, true);
+        memory_region_set_enabled(dc_mr, true);
+        host_memory_backend_set_mapped(ct3d->dc.host_dc, true);
+        if (ds->id) {
+            dc_name = g_strdup_printf("cxl-dcd-dpa-dc-space:%s", ds->id);
+        } else {
+            dc_name = g_strdup("cxl-dcd-dpa-dc-space");
+        }
+        address_space_init(&ct3d->dc.host_dc_as, dc_mr, dc_name);
+        g_free(dc_name);
+
+        for (i = 0; i < ct3d->dc.num_regions; i++) {
+            total_region_size += ct3d->dc.regions[i].len;
+        }
+        /* Make sure the host backend is large enough to cover all dc range */
+        if (total_region_size > memory_region_size(dc_mr)) {
+            error_setg(errp,
+                "too small host backend size, increase to %lu MiB or more",
+                total_region_size / MiB);
+            return false;
+        }
+
+        if (dc_mr->size % CXL_CAPACITY_MULTIPLIER != 0) {
+            error_setg(errp, "DC region size is unaligned to %lx",
+                    CXL_CAPACITY_MULTIPLIER);
+            return false;
+        }
+
+        ct3d->dc.total_capacity = total_region_size;
+    }
+
     return true;
 }
 
@@ -933,6 +1042,9 @@ err_release_cdat:
 err_free_special_ops:
     g_free(regs->special_ops);
 err_address_space_free:
+    if (ct3d->dc.host_dc) {
+        address_space_destroy(&ct3d->dc.host_dc_as);
+    }
     if (ct3d->hostpmem) {
         address_space_destroy(&ct3d->hostpmem_as);
     }
@@ -952,6 +1064,9 @@ static void ct3_exit(PCIDevice *pci_dev)
     cxl_doe_cdat_release(cxl_cstate);
     spdm_sock_fini(ct3d->doe_spdm.socket);
     g_free(regs->special_ops);
+    if (ct3d->dc.host_dc) {
+        address_space_destroy(&ct3d->dc.host_dc_as);
+    }
     if (ct3d->hostpmem) {
         address_space_destroy(&ct3d->hostpmem_as);
     }
@@ -1025,16 +1140,24 @@ static int cxl_type3_hpa_to_as_and_dpa(CXLType3Dev *ct3d,
                                        AddressSpace **as,
                                        uint64_t *dpa_offset)
 {
-    MemoryRegion *vmr = NULL, *pmr = NULL;
+    MemoryRegion *vmr = NULL, *pmr = NULL, *dc_mr = NULL;
+    uint64_t vmr_size = 0, pmr_size = 0, dc_size = 0;
 
     if (ct3d->hostvmem) {
         vmr = host_memory_backend_get_memory(ct3d->hostvmem);
+        vmr_size = memory_region_size(vmr);
     }
     if (ct3d->hostpmem) {
         pmr = host_memory_backend_get_memory(ct3d->hostpmem);
+        pmr_size = memory_region_size(pmr);
+    }
+    if (ct3d->dc.host_dc) {
+        dc_mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
+        /* Do we want dc_size to be dc_mr->size or not?? */
+        dc_size = ct3d->dc.total_capacity;
     }
 
-    if (!vmr && !pmr) {
+    if (!vmr && !pmr && !dc_mr) {
         return -ENODEV;
     }
 
@@ -1042,19 +1165,18 @@ static int cxl_type3_hpa_to_as_and_dpa(CXLType3Dev *ct3d,
         return -EINVAL;
     }
 
-    if (*dpa_offset > ct3d->cxl_dstate.static_mem_size) {
+    if (*dpa_offset >= vmr_size + pmr_size + dc_size) {
         return -EINVAL;
     }
 
-    if (vmr) {
-        if (*dpa_offset < memory_region_size(vmr)) {
-            *as = &ct3d->hostvmem_as;
-        } else {
-            *as = &ct3d->hostpmem_as;
-            *dpa_offset -= memory_region_size(vmr);
-        }
-    } else {
+    if (*dpa_offset < vmr_size) {
+        *as = &ct3d->hostvmem_as;
+    } else if (*dpa_offset < vmr_size + pmr_size) {
         *as = &ct3d->hostpmem_as;
+        *dpa_offset -= vmr_size;
+    } else {
+        *as = &ct3d->dc.host_dc_as;
+        *dpa_offset -= (vmr_size + pmr_size);
     }
 
     return 0;
@@ -1143,6 +1265,8 @@ static Property ct3_props[] = {
     DEFINE_PROP_STRING("cdat", CXLType3Dev, cxl_cstate.cdat.filename),
     DEFINE_PROP_UINT16("spdm", CXLType3Dev, spdm_port, 0),
     DEFINE_PROP_UINT8("num-dc-regions", CXLType3Dev, dc.num_regions, 0),
+    DEFINE_PROP_LINK("nonvolatile-dc-memdev", CXLType3Dev, dc.host_dc,
+                    TYPE_MEMORY_BACKEND, HostMemoryBackend *),
     DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -1209,33 +1333,39 @@ static void set_lsa(CXLType3Dev *ct3d, const void *buf, uint64_t size,
 
 static bool set_cacheline(CXLType3Dev *ct3d, uint64_t dpa_offset, uint8_t *data)
 {
-    MemoryRegion *vmr = NULL, *pmr = NULL;
+    MemoryRegion *vmr = NULL, *pmr = NULL, *dc_mr = NULL;
     AddressSpace *as;
+    uint64_t vmr_size = 0, pmr_size = 0, dc_size = 0;
 
     if (ct3d->hostvmem) {
         vmr = host_memory_backend_get_memory(ct3d->hostvmem);
+        vmr_size = memory_region_size(vmr);
     }
     if (ct3d->hostpmem) {
         pmr = host_memory_backend_get_memory(ct3d->hostpmem);
+        pmr_size = memory_region_size(pmr);
     }
+    if (ct3d->dc.host_dc) {
+        dc_mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
+        dc_size = ct3d->dc.total_capacity;
+     }
 
-    if (!vmr && !pmr) {
+    if (!vmr && !pmr && !dc_mr) {
         return false;
     }
 
-    if (dpa_offset + CXL_CACHE_LINE_SIZE > ct3d->cxl_dstate.static_mem_size) {
+    if (dpa_offset + CXL_CACHE_LINE_SIZE > vmr_size + pmr_size + dc_size) {
         return false;
     }
 
-    if (vmr) {
-        if (dpa_offset < memory_region_size(vmr)) {
-            as = &ct3d->hostvmem_as;
-        } else {
-            as = &ct3d->hostpmem_as;
-            dpa_offset -= memory_region_size(vmr);
-        }
-    } else {
+    if (dpa_offset < vmr_size) {
+        as = &ct3d->hostvmem_as;
+    } else if (dpa_offset < vmr_size + pmr_size) {
         as = &ct3d->hostpmem_as;
+        dpa_offset -= vmr_size;
+    } else {
+        as = &ct3d->dc.host_dc_as;
+        dpa_offset -= (vmr_size + pmr_size);
     }
 
     address_space_write(as, dpa_offset, MEMTXATTRS_UNSPECIFIED, &data,
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index de6469eef7..3dc6928bc5 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -467,6 +467,10 @@ struct CXLType3Dev {
     uint64_t poison_list_overflow_ts;
 
     struct dynamic_capacity {
+        HostMemoryBackend *host_dc;
+        AddressSpace host_dc_as;
+        uint64_t total_capacity; /* 256M aligned */
+
         uint8_t num_regions; /* 0-8 regions */
         CXLDCDRegion regions[DCD_MAX_REGION_NUM];
     } dc;
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 6/9] hw/mem/cxl_type3: Add DC extent list representative and get DC extent list mailbox support
  2023-11-07 18:07 [PATCH v3 0/9] Enabling DCD emulation support in Qemu nifan.cxl
                   ` (4 preceding siblings ...)
  2023-11-07 18:07 ` [PATCH v3 5/9] hw/mem/cxl_type3: Add host backend and address space handling for DC regions nifan.cxl
@ 2023-11-07 18:07 ` nifan.cxl
  2024-01-24 15:56   ` Jonathan Cameron
  2024-02-23  7:10   ` Wonjae Lee
  2023-11-07 18:07 ` [PATCH v3 7/9] hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release dynamic capacity response nifan.cxl
                   ` (4 subsequent siblings)
  10 siblings, 2 replies; 37+ messages in thread
From: nifan.cxl @ 2023-11-07 18:07 UTC (permalink / raw
  To: qemu-devel
  Cc: jonathan.cameron, linux-cxl, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, nifan, jim.harris, Fan Ni

From: Fan Ni <fan.ni@samsung.com>

Add dynamic capacity extent list representative to the definition of
CXLType3Dev and add get DC extent list mailbox command per
CXL.spec.3.0:.8.2.9.8.9.2.

Signed-off-by: Fan Ni <fan.ni@samsung.com>
---
 hw/cxl/cxl-mailbox-utils.c  | 73 +++++++++++++++++++++++++++++++++++++
 hw/mem/cxl_type3.c          |  1 +
 include/hw/cxl/cxl_device.h | 23 ++++++++++++
 3 files changed, 97 insertions(+)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 1f512b3e6b..56f4aa237a 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -82,6 +82,7 @@ enum {
         #define CLEAR_POISON           0x2
     DCD_CONFIG  = 0x48,
         #define GET_DC_CONFIG          0x0
+        #define GET_DYN_CAP_EXT_LIST   0x1
     PHYSICAL_SWITCH = 0x51,
         #define IDENTIFY_SWITCH_DEVICE      0x0
         #define GET_PHYSICAL_PORT_STATE     0x1
@@ -1286,6 +1287,75 @@ static CXLRetCode cmd_dcd_get_dyn_cap_config(const struct cxl_cmd *cmd,
     return CXL_MBOX_SUCCESS;
 }
 
+/*
+ * CXL r3.0 section 8.2.9.8.9.2:
+ * Get Dynamic Capacity Extent List (Opcode 4810h)
+ */
+static CXLRetCode cmd_dcd_get_dyn_cap_ext_list(const struct cxl_cmd *cmd,
+                                               uint8_t *payload_in,
+                                               size_t len_in,
+                                               uint8_t *payload_out,
+                                               size_t *len_out,
+                                               CXLCCI *cci)
+{
+    CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
+    struct get_dyn_cap_ext_list_in_pl {
+        uint32_t extent_cnt;
+        uint32_t start_extent_id;
+    } QEMU_PACKED;
+
+    struct get_dyn_cap_ext_list_out_pl {
+        uint32_t count;
+        uint32_t total_extents;
+        uint32_t generation_num;
+        uint8_t rsvd[4];
+        CXLDCExtentRaw records[];
+    } QEMU_PACKED;
+
+    struct get_dyn_cap_ext_list_in_pl *in = (void *)payload_in;
+    struct get_dyn_cap_ext_list_out_pl *out = (void *)payload_out;
+    uint16_t record_count = 0, i = 0, record_done = 0;
+    CXLDCDExtentList *extent_list = &ct3d->dc.extents;
+    CXLDCDExtent *ent;
+    uint16_t out_pl_len;
+    uint32_t start_extent_id = in->start_extent_id;
+
+    if (start_extent_id > ct3d->dc.total_extent_count) {
+        return CXL_MBOX_INVALID_INPUT;
+    }
+
+    record_count = MIN(in->extent_cnt,
+                       ct3d->dc.total_extent_count - start_extent_id);
+
+    out_pl_len = sizeof(*out) + record_count * sizeof(out->records[0]);
+    /* May need more processing here in the future */
+    assert(out_pl_len <= CXL_MAILBOX_MAX_PAYLOAD_SIZE);
+
+    memset(out, 0, out_pl_len);
+    stl_le_p(&out->count, record_count);
+    stl_le_p(&out->total_extents, ct3d->dc.total_extent_count);
+    stl_le_p(&out->generation_num, ct3d->dc.ext_list_gen_seq);
+
+    if (record_count > 0) {
+        QTAILQ_FOREACH(ent, extent_list, node) {
+            if (i++ < start_extent_id) {
+                continue;
+            }
+            stq_le_p(&out->records[record_done].start_dpa, ent->start_dpa);
+            stq_le_p(&out->records[record_done].len, ent->len);
+            memcpy(&out->records[record_done].tag, ent->tag, 0x10);
+            stw_le_p(&out->records[record_done].shared_seq, ent->shared_seq);
+            record_done++;
+            if (record_done == record_count) {
+                break;
+            }
+        }
+    }
+
+    *len_out = out_pl_len;
+    return CXL_MBOX_SUCCESS;
+}
+
 #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
 #define IMMEDIATE_DATA_CHANGE (1 << 2)
 #define IMMEDIATE_POLICY_CHANGE (1 << 3)
@@ -1333,6 +1403,9 @@ static const struct cxl_cmd cxl_cmd_set[256][256] = {
 static const struct cxl_cmd cxl_cmd_set_dcd[256][256] = {
     [DCD_CONFIG][GET_DC_CONFIG] = { "DCD_GET_DC_CONFIG",
         cmd_dcd_get_dyn_cap_config, 2, 0 },
+    [DCD_CONFIG][GET_DYN_CAP_EXT_LIST] = {
+        "DCD_GET_DYNAMIC_CAPACITY_EXTENT_LIST", cmd_dcd_get_dyn_cap_ext_list,
+        8, 0 },
 };
 
 static const struct cxl_cmd cxl_cmd_set_sw[256][256] = {
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 152a51306d..c9d792a725 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -811,6 +811,7 @@ static int cxl_create_dc_regions(CXLType3Dev *ct3d)
 
         region_base += region->len;
     }
+    QTAILQ_INIT(&ct3d->dc.extents);
 
     return 0;
 }
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index 3dc6928bc5..5738c6f434 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -420,6 +420,25 @@ typedef QLIST_HEAD(, CXLPoison) CXLPoisonList;
 
 #define DCD_MAX_REGION_NUM 8
 
+typedef struct CXLDCDExtentRaw {
+    uint64_t start_dpa;
+    uint64_t len;
+    uint8_t tag[0x10];
+    uint16_t shared_seq;
+    uint8_t rsvd[0x6];
+} QEMU_PACKED CXLDCExtentRaw;
+
+typedef struct CXLDCDExtent {
+    uint64_t start_dpa;
+    uint64_t len;
+    uint8_t tag[0x10];
+    uint16_t shared_seq;
+    uint8_t rsvd[0x6];
+
+    QTAILQ_ENTRY(CXLDCDExtent) node;
+} CXLDCDExtent;
+typedef QTAILQ_HEAD(, CXLDCDExtent) CXLDCDExtentList;
+
 typedef struct CXLDCDRegion {
     uint64_t base;
     uint64_t decode_len; /* aligned to 256*MiB */
@@ -470,6 +489,10 @@ struct CXLType3Dev {
         HostMemoryBackend *host_dc;
         AddressSpace host_dc_as;
         uint64_t total_capacity; /* 256M aligned */
+        CXLDCDExtentList extents;
+
+        uint32_t total_extent_count;
+        uint32_t ext_list_gen_seq;
 
         uint8_t num_regions; /* 0-8 regions */
         CXLDCDRegion regions[DCD_MAX_REGION_NUM];
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 7/9] hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release dynamic capacity response
  2023-11-07 18:07 [PATCH v3 0/9] Enabling DCD emulation support in Qemu nifan.cxl
                   ` (5 preceding siblings ...)
  2023-11-07 18:07 ` [PATCH v3 6/9] hw/mem/cxl_type3: Add DC extent list representative and get DC extent list mailbox support nifan.cxl
@ 2023-11-07 18:07 ` nifan.cxl
  2024-01-24 16:23   ` Jonathan Cameron
  2023-11-07 18:07 ` [PATCH v3 8/9] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents nifan.cxl
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 37+ messages in thread
From: nifan.cxl @ 2023-11-07 18:07 UTC (permalink / raw
  To: qemu-devel
  Cc: jonathan.cameron, linux-cxl, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, nifan, jim.harris, Fan Ni

From: Fan Ni <fan.ni@samsung.com>

Per CXL spec 3.0, two mailbox commands are implemented:
Add Dynamic Capacity Response (Opcode 4802h) 8.2.9.8.9.3, and
Release Dynamic Capacity (Opcode 4803h) 8.2.9.8.9.4.

Signed-off-by: Fan Ni <fan.ni@samsung.com>
---
 hw/cxl/cxl-mailbox-utils.c  | 271 ++++++++++++++++++++++++++++++++++++
 hw/mem/cxl_type3.c          |   3 +-
 include/hw/cxl/cxl_device.h |   5 +-
 3 files changed, 277 insertions(+), 2 deletions(-)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 56f4aa237a..9f788b03b6 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -83,6 +83,8 @@ enum {
     DCD_CONFIG  = 0x48,
         #define GET_DC_CONFIG          0x0
         #define GET_DYN_CAP_EXT_LIST   0x1
+        #define ADD_DYN_CAP_RSP        0x2
+        #define RELEASE_DYN_CAP        0x3
     PHYSICAL_SWITCH = 0x51,
         #define IDENTIFY_SWITCH_DEVICE      0x0
         #define GET_PHYSICAL_PORT_STATE     0x1
@@ -1356,6 +1358,269 @@ static CXLRetCode cmd_dcd_get_dyn_cap_ext_list(const struct cxl_cmd *cmd,
     return CXL_MBOX_SUCCESS;
 }
 
+/*
+ * Check whether any bit between addr[nr, nr+size) is set,
+ * return true if any bit is set, otherwise return false
+ */
+static bool test_any_bits_set(const unsigned long *addr, int nr, int size)
+{
+    unsigned long res = find_next_bit(addr, size + nr, nr);
+
+    return res < nr + size;
+}
+
+CXLDCDRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len)
+{
+    CXLDCDRegion *region = &ct3d->dc.regions[0];
+    int i;
+
+    if (dpa < region->base ||
+        dpa >= region->base + ct3d->dc.total_capacity) {
+        return NULL;
+    }
+
+    /*
+     * CXL r3.0 section 9.13.3: Dynamic Capacity Device (DCD)
+     *
+     * Regions are used in increasing-DPA order, with Region 0 being used for
+     * the lowest DPA of Dynamic Capacity and Region 7 for the highest DPA.
+     * So check from the last region to find where the dpa belongs. Extents that
+     * cross multiple regions are not allowed.
+     */
+    for (i = ct3d->dc.num_regions - 1; i >= 0; i--) {
+        region = &ct3d->dc.regions[i];
+        if (dpa >= region->base) {
+            /*Should we compare with decode_len or len of the region??*/
+            if (dpa + len > region->base +
+                    region->decode_len * CXL_CAPACITY_MULTIPLIER)
+                return NULL;
+            return region;
+        }
+    }
+    return NULL;
+}
+
+static void cxl_insert_extent_to_extent_list(CXLDCDExtentList *list,
+                                             uint64_t dpa,
+                                             uint64_t len,
+                                             uint8_t *tag,
+                                             uint16_t shared_seq)
+{
+    CXLDCDExtent *extent;
+
+    extent = g_new0(CXLDCDExtent, 1);
+    extent->start_dpa = dpa;
+    extent->len = len;
+    if (tag) {
+        memcpy(extent->tag, tag, 0x10);
+    } else {
+        memset(extent->tag, 0, 0x10);
+    }
+    extent->shared_seq = shared_seq;
+
+    QTAILQ_INSERT_TAIL(list, extent, node);
+}
+
+/*
+ * CXL r3.0 Table 8-129: Add Dynamic Capacity Response Input Payload
+ * CXL r3.0 Table 8-131: Release Dynamic Capacity Input Payload
+ */
+typedef struct updated_dc_extent_list_in_pl {
+    uint32_t num_entries_updated;
+    uint8_t rsvd[4];
+    /* CXL r3.0 Table 8-130: Updated Extent List */
+    struct {
+        uint64_t start_dpa;
+        uint64_t len;
+        uint8_t rsvd[8];
+    } QEMU_PACKED updated_entries[];
+} QEMU_PACKED updated_dc_extent_list_in_pl;
+
+/*
+ * For the extents in the extent list to operate, check whether they are valid
+ * 1. The extent should be in the range of a valid DC region;
+ * 2. The extent should not cross multiple regions;
+ * 3. The start DPA and the length of the extent should align with the block
+ * size of the region;
+ * 4. The address range of multiple extents in the list should not overlap.
+ */
+static CXLRetCode cxl_detect_malformed_extent_list(CXLType3Dev *ct3d,
+        const updated_dc_extent_list_in_pl *in)
+{
+    uint64_t min_block_size = UINT64_MAX;
+    CXLDCDRegion *region = &ct3d->dc.regions[0];
+    CXLDCDRegion *lastregion = &ct3d->dc.regions[ct3d->dc.num_regions - 1];
+    g_autofree unsigned long *blk_bitmap = NULL;
+    uint64_t dpa, len;
+    uint32_t i;
+
+    for (i = 0; i < ct3d->dc.num_regions; i++) {
+        region = &ct3d->dc.regions[i];
+        min_block_size = MIN(min_block_size, region->block_size);
+    }
+
+    blk_bitmap = bitmap_new((lastregion->len + lastregion->base -
+                             ct3d->dc.regions[0].base) / min_block_size);
+
+    for (i = 0; i < in->num_entries_updated; i++) {
+        dpa = in->updated_entries[i].start_dpa;
+        len = in->updated_entries[i].len;
+
+        region = cxl_find_dc_region(ct3d, dpa, len);
+        if (!region) {
+            return CXL_MBOX_INVALID_PA;
+        }
+
+        dpa -= ct3d->dc.regions[0].base;
+        if (dpa % region->block_size || len % region->block_size) {
+            return CXL_MBOX_INVALID_EXTENT_LIST;
+        }
+        /* the dpa range already covered by some other extents in the list */
+        if (test_any_bits_set(blk_bitmap, dpa / min_block_size,
+            len / min_block_size)) {
+            return CXL_MBOX_INVALID_EXTENT_LIST;
+        }
+        bitmap_set(blk_bitmap, dpa / min_block_size, len / min_block_size);
+   }
+
+    return CXL_MBOX_SUCCESS;
+}
+
+/*
+ * CXL r3.0 section 8.2.9.8.9.3: Add Dynamic Capacity Response (opcode 4802h)
+ *
+ * Assume an extent is added only after the response is processed successfully
+ * TODO: for better extent list validation, a better solution would be
+ * maintaining a pending extent list and use it to verify the extent list in
+ * the response.
+ */
+static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
+                                          uint8_t *payload_in,
+                                          size_t len_in,
+                                          uint8_t *payload_out,
+                                          size_t *len_out,
+                                          CXLCCI *cci)
+{
+    updated_dc_extent_list_in_pl *in = (void *)payload_in;
+    CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
+    CXLDCDExtentList *extent_list = &ct3d->dc.extents;
+    CXLDCDExtent *ent;
+    uint32_t i;
+    uint64_t dpa, len;
+    CXLRetCode ret;
+
+    if (in->num_entries_updated == 0) {
+        return CXL_MBOX_SUCCESS;
+    }
+
+    ret = cxl_detect_malformed_extent_list(ct3d, in);
+    if (ret != CXL_MBOX_SUCCESS) {
+        return ret;
+    }
+
+    for (i = 0; i < in->num_entries_updated; i++) {
+        dpa = in->updated_entries[i].start_dpa;
+        len = in->updated_entries[i].len;
+
+        /*
+         * Check if the DPA range of the to-be-added extent overlaps with
+         * existing extent list maintained by the device.
+         */
+        QTAILQ_FOREACH(ent, extent_list, node) {
+            if (ent->start_dpa <= dpa &&
+                    dpa + len <= ent->start_dpa + ent->len) {
+                return CXL_MBOX_INVALID_PA;
+            /* Overlapping one end of the other */
+            } else if ((dpa < ent->start_dpa + ent->len &&
+                        dpa + len > ent->start_dpa + ent->len) ||
+                       (dpa < ent->start_dpa && dpa + len > ent->start_dpa)) {
+                return CXL_MBOX_INVALID_PA;
+            }
+        }
+
+        /*
+         * TODO: add a pending extent list based on event log record and
+         * verify the input response
+         */
+
+        cxl_insert_extent_to_extent_list(extent_list, dpa, len, NULL, 0);
+        ct3d->dc.total_extent_count += 1;
+    }
+
+    return CXL_MBOX_SUCCESS;
+}
+
+/*
+ * CXL r3.0 section 8.2.9.8.9.4: Release Dynamic Capacity (opcode 4803h)
+ */
+static CXLRetCode cmd_dcd_release_dyn_cap(const struct cxl_cmd *cmd,
+                                          uint8_t *payload_in,
+                                          size_t len_in,
+                                          uint8_t *payload_out,
+                                          size_t *len_out,
+                                          CXLCCI *cci)
+{
+    updated_dc_extent_list_in_pl *in = (void *)payload_in;
+    CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
+    CXLDCDExtentList *extent_list = &ct3d->dc.extents;
+    CXLDCDExtent *ent;
+    uint32_t i;
+    uint64_t dpa, len;
+    CXLRetCode ret;
+
+    if (in->num_entries_updated == 0) {
+        return CXL_MBOX_INVALID_INPUT;
+    }
+
+    ret = cxl_detect_malformed_extent_list(ct3d, in);
+    if (ret != CXL_MBOX_SUCCESS) {
+        return ret;
+    }
+
+    for (i = 0; i < in->num_entries_updated; i++) {
+        dpa = in->updated_entries[i].start_dpa;
+        len = in->updated_entries[i].len;
+
+        QTAILQ_FOREACH(ent, extent_list, node) {
+            if (ent->start_dpa <= dpa &&
+                dpa + len <= ent->start_dpa + ent->len) {
+                /* Remove any partial extents */
+                uint64_t len1 = dpa - ent->start_dpa;
+                uint64_t len2 = ent->start_dpa + ent->len - dpa - len;
+
+                if (len1) {
+                    cxl_insert_extent_to_extent_list(extent_list,
+                                                     ent->start_dpa, len1,
+                                                     NULL, 0);
+                    ct3d->dc.total_extent_count += 1;
+                }
+                if (len2) {
+                    cxl_insert_extent_to_extent_list(extent_list, dpa + len,
+                                                     len2, NULL, 0);
+                    ct3d->dc.total_extent_count += 1;
+                }
+                break;
+                /*Currently we reject the attempt to remove a superset*/
+            } else if ((dpa < ent->start_dpa + ent->len &&
+                        dpa + len > ent->start_dpa + ent->len) ||
+                       (dpa < ent->start_dpa && dpa + len > ent->start_dpa)) {
+                return CXL_MBOX_INVALID_EXTENT_LIST;
+            }
+        }
+
+        if (ent) {
+            QTAILQ_REMOVE(extent_list, ent, node);
+            g_free(ent);
+            ct3d->dc.total_extent_count -= 1;
+        } else {
+            /* Try to remove a non-existing extent */
+            return CXL_MBOX_INVALID_PA;
+        }
+    }
+
+    return CXL_MBOX_SUCCESS;
+}
+
 #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
 #define IMMEDIATE_DATA_CHANGE (1 << 2)
 #define IMMEDIATE_POLICY_CHANGE (1 << 3)
@@ -1406,6 +1671,12 @@ static const struct cxl_cmd cxl_cmd_set_dcd[256][256] = {
     [DCD_CONFIG][GET_DYN_CAP_EXT_LIST] = {
         "DCD_GET_DYNAMIC_CAPACITY_EXTENT_LIST", cmd_dcd_get_dyn_cap_ext_list,
         8, 0 },
+    [DCD_CONFIG][ADD_DYN_CAP_RSP] = {
+        "ADD_DCD_DYNAMIC_CAPACITY_RESPONSE", cmd_dcd_add_dyn_cap_rsp,
+        ~0, IMMEDIATE_DATA_CHANGE },
+    [DCD_CONFIG][RELEASE_DYN_CAP] = {
+        "RELEASE_DCD_DYNAMIC_CAPACITY", cmd_dcd_release_dyn_cap,
+        ~0, IMMEDIATE_DATA_CHANGE },
 };
 
 static const struct cxl_cmd cxl_cmd_set_sw[256][256] = {
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index c9d792a725..482329a499 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -789,7 +789,7 @@ static int cxl_create_dc_regions(CXLType3Dev *ct3d)
     int i;
     uint64_t region_base = 0;
     uint64_t region_len =  2 * GiB;
-    uint64_t decode_len = 8; /* 8*256MB */
+    uint64_t decode_len = 2 * GiB;
     uint64_t blk_size = 2 * MiB;
     CXLDCDRegion *region;
 
@@ -803,6 +803,7 @@ static int cxl_create_dc_regions(CXLType3Dev *ct3d)
     for (i = 0; i < ct3d->dc.num_regions; i++) {
         region = &ct3d->dc.regions[i];
         region->base = region_base;
+        /* NOTE: Should be divided by 256 * MiB before be returned to host */
         region->decode_len = decode_len;
         region->len = region_len;
         region->block_size = blk_size;
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index 5738c6f434..b3d35fe000 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -130,7 +130,8 @@ typedef enum {
     CXL_MBOX_INCORRECT_PASSPHRASE = 0x14,
     CXL_MBOX_UNSUPPORTED_MAILBOX = 0x15,
     CXL_MBOX_INVALID_PAYLOAD_LENGTH = 0x16,
-    CXL_MBOX_MAX = 0x17
+    CXL_MBOX_INVALID_EXTENT_LIST = 0x1E, /* cxl r3.0: Table 8-34*/
+    CXL_MBOX_MAX = 0x1F
 } CXLRetCode;
 
 typedef struct CXLCCI CXLCCI;
@@ -548,4 +549,6 @@ void cxl_event_irq_assert(CXLType3Dev *ct3d);
 
 void cxl_set_poison_list_overflowed(CXLType3Dev *ct3d);
 
+CXLDCDRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len);
+
 #endif
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 8/9] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
  2023-11-07 18:07 [PATCH v3 0/9] Enabling DCD emulation support in Qemu nifan.cxl
                   ` (6 preceding siblings ...)
  2023-11-07 18:07 ` [PATCH v3 7/9] hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release dynamic capacity response nifan.cxl
@ 2023-11-07 18:07 ` nifan.cxl
  2024-01-24 16:50   ` Jonathan Cameron
  2024-02-13 17:44   ` Jonathan Cameron
  2023-11-07 18:07 ` [PATCH v3 9/9] hw/mem/cxl_type3: Add dpa range validation for accesses to dc regions nifan.cxl
                   ` (2 subsequent siblings)
  10 siblings, 2 replies; 37+ messages in thread
From: nifan.cxl @ 2023-11-07 18:07 UTC (permalink / raw
  To: qemu-devel
  Cc: jonathan.cameron, linux-cxl, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, nifan, jim.harris, Fan Ni

From: Fan Ni <fan.ni@samsung.com>

Since fabric manager emulation is not supported yet, the change implements
the functions to add/release dynamic capacity extents as QMP interfaces.

Note: we block any FM issued extent release request if the exact extent
does not exist in the extent list of the device. We will loose the
restriction later once we have partial release support in the kernel.

1. Add dynamic capacity extents:

For example, the command to add two continuous extents (each 128MiB long)
to region 0 (starting at DPA offset 0) looks like below:

{ "execute": "qmp_capabilities" }

{ "execute": "cxl-add-dynamic-capacity",
  "arguments": {
      "path": "/machine/peripheral/cxl-dcd0",
      "region-id": 0,
      "extents": [
      {
          "dpa": 0,
          "len": 128
      },
      {
          "dpa": 128,
          "len": 128
      }
      ]
  }
}

2. Release dynamic capacity extents:

For example, the command to release an extent of size 128MiB from region 0
(DPA offset 128MiB) look like below:

{ "execute": "cxl-release-dynamic-capacity",
  "arguments": {
      "path": "/machine/peripheral/cxl-dcd0",
      "region-id": 0,
      "extents": [
      {
          "dpa": 128,
          "len": 128
      }
      ]
  }
}

Signed-off-by: Fan Ni <fan.ni@samsung.com>
---
 hw/cxl/cxl-mailbox-utils.c  |  25 +++-
 hw/mem/cxl_type3.c          | 225 +++++++++++++++++++++++++++++++++++-
 hw/mem/cxl_type3_stubs.c    |  14 +++
 include/hw/cxl/cxl_device.h |   8 +-
 include/hw/cxl/cxl_events.h |  15 +++
 qapi/cxl.json               |  60 +++++++++-
 6 files changed, 338 insertions(+), 9 deletions(-)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 9f788b03b6..8e6a98753a 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -1362,7 +1362,7 @@ static CXLRetCode cmd_dcd_get_dyn_cap_ext_list(const struct cxl_cmd *cmd,
  * Check whether any bit between addr[nr, nr+size) is set,
  * return true if any bit is set, otherwise return false
  */
-static bool test_any_bits_set(const unsigned long *addr, int nr, int size)
+bool test_any_bits_set(const unsigned long *addr, int nr, int size)
 {
     unsigned long res = find_next_bit(addr, size + nr, nr);
 
@@ -1400,7 +1400,7 @@ CXLDCDRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len)
     return NULL;
 }
 
-static void cxl_insert_extent_to_extent_list(CXLDCDExtentList *list,
+void cxl_insert_extent_to_extent_list(CXLDCDExtentList *list,
                                              uint64_t dpa,
                                              uint64_t len,
                                              uint8_t *tag,
@@ -1538,15 +1538,28 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
             }
         }
 
-        /*
-         * TODO: add a pending extent list based on event log record and
-         * verify the input response
-         */
+        QTAILQ_FOREACH(ent, &ct3d->dc.extents_pending_to_add, node) {
+            if (ent->start_dpa <= dpa &&
+                dpa + len <= ent->start_dpa + ent->len) {
+                break;
+            }
+        }
+        if (ent) {
+            QTAILQ_REMOVE(&ct3d->dc.extents_pending_to_add, ent, node);
+            g_free(ent);
+        } else {
+            return CXL_MBOX_INVALID_PA;
+        }
 
         cxl_insert_extent_to_extent_list(extent_list, dpa, len, NULL, 0);
         ct3d->dc.total_extent_count += 1;
     }
 
+    /*
+     * TODO: extents_pending_to_add needs to be cleared so the extents not
+     * accepted can be reclaimed base on spec r3.0: 8.2.9.8.9.3
+     */
+
     return CXL_MBOX_SUCCESS;
 }
 
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 482329a499..43cea3d818 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -813,6 +813,7 @@ static int cxl_create_dc_regions(CXLType3Dev *ct3d)
         region_base += region->len;
     }
     QTAILQ_INIT(&ct3d->dc.extents);
+    QTAILQ_INIT(&ct3d->dc.extents_pending_to_add);
 
     return 0;
 }
@@ -1616,7 +1617,8 @@ static int ct3d_qmp_cxl_event_log_enc(CxlEventLog log)
         return CXL_EVENT_TYPE_FAIL;
     case CXL_EVENT_LOG_FATAL:
         return CXL_EVENT_TYPE_FATAL;
-/* DCD not yet supported */
+    case CXL_EVENT_LOG_DYNCAP:
+        return CXL_EVENT_TYPE_DYNAMIC_CAP;
     default:
         return -EINVAL;
     }
@@ -1867,6 +1869,227 @@ void qmp_cxl_inject_memory_module_event(const char *path, CxlEventLog log,
     }
 }
 
+/* CXL r3.0 Table 8-47: Dynanic Capacity Event Record */
+static const QemuUUID dynamic_capacity_uuid = {
+    .data = UUID(0xca95afa7, 0xf183, 0x4018, 0x8c, 0x2f,
+                 0x95, 0x26, 0x8e, 0x10, 0x1a, 0x2a),
+};
+
+typedef enum CXLDCEventType {
+    DC_EVENT_ADD_CAPACITY = 0x0,
+    DC_EVENT_RELEASE_CAPACITY = 0x1,
+    DC_EVENT_FORCED_RELEASE_CAPACITY = 0x2,
+    DC_EVENT_REGION_CONFIG_UPDATED = 0x3,
+    DC_EVENT_ADD_CAPACITY_RSP = 0x4,
+    DC_EVENT_CAPACITY_RELEASED = 0x5,
+    DC_EVENT_NUM
+} CXLDCEventType;
+
+/*
+ * Check whether the exact extent exists in the list
+ * Return value: true if exists, otherwise false
+ */
+static bool cxl_dc_extent_exists(CXLDCDExtentList *list, CXLDCExtentRaw *ext)
+{
+    CXLDCDExtent *ent;
+
+    if (!ext || !list) {
+        return false;
+    }
+
+    QTAILQ_FOREACH(ent, list, node) {
+        if (ent->start_dpa != ext->start_dpa) {
+            continue;
+        }
+
+        /*Found exact extent*/
+        if (ent->len == ext->len) {
+            return true;
+        } else {
+            return false;
+        }
+    }
+    return false;
+}
+
+static void qmp_cxl_process_dynamic_capacity(const char *path, CxlEventLog log,
+                                             CXLDCEventType type, uint16_t hid,
+                                             uint8_t rid,
+                                             CXLDCExtentRecordList *records,
+                                             Error **errp)
+{
+    Object *obj;
+    CXLEventDynamicCapacity dCap = {};
+    CXLEventRecordHdr *hdr = &dCap.hdr;
+    CXLType3Dev *dcd;
+    uint8_t flags = 1 << CXL_EVENT_TYPE_INFO;
+    uint32_t num_extents = 0;
+    CXLDCExtentRecordList *list;
+    g_autofree CXLDCExtentRaw *extents = NULL;
+    CXLDCDExtentList *extent_list = NULL;
+    uint8_t enc_log;
+    uint64_t offset, len, block_size;
+    int i;
+    int rc;
+    g_autofree unsigned long *blk_bitmap = NULL;
+
+    obj = object_resolve_path(path, NULL);
+    if (!obj) {
+        error_setg(errp, "Unable to resolve path");
+        return;
+    }
+    if (!object_dynamic_cast(obj, TYPE_CXL_TYPE3)) {
+        error_setg(errp, "Path not point to a valid CXL type3 device");
+        return;
+    }
+
+    dcd = CXL_TYPE3(obj);
+    if (!dcd->dc.num_regions) {
+        error_setg(errp, "No dynamic capacity support from the device");
+        return;
+    }
+
+    rc = ct3d_qmp_cxl_event_log_enc(log);
+    if (rc < 0) {
+        error_setg(errp, "Unhandled error log type");
+        return;
+    }
+    enc_log = rc;
+
+    if (rid >= dcd->dc.num_regions) {
+        error_setg(errp, "region id is too large");
+        return;
+    }
+    block_size = dcd->dc.regions[rid].block_size;
+
+    /* Sanity check and count the extents */
+    list = records;
+    while (list) {
+        offset = list->value->offset * MiB;
+        len = list->value->len * MiB;
+
+        if (len == 0) {
+            error_setg(errp, "extent with 0 length is not allowed");
+            return;
+        }
+
+        if (offset % block_size || len % block_size) {
+            error_setg(errp, "dpa or len is not aligned to region block size");
+            return;
+        }
+
+        if (offset + len > dcd->dc.regions[rid].len) {
+            error_setg(errp, "extent range is beyond the region end");
+            return;
+        }
+
+        num_extents++;
+        list = list->next;
+    }
+    if (num_extents == 0) {
+        error_setg(errp, "No extents found in the command");
+        return;
+    }
+
+    blk_bitmap = bitmap_new(dcd->dc.regions[rid].len / block_size);
+
+    /* Create Extent list for event being passed to host */
+    i = 0;
+    list = records;
+    extents = g_new0(CXLDCExtentRaw, num_extents);
+    while (list) {
+        offset = list->value->offset * MiB;
+        len = list->value->len * MiB;
+
+        extents[i].start_dpa = offset + dcd->dc.regions[rid].base;
+        extents[i].len = len;
+        memset(extents[i].tag, 0, 0x10);
+        extents[i].shared_seq = 0;
+
+        /*
+         * We block the release request from FM if the exact extent has
+         * not been accepted by the host yet
+         * TODO: We can loose the restriction by skipping the check if desired
+         */
+        if (type == DC_EVENT_RELEASE_CAPACITY ||
+            type == DC_EVENT_FORCED_RELEASE_CAPACITY) {
+            if (!cxl_dc_extent_exists(&dcd->dc.extents, &extents[i])) {
+                error_setg(errp, "No exact extent found in the extent list");
+                return;
+            }
+        }
+
+        /* No duplicate or overlapped extents are allowed */
+        if (test_any_bits_set(blk_bitmap, offset / block_size,
+                              len / block_size)) {
+            error_setg(errp, "duplicate or overlapped extents are detected");
+            return;
+        }
+        bitmap_set(blk_bitmap, offset / block_size, len / block_size);
+
+        list = list->next;
+        i++;
+    }
+
+    switch (type) {
+    case DC_EVENT_ADD_CAPACITY:
+        extent_list = &dcd->dc.extents_pending_to_add;
+        break;
+    default:
+        break;
+    }
+    /*
+     * CXL r3.0 section 8.2.9.1.5: Dynamic Capacity Event Record
+     *
+     * All Dynamic Capacity event records shall set the Event Record Severity
+     * field in the Common Event Record Format to Informational Event. All
+     * Dynamic Capacity related events shall be logged in the Dynamic Capacity
+     * Event Log.
+     */
+    cxl_assign_event_header(hdr, &dynamic_capacity_uuid, flags, sizeof(dCap),
+                            cxl_device_get_timestamp(&dcd->cxl_dstate));
+
+    dCap.type = type;
+    stw_le_p(&dCap.host_id, hid);
+    /* only valid for DC_REGION_CONFIG_UPDATED event */
+    dCap.updated_region_id = 0;
+    for (i = 0; i < num_extents; i++) {
+        memcpy(&dCap.dynamic_capacity_extent, &extents[i],
+               sizeof(CXLDCExtentRaw));
+
+        if (extent_list) {
+            cxl_insert_extent_to_extent_list(extent_list,
+                                             extents[i].start_dpa,
+                                             extents[i].len,
+                                             extents[i].tag,
+                                             extents[i].shared_seq);
+        }
+
+        if (cxl_event_insert(&dcd->cxl_dstate, enc_log,
+                             (CXLEventRecordRaw *)&dCap)) {
+            cxl_event_irq_assert(dcd);
+        }
+    }
+}
+
+void qmp_cxl_add_dynamic_capacity(const char *path, uint8_t region_id,
+                                  CXLDCExtentRecordList  *records,
+                                  Error **errp)
+{
+   qmp_cxl_process_dynamic_capacity(path, CXL_EVENT_LOG_DYNCAP,
+                                    DC_EVENT_ADD_CAPACITY, 0,
+                                    region_id, records, errp);
+}
+
+void qmp_cxl_release_dynamic_capacity(const char *path, uint8_t region_id,
+                                      CXLDCExtentRecordList  *records,
+                                      Error **errp)
+{
+    qmp_cxl_process_dynamic_capacity(path, CXL_EVENT_LOG_DYNCAP,
+                                     DC_EVENT_RELEASE_CAPACITY, 0,
+                                     region_id, records, errp);
+}
+
 static void ct3_class_init(ObjectClass *oc, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(oc);
diff --git a/hw/mem/cxl_type3_stubs.c b/hw/mem/cxl_type3_stubs.c
index 3e1851e32b..d913b11b4d 100644
--- a/hw/mem/cxl_type3_stubs.c
+++ b/hw/mem/cxl_type3_stubs.c
@@ -67,3 +67,17 @@ void qmp_cxl_inject_correctable_error(const char *path, CxlCorErrorType type,
 {
     error_setg(errp, "CXL Type 3 support is not compiled in");
 }
+
+void qmp_cxl_add_dynamic_capacity(const char *path, uint8_t region_id,
+                                  CXLDCExtentRecordList  *records,
+                                  Error **errp)
+{
+    error_setg(errp, "CXL Type 3 support is not compiled in");
+}
+
+void qmp_cxl_release_dynamic_capacity(const char *path, uint8_t region_id,
+                                      CXLDCExtentRecordList  *records,
+                                      Error **errp)
+{
+    error_setg(errp, "CXL Type 3 support is not compiled in");
+}
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index b3d35fe000..ca4f824b11 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -491,6 +491,7 @@ struct CXLType3Dev {
         AddressSpace host_dc_as;
         uint64_t total_capacity; /* 256M aligned */
         CXLDCDExtentList extents;
+        CXLDCDExtentList extents_pending_to_add;
 
         uint32_t total_extent_count;
         uint32_t ext_list_gen_seq;
@@ -550,5 +551,10 @@ void cxl_event_irq_assert(CXLType3Dev *ct3d);
 void cxl_set_poison_list_overflowed(CXLType3Dev *ct3d);
 
 CXLDCDRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len);
-
+void cxl_insert_extent_to_extent_list(CXLDCDExtentList *list,
+                                             uint64_t dpa,
+                                             uint64_t len,
+                                             uint8_t *tag,
+                                             uint16_t shared_seq);
+bool test_any_bits_set(const unsigned long *addr, int nr, int size);
 #endif
diff --git a/include/hw/cxl/cxl_events.h b/include/hw/cxl/cxl_events.h
index d778487b7e..4f8cb3215d 100644
--- a/include/hw/cxl/cxl_events.h
+++ b/include/hw/cxl/cxl_events.h
@@ -166,4 +166,19 @@ typedef struct CXLEventMemoryModule {
     uint8_t reserved[0x3d];
 } QEMU_PACKED CXLEventMemoryModule;
 
+/*
+ * CXL r3.0 section Table 8-47: Dynamic Capacity Event Record
+ * All fields little endian.
+ */
+typedef struct CXLEventDynamicCapacity {
+    CXLEventRecordHdr hdr;
+    uint8_t type;
+    uint8_t reserved1;
+    uint16_t host_id;
+    uint8_t updated_region_id;
+    uint8_t reserved2[3];
+    uint8_t dynamic_capacity_extent[0x28]; /* defined in cxl_device.h */
+    uint8_t reserved[0x20];
+} QEMU_PACKED CXLEventDynamicCapacity;
+
 #endif /* CXL_EVENTS_H */
diff --git a/qapi/cxl.json b/qapi/cxl.json
index 8cc4c72fa9..6b631f64f1 100644
--- a/qapi/cxl.json
+++ b/qapi/cxl.json
@@ -25,7 +25,8 @@
   'data': ['informational',
            'warning',
            'failure',
-           'fatal']
+           'fatal',
+           'dyncap']
  }
 
 ##
@@ -361,3 +362,60 @@
 ##
 {'command': 'cxl-inject-correctable-error',
  'data': {'path': 'str', 'type': 'CxlCorErrorType'}}
+
+##
+# @CXLDCExtentRecord:
+#
+# Record of a single extent to add/release
+#
+# @offset: offset of the extent start related to current region base address
+# @len: extent size (in MiB)
+#
+# Since: 8.0
+##
+{ 'struct': 'CXLDCExtentRecord',
+  'data': {
+      'offset':'uint64',
+      'len': 'uint64'
+  }
+}
+
+##
+# @cxl-add-dynamic-capacity:
+#
+# Command to start add dynamic capacity extents flow. The host will
+# need to respond to indicate it accepts the capacity before it becomes
+# available for read and write.
+#
+# @path: CXL DCD canonical QOM path
+# @region-id: id of the region where the extent to add/release
+# @extents: Extents to add
+#
+# Since : 8.2
+##
+{ 'command': 'cxl-add-dynamic-capacity',
+  'data': { 'path': 'str',
+            'region-id': 'uint8',
+            'extents': [ 'CXLDCExtentRecord' ]
+           }
+}
+
+##
+# @cxl-release-dynamic-capacity:
+#
+# Command to start release dynamic capacity extents flow. The host will
+# need to respond to indicate that it has released the capacity before it
+# is made unavailable for read and write and can be re-added.
+#
+# @path: CXL DCD canonical QOM path
+# @region-id: id of the region where the extent to add/release
+# @extents: Extents to release
+#
+# Since : 8.2
+##
+{ 'command': 'cxl-release-dynamic-capacity',
+  'data': { 'path': 'str',
+            'region-id': 'uint8',
+            'extents': [ 'CXLDCExtentRecord' ]
+           }
+}
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 9/9] hw/mem/cxl_type3: Add dpa range validation for accesses to dc regions
  2023-11-07 18:07 [PATCH v3 0/9] Enabling DCD emulation support in Qemu nifan.cxl
                   ` (7 preceding siblings ...)
  2023-11-07 18:07 ` [PATCH v3 8/9] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents nifan.cxl
@ 2023-11-07 18:07 ` nifan.cxl
  2024-01-24 16:58   ` Jonathan Cameron
  2023-11-17  0:09 ` [PATCH v3 0/9] Enabling DCD emulation support in Qemu Ira Weiny
  2024-02-13 18:18 ` fan
  10 siblings, 1 reply; 37+ messages in thread
From: nifan.cxl @ 2023-11-07 18:07 UTC (permalink / raw
  To: qemu-devel
  Cc: jonathan.cameron, linux-cxl, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, nifan, jim.harris, Fan Ni

From: Fan Ni <fan.ni@samsung.com>

Not all dpa range in the dc regions is valid to access until an extent
covering the range has been added. Add a bitmap for each region to
record whether a dc block in the region has been backed by dc extent.
For the bitmap, a bit in the bitmap represents a dc block. When a dc
extent is added, all the bits of the blocks in the extent will be set,
which will be cleared when the extent is released.

Signed-off-by: Fan Ni <fan.ni@samsung.com>

--
JC changes:
- Rebase on what will be next gitlab.com/jic23/qemu CXL staging tree.
- Drop unnecessary handling of failed bitmap allocations. In common with
  most QEMU allocations they fail hard anyway.
- Use previously factored out cxl_find_region() helper
- Minor editorial stuff in comments such as spec version references
  according to the standard form I'm trying to push through the code.
Picked up Jørgen's fix:
https://lore.kernel.org/qemu-devel/d0d7ca1d-81bc-19b3-4904-d60046ded844@wdc.com/T/#u
---
 hw/cxl/cxl-mailbox-utils.c  | 31 +++++++++------
 hw/mem/cxl_type3.c          | 78 +++++++++++++++++++++++++++++++++++++
 include/hw/cxl/cxl_device.h | 15 +++++--
 3 files changed, 109 insertions(+), 15 deletions(-)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 8e6a98753a..6be92fb5ba 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -1401,10 +1401,9 @@ CXLDCDRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len)
 }
 
 void cxl_insert_extent_to_extent_list(CXLDCDExtentList *list,
-                                             uint64_t dpa,
-                                             uint64_t len,
-                                             uint8_t *tag,
-                                             uint16_t shared_seq)
+                                      uint64_t dpa, uint64_t len,
+                                      uint8_t *tag,
+                                      uint16_t shared_seq)
 {
     CXLDCDExtent *extent;
 
@@ -1421,6 +1420,13 @@ void cxl_insert_extent_to_extent_list(CXLDCDExtentList *list,
     QTAILQ_INSERT_TAIL(list, extent, node);
 }
 
+static void cxl_remove_extent_to_extent_list(CXLDCDExtentList *list,
+                                             CXLDCDExtent *ent)
+{
+    QTAILQ_REMOVE(list, ent, node);
+    g_free(ent);
+}
+
 /*
  * CXL r3.0 Table 8-129: Add Dynamic Capacity Response Input Payload
  * CXL r3.0 Table 8-131: Release Dynamic Capacity Input Payload
@@ -1545,14 +1551,15 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
             }
         }
         if (ent) {
-            QTAILQ_REMOVE(&ct3d->dc.extents_pending_to_add, ent, node);
-            g_free(ent);
+            cxl_remove_extent_to_extent_list(&ct3d->dc.extents_pending_to_add,
+                                             ent);
         } else {
             return CXL_MBOX_INVALID_PA;
         }
 
         cxl_insert_extent_to_extent_list(extent_list, dpa, len, NULL, 0);
         ct3d->dc.total_extent_count += 1;
+        ct3_set_region_block_backed(ct3d, dpa, len);
     }
 
     /*
@@ -1601,16 +1608,22 @@ static CXLRetCode cmd_dcd_release_dyn_cap(const struct cxl_cmd *cmd,
                 uint64_t len1 = dpa - ent->start_dpa;
                 uint64_t len2 = ent->start_dpa + ent->len - dpa - len;
 
+                cxl_remove_extent_to_extent_list(extent_list, ent);
+                ct3d->dc.total_extent_count -= 1;
+                ct3_clear_region_block_backed(ct3d, dpa, len);
+
                 if (len1) {
                     cxl_insert_extent_to_extent_list(extent_list,
                                                      ent->start_dpa, len1,
                                                      NULL, 0);
                     ct3d->dc.total_extent_count += 1;
+                    ct3_set_region_block_backed(ct3d, dpa, len);
                 }
                 if (len2) {
                     cxl_insert_extent_to_extent_list(extent_list, dpa + len,
                                                      len2, NULL, 0);
                     ct3d->dc.total_extent_count += 1;
+                    ct3_set_region_block_backed(ct3d, dpa, len);
                 }
                 break;
                 /*Currently we reject the attempt to remove a superset*/
@@ -1621,11 +1634,7 @@ static CXLRetCode cmd_dcd_release_dyn_cap(const struct cxl_cmd *cmd,
             }
         }
 
-        if (ent) {
-            QTAILQ_REMOVE(extent_list, ent, node);
-            g_free(ent);
-            ct3d->dc.total_extent_count -= 1;
-        } else {
+        if (!ent) {
             /* Try to remove a non-existing extent */
             return CXL_MBOX_INVALID_PA;
         }
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 43cea3d818..4ec65a751a 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -810,6 +810,7 @@ static int cxl_create_dc_regions(CXLType3Dev *ct3d)
         /* dsmad_handle is set when creating cdat table entries */
         region->flags = 0;
 
+        region->blk_bitmap = bitmap_new(region->len / region->block_size);
         region_base += region->len;
     }
     QTAILQ_INIT(&ct3d->dc.extents);
@@ -818,6 +819,17 @@ static int cxl_create_dc_regions(CXLType3Dev *ct3d)
     return 0;
 }
 
+static void cxl_destroy_dc_regions(CXLType3Dev *ct3d)
+{
+    int i;
+    struct CXLDCDRegion *region;
+
+    for (i = 0; i < ct3d->dc.num_regions; i++) {
+        region = &ct3d->dc.regions[i];
+        g_free(region->blk_bitmap);
+    }
+}
+
 static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
 {
     DeviceState *ds = DEVICE(ct3d);
@@ -1046,6 +1058,7 @@ err_free_special_ops:
     g_free(regs->special_ops);
 err_address_space_free:
     if (ct3d->dc.host_dc) {
+        cxl_destroy_dc_regions(ct3d);
         address_space_destroy(&ct3d->dc.host_dc_as);
     }
     if (ct3d->hostpmem) {
@@ -1068,6 +1081,7 @@ static void ct3_exit(PCIDevice *pci_dev)
     spdm_sock_fini(ct3d->doe_spdm.socket);
     g_free(regs->special_ops);
     if (ct3d->dc.host_dc) {
+        cxl_destroy_dc_regions(ct3d);
         address_space_destroy(&ct3d->dc.host_dc_as);
     }
     if (ct3d->hostpmem) {
@@ -1078,6 +1092,66 @@ static void ct3_exit(PCIDevice *pci_dev)
     }
 }
 
+/*
+ * Mark the DPA range [dpa, dap + len) to be backed and accessible. This
+ * happens when a DC extent is added and accepted by the host.
+ */
+void ct3_set_region_block_backed(CXLType3Dev *ct3d, uint64_t dpa,
+                                 uint64_t len)
+{
+    CXLDCDRegion *region;
+
+    region = cxl_find_dc_region(ct3d, dpa, len);
+    if (!region) {
+        return;
+    }
+
+    bitmap_set(region->blk_bitmap, (dpa - region->base) / region->block_size,
+               len / region->block_size);
+}
+
+/*
+ * Check whether a DPA range [dpa, dpa + len) has been backed with DC extents.
+ * Used when validating read/write to dc regions
+ */
+bool ct3_test_region_block_backed(CXLType3Dev *ct3d, uint64_t dpa,
+                                  uint64_t len)
+{
+    CXLDCDRegion *region;
+    uint64_t nbits;
+    long nr;
+
+    region = cxl_find_dc_region(ct3d, dpa, len);
+    if (!region) {
+        return false;
+    }
+
+    nr = (dpa - region->base) / region->block_size;
+    nbits = DIV_ROUND_UP(len, region->block_size);
+    return find_next_zero_bit(region->blk_bitmap, nr + nbits, nr) == nr + nbits;
+}
+
+/*
+ * Mark the DPA range [dpa, dap + len) to be unbacked and inaccessible. This
+ * happens when a dc extent is return by the host.
+ */
+void ct3_clear_region_block_backed(CXLType3Dev *ct3d, uint64_t dpa,
+                                   uint64_t len)
+{
+    CXLDCDRegion *region;
+    uint64_t nbits;
+    long nr;
+
+    region = cxl_find_dc_region(ct3d, dpa, len);
+    if (!region) {
+        return;
+    }
+
+    nr = (dpa - region->base) / region->block_size;
+    nbits = len / region->block_size;
+    bitmap_clear(region->blk_bitmap, nr, nbits);
+}
+
 static bool cxl_type3_dpa(CXLType3Dev *ct3d, hwaddr host_addr, uint64_t *dpa)
 {
     int hdm_inc = R_CXL_HDM_DECODER1_BASE_LO - R_CXL_HDM_DECODER0_BASE_LO;
@@ -1178,6 +1252,10 @@ static int cxl_type3_hpa_to_as_and_dpa(CXLType3Dev *ct3d,
         *as = &ct3d->hostpmem_as;
         *dpa_offset -= vmr_size;
     } else {
+        if (!ct3_test_region_block_backed(ct3d, *dpa_offset, size)) {
+            return -ENODEV;
+        }
+
         *as = &ct3d->dc.host_dc_as;
         *dpa_offset -= (vmr_size + pmr_size);
     }
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index ca4f824b11..b71b09700a 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -447,6 +447,7 @@ typedef struct CXLDCDRegion {
     uint64_t block_size;
     uint32_t dsmadhandle;
     uint8_t flags;
+    unsigned long *blk_bitmap;
 } CXLDCDRegion;
 
 struct CXLType3Dev {
@@ -552,9 +553,15 @@ void cxl_set_poison_list_overflowed(CXLType3Dev *ct3d);
 
 CXLDCDRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len);
 void cxl_insert_extent_to_extent_list(CXLDCDExtentList *list,
-                                             uint64_t dpa,
-                                             uint64_t len,
-                                             uint8_t *tag,
-                                             uint16_t shared_seq);
+                                      uint64_t dpa,
+                                      uint64_t len,
+                                      uint8_t *tag,
+                                      uint16_t shared_seq);
 bool test_any_bits_set(const unsigned long *addr, int nr, int size);
+void ct3_set_region_block_backed(CXLType3Dev *ct3d, uint64_t dpa,
+                                  uint64_t len);
+void ct3_clear_region_block_backed(CXLType3Dev *ct3d, uint64_t dpa,
+                                   uint64_t len);
+bool ct3_test_region_block_backed(CXLType3Dev *ct3d, uint64_t dpa,
+                                  uint64_t len);
 #endif
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/9] Enabling DCD emulation support in Qemu
  2023-11-07 18:07 [PATCH v3 0/9] Enabling DCD emulation support in Qemu nifan.cxl
                   ` (8 preceding siblings ...)
  2023-11-07 18:07 ` [PATCH v3 9/9] hw/mem/cxl_type3: Add dpa range validation for accesses to dc regions nifan.cxl
@ 2023-11-17  0:09 ` Ira Weiny
  2024-01-26 15:21   ` Jonathan Cameron
  2024-02-13 18:18 ` fan
  10 siblings, 1 reply; 37+ messages in thread
From: Ira Weiny @ 2023-11-17  0:09 UTC (permalink / raw
  To: nifan.cxl, qemu-devel
  Cc: jonathan.cameron, linux-cxl, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, nifan, jim.harris, Fan Ni

nifan.cxl@ wrote:
> From: Fan Ni <nifan.cxl@gmail.com>
> 
> 
> The patch series are based on Jonathan's branch cxl-2023-09-26.

Finally getting around to trying this new series and the patch series does not
seem to apply on top of this branch?

Just to verify is this the top commit this work was based on?

   d4edf131bbac [jonathan/cxl-2023-09-26] cxl/vendor: SK hynix Niagara Multi-Headed SLD Device

I seem to have found some issue with CDAT checksumming[1] which I'm not quite
sure about.

I went ahead and pulled your latest work from:

    https://github.com/moking/qemu-jic-clone.git dcd-dev

    abe893944bb3  hw/mem/cxl_type3: Add dpa range validation for accesses to dc regions

It still has this same problem.

Before I dig into this, is this the latest dcd branch?

Has anything changed in how you specify DCD devices on the qemu command line
with this latest work?  Here is what I have:

...
-device cxl-type3,bus=hb0rp0,memdev=cxl-mem0,num-dc-regions=2,nonvolatile-dc-memdev=cxl-dc-mem0,id=cxl-dev0,lsa=cxl-lsa0,sn=0
-device cxl-type3,bus=hb0rp1,memdev=cxl-mem1,num-dc-regions=2,nonvolatile-dc-memdev=cxl-dc-mem1,id=cxl-dev1,lsa=cxl-lsa1,sn=1
-device cxl-type3,bus=hb1rp0,memdev=cxl-mem2,num-dc-regions=2,nonvolatile-dc-memdev=cxl-dc-mem2,id=cxl-dev2,lsa=cxl-lsa2,sn=2
-device cxl-type3,bus=hb1rp1,memdev=cxl-mem3,num-dc-regions=2,nonvolatile-dc-memdev=cxl-dc-mem3,id=cxl-dev3,lsa=cxl-lsa3,sn=3
...


Ira

[1] https://lore.kernel.org/all/20231116-fix-cdat-devm-free-v1-1-b148b40707d7@intel.com/

 
> The main changes include,
> 1. Update cxl_find_dc_region to detect the case the range of the extent cross
>     multiple DC regions.
> 2. Add comments to explain the checks performed in function
>     cxl_detect_malformed_extent_list. (Jonathan)
> 3. Minimize the checks in cmd_dcd_add_dyn_cap_rsp.(Jonathan)
> 4. Update total_extent_count in add/release dynamic capacity response function.
>     (Ira and Jorgen Hansen).
> 5. Fix the logic issue in test_bits and renamed it to
>     test_any_bits_set to clear its function.
> 6. Add pending extent list for dc extent add event.
> 7. When add extent response is received, use the pending-to-add list to
>     verify the extents are valid.
> 8. Add test_any_bits_set and cxl_insert_extent_to_extent_list declaration to
>     cxl_device.h so it can be used in different files.
> 9. Updated ct3d_qmp_cxl_event_log_enc to include dynamic capacity event
>     log type.
> 10. Extract the functionality to delete extent from extent list to a helper
>     function.
> 11. Move the update of the bitmap which reflects which blocks are backed with
> dc extents from the moment when a dc extent is offered to the moment when it
> is accepted from the host.
> 12. Free dc_name after calling address_space_init to avoid memory leak when
>     returning early. (Nathan)
> 13. Add code to detect and reject QMP requests without any extents. (Jonathan)
> 14. Add code to detect and reject QMP requests where the extent len is 0.
> 15. Change the QMP interface and move the region-id out of extents and now
>     each command only takes care of extent add/release request in a single
>     region. (Jonathan)
> 16. Change the region bitmap length from decode_len to len.
> 17. Rename "dpa" to "offset" in the add/release dc extent qmp interface.
>     (Jonathan)
> 18. Block any dc extent release command if the exact extent is not already in
>     the extent list of the device.
> 
> The code is tested together with Ira's kernel DCD support:
> https://github.com/weiny2/linux-kernel/tree/dcd-v3-2023-10-30
> 
> Cover letter from v2 is here:
> https://lore.kernel.org/linux-cxl/20230724162313.34196-1-fan.ni@samsung.com/T/#m63039621087023691c9749a0af1212deb5549ddf
> 
> Last version (v2) is here:
> https://lore.kernel.org/linux-cxl/20230725183939.2741025-1-fan.ni@samsung.com/
> 
> More DCD related discussions are here:
> https://lore.kernel.org/linux-cxl/650cc29ab3f64_50d07294e7@iweiny-mobl.notmuch/
> 
> 
> 
> Fan Ni (9):
>   hw/cxl/cxl-mailbox-utils: Add dc_event_log_size field to output
>     payload of identify memory device command
>   hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative
>     and mailbox command support
>   include/hw/cxl/cxl_device: Rename mem_size as static_mem_size for
>     type3 memory devices
>   hw/mem/cxl_type3: Add support to create DC regions to type3 memory
>     devices
>   hw/mem/cxl_type3: Add host backend and address space handling for DC
>     regions
>   hw/mem/cxl_type3: Add DC extent list representative and get DC extent
>     list mailbox support
>   hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release
>     dynamic capacity response
>   hw/cxl/events: Add qmp interfaces to add/release dynamic capacity
>     extents
>   hw/mem/cxl_type3: Add dpa range validation for accesses to dc regions
> 
>  hw/cxl/cxl-mailbox-utils.c  | 469 +++++++++++++++++++++++++++++-
>  hw/mem/cxl_type3.c          | 548 +++++++++++++++++++++++++++++++++---
>  hw/mem/cxl_type3_stubs.c    |  14 +
>  include/hw/cxl/cxl_device.h |  64 ++++-
>  include/hw/cxl/cxl_events.h |  15 +
>  qapi/cxl.json               |  60 +++-
>  6 files changed, 1123 insertions(+), 47 deletions(-)
> 
> -- 
> 2.42.0
> 



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 2/9] hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative and mailbox command support
  2023-11-07 18:07 ` [PATCH v3 2/9] hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative and mailbox command support nifan.cxl
@ 2024-01-24 14:51   ` Jonathan Cameron
  2024-01-29 17:32     ` fan
  2024-02-01 19:58     ` fan
  2024-01-24 15:48   ` Jonathan Cameron
  1 sibling, 2 replies; 37+ messages in thread
From: Jonathan Cameron @ 2024-01-24 14:51 UTC (permalink / raw
  To: nifan.cxl
  Cc: qemu-devel, linux-cxl, ira.weiny, dan.j.williams, a.manzanares,
	dave, nmtadam.samsung, nifan, jim.harris, Fan Ni

On Tue,  7 Nov 2023 10:07:06 -0800
nifan.cxl@gmail.com wrote:

> From: Fan Ni <fan.ni@samsung.com>
> 
> Per cxl spec 3.0, add dynamic capacity region representative based on
> Table 8-126 and extend the cxl type3 device definition to include dc region
> information. Also, based on info in 8.2.9.8.9.1, add 'Get Dynamic Capacity
> Configuration' mailbox support.
> 
> Note: decode_len of a dc region is aligned to 256*MiB, need to be divided by
> 256 * MiB before returned to the host for "Get Dynamic Capacity Configuration"
> mailbox command.
> 
> Signed-off-by: Fan Ni <fan.ni@samsung.com>

Hi Fan,

I'm looking at how to move these much earlier in my tree on basis that
they should be our main focus for merging in this QEMU cycle.

Whilst I do that rebase, I'm taking a closer look at the code.
I'm targetting rebasing on upstream qemu + the two patch sets I just
sent out:
[PATCH 00/12 qemu] CXL emulation fixes and minor cleanup. 
[PATCH 0/5 qemu] hw/cxl: Update CXL emulation to reflect and reference r3.1

It would be good to document why these commands should be optional (which I think
comes down to the annoying fact that Get Dynamic Capacity Configuration isn't
allowed to return 0 regions, but instead should not be available as a command
if DCD isn't supported.

Note this requires us to carry Gregory's patches to make the CCI command list
constructed at runtime rather than baked in ahead of this set.

So another question is should we jump directly to the r3.1 version of DCD?
I think we probably should as it includes some additions that are necessary
for a bunch of the potential use cases.


> ---
>  hw/cxl/cxl-mailbox-utils.c  | 80 +++++++++++++++++++++++++++++++++++++
>  hw/mem/cxl_type3.c          |  6 +++
>  include/hw/cxl/cxl_device.h | 17 ++++++++
>  3 files changed, 103 insertions(+)
> 
> diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> index 8eceedfa87..f80dd6474f 100644
> --- a/hw/cxl/cxl-mailbox-utils.c
> +++ b/hw/cxl/cxl-mailbox-utils.c
> @@ -80,6 +80,8 @@ enum {
>          #define GET_POISON_LIST        0x0
>          #define INJECT_POISON          0x1
>          #define CLEAR_POISON           0x2
> +    DCD_CONFIG  = 0x48,
> +        #define GET_DC_CONFIG          0x0
>      PHYSICAL_SWITCH = 0x51,
>          #define IDENTIFY_SWITCH_DEVICE      0x0
>          #define GET_PHYSICAL_PORT_STATE     0x1
> @@ -1210,6 +1212,74 @@ static CXLRetCode cmd_media_clear_poison(const struct cxl_cmd *cmd,
>      return CXL_MBOX_SUCCESS;
>  }
>  
> +/*
> + * CXL r3.0 section 8.2.9.8.9.1: Get Dynamic Capacity Configuration

As per the patch set I just sent out, I want to standardize on references
to r3.1 because it's all that is easy to get.  However if we decide to r3.0
DCD first the upgrade it later, then clearly these need to stick to r3.0 for
now.

> + * (Opcode: 4800h)
> + */
> +static CXLRetCode cmd_dcd_get_dyn_cap_config(const struct cxl_cmd *cmd,
> +                                             uint8_t *payload_in,
> +                                             size_t len_in,
> +                                             uint8_t *payload_out,
> +                                             size_t *len_out,
> +                                             CXLCCI *cci)
> +{
> +    CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
> +    struct get_dyn_cap_config_in_pl {
> +        uint8_t region_cnt;
> +        uint8_t start_region_id;
> +    } QEMU_PACKED;
> +
> +    struct get_dyn_cap_config_out_pl {
> +        uint8_t num_regions;
> +        uint8_t rsvd1[7];

This changed in r3.1 (errata? - I haven't checked)
Should be 'regions returned' in first byte.

> +        struct {
> +            uint64_t base;
> +            uint64_t decode_len;
> +            uint64_t region_len;
> +            uint64_t block_size;
> +            uint32_t dsmadhandle;

> +            uint8_t flags;
> +            uint8_t rsvd2[3];
> +        } QEMU_PACKED records[];

There are two fields after this as well.
Total number of supported extents and number of available extents.

That annoyingly means we can't use the structure to tell us where
to find all the fields...


> +    } QEMU_PACKED;
> +
> +    struct get_dyn_cap_config_in_pl *in = (void *)payload_in;
> +    struct get_dyn_cap_config_out_pl *out = (void *)payload_out;
> +    uint16_t record_count = 0, i;

Better to split that on to 2 lines. Never hide setting a value
in the middle of a set of declarations.

> +    uint16_t out_pl_len;
> +    uint8_t start_region_id = in->start_region_id;
> +
> +    if (start_region_id >= ct3d->dc.num_regions) {
> +        return CXL_MBOX_INVALID_INPUT;
> +    }
> +
> +    record_count = MIN(ct3d->dc.num_regions - in->start_region_id,
> +            in->region_cnt);
> +
> +    out_pl_len = sizeof(*out) + record_count * sizeof(out->records[0]);

For r3.1 + 8 for the two trailing fields.

> +    assert(out_pl_len <= CXL_MAILBOX_MAX_PAYLOAD_SIZE);
> +
> +    memset(out, 0, out_pl_len);

As part of the cci rework we started zeroing the whole mailbox payload space
after copying out the input payload.
https://elixir.bootlin.com/qemu/latest/source/hw/cxl/cxl-device-utils.c#L204

So shouldn't need this (unless we have a bug)

Jonathan

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 3/9] include/hw/cxl/cxl_device: Rename mem_size as static_mem_size for type3 memory devices
  2023-11-07 18:07 ` [PATCH v3 3/9] include/hw/cxl/cxl_device: Rename mem_size as static_mem_size for type3 memory devices nifan.cxl
@ 2024-01-24 14:54   ` Jonathan Cameron
  0 siblings, 0 replies; 37+ messages in thread
From: Jonathan Cameron @ 2024-01-24 14:54 UTC (permalink / raw
  To: nifan.cxl
  Cc: qemu-devel, linux-cxl, ira.weiny, dan.j.williams, a.manzanares,
	dave, nmtadam.samsung, nifan, jim.harris, Fan Ni

On Tue,  7 Nov 2023 10:07:07 -0800
nifan.cxl@gmail.com wrote:

> From: Fan Ni <fan.ni@samsung.com>
> 
> Rename mem_size as static_mem_size for type3 memdev to cover static RAM and
> pmem capacity, preparing for the introduction of dynamic capacity to support
> dynamic capacity devices.
> 
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
Hi Fan,

One trivial comment inline.

> ---
>  hw/cxl/cxl-mailbox-utils.c  | 4 ++--
>  hw/mem/cxl_type3.c          | 8 ++++----
>  include/hw/cxl/cxl_device.h | 2 +-
>  3 files changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> index f80dd6474f..707fd9fe7f 100644
> --- a/hw/cxl/cxl-mailbox-utils.c
> +++ b/hw/cxl/cxl-mailbox-utils.c
> @@ -774,7 +774,7 @@ static CXLRetCode cmd_identify_memory_device(const struct cxl_cmd *cmd,
>      snprintf(id->fw_revision, 0x10, "BWFW VERSION %02d", 0);
>  
>      stq_le_p(&id->total_capacity,
> -             cxl_dstate->mem_size / CXL_CAPACITY_MULTIPLIER);
> +            cxl_dstate->static_mem_size / CXL_CAPACITY_MULTIPLIER);
Indent ended up one space short.

>      stq_le_p(&id->persistent_capacity,
>               cxl_dstate->pmem_size / CXL_CAPACITY_MULTIPLIER);
>      stq_le_p(&id->volatile_capacity,
> @@ -1149,7 +1149,7 @@ static CXLRetCode cmd_media_clear_poison(const struct cxl_cmd *cmd,
>      struct clear_poison_pl *in = (void *)payload_in;
>  
>      dpa = ldq_le_p(&in->dpa);
> -    if (dpa + CXL_CACHE_LINE_SIZE > cxl_dstate->mem_size) {
> +    if (dpa + CXL_CACHE_LINE_SIZE > cxl_dstate->static_mem_size) {
>          return CXL_MBOX_INVALID_PA;
>      }
>  


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 4/9] hw/mem/cxl_type3: Add support to create DC regions to type3 memory devices
  2023-11-07 18:07 ` [PATCH v3 4/9] hw/mem/cxl_type3: Add support to create DC regions to " nifan.cxl
@ 2024-01-24 15:23   ` Jonathan Cameron
  2024-01-26 13:00     ` Jonathan Cameron
  0 siblings, 1 reply; 37+ messages in thread
From: Jonathan Cameron @ 2024-01-24 15:23 UTC (permalink / raw
  To: nifan.cxl
  Cc: qemu-devel, linux-cxl, ira.weiny, dan.j.williams, a.manzanares,
	dave, nmtadam.samsung, nifan, jim.harris, Fan Ni

On Tue,  7 Nov 2023 10:07:08 -0800
nifan.cxl@gmail.com wrote:

> From: Fan Ni <fan.ni@samsung.com>
> 
> With the change, when setting up memory for type3 memory device, we can
> create DC regions.
> A property 'num-dc-regions' is added to ct3_props to allow users to pass the
> number of DC regions to create. To make it easier, other region parameters
> like region base, length, and block size are hard coded. If needed,
> these parameters can be added easily.
> 
> With the change, we can create DC regions with proper kernel side
> support as below:
> 
> region=$(cat /sys/bus/cxl/devices/decoder0.0/create_dc_region)
> echo $region> /sys/bus/cxl/devices/decoder0.0/create_dc_region
> echo 256 > /sys/bus/cxl/devices/$region/interleave_granularity
> echo 1 > /sys/bus/cxl/devices/$region/interleave_ways
> 
> echo "dc0" >/sys/bus/cxl/devices/decoder2.0/mode
> echo 0x40000000 >/sys/bus/cxl/devices/decoder2.0/dpa_size
> 
> echo 0x40000000 > /sys/bus/cxl/devices/$region/size
> echo  "decoder2.0" > /sys/bus/cxl/devices/$region/target0
> echo 1 > /sys/bus/cxl/devices/$region/commit
> echo $region > /sys/bus/cxl/drivers/cxl_region/bind
> 
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
Hi Fan, a few comments inline.

Jonathan

> ---
>  hw/mem/cxl_type3.c | 35 +++++++++++++++++++++++++++++++++++
>  1 file changed, 35 insertions(+)
> 
> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index 754c885cd1..2d67d2015c 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -721,6 +721,36 @@ static void ct3d_reg_write(void *opaque, hwaddr offset, uint64_t value,
>      }
>  }
>  
> +static int cxl_create_dc_regions(CXLType3Dev *ct3d)
> +{
> +    int i;
> +    uint64_t region_base = 0;
> +    uint64_t region_len =  2 * GiB;
> +    uint64_t decode_len = 8; /* 8*256MB */

If decode len is going to be div 256MiB then we need
a name for that field that makes it clear that it is.

decode_len_256mbytes or something like that and maybe
region_len_bytes to keep things consistent.

Why the spec didn't make our life easier and define decode length
in bytes with some bits that must be zero is beyond me... 


I think we need to make this at least optionally configurable or based
in some fashion on the provided memory backend (divide that up
by number of regions with appropriate rounding perhaps?)

> +    uint64_t blk_size = 2 * MiB;
> +    CXLDCDRegion *region;
> +
> +    if (ct3d->hostvmem) {
> +        region_base += ct3d->hostvmem->size;
> +    }
> +    if (ct3d->hostpmem) {
> +        region_base += ct3d->hostpmem->size;
> +    }
> +    for (i = 0; i < ct3d->dc.num_regions; i++) {
> +        region = &ct3d->dc.regions[i];
> +        region->base = region_base;
> +        region->decode_len = decode_len;
> +        region->len = region_len;
> +        region->block_size = blk_size;
> +        /* dsmad_handle is set when creating cdat table entries */
> +        region->flags = 0;
> +
> +        region_base += region->len;
> +    }
> +
> +    return 0;

Given it doesn't fail (even after the rest of this series is applied),
why return anything?  Make it void and we can drop the checks below..

> +}
> +
>  static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
>  {
>      DeviceState *ds = DEVICE(ct3d);
> @@ -789,6 +819,10 @@ static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
>          g_free(p_name);
>      }
>  
> +    if (cxl_create_dc_regions(ct3d)) {
> +        return false;
> +    }
> +
>      return true;
>  }
>  
> @@ -1108,6 +1142,7 @@ static Property ct3_props[] = {
>      DEFINE_PROP_UINT64("sn", CXLType3Dev, sn, UI64_NULL),
>      DEFINE_PROP_STRING("cdat", CXLType3Dev, cxl_cstate.cdat.filename),
>      DEFINE_PROP_UINT16("spdm", CXLType3Dev, spdm_port, 0),
> +    DEFINE_PROP_UINT8("num-dc-regions", CXLType3Dev, dc.num_regions, 0),
>      DEFINE_PROP_END_OF_LIST(),
>  };
>  


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 5/9] hw/mem/cxl_type3: Add host backend and address space handling for DC regions
  2023-11-07 18:07 ` [PATCH v3 5/9] hw/mem/cxl_type3: Add host backend and address space handling for DC regions nifan.cxl
@ 2024-01-24 15:47   ` Jonathan Cameron
  2024-02-06 22:24     ` fan
  0 siblings, 1 reply; 37+ messages in thread
From: Jonathan Cameron @ 2024-01-24 15:47 UTC (permalink / raw
  To: nifan.cxl
  Cc: qemu-devel, linux-cxl, ira.weiny, dan.j.williams, a.manzanares,
	dave, nmtadam.samsung, nifan, jim.harris, Fan Ni

On Tue,  7 Nov 2023 10:07:09 -0800
nifan.cxl@gmail.com wrote:

> From: Fan Ni <fan.ni@samsung.com>
> 
> Add (file/memory backed) host backend, all the dynamic capacity regions
> will share a single, large enough host backend. Set up address space for
> DC regions to support read/write operations to dynamic capacity for DCD.
> 
> With the change, following supports are added:
> 1. Add a new property to type3 device "nonvolatile-dc-memdev" to point to host
>    memory backend for dynamic capacity. Currently, all dc regions share one
>    one host backend.
> 2. Add namespace for dynamic capacity for read/write support;
> 3. Create cdat entries for each dynamic capacity region;
> 4. Fix dvsec range registers to include DC regions.
> 
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
Some minor comments inline, mostly suggesting pulling refactors out before
you do the new stuff.

Thanks,

Jonathan

> ---
>  hw/cxl/cxl-mailbox-utils.c  |  16 ++-
>  hw/mem/cxl_type3.c          | 198 +++++++++++++++++++++++++++++-------
>  include/hw/cxl/cxl_device.h |   4 +
>  3 files changed, 179 insertions(+), 39 deletions(-)
> 



>  
> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index 2d67d2015c..152a51306d 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -31,6 +31,7 @@
>  #include "hw/pci/spdm.h"
>  
>  #define DWORD_BYTE 4
> +#define CXL_CAPACITY_MULTIPLIER   (256 * MiB)
>  
>  /* Default CDAT entries for a memory region */
>  enum {
> @@ -44,8 +45,9 @@ enum {
>  };
>  
>  static int ct3_build_cdat_entries_for_mr(CDATSubHeader **cdat_table,
> -                                         int dsmad_handle, MemoryRegion *mr,
> -                                         bool is_pmem, uint64_t dpa_base)
> +                                         int dsmad_handle, uint64_t size,
> +                                         bool is_pmem, bool is_dynamic,
> +                                         uint64_t dpa_base)
>  {
>      g_autofree CDATDsmas *dsmas = NULL;
>      g_autofree CDATDslbis *dslbis0 = NULL;
> @@ -64,9 +66,10 @@ static int ct3_build_cdat_entries_for_mr(CDATSubHeader **cdat_table,
>              .length = sizeof(*dsmas),
>          },
>          .DSMADhandle = dsmad_handle,
> -        .flags = is_pmem ? CDAT_DSMAS_FLAG_NV : 0,
> +        .flags = (is_pmem ? CDAT_DSMAS_FLAG_NV : 0) |
> +            (is_dynamic ? CDAT_DSMAS_FLAG_DYNAMIC_CAP : 0),
>          .DPA_base = dpa_base,
> -        .DPA_length = memory_region_size(mr),
> +        .DPA_length = size,
>      };
>  
>      /* For now, no memory side cache, plausiblish numbers */
> @@ -150,7 +153,7 @@ static int ct3_build_cdat_entries_for_mr(CDATSubHeader **cdat_table,
>           */
>          .EFI_memory_type_attr = is_pmem ? 2 : 1,
>          .DPA_offset = 0,
> -        .DPA_length = memory_region_size(mr),
> +        .DPA_length = size,
>      };

Might be better to make the change to this function as a precursor patch before
you introduce the new users.  Will separate the DC bits out from the rest.

>  
>      /* Header always at start of structure */
> @@ -169,21 +172,28 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table, void *priv)
>      g_autofree CDATSubHeader **table = NULL;
>      CXLType3Dev *ct3d = priv;
>      MemoryRegion *volatile_mr = NULL, *nonvolatile_mr = NULL;
> +    MemoryRegion *dc_mr = NULL;
>      int dsmad_handle = 0;
>      int cur_ent = 0;
>      int len = 0;
>      int rc, i;
> +    uint64_t vmr_size = 0, pmr_size = 0;

Put these next to the memory region definitions above given they are referring to the
same regions.

>  
> -    if (!ct3d->hostpmem && !ct3d->hostvmem) {
> +    if (!ct3d->hostpmem && !ct3d->hostvmem && !ct3d->dc.num_regions) {
>          return 0;
>      }
>  
> +    if (ct3d->hostpmem && ct3d->hostvmem && ct3d->dc.host_dc) {
> +        warn_report("The device has static ram and pmem and dynamic capacity");

This is the whole how many DVSEC ranges question? 
I hope we resolved that so we don't care about this...

> +    }
> +
>      if (ct3d->hostvmem) {
>          volatile_mr = host_memory_backend_get_memory(ct3d->hostvmem);
>          if (!volatile_mr) {
>              return -EINVAL;
>          }
>          len += CT3_CDAT_NUM_ENTRIES;
> +        vmr_size = memory_region_size(volatile_mr);
>      }
>  
>      if (ct3d->hostpmem) {

....

> @@ -210,14 +233,38 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table, void *priv)
>      }
>  
>      if (nonvolatile_mr) {
> -        uint64_t base = volatile_mr ? memory_region_size(volatile_mr) : 0;
>          rc = ct3_build_cdat_entries_for_mr(&(table[cur_ent]), dsmad_handle++,
> -                                           nonvolatile_mr, true, base);
> +                                           pmr_size, true, false, vmr_size);
>          if (rc < 0) {
>              goto error_cleanup;
>          }
>          cur_ent += CT3_CDAT_NUM_ENTRIES;
>      }
> +
> +    if (dc_mr) {
> +        uint64_t region_base = vmr_size + pmr_size;
> +
> +        /*
> +         * Currently we create cdat entries for each region, should we only
> +         * create dsmas table instead??

We want the whole set.  Need multiple DSMAS for the flags.
SLBIS refer to DSMAS to identify which memory they cover + they may well be
different for different regions (could be different types of memory).
DSEMTS also by DSMAS handle so we need those as well


> +         * We assume all dc regions are non-volatile for now.

As expressed below. I'd really prefer them to start as volatile and we can
consider non volatile later.

> +         *
> +         */
> +        for (i = 0; i < ct3d->dc.num_regions; i++) {
> +            rc = ct3_build_cdat_entries_for_mr(&(table[cur_ent]),
> +                                               dsmad_handle++,
> +                                               ct3d->dc.regions[i].len,
> +                                               true, true, region_base);
> +            if (rc < 0) {
> +                goto error_cleanup;
> +            }
> +            ct3d->dc.regions[i].dsmadhandle = dsmad_handle - 1;
> +
> +            cur_ent += CT3_CDAT_NUM_ENTRIES;
> +            region_base += ct3d->dc.regions[i].len;
> +        }
> +    }
> +
>      assert(len == cur_ent);
>  
>      *cdat_table = g_steal_pointer(&table);
> @@ -445,11 +492,24 @@ static void build_dvsecs(CXLType3Dev *ct3d)
>              range2_size_hi = ct3d->hostpmem->size >> 32;
>              range2_size_lo = (2 << 5) | (2 << 2) | 0x3 |
>                               (ct3d->hostpmem->size & 0xF0000000);
> +        } else if (ct3d->dc.host_dc) {
> +            range2_size_hi = ct3d->dc.host_dc->size >> 32;
> +            range2_size_lo = (2 << 5) | (2 << 2) | 0x3 |
> +                             (ct3d->dc.host_dc->size & 0xF0000000);

I've forgotten if we came to a conclusion on whether these should include
DC or not...  My gut feeling is no because we don't know what to do
if they are both already in use.

>          }
> -    } else {
> +    } else if (ct3d->hostpmem) {
>          range1_size_hi = ct3d->hostpmem->size >> 32;
>          range1_size_lo = (2 << 5) | (2 << 2) | 0x3 |
>                           (ct3d->hostpmem->size & 0xF0000000);
> +        if (ct3d->dc.host_dc) {
> +            range2_size_hi = ct3d->dc.host_dc->size >> 32;
> +            range2_size_lo = (2 << 5) | (2 << 2) | 0x3 |
> +                             (ct3d->dc.host_dc->size & 0xF0000000);
> +        }
> +    } else {
> +        range1_size_hi = ct3d->dc.host_dc->size >> 32;
> +        range1_size_lo = (2 << 5) | (2 << 2) | 0x3 |
> +            (ct3d->dc.host_dc->size & 0xF0000000);
>      }
>  
>      dvsec = (uint8_t *)&(CXLDVSECDevice){
> @@ -721,6 +781,9 @@ static void ct3d_reg_write(void *opaque, hwaddr offset, uint64_t value,
>      }
>  }
>  
> +/*
> + * TODO: region parameters are hard coded, may need to change in the future.

Agreed :)  We should look at this fairly soon I think, though as
long as we keep option of defaults that fall back to what we have here
we can do most of it later. However I would like the defaults to be derived
from the memory backend size.

> + */
>  static int cxl_create_dc_regions(CXLType3Dev *ct3d)
>  {
>      int i;
> @@ -736,6 +799,7 @@ static int cxl_create_dc_regions(CXLType3Dev *ct3d)
>      if (ct3d->hostpmem) {
>          region_base += ct3d->hostpmem->size;
>      }
> +

Should be pushed back to the original patch.

>      for (i = 0; i < ct3d->dc.num_regions; i++) {
>          region = &ct3d->dc.regions[i];
>          region->base = region_base;

> @@ -823,6 +888,50 @@ static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
>          return false;
>      }
>  
> +    ct3d->dc.total_capacity = 0;
> +    if (ct3d->dc.host_dc) {

This confuses me a little. Can we create DC regions without a memory backend?
I don't think we should allow that - in which case the earlier
cxl_create_dc_regions() can move under this check.

> +        MemoryRegion *dc_mr;
> +        char *dc_name;
> +        uint64_t total_region_size = 0;
> +        int i;
> +
> +        dc_mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
> +        if (!dc_mr) {
> +            error_setg(errp, "dynamic capacity must have backing device");
> +            return false;
> +        }
> +        /* FIXME: set dc as nonvolatile for now */

As that's less likely to occur than volatile I'd prefer a default of volatile.

> +        memory_region_set_nonvolatile(dc_mr, true);
> +        memory_region_set_enabled(dc_mr, true);
> +        host_memory_backend_set_mapped(ct3d->dc.host_dc, true);
> +        if (ds->id) {
> +            dc_name = g_strdup_printf("cxl-dcd-dpa-dc-space:%s", ds->id);
> +        } else {
> +            dc_name = g_strdup("cxl-dcd-dpa-dc-space");
> +        }
> +        address_space_init(&ct3d->dc.host_dc_as, dc_mr, dc_name);
> +        g_free(dc_name);
> +
> +        for (i = 0; i < ct3d->dc.num_regions; i++) {
> +            total_region_size += ct3d->dc.regions[i].len;
> +        }
> +        /* Make sure the host backend is large enough to cover all dc range */

I suppose this is another reasonable way of doing region defaults. Just refuse
them if your defaults don't fit in the provided memory backend.
We can work with that as long as we cycle back around to regions we can
configure from the command line fairly soon. 

> +        if (total_region_size > memory_region_size(dc_mr)) {
> +            error_setg(errp,
> +                "too small host backend size, increase to %lu MiB or more",
> +                total_region_size / MiB);
> +            return false;
> +        }
> +
> +        if (dc_mr->size % CXL_CAPACITY_MULTIPLIER != 0) {
> +            error_setg(errp, "DC region size is unaligned to %lx",
> +                    CXL_CAPACITY_MULTIPLIER);
> +            return false;
> +        }
> +
> +        ct3d->dc.total_capacity = total_region_size;
> +    }
> +
>      return true;
>  }


> @@ -1025,16 +1140,24 @@ static int cxl_type3_hpa_to_as_and_dpa(CXLType3Dev *ct3d,
>                                         AddressSpace **as,
>                                         uint64_t *dpa_offset)
>  {

>  
> @@ -1042,19 +1165,18 @@ static int cxl_type3_hpa_to_as_and_dpa(CXLType3Dev *ct3d,
>          return -EINVAL;
>      }
>  
> -    if (*dpa_offset > ct3d->cxl_dstate.static_mem_size) {
> +    if (*dpa_offset >= vmr_size + pmr_size + dc_size) {
>          return -EINVAL;
>      }
>  
> -    if (vmr) {
> -        if (*dpa_offset < memory_region_size(vmr)) {
> -            *as = &ct3d->hostvmem_as;
> -        } else {
> -            *as = &ct3d->hostpmem_as;
> -            *dpa_offset -= memory_region_size(vmr);
> -        }
> -    } else {
> +    if (*dpa_offset < vmr_size) {
> +        *as = &ct3d->hostvmem_as;
> +    } else if (*dpa_offset < vmr_size + pmr_size) {
>          *as = &ct3d->hostpmem_as;
> +        *dpa_offset -= vmr_size;
> +    } else {
> +        *as = &ct3d->dc.host_dc_as;
> +        *dpa_offset -= (vmr_size + pmr_size);
>      }

This code is duplicated below.  As a follow up perhaps we should
add a utility function to get the as and offset within that space.
  
>      return 0;
> @@ -1143,6 +1265,8 @@ static Property ct3_props[] = {
>      DEFINE_PROP_STRING("cdat", CXLType3Dev, cxl_cstate.cdat.filename),
>      DEFINE_PROP_UINT16("spdm", CXLType3Dev, spdm_port, 0),
>      DEFINE_PROP_UINT8("num-dc-regions", CXLType3Dev, dc.num_regions, 0),
> +    DEFINE_PROP_LINK("nonvolatile-dc-memdev", CXLType3Dev, dc.host_dc,
> +                    TYPE_MEMORY_BACKEND, HostMemoryBackend *),
>      DEFINE_PROP_END_OF_LIST(),
>  };
>  
> @@ -1209,33 +1333,39 @@ static void set_lsa(CXLType3Dev *ct3d, const void *buf, uint64_t size,
>  
>  static bool set_cacheline(CXLType3Dev *ct3d, uint64_t dpa_offset, uint8_t *data)
>  {
> -    MemoryRegion *vmr = NULL, *pmr = NULL;
> +    MemoryRegion *vmr = NULL, *pmr = NULL, *dc_mr = NULL;
>      AddressSpace *as;
> +    uint64_t vmr_size = 0, pmr_size = 0, dc_size = 0;
>  
>      if (ct3d->hostvmem) {
>          vmr = host_memory_backend_get_memory(ct3d->hostvmem);
> +        vmr_size = memory_region_size(vmr);
>      }
>      if (ct3d->hostpmem) {
>          pmr = host_memory_backend_get_memory(ct3d->hostpmem);
> +        pmr_size = memory_region_size(pmr);
>      }
> +    if (ct3d->dc.host_dc) {
> +        dc_mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
> +        dc_size = ct3d->dc.total_capacity;
> +     }
>  
> -    if (!vmr && !pmr) {
> +    if (!vmr && !pmr && !dc_mr) {
>          return false;
>      }
>  
> -    if (dpa_offset + CXL_CACHE_LINE_SIZE > ct3d->cxl_dstate.static_mem_size) {
> +    if (dpa_offset + CXL_CACHE_LINE_SIZE > vmr_size + pmr_size + dc_size) {
>          return false;
>      }
>  
> -    if (vmr) {
> -        if (dpa_offset < memory_region_size(vmr)) {
> -            as = &ct3d->hostvmem_as;
> -        } else {
> -            as = &ct3d->hostpmem_as;
> -            dpa_offset -= memory_region_size(vmr);
> -        }
> -    } else {
> +    if (dpa_offset < vmr_size) {
> +        as = &ct3d->hostvmem_as;
> +    } else if (dpa_offset < vmr_size + pmr_size) {
>          as = &ct3d->hostpmem_as;
> +        dpa_offset -= vmr_size;
> +    } else {	
> +        as = &ct3d->dc.host_dc_as;
> +        dpa_offset -= (vmr_size + pmr_size);
>      }
>  
>      address_space_write(as, dpa_offset, MEMTXATTRS_UNSPECIFIED, &data,



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 2/9] hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative and mailbox command support
  2023-11-07 18:07 ` [PATCH v3 2/9] hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative and mailbox command support nifan.cxl
  2024-01-24 14:51   ` Jonathan Cameron
@ 2024-01-24 15:48   ` Jonathan Cameron
  1 sibling, 0 replies; 37+ messages in thread
From: Jonathan Cameron @ 2024-01-24 15:48 UTC (permalink / raw
  To: nifan.cxl
  Cc: qemu-devel, linux-cxl, ira.weiny, dan.j.williams, a.manzanares,
	dave, nmtadam.samsung, nifan, jim.harris, Fan Ni


> diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
> index 4f2ef0b899..334c51fddb 100644
> --- a/include/hw/cxl/cxl_device.h
> +++ b/include/hw/cxl/cxl_device.h
> @@ -235,6 +235,7 @@ typedef struct cxl_device_state {
>      uint64_t mem_size;
>      uint64_t pmem_size;
>      uint64_t vmem_size;
> +    bool is_dcd;
Written but never read, so drop this.



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 6/9] hw/mem/cxl_type3: Add DC extent list representative and get DC extent list mailbox support
  2023-11-07 18:07 ` [PATCH v3 6/9] hw/mem/cxl_type3: Add DC extent list representative and get DC extent list mailbox support nifan.cxl
@ 2024-01-24 15:56   ` Jonathan Cameron
  2024-02-23  7:10   ` Wonjae Lee
  1 sibling, 0 replies; 37+ messages in thread
From: Jonathan Cameron @ 2024-01-24 15:56 UTC (permalink / raw
  To: nifan.cxl
  Cc: qemu-devel, linux-cxl, ira.weiny, dan.j.williams, a.manzanares,
	dave, nmtadam.samsung, nifan, jim.harris, Fan Ni

On Tue,  7 Nov 2023 10:07:10 -0800
nifan.cxl@gmail.com wrote:

> From: Fan Ni <fan.ni@samsung.com>
> 
> Add dynamic capacity extent list representative to the definition of
> CXLType3Dev and add get DC extent list mailbox command per
> CXL.spec.3.0:.8.2.9.8.9.2.
> 
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
A few minor comments inline.

J
> ---
>  hw/cxl/cxl-mailbox-utils.c  | 73 +++++++++++++++++++++++++++++++++++++
>  hw/mem/cxl_type3.c          |  1 +
>  include/hw/cxl/cxl_device.h | 23 ++++++++++++
>  3 files changed, 97 insertions(+)
> 
> diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> index 1f512b3e6b..56f4aa237a 100644
> --- a/hw/cxl/cxl-mailbox-utils.c
> +++ b/hw/cxl/cxl-mailbox-utils.c
> @@ -82,6 +82,7 @@ enum {
>          #define CLEAR_POISON           0x2
>      DCD_CONFIG  = 0x48,
>          #define GET_DC_CONFIG          0x0
> +        #define GET_DYN_CAP_EXT_LIST   0x1
>      PHYSICAL_SWITCH = 0x51,
>          #define IDENTIFY_SWITCH_DEVICE      0x0
>          #define GET_PHYSICAL_PORT_STATE     0x1
> @@ -1286,6 +1287,75 @@ static CXLRetCode cmd_dcd_get_dyn_cap_config(const struct cxl_cmd *cmd,
>      return CXL_MBOX_SUCCESS;
>  }
>  
> +/*
> + * CXL r3.0 section 8.2.9.8.9.2:
> + * Get Dynamic Capacity Extent List (Opcode 4810h)

4801h

> + */
> +static CXLRetCode cmd_dcd_get_dyn_cap_ext_list(const struct cxl_cmd *cmd,
> +                                               uint8_t *payload_in,
> +                                               size_t len_in,
> +                                               uint8_t *payload_out,
> +                                               size_t *len_out,
> +                                               CXLCCI *cci)
> +{
> +    CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
> +    struct get_dyn_cap_ext_list_in_pl {
> +        uint32_t extent_cnt;
> +        uint32_t start_extent_id;
> +    } QEMU_PACKED;
> +
> +    struct get_dyn_cap_ext_list_out_pl {
> +        uint32_t count;
> +        uint32_t total_extents;
> +        uint32_t generation_num;
> +        uint8_t rsvd[4];
> +        CXLDCExtentRaw records[];
> +    } QEMU_PACKED;
> +
> +    struct get_dyn_cap_ext_list_in_pl *in = (void *)payload_in;
> +    struct get_dyn_cap_ext_list_out_pl *out = (void *)payload_out;
> +    uint16_t record_count = 0, i = 0, record_done = 0;
> +    CXLDCDExtentList *extent_list = &ct3d->dc.extents;
> +    CXLDCDExtent *ent;
> +    uint16_t out_pl_len;
> +    uint32_t start_extent_id = in->start_extent_id;
> +
> +    if (start_extent_id > ct3d->dc.total_extent_count) {
> +        return CXL_MBOX_INVALID_INPUT;
> +    }
> +
> +    record_count = MIN(in->extent_cnt,
> +                       ct3d->dc.total_extent_count - start_extent_id);
> +
> +    out_pl_len = sizeof(*out) + record_count * sizeof(out->records[0]);
> +    /* May need more processing here in the future */

Not sure what this comment is referring to... I'd be tempted to just
remove it.

> +    assert(out_pl_len <= CXL_MAILBOX_MAX_PAYLOAD_SIZE);
> +
> +    memset(out, 0, out_pl_len);

As before. It should be already zeroed.

> +    stl_le_p(&out->count, record_count);
> +    stl_le_p(&out->total_extents, ct3d->dc.total_extent_count);
> +    stl_le_p(&out->generation_num, ct3d->dc.ext_list_gen_seq);
> +
> +    if (record_count > 0) {
> +        QTAILQ_FOREACH(ent, extent_list, node) {
> +            if (i++ < start_extent_id) {
> +                continue;
> +            }
> +            stq_le_p(&out->records[record_done].start_dpa, ent->start_dpa);
> +            stq_le_p(&out->records[record_done].len, ent->len);
> +            memcpy(&out->records[record_done].tag, ent->tag, 0x10);
> +            stw_le_p(&out->records[record_done].shared_seq, ent->shared_seq);
> +            record_done++;
> +            if (record_done == record_count) {
> +                break;
> +            }
> +        }
> +    }
> +
> +    *len_out = out_pl_len;
> +    return CXL_MBOX_SUCCESS;
> +}
> +



> diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
> index 3dc6928bc5..5738c6f434 100644
> --- a/include/hw/cxl/cxl_device.h
> +++ b/include/hw/cxl/cxl_device.h
> @@ -420,6 +420,25 @@ typedef QLIST_HEAD(, CXLPoison) CXLPoisonList;
>  
>  #define DCD_MAX_REGION_NUM 8
>  
> +typedef struct CXLDCDExtentRaw {
> +    uint64_t start_dpa;
> +    uint64_t len;
> +    uint8_t tag[0x10];
> +    uint16_t shared_seq;
> +    uint8_t rsvd[0x6];
> +} QEMU_PACKED CXLDCExtentRaw;
Naming mismatch.

> +
> +typedef struct CXLDCDExtent {
> +    uint64_t start_dpa;
> +    uint64_t len;
> +    uint8_t tag[0x10];
> +    uint16_t shared_seq;
> +    uint8_t rsvd[0x6];
> +
> +    QTAILQ_ENTRY(CXLDCDExtent) node;
> +} CXLDCDExtent;

DCD or DC?  I don't really care but inconsistent currently.

> +typedef QTAILQ_HEAD(, CXLDCDExtent) CXLDCDExtentList;
> +
>  typedef struct CXLDCDRegion {
>      uint64_t base;
>      uint64_t decode_len; /* aligned to 256*MiB */
> @@ -470,6 +489,10 @@ struct CXLType3Dev {
>          HostMemoryBackend *host_dc;
>          AddressSpace host_dc_as;
>          uint64_t total_capacity; /* 256M aligned */
> +        CXLDCDExtentList extents;
> +
> +        uint32_t total_extent_count;
> +        uint32_t ext_list_gen_seq;
>  
>          uint8_t num_regions; /* 0-8 regions */
>          CXLDCDRegion regions[DCD_MAX_REGION_NUM];


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 7/9] hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release dynamic capacity response
  2023-11-07 18:07 ` [PATCH v3 7/9] hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release dynamic capacity response nifan.cxl
@ 2024-01-24 16:23   ` Jonathan Cameron
  0 siblings, 0 replies; 37+ messages in thread
From: Jonathan Cameron @ 2024-01-24 16:23 UTC (permalink / raw
  To: nifan.cxl
  Cc: qemu-devel, linux-cxl, ira.weiny, dan.j.williams, a.manzanares,
	dave, nmtadam.samsung, nifan, jim.harris, Fan Ni

On Tue,  7 Nov 2023 10:07:11 -0800
nifan.cxl@gmail.com wrote:

> From: Fan Ni <fan.ni@samsung.com>
> 
> Per CXL spec 3.0, two mailbox commands are implemented:
> Add Dynamic Capacity Response (Opcode 4802h) 8.2.9.8.9.3, and
> Release Dynamic Capacity (Opcode 4803h) 8.2.9.8.9.4.
> 
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
Some minor comments inline. Mostly agreeing we need a pending list.

Jonathan

> ---
>  hw/cxl/cxl-mailbox-utils.c  | 271 ++++++++++++++++++++++++++++++++++++
>  hw/mem/cxl_type3.c          |   3 +-
>  include/hw/cxl/cxl_device.h |   5 +-
>  3 files changed, 277 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> index 56f4aa237a..9f788b03b6 100644
> --- a/hw/cxl/cxl-mailbox-utils.c
> +++ b/hw/cxl/cxl-mailbox-utils.c
> @@ -83,6 +83,8 @@ enum {
>      DCD_CONFIG  = 0x48,
>          #define GET_DC_CONFIG          0x0
>          #define GET_DYN_CAP_EXT_LIST   0x1
> +        #define ADD_DYN_CAP_RSP        0x2
> +        #define RELEASE_DYN_CAP        0x3
>      PHYSICAL_SWITCH = 0x51,
>          #define IDENTIFY_SWITCH_DEVICE      0x0
>          #define GET_PHYSICAL_PORT_STATE     0x1
> @@ -1356,6 +1358,269 @@ static CXLRetCode cmd_dcd_get_dyn_cap_ext_list(const struct cxl_cmd *cmd,
>      return CXL_MBOX_SUCCESS;
>  }
>  
> +/*
> + * Check whether any bit between addr[nr, nr+size) is set,
> + * return true if any bit is set, otherwise return false
> + */
> +static bool test_any_bits_set(const unsigned long *addr, int nr, int size)
> +{
> +    unsigned long res = find_next_bit(addr, size + nr, nr);
> +
> +    return res < nr + size;
> +}
> +
> +CXLDCDRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len)
> +{
> +    CXLDCDRegion *region = &ct3d->dc.regions[0];
> +    int i;
> +
> +    if (dpa < region->base ||
> +        dpa >= region->base + ct3d->dc.total_capacity) {
> +        return NULL;
> +    }
> +
> +    /*
> +     * CXL r3.0 section 9.13.3: Dynamic Capacity Device (DCD)
> +     *
> +     * Regions are used in increasing-DPA order, with Region 0 being used for
> +     * the lowest DPA of Dynamic Capacity and Region 7 for the highest DPA.
> +     * So check from the last region to find where the dpa belongs. Extents that
> +     * cross multiple regions are not allowed.
> +     */
> +    for (i = ct3d->dc.num_regions - 1; i >= 0; i--) {
> +        region = &ct3d->dc.regions[i];
> +        if (dpa >= region->base) {
> +            /*Should we compare with decode_len or len of the region??*/

len of region. If it's in the hole after the len but before the decoded len
not a lot of use in finding the region as nothing we can do with any access.
If it's useful to match on decode_len here and then reject outside len
at the callers, that's fine too.

> +            if (dpa + len > region->base +
> +                    region->decode_len * CXL_CAPACITY_MULTIPLIER)
> +                return NULL;
> +            return region;
> +        }
> +    }
> +    return NULL;
> +}
> +
> +static void cxl_insert_extent_to_extent_list(CXLDCDExtentList *list,
> +                                             uint64_t dpa,
> +                                             uint64_t len,
> +                                             uint8_t *tag,
> +                                             uint16_t shared_seq)
> +{
> +    CXLDCDExtent *extent;
> +
> +    extent = g_new0(CXLDCDExtent, 1);
> +    extent->start_dpa = dpa;
> +    extent->len = len;
> +    if (tag) {
> +        memcpy(extent->tag, tag, 0x10);
> +    } else {
> +        memset(extent->tag, 0, 0x10);

You allocated zero filled above. Don't set them here.

> +    }
> +    extent->shared_seq = shared_seq;
> +
> +    QTAILQ_INSERT_TAIL(list, extent, node);
> +}
> +
> +/*
> + * CXL r3.0 Table 8-129: Add Dynamic Capacity Response Input Payload
> + * CXL r3.0 Table 8-131: Release Dynamic Capacity Input Payload
> + */
> +typedef struct updated_dc_extent_list_in_pl {

Not matching QEMU naming conventions for types which is
camelcase fun.

CXLUpdateDCExtentListInPl perhaps?


> +    uint32_t num_entries_updated;
> +    uint8_t rsvd[4];

There is a flag in there now (fairly sure this one was an errata to r3.0)
but easier to just use r3.1.  That More flag is vital for some of the flows
as it associates multiple records.  We might not implement it yet but
we should have it in the structure at least.

> +    /* CXL r3.0 Table 8-130: Updated Extent List */
> +    struct {
> +        uint64_t start_dpa;
> +        uint64_t len;
> +        uint8_t rsvd[8];
> +    } QEMU_PACKED updated_entries[];
> +} QEMU_PACKED updated_dc_extent_list_in_pl;
> +
> +/*
> + * For the extents in the extent list to operate, check whether they are valid
> + * 1. The extent should be in the range of a valid DC region;
> + * 2. The extent should not cross multiple regions;
> + * 3. The start DPA and the length of the extent should align with the block
> + * size of the region;
> + * 4. The address range of multiple extents in the list should not overlap.
> + */
> +static CXLRetCode cxl_detect_malformed_extent_list(CXLType3Dev *ct3d,
> +        const updated_dc_extent_list_in_pl *in)
> +{
> +    uint64_t min_block_size = UINT64_MAX;
> +    CXLDCDRegion *region = &ct3d->dc.regions[0];
> +    CXLDCDRegion *lastregion = &ct3d->dc.regions[ct3d->dc.num_regions - 1];
> +    g_autofree unsigned long *blk_bitmap = NULL;
> +    uint64_t dpa, len;
> +    uint32_t i;
> +
> +    for (i = 0; i < ct3d->dc.num_regions; i++) {
> +        region = &ct3d->dc.regions[i];
> +        min_block_size = MIN(min_block_size, region->block_size);
> +    }
> +
> +    blk_bitmap = bitmap_new((lastregion->len + lastregion->base -

I'd flip order of those two.  People tend to think base + length I think rather
than length + base.

> +                             ct3d->dc.regions[0].base) / min_block_size);
> +
> +    for (i = 0; i < in->num_entries_updated; i++) {
> +        dpa = in->updated_entries[i].start_dpa;
> +        len = in->updated_entries[i].len;
> +
> +        region = cxl_find_dc_region(ct3d, dpa, len);
> +        if (!region) {
> +            return CXL_MBOX_INVALID_PA;
> +        }
> +
> +        dpa -= ct3d->dc.regions[0].base;
> +        if (dpa % region->block_size || len % region->block_size) {
> +            return CXL_MBOX_INVALID_EXTENT_LIST;
> +        }
> +        /* the dpa range already covered by some other extents in the list */
> +        if (test_any_bits_set(blk_bitmap, dpa / min_block_size,
> +            len / min_block_size)) {
> +            return CXL_MBOX_INVALID_EXTENT_LIST;
> +        }
> +        bitmap_set(blk_bitmap, dpa / min_block_size, len / min_block_size);
> +   }
> +
> +    return CXL_MBOX_SUCCESS;
> +}
> +
> +/*
> + * CXL r3.0 section 8.2.9.8.9.3: Add Dynamic Capacity Response (opcode 4802h)
> + *
> + * Assume an extent is added only after the response is processed successfully
> + * TODO: for better extent list validation, a better solution would be
> + * maintaining a pending extent list and use it to verify the extent list in
> + * the response.

We really should be doing that given the
"shall report invalid physical address if: One or more extents in the updated
 extent list specify a DPA range that is outside the <of> range of the Extent
 List contained in the Add Capacity Event Record"

As you observe, a simple pending list should work for that.
I think we also have to deal with hardware trying to accept only part
of an extent - though we can just reject that if we like with
Resources Exhausted (because it might require tracking resources we can't
handle - think accepting every other 4k of a 2TiB region).

> + */
> +static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
> +                                          uint8_t *payload_in,
> +                                          size_t len_in,
> +                                          uint8_t *payload_out,
> +                                          size_t *len_out,
> +                                          CXLCCI *cci)
> +{
> +    updated_dc_extent_list_in_pl *in = (void *)payload_in;
> +    CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
> +    CXLDCDExtentList *extent_list = &ct3d->dc.extents;
> +    CXLDCDExtent *ent;
> +    uint32_t i;
> +    uint64_t dpa, len;
> +    CXLRetCode ret;
> +
> +    if (in->num_entries_updated == 0) {
> +        return CXL_MBOX_SUCCESS;
> +    }
> +
> +    ret = cxl_detect_malformed_extent_list(ct3d, in);
> +    if (ret != CXL_MBOX_SUCCESS) {
> +        return ret;
> +    }
> +
> +    for (i = 0; i < in->num_entries_updated; i++) {
> +        dpa = in->updated_entries[i].start_dpa;
> +        len = in->updated_entries[i].len;
> +
> +        /*
> +         * Check if the DPA range of the to-be-added extent overlaps with
> +         * existing extent list maintained by the device.
> +         */
> +        QTAILQ_FOREACH(ent, extent_list, node) {
> +            if (ent->start_dpa <= dpa &&
> +                    dpa + len <= ent->start_dpa + ent->len) {
> +                return CXL_MBOX_INVALID_PA;
> +            /* Overlapping one end of the other */
> +            } else if ((dpa < ent->start_dpa + ent->len &&
> +                        dpa + len > ent->start_dpa + ent->len) ||
> +                       (dpa < ent->start_dpa && dpa + len > ent->start_dpa)) {
> +                return CXL_MBOX_INVALID_PA;
> +            }
> +        }
> +
> +        /*
> +         * TODO: add a pending extent list based on event log record and
> +         * verify the input response
> +         */
> +
> +        cxl_insert_extent_to_extent_list(extent_list, dpa, len, NULL, 0);
> +        ct3d->dc.total_extent_count += 1;
> +    }
> +
> +    return CXL_MBOX_SUCCESS;
> +}
> +
> +/*
> + * CXL r3.0 section 8.2.9.8.9.4: Release Dynamic Capacity (opcode 4803h)
> + */
> +static CXLRetCode cmd_dcd_release_dyn_cap(const struct cxl_cmd *cmd,
> +                                          uint8_t *payload_in,
> +                                          size_t len_in,
> +                                          uint8_t *payload_out,
> +                                          size_t *len_out,
> +                                          CXLCCI *cci)
> +{
> +    updated_dc_extent_list_in_pl *in = (void *)payload_in;
> +    CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
> +    CXLDCDExtentList *extent_list = &ct3d->dc.extents;
> +    CXLDCDExtent *ent;
> +    uint32_t i;
> +    uint64_t dpa, len;
> +    CXLRetCode ret;
> +
> +    if (in->num_entries_updated == 0) {
> +        return CXL_MBOX_INVALID_INPUT;
> +    }
> +
> +    ret = cxl_detect_malformed_extent_list(ct3d, in);
> +    if (ret != CXL_MBOX_SUCCESS) {
> +        return ret;
> +    }
> +
> +    for (i = 0; i < in->num_entries_updated; i++) {
> +        dpa = in->updated_entries[i].start_dpa;
> +        len = in->updated_entries[i].len;
> +
> +        QTAILQ_FOREACH(ent, extent_list, node) {
> +            if (ent->start_dpa <= dpa &&
> +                dpa + len <= ent->start_dpa + ent->len) {
> +                /* Remove any partial extents */

This comment needs more detail.

> +                uint64_t len1 = dpa - ent->start_dpa;
> +                uint64_t len2 = ent->start_dpa + ent->len - dpa - len;
> +
> +                if (len1) {
> +                    cxl_insert_extent_to_extent_list(extent_list,
> +                                                     ent->start_dpa, len1,
> +                                                     NULL, 0);
> +                    ct3d->dc.total_extent_count += 1;
> +                }
> +                if (len2) {
> +                    cxl_insert_extent_to_extent_list(extent_list, dpa + len,
> +                                                     len2, NULL, 0);
> +                    ct3d->dc.total_extent_count += 1;
> +                }
> +                break;
> +                /*Currently we reject the attempt to remove a superset*/

Hmm. That's fine if we do extent fusion. I guess it's fine in general for now
if the linux support doesn't fuse extents in it's tracking,
but I'm not sure the spec allows us to be so picky.

> +            } else if ((dpa < ent->start_dpa + ent->len &&
> +                        dpa + len > ent->start_dpa + ent->len) ||
> +                       (dpa < ent->start_dpa && dpa + len > ent->start_dpa)) {
> +                return CXL_MBOX_INVALID_EXTENT_LIST;
> +            }
> +        }
> +
> +        if (ent) {
> +            QTAILQ_REMOVE(extent_list, ent, node);
> +            g_free(ent);
> +            ct3d->dc.total_extent_count -= 1;
> +        } else {
> +            /* Try to remove a non-existing extent */
> +            return CXL_MBOX_INVALID_PA;
> +        }
> +    }
> +
> +    return CXL_MBOX_SUCCESS;
> +}
> +

...

>  static const struct cxl_cmd cxl_cmd_set_sw[256][256] = {
> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index c9d792a725..482329a499 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -789,7 +789,7 @@ static int cxl_create_dc_regions(CXLType3Dev *ct3d)
>      int i;
>      uint64_t region_base = 0;
>      uint64_t region_len =  2 * GiB;
> -    uint64_t decode_len = 8; /* 8*256MB */
> +    uint64_t decode_len = 2 * GiB;

Push this down to the earlier patch that introduced this.
I think it's a good change, but given I commented on the oddity of it
there it would be better if it was never odd!

>      uint64_t blk_size = 2 * MiB;
>      CXLDCDRegion *region;
>  
> @@ -803,6 +803,7 @@ static int cxl_create_dc_regions(CXLType3Dev *ct3d)
>      for (i = 0; i < ct3d->dc.num_regions; i++) {
>          region = &ct3d->dc.regions[i];
>          region->base = region_base;
> +        /* NOTE: Should be divided by 256 * MiB before be returned to host */

Is that done?  I'd expect to see that change in this patch.

>          region->decode_len = decode_len;
>          region->len = region_len;
>          region->block_size = blk_size;
> diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
> index 5738c6f434..b3d35fe000 100644
> --- a/include/hw/cxl/cxl_device.h
> +++ b/include/hw/cxl/cxl_device.h
> @@ -130,7 +130,8 @@ typedef enum {
>      CXL_MBOX_INCORRECT_PASSPHRASE = 0x14,
>      CXL_MBOX_UNSUPPORTED_MAILBOX = 0x15,
>      CXL_MBOX_INVALID_PAYLOAD_LENGTH = 0x16,
> -    CXL_MBOX_MAX = 0x17
> +    CXL_MBOX_INVALID_EXTENT_LIST = 0x1E, /* cxl r3.0: Table 8-34*/

This is already in the posted update for the r3.1 codes that I plan
to get upstream ahead of this series. So can drop this bit.

> +    CXL_MBOX_MAX = 0x1F
>  } CXLRetCode;
>  
>  typedef struct CXLCCI CXLCCI;
> @@ -548,4 +549,6 @@ void cxl_event_irq_assert(CXLType3Dev *ct3d);
>  
>  void cxl_set_poison_list_overflowed(CXLType3Dev *ct3d);
>  
> +CXLDCDRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len);
> +
>  #endif


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 8/9] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
  2023-11-07 18:07 ` [PATCH v3 8/9] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents nifan.cxl
@ 2024-01-24 16:50   ` Jonathan Cameron
  2024-02-08 19:17     ` fan
  2024-02-13 17:44   ` Jonathan Cameron
  1 sibling, 1 reply; 37+ messages in thread
From: Jonathan Cameron @ 2024-01-24 16:50 UTC (permalink / raw
  To: nifan.cxl
  Cc: qemu-devel, linux-cxl, ira.weiny, dan.j.williams, a.manzanares,
	dave, nmtadam.samsung, nifan, jim.harris, Fan Ni

On Tue,  7 Nov 2023 10:07:12 -0800
nifan.cxl@gmail.com wrote:

> From: Fan Ni <fan.ni@samsung.com>
> 
> Since fabric manager emulation is not supported yet, the change implements
> the functions to add/release dynamic capacity extents as QMP interfaces.
> 
> Note: we block any FM issued extent release request if the exact extent
> does not exist in the extent list of the device. We will loose the
> restriction later once we have partial release support in the kernel.
> 
> 1. Add dynamic capacity extents:
> 
> For example, the command to add two continuous extents (each 128MiB long)
> to region 0 (starting at DPA offset 0) looks like below:
> 
> { "execute": "qmp_capabilities" }
> 
> { "execute": "cxl-add-dynamic-capacity",
>   "arguments": {
>       "path": "/machine/peripheral/cxl-dcd0",
>       "region-id": 0,
>       "extents": [
>       {
>           "dpa": 0,
>           "len": 128
>       },
>       {
>           "dpa": 128,
>           "len": 128
>       }
>       ]
>   }
> }
> 
> 2. Release dynamic capacity extents:
> 
> For example, the command to release an extent of size 128MiB from region 0
> (DPA offset 128MiB) look like below:
> 
> { "execute": "cxl-release-dynamic-capacity",
>   "arguments": {
>       "path": "/machine/peripheral/cxl-dcd0",
>       "region-id": 0,
>       "extents": [
>       {
>           "dpa": 128,
>           "len": 128
>       }
>       ]
>   }
> }
> 
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
A few more comment and found some answers to previous comments. I should have
read the whole thing first :(

> ---
>  hw/cxl/cxl-mailbox-utils.c  |  25 +++-
>  hw/mem/cxl_type3.c          | 225 +++++++++++++++++++++++++++++++++++-
>  hw/mem/cxl_type3_stubs.c    |  14 +++
>  include/hw/cxl/cxl_device.h |   8 +-
>  include/hw/cxl/cxl_events.h |  15 +++
>  qapi/cxl.json               |  60 +++++++++-
>  6 files changed, 338 insertions(+), 9 deletions(-)
> 
> diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> index 9f788b03b6..8e6a98753a 100644
> --- a/hw/cxl/cxl-mailbox-utils.c
> +++ b/hw/cxl/cxl-mailbox-utils.c
> @@ -1362,7 +1362,7 @@ static CXLRetCode cmd_dcd_get_dyn_cap_ext_list(const struct cxl_cmd *cmd,
>   * Check whether any bit between addr[nr, nr+size) is set,
>   * return true if any bit is set, otherwise return false
>   */
> -static bool test_any_bits_set(const unsigned long *addr, int nr, int size)
> +bool test_any_bits_set(const unsigned long *addr, int nr, int size)
>  {
>      unsigned long res = find_next_bit(addr, size + nr, nr);
>  
> @@ -1400,7 +1400,7 @@ CXLDCDRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len)
>      return NULL;
>  }
>  
> -static void cxl_insert_extent_to_extent_list(CXLDCDExtentList *list,
> +void cxl_insert_extent_to_extent_list(CXLDCDExtentList *list,
>                                               uint64_t dpa,
>                                               uint64_t len,
>                                               uint8_t *tag,
> @@ -1538,15 +1538,28 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
>              }
>          }
>  
> -        /*
> -         * TODO: add a pending extent list based on event log record and
> -         * verify the input response
> -         */

Ahah. I should have read on :)  Ignore comments on previous agreeing
that such a list was needed.

> +        QTAILQ_FOREACH(ent, &ct3d->dc.extents_pending_to_add, node) {
> +            if (ent->start_dpa <= dpa &&
> +                dpa + len <= ent->start_dpa + ent->len) {
> +                break;
> +            }
> +        }
> +        if (ent) {
> +            QTAILQ_REMOVE(&ct3d->dc.extents_pending_to_add, ent, node);
> +            g_free(ent);
> +        } else {
> +            return CXL_MBOX_INVALID_PA;
> +        }
Flip to simplify logic

           if (!end) {
                return CXL_MBOX_INVALID_PA;
           }

	   QTAILQ...

>  
>          cxl_insert_extent_to_extent_list(extent_list, dpa, len, NULL, 0);
>          ct3d->dc.total_extent_count += 1;
>      }
>  
> +    /*
> +     * TODO: extents_pending_to_add needs to be cleared so the extents not
> +     * accepted can be reclaimed base on spec r3.0: 8.2.9.8.9.3
> +     */
> +
>      return CXL_MBOX_SUCCESS;
>  }
>  
> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index 482329a499..43cea3d818 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -813,6 +813,7 @@ static int cxl_create_dc_regions(CXLType3Dev *ct3d)
>          region_base += region->len;
>      }
>      QTAILQ_INIT(&ct3d->dc.extents);
> +    QTAILQ_INIT(&ct3d->dc.extents_pending_to_add);
>  
>      return 0;
>  }
> @@ -1616,7 +1617,8 @@ static int ct3d_qmp_cxl_event_log_enc(CxlEventLog log)
>          return CXL_EVENT_TYPE_FAIL;
>      case CXL_EVENT_LOG_FATAL:
>          return CXL_EVENT_TYPE_FATAL;
> -/* DCD not yet supported */
> +    case CXL_EVENT_LOG_DYNCAP:
> +        return CXL_EVENT_TYPE_DYNAMIC_CAP;
>      default:
>          return -EINVAL;
>      }
> @@ -1867,6 +1869,227 @@ void qmp_cxl_inject_memory_module_event(const char *path, CxlEventLog log,
>      }
>  }
>  
> +/* CXL r3.0 Table 8-47: Dynanic Capacity Event Record */
> +static const QemuUUID dynamic_capacity_uuid = {
> +    .data = UUID(0xca95afa7, 0xf183, 0x4018, 0x8c, 0x2f,
> +                 0x95, 0x26, 0x8e, 0x10, 0x1a, 0x2a),
> +};
> +
> +typedef enum CXLDCEventType {
> +    DC_EVENT_ADD_CAPACITY = 0x0,
> +    DC_EVENT_RELEASE_CAPACITY = 0x1,
> +    DC_EVENT_FORCED_RELEASE_CAPACITY = 0x2,
> +    DC_EVENT_REGION_CONFIG_UPDATED = 0x3,
> +    DC_EVENT_ADD_CAPACITY_RSP = 0x4,
> +    DC_EVENT_CAPACITY_RELEASED = 0x5,
> +    DC_EVENT_NUM
Don't thing EVENT_NUM is used. Don't define it unless it's useful.

> +} CXLDCEventType;
> +
> +/*
> + * Check whether the exact extent exists in the list
> + * Return value: true if exists, otherwise false
> + */
> +static bool cxl_dc_extent_exists(CXLDCDExtentList *list, CXLDCExtentRaw *ext)
> +{
> +    CXLDCDExtent *ent;
> +
> +    if (!ext || !list) {
> +        return false;
> +    }
> +
> +    QTAILQ_FOREACH(ent, list, node) {
> +        if (ent->start_dpa != ext->start_dpa) {
> +            continue;
> +        }
> +
> +        /*Found exact extent*/

	   return ent->len == ext->len;

> +        if (ent->len == ext->len) {
> +            return true;
> +        } else {
> +            return false;
> +        }
> +    }
> +    return false;
> +}
> +
> +static void qmp_cxl_process_dynamic_capacity(const char *path, CxlEventLog log,
> +                                             CXLDCEventType type, uint16_t hid,
> +                                             uint8_t rid,
> +                                             CXLDCExtentRecordList *records,
> +                                             Error **errp)
> +{
> +    Object *obj;
> +    CXLEventDynamicCapacity dCap = {};
> +    CXLEventRecordHdr *hdr = &dCap.hdr;
> +    CXLType3Dev *dcd;
> +    uint8_t flags = 1 << CXL_EVENT_TYPE_INFO;
> +    uint32_t num_extents = 0;
> +    CXLDCExtentRecordList *list;
> +    g_autofree CXLDCExtentRaw *extents = NULL;
> +    CXLDCDExtentList *extent_list = NULL;
> +    uint8_t enc_log;
> +    uint64_t offset, len, block_size;
> +    int i;
> +    int rc;
> +    g_autofree unsigned long *blk_bitmap = NULL;
> +
> +    obj = object_resolve_path(path, NULL);
> +    if (!obj) {
> +        error_setg(errp, "Unable to resolve path");
> +        return;
> +    }
> +    if (!object_dynamic_cast(obj, TYPE_CXL_TYPE3)) {
> +        error_setg(errp, "Path not point to a valid CXL type3 device");
> +        return;
> +    }
> +
> +    dcd = CXL_TYPE3(obj);
> +    if (!dcd->dc.num_regions) {
> +        error_setg(errp, "No dynamic capacity support from the device");
> +        return;
> +    }
> +
> +    rc = ct3d_qmp_cxl_event_log_enc(log);
> +    if (rc < 0) {
> +        error_setg(errp, "Unhandled error log type");
> +        return;
> +    }
> +    enc_log = rc;
> +
> +    if (rid >= dcd->dc.num_regions) {
> +        error_setg(errp, "region id is too large");
> +        return;
> +    }
> +    block_size = dcd->dc.regions[rid].block_size;
> +
> +    /* Sanity check and count the extents */
> +    list = records;
> +    while (list) {
> +        offset = list->value->offset * MiB;
> +        len = list->value->len * MiB;
> +
> +        if (len == 0) {
> +            error_setg(errp, "extent with 0 length is not allowed");
> +            return;
> +        }
> +
> +        if (offset % block_size || len % block_size) {
> +            error_setg(errp, "dpa or len is not aligned to region block size");
> +            return;
> +        }
> +
> +        if (offset + len > dcd->dc.regions[rid].len) {
> +            error_setg(errp, "extent range is beyond the region end");
> +            return;
> +        }
> +
> +        num_extents++;
> +        list = list->next;
> +    }
> +    if (num_extents == 0) {
> +        error_setg(errp, "No extents found in the command");
> +        return;
> +    }
> +
> +    blk_bitmap = bitmap_new(dcd->dc.regions[rid].len / block_size);
> +
> +    /* Create Extent list for event being passed to host */
> +    i = 0;
> +    list = records;
> +    extents = g_new0(CXLDCExtentRaw, num_extents);
> +    while (list) {
> +        offset = list->value->offset * MiB;
> +        len = list->value->len * MiB;
That suggests it wasn't in MiB unlike the comment below.

> +
> +        extents[i].start_dpa = offset + dcd->dc.regions[rid].base;
> +        extents[i].len = len;
> +        memset(extents[i].tag, 0, 0x10);
> +        extents[i].shared_seq = 0;
> +
> +        /*
> +         * We block the release request from FM if the exact extent has
> +         * not been accepted by the host yet

If it's released before host accepts it that is fine - drop it from the pending list.
If the host then tries to accept we validate it and fail the accept.

Should really validate no overlap with existing extents in pending list or
accepted lists.


> +         * TODO: We can loose the restriction by skipping the check if desired
> +         */
> +        if (type == DC_EVENT_RELEASE_CAPACITY ||
> +            type == DC_EVENT_FORCED_RELEASE_CAPACITY) {
> +            if (!cxl_dc_extent_exists(&dcd->dc.extents, &extents[i])) {
> +                error_setg(errp, "No exact extent found in the extent list");
> +                return;
> +            }
> +        }
> +
> +        /* No duplicate or overlapped extents are allowed */
> +        if (test_any_bits_set(blk_bitmap, offset / block_size,
> +                              len / block_size)) {
> +            error_setg(errp, "duplicate or overlapped extents are detected");
> +            return;
> +        }
> +        bitmap_set(blk_bitmap, offset / block_size, len / block_size);
> +
> +        list = list->next;
> +        i++;
> +    }
> +
> +    switch (type) {
> +    case DC_EVENT_ADD_CAPACITY:
> +        extent_list = &dcd->dc.extents_pending_to_add;
> +        break;
> +    default:
> +        break;
> +    }
> +    /*
> +     * CXL r3.0 section 8.2.9.1.5: Dynamic Capacity Event Record
> +     *
> +     * All Dynamic Capacity event records shall set the Event Record Severity
> +     * field in the Common Event Record Format to Informational Event. All
> +     * Dynamic Capacity related events shall be logged in the Dynamic Capacity
> +     * Event Log.
> +     */
> +    cxl_assign_event_header(hdr, &dynamic_capacity_uuid, flags, sizeof(dCap),
> +                            cxl_device_get_timestamp(&dcd->cxl_dstate));
> +
> +    dCap.type = type;
> +    stw_le_p(&dCap.host_id, hid);
> +    /* only valid for DC_REGION_CONFIG_UPDATED event */
> +    dCap.updated_region_id = 0;
> +    for (i = 0; i < num_extents; i++) {
> +        memcpy(&dCap.dynamic_capacity_extent, &extents[i],
> +               sizeof(CXLDCExtentRaw));
> +
> +        if (extent_list) {
Given this is always the same list
	   if (type == DC_EVENT_ADD_CAPACITY) {
               cxl_insert_extent_to_extent_list(&dcd->dc.extents_pending_to_add,
//local variable here to avoid line length but the basic idea is the same.


> +            cxl_insert_extent_to_extent_list(extent_list,
> +                                             extents[i].start_dpa,
> +                                             extents[i].len,
> +                                             extents[i].tag,
> +                                             extents[i].shared_seq);
> +        }
> +
> +        if (cxl_event_insert(&dcd->cxl_dstate, enc_log,
> +                             (CXLEventRecordRaw *)&dCap)) {
> +            cxl_event_irq_assert(dcd);
> +        }
> +    }
> +}

...

> diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
> index b3d35fe000..ca4f824b11 100644
> --- a/include/hw/cxl/cxl_device.h
> +++ b/include/hw/cxl/cxl_device.h
> @@ -491,6 +491,7 @@ struct CXLType3Dev {
>          AddressSpace host_dc_as;
>          uint64_t total_capacity; /* 256M aligned */
>          CXLDCDExtentList extents;
> +        CXLDCDExtentList extents_pending_to_add;
>  
>          uint32_t total_extent_count;
>          uint32_t ext_list_gen_seq;
> @@ -550,5 +551,10 @@ void cxl_event_irq_assert(CXLType3Dev *ct3d);
>  void cxl_set_poison_list_overflowed(CXLType3Dev *ct3d);
>  
>  CXLDCDRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len);
> -
Avoid the whitespace change either by never adding that blank line or by keeping it here.

> +void cxl_insert_extent_to_extent_list(CXLDCDExtentList *list,
> +                                             uint64_t dpa,
> +                                             uint64_t len,
> +                                             uint8_t *tag,
> +                                             uint16_t shared_seq);
> +bool test_any_bits_set(const unsigned long *addr, int nr, int size);
>  #endif
> diff --git a/include/hw/cxl/cxl_events.h b/include/hw/cxl/cxl_events.h
> index d778487b7e..4f8cb3215d 100644
> --- a/include/hw/cxl/cxl_events.h
> +++ b/include/hw/cxl/cxl_events.h
> @@ -166,4 +166,19 @@ typedef struct CXLEventMemoryModule {
>      uint8_t reserved[0x3d];
>  } QEMU_PACKED CXLEventMemoryModule;
>  
> +/*
> + * CXL r3.0 section Table 8-47: Dynamic Capacity Event Record
> + * All fields little endian.
> + */
> +typedef struct CXLEventDynamicCapacity {
> +    CXLEventRecordHdr hdr;
> +    uint8_t type;
> +    uint8_t reserved1;
> +    uint16_t host_id;
> +    uint8_t updated_region_id;
> +    uint8_t reserved2[3];
> +    uint8_t dynamic_capacity_extent[0x28]; /* defined in cxl_device.h */

Can't we use that definition here?

> +    uint8_t reserved[0x20];
> +} QEMU_PACKED CXLEventDynamicCapacity;
> +
>  #endif /* CXL_EVENTS_H */
> diff --git a/qapi/cxl.json b/qapi/cxl.json
> index 8cc4c72fa9..6b631f64f1 100644
> --- a/qapi/cxl.json
> +++ b/qapi/cxl.json
...

> @@ -361,3 +362,60 @@
>  ##
>  {'command': 'cxl-inject-correctable-error',
>   'data': {'path': 'str', 'type': 'CxlCorErrorType'}}
> +
> +##
> +# @CXLDCExtentRecord:
> +#
> +# Record of a single extent to add/release
> +#
> +# @offset: offset of the extent start related to current region base address
> +# @len: extent size (in MiB)

Why?  Extents can be smaller than that (though we might not have implemented
that yet).  Bytes would be better.

> +#
> +# Since: 8.0
> +##
> +{ 'struct': 'CXLDCExtentRecord',
> +  'data': {
> +      'offset':'uint64',
> +      'len': 'uint64'
> +  }
> +}
> +
> +##
> +# @cxl-add-dynamic-capacity:
> +#
> +# Command to start add dynamic capacity extents flow. The host will
> +# need to respond to indicate it accepts the capacity before it becomes
> +# available for read and write.

The device will have to have acknowledged the accept though perhaps that is
too much detail.

> +#
> +# @path: CXL DCD canonical QOM path
> +# @region-id: id of the region where the extent to add/release
> +# @extents: Extents to add
> +#
> +# Since : 8.2

Update for next version.  9.0 is ideal target now.

> +##
> +{ 'command': 'cxl-add-dynamic-capacity',
> +  'data': { 'path': 'str',
> +            'region-id': 'uint8',
> +            'extents': [ 'CXLDCExtentRecord' ]
> +           }
> +}
> +
> +##
> +# @cxl-release-dynamic-capacity:
> +#
> +# Command to start release dynamic capacity extents flow. The host will
> +# need to respond to indicate that it has released the capacity before it
> +# is made unavailable for read and write and can be re-added.
> +#
> +# @path: CXL DCD canonical QOM path
> +# @region-id: id of the region where the extent to add/release
> +# @extents: Extents to release
> +#
> +# Since : 8.2
> +##
> +{ 'command': 'cxl-release-dynamic-capacity',
> +  'data': { 'path': 'str',
> +            'region-id': 'uint8',
> +            'extents': [ 'CXLDCExtentRecord' ]
> +           }
> +}


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 9/9] hw/mem/cxl_type3: Add dpa range validation for accesses to dc regions
  2023-11-07 18:07 ` [PATCH v3 9/9] hw/mem/cxl_type3: Add dpa range validation for accesses to dc regions nifan.cxl
@ 2024-01-24 16:58   ` Jonathan Cameron
  2024-02-09 19:04     ` fan
  0 siblings, 1 reply; 37+ messages in thread
From: Jonathan Cameron @ 2024-01-24 16:58 UTC (permalink / raw
  To: nifan.cxl
  Cc: qemu-devel, linux-cxl, ira.weiny, dan.j.williams, a.manzanares,
	dave, nmtadam.samsung, nifan, jim.harris, Fan Ni

On Tue,  7 Nov 2023 10:07:13 -0800
nifan.cxl@gmail.com wrote:

> From: Fan Ni <fan.ni@samsung.com>
> 
> Not all dpa range in the dc regions is valid to access until an extent
DPA ... DC etc

> covering the range has been added. Add a bitmap for each region to
> record whether a dc block in the region has been backed by dc extent.
> For the bitmap, a bit in the bitmap represents a dc block. When a dc
> extent is added, all the bits of the blocks in the extent will be set,
> which will be cleared when the extent is released.
> 
> Signed-off-by: Fan Ni <fan.ni@samsung.com>

Hi Fan, one query inline and a few comments.

Jonathan

> 
> --
> JC changes:
> - Rebase on what will be next gitlab.com/jic23/qemu CXL staging tree.
> - Drop unnecessary handling of failed bitmap allocations. In common with
>   most QEMU allocations they fail hard anyway.
> - Use previously factored out cxl_find_region() helper
> - Minor editorial stuff in comments such as spec version references
>   according to the standard form I'm trying to push through the code.
> Picked up Jørgen's fix:
> https://lore.kernel.org/qemu-devel/d0d7ca1d-81bc-19b3-4904-d60046ded844@wdc.com/T/#u
> ---
>  hw/cxl/cxl-mailbox-utils.c  | 31 +++++++++------
>  hw/mem/cxl_type3.c          | 78 +++++++++++++++++++++++++++++++++++++
>  include/hw/cxl/cxl_device.h | 15 +++++--
>  3 files changed, 109 insertions(+), 15 deletions(-)
> 
> diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> index 8e6a98753a..6be92fb5ba 100644
> --- a/hw/cxl/cxl-mailbox-utils.c
> +++ b/hw/cxl/cxl-mailbox-utils.c
> @@ -1401,10 +1401,9 @@ CXLDCDRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len)
>  }
>  
>  void cxl_insert_extent_to_extent_list(CXLDCDExtentList *list,
> -                                             uint64_t dpa,
> -                                             uint64_t len,
> -                                             uint8_t *tag,
> -                                             uint16_t shared_seq)
> +                                      uint64_t dpa, uint64_t len,
> +                                      uint8_t *tag,
> +                                      uint16_t shared_seq)

avoid noisy whitespace changes like this.


> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index 43cea3d818..4ec65a751a 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c

> +/*
> + * Check whether a DPA range [dpa, dpa + len) has been backed with DC extents.
> + * Used when validating read/write to dc regions
> + */
> +bool ct3_test_region_block_backed(CXLType3Dev *ct3d, uint64_t dpa,
> +                                  uint64_t len)
> +{
> +    CXLDCDRegion *region;
> +    uint64_t nbits;
> +    long nr;
> +
> +    region = cxl_find_dc_region(ct3d, dpa, len);
> +    if (!region) {
> +        return false;
> +    }
> +
> +    nr = (dpa - region->base) / region->block_size;
> +    nbits = DIV_ROUND_UP(len, region->block_size);
> +    return find_next_zero_bit(region->blk_bitmap, nr + nbits, nr) == nr + nbits;
I'm not sure how this works... Is it taking a size or an end point?

Linux equivalent takes size, so I'd expect

    return find_next_zero_bit(region->blk_bitmap, nbits, nr);
Perhaps a comment would avoid any future confusion on this.

> +}
> +
> +/*
> + * Mark the DPA range [dpa, dap + len) to be unbacked and inaccessible. This
> + * happens when a dc extent is return by the host.
> + */
> +void ct3_clear_region_block_backed(CXLType3Dev *ct3d, uint64_t dpa,
> +                                   uint64_t len)
> +{
> +    CXLDCDRegion *region;
> +    uint64_t nbits;
> +    long nr;
> +
> +    region = cxl_find_dc_region(ct3d, dpa, len);
> +    if (!region) {
> +        return;
> +    }
> +
> +    nr = (dpa - region->base) / region->block_size;
> +    nbits = len / region->block_size;
> +    bitmap_clear(region->blk_bitmap, nr, nbits);
> +}
> +


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 4/9] hw/mem/cxl_type3: Add support to create DC regions to type3 memory devices
  2024-01-24 15:23   ` Jonathan Cameron
@ 2024-01-26 13:00     ` Jonathan Cameron
  0 siblings, 0 replies; 37+ messages in thread
From: Jonathan Cameron @ 2024-01-26 13:00 UTC (permalink / raw
  To: nifan.cxl
  Cc: qemu-devel, linux-cxl, ira.weiny, dan.j.williams, a.manzanares,
	dave, nmtadam.samsung, nifan, jim.harris, Fan Ni

On Wed, 24 Jan 2024 15:23:16 +0000
Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote:

> On Tue,  7 Nov 2023 10:07:08 -0800
> nifan.cxl@gmail.com wrote:
> 
> > From: Fan Ni <fan.ni@samsung.com>
> > 
> > With the change, when setting up memory for type3 memory device, we can
> > create DC regions.
> > A property 'num-dc-regions' is added to ct3_props to allow users to pass the
> > number of DC regions to create. To make it easier, other region parameters
> > like region base, length, and block size are hard coded. If needed,
> > these parameters can be added easily.
> > 
> > With the change, we can create DC regions with proper kernel side
> > support as below:
> > 
> > region=$(cat /sys/bus/cxl/devices/decoder0.0/create_dc_region)
> > echo $region> /sys/bus/cxl/devices/decoder0.0/create_dc_region
> > echo 256 > /sys/bus/cxl/devices/$region/interleave_granularity
> > echo 1 > /sys/bus/cxl/devices/$region/interleave_ways
> > 
> > echo "dc0" >/sys/bus/cxl/devices/decoder2.0/mode
> > echo 0x40000000 >/sys/bus/cxl/devices/decoder2.0/dpa_size
> > 
> > echo 0x40000000 > /sys/bus/cxl/devices/$region/size
> > echo  "decoder2.0" > /sys/bus/cxl/devices/$region/target0
> > echo 1 > /sys/bus/cxl/devices/$region/commit
> > echo $region > /sys/bus/cxl/drivers/cxl_region/bind
> > 
> > Signed-off-by: Fan Ni <fan.ni@samsung.com>  
> Hi Fan, a few comments inline.
> 
> Jonathan
> 
> > ---
> >  hw/mem/cxl_type3.c | 35 +++++++++++++++++++++++++++++++++++
> >  1 file changed, 35 insertions(+)
> > 
> > diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> > index 754c885cd1..2d67d2015c 100644
> > --- a/hw/mem/cxl_type3.c
> > +++ b/hw/mem/cxl_type3.c
> > @@ -721,6 +721,36 @@ static void ct3d_reg_write(void *opaque, hwaddr offset, uint64_t value,
> >      }
> >  }
> >  
> > +static int cxl_create_dc_regions(CXLType3Dev *ct3d)
> > +{
> > +    int i;
> > +    uint64_t region_base = 0;
> > +    uint64_t region_len =  2 * GiB;
> > +    uint64_t decode_len = 8; /* 8*256MB */  
> 
> If decode len is going to be div 256MiB then we need
> a name for that field that makes it clear that it is.
> 
> decode_len_256mbytes or something like that and maybe
> region_len_bytes to keep things consistent.
> 
> Why the spec didn't make our life easier and define decode length
> in bytes with some bits that must be zero is beyond me... 
> 
> 
> I think we need to make this at least optionally configurable or based
> in some fashion on the provided memory backend (divide that up
> by number of regions with appropriate rounding perhaps?)

This seems to be a mid patch set confusion..  It's fixed in patch 7.
Whilst applying I've made this 2GiB here.

Jonathan


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/9] Enabling DCD emulation support in Qemu
  2023-11-17  0:09 ` [PATCH v3 0/9] Enabling DCD emulation support in Qemu Ira Weiny
@ 2024-01-26 15:21   ` Jonathan Cameron
  0 siblings, 0 replies; 37+ messages in thread
From: Jonathan Cameron @ 2024-01-26 15:21 UTC (permalink / raw
  To: Ira Weiny
  Cc: nifan.cxl, qemu-devel, linux-cxl, dan.j.williams, a.manzanares,
	dave, nmtadam.samsung, nifan, jim.harris

On Thu, 16 Nov 2023 16:09:03 -0800
Ira Weiny <ira.weiny@intel.com> wrote:

> nifan.cxl@ wrote:
> > From: Fan Ni <nifan.cxl@gmail.com>
> > 
> > 
> > The patch series are based on Jonathan's branch cxl-2023-09-26.  
> 
> Finally getting around to trying this new series and the patch series does not
> seem to apply on top of this branch?
> 
> Just to verify is this the top commit this work was based on?
> 
>    d4edf131bbac [jonathan/cxl-2023-09-26] cxl/vendor: SK hynix Niagara Multi-Headed SLD Device
> 
> I seem to have found some issue with CDAT checksumming[1] which I'm not quite
> sure about.
> 
> I went ahead and pulled your latest work from:
> 
>     https://github.com/moking/qemu-jic-clone.git dcd-dev
> 
>     abe893944bb3  hw/mem/cxl_type3: Add dpa range validation for accesses to dc regions
> 
> It still has this same problem.
> 

> Before I dig into this, is this the latest dcd branch?
I've pushed out a new tree, but it's definitely in a may eat babies form...

gitlab.com/jic23/qemu cxl-2024-26-01-draft

Only had the most basic of testing so far.  DCD rebase was messy as I've dragged
it into the 'next to send upstream' location and various fixes including
Ira's CDAT one have gone out already.

I'm keen to try and land this in QEMU 9.0 which basically means we have until
the end of Feb to shake out any problems.

Some other work is at least somewhat built on top of this (because of the
need to deal with DCD regions as well as pmem and volatile ones).

Jonathan


> 
> Has anything changed in how you specify DCD devices on the qemu command line
> with this latest work?  Here is what I have:
> 
> ...
> -device cxl-type3,bus=hb0rp0,memdev=cxl-mem0,num-dc-regions=2,nonvolatile-dc-memdev=cxl-dc-mem0,id=cxl-dev0,lsa=cxl-lsa0,sn=0
> -device cxl-type3,bus=hb0rp1,memdev=cxl-mem1,num-dc-regions=2,nonvolatile-dc-memdev=cxl-dc-mem1,id=cxl-dev1,lsa=cxl-lsa1,sn=1
> -device cxl-type3,bus=hb1rp0,memdev=cxl-mem2,num-dc-regions=2,nonvolatile-dc-memdev=cxl-dc-mem2,id=cxl-dev2,lsa=cxl-lsa2,sn=2
> -device cxl-type3,bus=hb1rp1,memdev=cxl-mem3,num-dc-regions=2,nonvolatile-dc-memdev=cxl-dc-mem3,id=cxl-dev3,lsa=cxl-lsa3,sn=3
> ...
> 
> 
> Ira
> 
> [1] https://lore.kernel.org/all/20231116-fix-cdat-devm-free-v1-1-b148b40707d7@intel.com/
> 
>  
> > The main changes include,
> > 1. Update cxl_find_dc_region to detect the case the range of the extent cross
> >     multiple DC regions.
> > 2. Add comments to explain the checks performed in function
> >     cxl_detect_malformed_extent_list. (Jonathan)
> > 3. Minimize the checks in cmd_dcd_add_dyn_cap_rsp.(Jonathan)
> > 4. Update total_extent_count in add/release dynamic capacity response function.
> >     (Ira and Jorgen Hansen).
> > 5. Fix the logic issue in test_bits and renamed it to
> >     test_any_bits_set to clear its function.
> > 6. Add pending extent list for dc extent add event.
> > 7. When add extent response is received, use the pending-to-add list to
> >     verify the extents are valid.
> > 8. Add test_any_bits_set and cxl_insert_extent_to_extent_list declaration to
> >     cxl_device.h so it can be used in different files.
> > 9. Updated ct3d_qmp_cxl_event_log_enc to include dynamic capacity event
> >     log type.
> > 10. Extract the functionality to delete extent from extent list to a helper
> >     function.
> > 11. Move the update of the bitmap which reflects which blocks are backed with
> > dc extents from the moment when a dc extent is offered to the moment when it
> > is accepted from the host.
> > 12. Free dc_name after calling address_space_init to avoid memory leak when
> >     returning early. (Nathan)
> > 13. Add code to detect and reject QMP requests without any extents. (Jonathan)
> > 14. Add code to detect and reject QMP requests where the extent len is 0.
> > 15. Change the QMP interface and move the region-id out of extents and now
> >     each command only takes care of extent add/release request in a single
> >     region. (Jonathan)
> > 16. Change the region bitmap length from decode_len to len.
> > 17. Rename "dpa" to "offset" in the add/release dc extent qmp interface.
> >     (Jonathan)
> > 18. Block any dc extent release command if the exact extent is not already in
> >     the extent list of the device.
> > 
> > The code is tested together with Ira's kernel DCD support:
> > https://github.com/weiny2/linux-kernel/tree/dcd-v3-2023-10-30
> > 
> > Cover letter from v2 is here:
> > https://lore.kernel.org/linux-cxl/20230724162313.34196-1-fan.ni@samsung.com/T/#m63039621087023691c9749a0af1212deb5549ddf
> > 
> > Last version (v2) is here:
> > https://lore.kernel.org/linux-cxl/20230725183939.2741025-1-fan.ni@samsung.com/
> > 
> > More DCD related discussions are here:
> > https://lore.kernel.org/linux-cxl/650cc29ab3f64_50d07294e7@iweiny-mobl.notmuch/
> > 
> > 
> > 
> > Fan Ni (9):
> >   hw/cxl/cxl-mailbox-utils: Add dc_event_log_size field to output
> >     payload of identify memory device command
> >   hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative
> >     and mailbox command support
> >   include/hw/cxl/cxl_device: Rename mem_size as static_mem_size for
> >     type3 memory devices
> >   hw/mem/cxl_type3: Add support to create DC regions to type3 memory
> >     devices
> >   hw/mem/cxl_type3: Add host backend and address space handling for DC
> >     regions
> >   hw/mem/cxl_type3: Add DC extent list representative and get DC extent
> >     list mailbox support
> >   hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release
> >     dynamic capacity response
> >   hw/cxl/events: Add qmp interfaces to add/release dynamic capacity
> >     extents
> >   hw/mem/cxl_type3: Add dpa range validation for accesses to dc regions
> > 
> >  hw/cxl/cxl-mailbox-utils.c  | 469 +++++++++++++++++++++++++++++-
> >  hw/mem/cxl_type3.c          | 548 +++++++++++++++++++++++++++++++++---
> >  hw/mem/cxl_type3_stubs.c    |  14 +
> >  include/hw/cxl/cxl_device.h |  64 ++++-
> >  include/hw/cxl/cxl_events.h |  15 +
> >  qapi/cxl.json               |  60 +++-
> >  6 files changed, 1123 insertions(+), 47 deletions(-)
> > 
> > -- 
> > 2.42.0
> >   
> 
> 
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 2/9] hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative and mailbox command support
  2024-01-24 14:51   ` Jonathan Cameron
@ 2024-01-29 17:32     ` fan
  2024-01-30  9:44       ` Jonathan Cameron
  2024-02-01 19:58     ` fan
  1 sibling, 1 reply; 37+ messages in thread
From: fan @ 2024-01-29 17:32 UTC (permalink / raw
  To: Jonathan Cameron
  Cc: nifan.cxl, qemu-devel, linux-cxl, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, nifan, jim.harris, Fan Ni

On Wed, Jan 24, 2024 at 02:51:18PM +0000, Jonathan Cameron wrote:
> On Tue,  7 Nov 2023 10:07:06 -0800
> nifan.cxl@gmail.com wrote:
> 
> > From: Fan Ni <fan.ni@samsung.com>
> > 
> > Per cxl spec 3.0, add dynamic capacity region representative based on
> > Table 8-126 and extend the cxl type3 device definition to include dc region
> > information. Also, based on info in 8.2.9.8.9.1, add 'Get Dynamic Capacity
> > Configuration' mailbox support.
> > 
> > Note: decode_len of a dc region is aligned to 256*MiB, need to be divided by
> > 256 * MiB before returned to the host for "Get Dynamic Capacity Configuration"
> > mailbox command.
> > 
> > Signed-off-by: Fan Ni <fan.ni@samsung.com>
> 
> Hi Fan,
> 
> I'm looking at how to move these much earlier in my tree on basis that
> they should be our main focus for merging in this QEMU cycle.
> 
> Whilst I do that rebase, I'm taking a closer look at the code.
> I'm targetting rebasing on upstream qemu + the two patch sets I just
> sent out:
> [PATCH 00/12 qemu] CXL emulation fixes and minor cleanup. 
> [PATCH 0/5 qemu] hw/cxl: Update CXL emulation to reflect and reference r3.1
> 
> It would be good to document why these commands should be optional (which I think
> comes down to the annoying fact that Get Dynamic Capacity Configuration isn't
> allowed to return 0 regions, but instead should not be available as a command
> if DCD isn't supported.
> 
> Note this requires us to carry Gregory's patches to make the CCI command list
> constructed at runtime rather than baked in ahead of this set.
> 
> So another question is should we jump directly to the r3.1 version of DCD?
> I think we probably should as it includes some additions that are necessary
> for a bunch of the potential use cases.
> 

Hi Jonathan,

Thanks for taking time to review the patches. 
I will redo the patches and make them align with cxl spec v3.1. Before
that, I need some clarifications.
As you mentioned above, for the next version, I will use upstream qemu + the
two patchsets you mentioned above as base, that is clear to me.
However, you mentioned Gregory's patches above constructing CCI command list
at runtime, I think you meant we should also include that patchset
before DCD so if DCD is not supported, the Get Dynamic capacity
configuration command will not be available at the first place, am I
right? If so, could you point me to the latest patches of the mentioned
CCI work I should use? I see the CCI rework patches, but not sure if we
should have them all or they are the latest.

Thanks,
Fan

> 
> > ---
> >  hw/cxl/cxl-mailbox-utils.c  | 80 +++++++++++++++++++++++++++++++++++++
> >  hw/mem/cxl_type3.c          |  6 +++
> >  include/hw/cxl/cxl_device.h | 17 ++++++++
> >  3 files changed, 103 insertions(+)
> > 
> > diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> > index 8eceedfa87..f80dd6474f 100644
> > --- a/hw/cxl/cxl-mailbox-utils.c
> > +++ b/hw/cxl/cxl-mailbox-utils.c
> > @@ -80,6 +80,8 @@ enum {
> >          #define GET_POISON_LIST        0x0
> >          #define INJECT_POISON          0x1
> >          #define CLEAR_POISON           0x2
> > +    DCD_CONFIG  = 0x48,
> > +        #define GET_DC_CONFIG          0x0
> >      PHYSICAL_SWITCH = 0x51,
> >          #define IDENTIFY_SWITCH_DEVICE      0x0
> >          #define GET_PHYSICAL_PORT_STATE     0x1
> > @@ -1210,6 +1212,74 @@ static CXLRetCode cmd_media_clear_poison(const struct cxl_cmd *cmd,
> >      return CXL_MBOX_SUCCESS;
> >  }
> >  
> > +/*
> > + * CXL r3.0 section 8.2.9.8.9.1: Get Dynamic Capacity Configuration
> 
> As per the patch set I just sent out, I want to standardize on references
> to r3.1 because it's all that is easy to get.  However if we decide to r3.0
> DCD first the upgrade it later, then clearly these need to stick to r3.0 for
> now.
> 
> > + * (Opcode: 4800h)
> > + */
> > +static CXLRetCode cmd_dcd_get_dyn_cap_config(const struct cxl_cmd *cmd,
> > +                                             uint8_t *payload_in,
> > +                                             size_t len_in,
> > +                                             uint8_t *payload_out,
> > +                                             size_t *len_out,
> > +                                             CXLCCI *cci)
> > +{
> > +    CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
> > +    struct get_dyn_cap_config_in_pl {
> > +        uint8_t region_cnt;
> > +        uint8_t start_region_id;
> > +    } QEMU_PACKED;
> > +
> > +    struct get_dyn_cap_config_out_pl {
> > +        uint8_t num_regions;
> > +        uint8_t rsvd1[7];
> 
> This changed in r3.1 (errata? - I haven't checked)
> Should be 'regions returned' in first byte.
> 
> > +        struct {
> > +            uint64_t base;
> > +            uint64_t decode_len;
> > +            uint64_t region_len;
> > +            uint64_t block_size;
> > +            uint32_t dsmadhandle;
> 
> > +            uint8_t flags;
> > +            uint8_t rsvd2[3];
> > +        } QEMU_PACKED records[];
> 
> There are two fields after this as well.
> Total number of supported extents and number of available extents.
> 
> That annoyingly means we can't use the structure to tell us where
> to find all the fields...
> 
> 
> > +    } QEMU_PACKED;
> > +
> > +    struct get_dyn_cap_config_in_pl *in = (void *)payload_in;
> > +    struct get_dyn_cap_config_out_pl *out = (void *)payload_out;
> > +    uint16_t record_count = 0, i;
> 
> Better to split that on to 2 lines. Never hide setting a value
> in the middle of a set of declarations.
> 
> > +    uint16_t out_pl_len;
> > +    uint8_t start_region_id = in->start_region_id;
> > +
> > +    if (start_region_id >= ct3d->dc.num_regions) {
> > +        return CXL_MBOX_INVALID_INPUT;
> > +    }
> > +
> > +    record_count = MIN(ct3d->dc.num_regions - in->start_region_id,
> > +            in->region_cnt);
> > +
> > +    out_pl_len = sizeof(*out) + record_count * sizeof(out->records[0]);
> 
> For r3.1 + 8 for the two trailing fields.
> 
> > +    assert(out_pl_len <= CXL_MAILBOX_MAX_PAYLOAD_SIZE);
> > +
> > +    memset(out, 0, out_pl_len);
> 
> As part of the cci rework we started zeroing the whole mailbox payload space
> after copying out the input payload.
> https://elixir.bootlin.com/qemu/latest/source/hw/cxl/cxl-device-utils.c#L204
> 
> So shouldn't need this (unless we have a bug)
> 
> Jonathan

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 2/9] hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative and mailbox command support
  2024-01-29 17:32     ` fan
@ 2024-01-30  9:44       ` Jonathan Cameron
  0 siblings, 0 replies; 37+ messages in thread
From: Jonathan Cameron @ 2024-01-30  9:44 UTC (permalink / raw
  To: fan
  Cc: qemu-devel, linux-cxl, ira.weiny, dan.j.williams, a.manzanares,
	dave, nmtadam.samsung, nifan, jim.harris, Fan Ni

On Mon, 29 Jan 2024 09:32:39 -0800
fan <nifan.cxl@gmail.com> wrote:

> On Wed, Jan 24, 2024 at 02:51:18PM +0000, Jonathan Cameron wrote:
> > On Tue,  7 Nov 2023 10:07:06 -0800
> > nifan.cxl@gmail.com wrote:
> >   
> > > From: Fan Ni <fan.ni@samsung.com>
> > > 
> > > Per cxl spec 3.0, add dynamic capacity region representative based on
> > > Table 8-126 and extend the cxl type3 device definition to include dc region
> > > information. Also, based on info in 8.2.9.8.9.1, add 'Get Dynamic Capacity
> > > Configuration' mailbox support.
> > > 
> > > Note: decode_len of a dc region is aligned to 256*MiB, need to be divided by
> > > 256 * MiB before returned to the host for "Get Dynamic Capacity Configuration"
> > > mailbox command.
> > > 
> > > Signed-off-by: Fan Ni <fan.ni@samsung.com>  
> > 
> > Hi Fan,
> > 
> > I'm looking at how to move these much earlier in my tree on basis that
> > they should be our main focus for merging in this QEMU cycle.
> > 
> > Whilst I do that rebase, I'm taking a closer look at the code.
> > I'm targetting rebasing on upstream qemu + the two patch sets I just
> > sent out:
> > [PATCH 00/12 qemu] CXL emulation fixes and minor cleanup. 
> > [PATCH 0/5 qemu] hw/cxl: Update CXL emulation to reflect and reference r3.1
> > 
> > It would be good to document why these commands should be optional (which I think
> > comes down to the annoying fact that Get Dynamic Capacity Configuration isn't
> > allowed to return 0 regions, but instead should not be available as a command
> > if DCD isn't supported.
> > 
> > Note this requires us to carry Gregory's patches to make the CCI command list
> > constructed at runtime rather than baked in ahead of this set.
> > 
> > So another question is should we jump directly to the r3.1 version of DCD?
> > I think we probably should as it includes some additions that are necessary
> > for a bunch of the potential use cases.
> >   
> 
> Hi Jonathan,
> 
> Thanks for taking time to review the patches. 
> I will redo the patches and make them align with cxl spec v3.1. Before
> that, I need some clarifications.
> As you mentioned above, for the next version, I will use upstream qemu + the
> two patchsets you mentioned above as base, that is clear to me.
> However, you mentioned Gregory's patches above constructing CCI command list
> at runtime, I think you meant we should also include that patchset
> before DCD so if DCD is not supported, the Get Dynamic capacity
> configuration command will not be available at the first place, am I
> right? If so, could you point me to the latest patches of the mentioned
> CCI work I should use? I see the CCI rework patches, but not sure if we
> should have them all or they are the latest.

only the two before DCD in this tree.
https://gitlab.com/jic23/qemu/-/commits/cxl-2024-26-01-draft/?ref_type=heads

hw/cxl/mailbox: change CCI cmd set structure to be a member, not a reference 
hw/cxl/mailbox: interface to add CCI commands to an existing CCI 

There is one more sneaky fix on that tree that isn't related to these that I
put behind the spec version updates because it was a pain to rebase.
So fine to ignore that one.

Everything else ahead of DCD has been sent to the list for a merge hopefully.

Jonathan

> 
> Thanks,
> Fan
> 
> >   
> > > ---
> > >  hw/cxl/cxl-mailbox-utils.c  | 80 +++++++++++++++++++++++++++++++++++++
> > >  hw/mem/cxl_type3.c          |  6 +++
> > >  include/hw/cxl/cxl_device.h | 17 ++++++++
> > >  3 files changed, 103 insertions(+)
> > > 
> > > diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> > > index 8eceedfa87..f80dd6474f 100644
> > > --- a/hw/cxl/cxl-mailbox-utils.c
> > > +++ b/hw/cxl/cxl-mailbox-utils.c
> > > @@ -80,6 +80,8 @@ enum {
> > >          #define GET_POISON_LIST        0x0
> > >          #define INJECT_POISON          0x1
> > >          #define CLEAR_POISON           0x2
> > > +    DCD_CONFIG  = 0x48,
> > > +        #define GET_DC_CONFIG          0x0
> > >      PHYSICAL_SWITCH = 0x51,
> > >          #define IDENTIFY_SWITCH_DEVICE      0x0
> > >          #define GET_PHYSICAL_PORT_STATE     0x1
> > > @@ -1210,6 +1212,74 @@ static CXLRetCode cmd_media_clear_poison(const struct cxl_cmd *cmd,
> > >      return CXL_MBOX_SUCCESS;
> > >  }
> > >  
> > > +/*
> > > + * CXL r3.0 section 8.2.9.8.9.1: Get Dynamic Capacity Configuration  
> > 
> > As per the patch set I just sent out, I want to standardize on references
> > to r3.1 because it's all that is easy to get.  However if we decide to r3.0
> > DCD first the upgrade it later, then clearly these need to stick to r3.0 for
> > now.
> >   
> > > + * (Opcode: 4800h)
> > > + */
> > > +static CXLRetCode cmd_dcd_get_dyn_cap_config(const struct cxl_cmd *cmd,
> > > +                                             uint8_t *payload_in,
> > > +                                             size_t len_in,
> > > +                                             uint8_t *payload_out,
> > > +                                             size_t *len_out,
> > > +                                             CXLCCI *cci)
> > > +{
> > > +    CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
> > > +    struct get_dyn_cap_config_in_pl {
> > > +        uint8_t region_cnt;
> > > +        uint8_t start_region_id;
> > > +    } QEMU_PACKED;
> > > +
> > > +    struct get_dyn_cap_config_out_pl {
> > > +        uint8_t num_regions;
> > > +        uint8_t rsvd1[7];  
> > 
> > This changed in r3.1 (errata? - I haven't checked)
> > Should be 'regions returned' in first byte.
> >   
> > > +        struct {
> > > +            uint64_t base;
> > > +            uint64_t decode_len;
> > > +            uint64_t region_len;
> > > +            uint64_t block_size;
> > > +            uint32_t dsmadhandle;  
> >   
> > > +            uint8_t flags;
> > > +            uint8_t rsvd2[3];
> > > +        } QEMU_PACKED records[];  
> > 
> > There are two fields after this as well.
> > Total number of supported extents and number of available extents.
> > 
> > That annoyingly means we can't use the structure to tell us where
> > to find all the fields...
> > 
> >   
> > > +    } QEMU_PACKED;
> > > +
> > > +    struct get_dyn_cap_config_in_pl *in = (void *)payload_in;
> > > +    struct get_dyn_cap_config_out_pl *out = (void *)payload_out;
> > > +    uint16_t record_count = 0, i;  
> > 
> > Better to split that on to 2 lines. Never hide setting a value
> > in the middle of a set of declarations.
> >   
> > > +    uint16_t out_pl_len;
> > > +    uint8_t start_region_id = in->start_region_id;
> > > +
> > > +    if (start_region_id >= ct3d->dc.num_regions) {
> > > +        return CXL_MBOX_INVALID_INPUT;
> > > +    }
> > > +
> > > +    record_count = MIN(ct3d->dc.num_regions - in->start_region_id,
> > > +            in->region_cnt);
> > > +
> > > +    out_pl_len = sizeof(*out) + record_count * sizeof(out->records[0]);  
> > 
> > For r3.1 + 8 for the two trailing fields.
> >   
> > > +    assert(out_pl_len <= CXL_MAILBOX_MAX_PAYLOAD_SIZE);
> > > +
> > > +    memset(out, 0, out_pl_len);  
> > 
> > As part of the cci rework we started zeroing the whole mailbox payload space
> > after copying out the input payload.
> > https://elixir.bootlin.com/qemu/latest/source/hw/cxl/cxl-device-utils.c#L204
> > 
> > So shouldn't need this (unless we have a bug)
> > 
> > Jonathan  


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 2/9] hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative and mailbox command support
  2024-01-24 14:51   ` Jonathan Cameron
  2024-01-29 17:32     ` fan
@ 2024-02-01 19:58     ` fan
  2024-02-02 11:52       ` Jonathan Cameron
  1 sibling, 1 reply; 37+ messages in thread
From: fan @ 2024-02-01 19:58 UTC (permalink / raw
  To: Jonathan Cameron
  Cc: nifan.cxl, qemu-devel, linux-cxl, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, nifan, jim.harris, Fan Ni

On Wed, Jan 24, 2024 at 02:51:18PM +0000, Jonathan Cameron wrote:
> On Tue,  7 Nov 2023 10:07:06 -0800
> nifan.cxl@gmail.com wrote:
> 
> > From: Fan Ni <fan.ni@samsung.com>
> > 
> > Per cxl spec 3.0, add dynamic capacity region representative based on
> > Table 8-126 and extend the cxl type3 device definition to include dc region
> > information. Also, based on info in 8.2.9.8.9.1, add 'Get Dynamic Capacity
> > Configuration' mailbox support.
> > 
> > Note: decode_len of a dc region is aligned to 256*MiB, need to be divided by
> > 256 * MiB before returned to the host for "Get Dynamic Capacity Configuration"
> > mailbox command.
> > 
> > Signed-off-by: Fan Ni <fan.ni@samsung.com>
> 
> Hi Fan,
> 
> I'm looking at how to move these much earlier in my tree on basis that
> they should be our main focus for merging in this QEMU cycle.
> 
> Whilst I do that rebase, I'm taking a closer look at the code.
> I'm targetting rebasing on upstream qemu + the two patch sets I just
> sent out:
> [PATCH 00/12 qemu] CXL emulation fixes and minor cleanup. 
> [PATCH 0/5 qemu] hw/cxl: Update CXL emulation to reflect and reference r3.1
> 
> It would be good to document why these commands should be optional (which I think
> comes down to the annoying fact that Get Dynamic Capacity Configuration isn't
> allowed to return 0 regions, but instead should not be available as a command
> if DCD isn't supported.
> 
> Note this requires us to carry Gregory's patches to make the CCI command list
> constructed at runtime rather than baked in ahead of this set.
> 
> So another question is should we jump directly to the r3.1 version of DCD?
> I think we probably should as it includes some additions that are necessary
> for a bunch of the potential use cases.
> 

Based on cxl spec r3.1, the get dynamic capacity configuration output
payload (Table 8-164) have 4 extra items after the variable region configuration
structure. That is not allowed by the compiler, should we move the
new-added 4 items before the variable region configuration structures?

Fan

> 
> > ---
> >  hw/cxl/cxl-mailbox-utils.c  | 80 +++++++++++++++++++++++++++++++++++++
> >  hw/mem/cxl_type3.c          |  6 +++
> >  include/hw/cxl/cxl_device.h | 17 ++++++++
> >  3 files changed, 103 insertions(+)
> > 
> > diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> > index 8eceedfa87..f80dd6474f 100644
> > --- a/hw/cxl/cxl-mailbox-utils.c
> > +++ b/hw/cxl/cxl-mailbox-utils.c
> > @@ -80,6 +80,8 @@ enum {
> >          #define GET_POISON_LIST        0x0
> >          #define INJECT_POISON          0x1
> >          #define CLEAR_POISON           0x2
> > +    DCD_CONFIG  = 0x48,
> > +        #define GET_DC_CONFIG          0x0
> >      PHYSICAL_SWITCH = 0x51,
> >          #define IDENTIFY_SWITCH_DEVICE      0x0
> >          #define GET_PHYSICAL_PORT_STATE     0x1
> > @@ -1210,6 +1212,74 @@ static CXLRetCode cmd_media_clear_poison(const struct cxl_cmd *cmd,
> >      return CXL_MBOX_SUCCESS;
> >  }
> >  
> > +/*
> > + * CXL r3.0 section 8.2.9.8.9.1: Get Dynamic Capacity Configuration
> 
> As per the patch set I just sent out, I want to standardize on references
> to r3.1 because it's all that is easy to get.  However if we decide to r3.0
> DCD first the upgrade it later, then clearly these need to stick to r3.0 for
> now.
> 
> > + * (Opcode: 4800h)
> > + */
> > +static CXLRetCode cmd_dcd_get_dyn_cap_config(const struct cxl_cmd *cmd,
> > +                                             uint8_t *payload_in,
> > +                                             size_t len_in,
> > +                                             uint8_t *payload_out,
> > +                                             size_t *len_out,
> > +                                             CXLCCI *cci)
> > +{
> > +    CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
> > +    struct get_dyn_cap_config_in_pl {
> > +        uint8_t region_cnt;
> > +        uint8_t start_region_id;
> > +    } QEMU_PACKED;
> > +
> > +    struct get_dyn_cap_config_out_pl {
> > +        uint8_t num_regions;
> > +        uint8_t rsvd1[7];
> 
> This changed in r3.1 (errata? - I haven't checked)
> Should be 'regions returned' in first byte.
> 
> > +        struct {
> > +            uint64_t base;
> > +            uint64_t decode_len;
> > +            uint64_t region_len;
> > +            uint64_t block_size;
> > +            uint32_t dsmadhandle;
> 
> > +            uint8_t flags;
> > +            uint8_t rsvd2[3];
> > +        } QEMU_PACKED records[];
> 
> There are two fields after this as well.
> Total number of supported extents and number of available extents.
> 
> That annoyingly means we can't use the structure to tell us where
> to find all the fields...
> 
> 
> > +    } QEMU_PACKED;
> > +
> > +    struct get_dyn_cap_config_in_pl *in = (void *)payload_in;
> > +    struct get_dyn_cap_config_out_pl *out = (void *)payload_out;
> > +    uint16_t record_count = 0, i;
> 
> Better to split that on to 2 lines. Never hide setting a value
> in the middle of a set of declarations.
> 
> > +    uint16_t out_pl_len;
> > +    uint8_t start_region_id = in->start_region_id;
> > +
> > +    if (start_region_id >= ct3d->dc.num_regions) {
> > +        return CXL_MBOX_INVALID_INPUT;
> > +    }
> > +
> > +    record_count = MIN(ct3d->dc.num_regions - in->start_region_id,
> > +            in->region_cnt);
> > +
> > +    out_pl_len = sizeof(*out) + record_count * sizeof(out->records[0]);
> 
> For r3.1 + 8 for the two trailing fields.
> 
> > +    assert(out_pl_len <= CXL_MAILBOX_MAX_PAYLOAD_SIZE);
> > +
> > +    memset(out, 0, out_pl_len);
> 
> As part of the cci rework we started zeroing the whole mailbox payload space
> after copying out the input payload.
> https://elixir.bootlin.com/qemu/latest/source/hw/cxl/cxl-device-utils.c#L204
> 
> So shouldn't need this (unless we have a bug)
> 
> Jonathan

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 2/9] hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative and mailbox command support
  2024-02-01 19:58     ` fan
@ 2024-02-02 11:52       ` Jonathan Cameron
  0 siblings, 0 replies; 37+ messages in thread
From: Jonathan Cameron @ 2024-02-02 11:52 UTC (permalink / raw
  To: fan
  Cc: qemu-devel, linux-cxl, ira.weiny, dan.j.williams, a.manzanares,
	dave, nmtadam.samsung, nifan, jim.harris, Fan Ni

On Thu, 1 Feb 2024 11:58:43 -0800
fan <nifan.cxl@gmail.com> wrote:

> On Wed, Jan 24, 2024 at 02:51:18PM +0000, Jonathan Cameron wrote:
> > On Tue,  7 Nov 2023 10:07:06 -0800
> > nifan.cxl@gmail.com wrote:
> >   
> > > From: Fan Ni <fan.ni@samsung.com>
> > > 
> > > Per cxl spec 3.0, add dynamic capacity region representative based on
> > > Table 8-126 and extend the cxl type3 device definition to include dc region
> > > information. Also, based on info in 8.2.9.8.9.1, add 'Get Dynamic Capacity
> > > Configuration' mailbox support.
> > > 
> > > Note: decode_len of a dc region is aligned to 256*MiB, need to be divided by
> > > 256 * MiB before returned to the host for "Get Dynamic Capacity Configuration"
> > > mailbox command.
> > > 
> > > Signed-off-by: Fan Ni <fan.ni@samsung.com>  
> > 
> > Hi Fan,
> > 
> > I'm looking at how to move these much earlier in my tree on basis that
> > they should be our main focus for merging in this QEMU cycle.
> > 
> > Whilst I do that rebase, I'm taking a closer look at the code.
> > I'm targetting rebasing on upstream qemu + the two patch sets I just
> > sent out:
> > [PATCH 00/12 qemu] CXL emulation fixes and minor cleanup. 
> > [PATCH 0/5 qemu] hw/cxl: Update CXL emulation to reflect and reference r3.1
> > 
> > It would be good to document why these commands should be optional (which I think
> > comes down to the annoying fact that Get Dynamic Capacity Configuration isn't
> > allowed to return 0 regions, but instead should not be available as a command
> > if DCD isn't supported.
> > 
> > Note this requires us to carry Gregory's patches to make the CCI command list
> > constructed at runtime rather than baked in ahead of this set.
> > 
> > So another question is should we jump directly to the r3.1 version of DCD?
> > I think we probably should as it includes some additions that are necessary
> > for a bunch of the potential use cases.
> >   
> 
> Based on cxl spec r3.1, the get dynamic capacity configuration output
> payload (Table 8-164) have 4 extra items after the variable region configuration
> structure. That is not allowed by the compiler, should we move the
> new-added 4 items before the variable region configuration structures?

You will just need to manage that size explicitly rather than using a variable
element at the end.  Add some helpers to find the offset in the structure
and it shouldn't be too ugly.

Can't reorganize it just because they made the spec hideous :(

> 
> Fan
> 
> >   
> > > ---
> > >  hw/cxl/cxl-mailbox-utils.c  | 80 +++++++++++++++++++++++++++++++++++++
> > >  hw/mem/cxl_type3.c          |  6 +++
> > >  include/hw/cxl/cxl_device.h | 17 ++++++++
> > >  3 files changed, 103 insertions(+)
> > > 
> > > diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> > > index 8eceedfa87..f80dd6474f 100644
> > > --- a/hw/cxl/cxl-mailbox-utils.c
> > > +++ b/hw/cxl/cxl-mailbox-utils.c
> > > @@ -80,6 +80,8 @@ enum {
> > >          #define GET_POISON_LIST        0x0
> > >          #define INJECT_POISON          0x1
> > >          #define CLEAR_POISON           0x2
> > > +    DCD_CONFIG  = 0x48,
> > > +        #define GET_DC_CONFIG          0x0
> > >      PHYSICAL_SWITCH = 0x51,
> > >          #define IDENTIFY_SWITCH_DEVICE      0x0
> > >          #define GET_PHYSICAL_PORT_STATE     0x1
> > > @@ -1210,6 +1212,74 @@ static CXLRetCode cmd_media_clear_poison(const struct cxl_cmd *cmd,
> > >      return CXL_MBOX_SUCCESS;
> > >  }
> > >  
> > > +/*
> > > + * CXL r3.0 section 8.2.9.8.9.1: Get Dynamic Capacity Configuration  
> > 
> > As per the patch set I just sent out, I want to standardize on references
> > to r3.1 because it's all that is easy to get.  However if we decide to r3.0
> > DCD first the upgrade it later, then clearly these need to stick to r3.0 for
> > now.
> >   
> > > + * (Opcode: 4800h)
> > > + */
> > > +static CXLRetCode cmd_dcd_get_dyn_cap_config(const struct cxl_cmd *cmd,
> > > +                                             uint8_t *payload_in,
> > > +                                             size_t len_in,
> > > +                                             uint8_t *payload_out,
> > > +                                             size_t *len_out,
> > > +                                             CXLCCI *cci)
> > > +{
> > > +    CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
> > > +    struct get_dyn_cap_config_in_pl {
> > > +        uint8_t region_cnt;
> > > +        uint8_t start_region_id;
> > > +    } QEMU_PACKED;
> > > +
> > > +    struct get_dyn_cap_config_out_pl {
> > > +        uint8_t num_regions;
> > > +        uint8_t rsvd1[7];  
> > 
> > This changed in r3.1 (errata? - I haven't checked)
> > Should be 'regions returned' in first byte.
> >   
> > > +        struct {
> > > +            uint64_t base;
> > > +            uint64_t decode_len;
> > > +            uint64_t region_len;
> > > +            uint64_t block_size;
> > > +            uint32_t dsmadhandle;  
> >   
> > > +            uint8_t flags;
> > > +            uint8_t rsvd2[3];
> > > +        } QEMU_PACKED records[];  
> > 
> > There are two fields after this as well.
> > Total number of supported extents and number of available extents.
> > 
> > That annoyingly means we can't use the structure to tell us where
> > to find all the fields...
> > 
> >   
> > > +    } QEMU_PACKED;
> > > +
> > > +    struct get_dyn_cap_config_in_pl *in = (void *)payload_in;
> > > +    struct get_dyn_cap_config_out_pl *out = (void *)payload_out;
> > > +    uint16_t record_count = 0, i;  
> > 
> > Better to split that on to 2 lines. Never hide setting a value
> > in the middle of a set of declarations.
> >   
> > > +    uint16_t out_pl_len;
> > > +    uint8_t start_region_id = in->start_region_id;
> > > +
> > > +    if (start_region_id >= ct3d->dc.num_regions) {
> > > +        return CXL_MBOX_INVALID_INPUT;
> > > +    }
> > > +
> > > +    record_count = MIN(ct3d->dc.num_regions - in->start_region_id,
> > > +            in->region_cnt);
> > > +
> > > +    out_pl_len = sizeof(*out) + record_count * sizeof(out->records[0]);  
> > 
> > For r3.1 + 8 for the two trailing fields.
> >   
> > > +    assert(out_pl_len <= CXL_MAILBOX_MAX_PAYLOAD_SIZE);
> > > +
> > > +    memset(out, 0, out_pl_len);  
> > 
> > As part of the cci rework we started zeroing the whole mailbox payload space
> > after copying out the input payload.
> > https://elixir.bootlin.com/qemu/latest/source/hw/cxl/cxl-device-utils.c#L204
> > 
> > So shouldn't need this (unless we have a bug)
> > 
> > Jonathan  


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 5/9] hw/mem/cxl_type3: Add host backend and address space handling for DC regions
  2024-01-24 15:47   ` Jonathan Cameron
@ 2024-02-06 22:24     ` fan
  2024-02-13  9:28       ` Jonathan Cameron
  0 siblings, 1 reply; 37+ messages in thread
From: fan @ 2024-02-06 22:24 UTC (permalink / raw
  To: Jonathan Cameron
  Cc: nifan.cxl, qemu-devel, linux-cxl, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, nifan, jim.harris, Fan Ni

On Wed, Jan 24, 2024 at 03:47:21PM +0000, Jonathan Cameron wrote:
> On Tue,  7 Nov 2023 10:07:09 -0800
> nifan.cxl@gmail.com wrote:
> 
> > From: Fan Ni <fan.ni@samsung.com>
> > 
> > Add (file/memory backed) host backend, all the dynamic capacity regions
> > will share a single, large enough host backend. Set up address space for
> > DC regions to support read/write operations to dynamic capacity for DCD.
> > 
> > With the change, following supports are added:
> > 1. Add a new property to type3 device "nonvolatile-dc-memdev" to point to host
> >    memory backend for dynamic capacity. Currently, all dc regions share one
> >    one host backend.
> > 2. Add namespace for dynamic capacity for read/write support;
> > 3. Create cdat entries for each dynamic capacity region;
> > 4. Fix dvsec range registers to include DC regions.
> > 
> > Signed-off-by: Fan Ni <fan.ni@samsung.com>
> Some minor comments inline, mostly suggesting pulling refactors out before
> you do the new stuff.
> 
> Thanks,
> 
> Jonathan

Hi Jonathan,
   One question about DVSEC setting inline.
   Please search ""QUESTION:"

> 
> > ---
> >  hw/cxl/cxl-mailbox-utils.c  |  16 ++-
> >  hw/mem/cxl_type3.c          | 198 +++++++++++++++++++++++++++++-------
> >  include/hw/cxl/cxl_device.h |   4 +
> >  3 files changed, 179 insertions(+), 39 deletions(-)
> > 
> 
> 
> 
> >  
> > diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> > index 2d67d2015c..152a51306d 100644
> > --- a/hw/mem/cxl_type3.c
> > +++ b/hw/mem/cxl_type3.c
> > @@ -31,6 +31,7 @@
> >  #include "hw/pci/spdm.h"
> >  
> >  #define DWORD_BYTE 4
> > +#define CXL_CAPACITY_MULTIPLIER   (256 * MiB)
> >  
> >  /* Default CDAT entries for a memory region */
> >  enum {
> > @@ -44,8 +45,9 @@ enum {
> >  };
> >  
> >  static int ct3_build_cdat_entries_for_mr(CDATSubHeader **cdat_table,
> > -                                         int dsmad_handle, MemoryRegion *mr,
> > -                                         bool is_pmem, uint64_t dpa_base)
> > +                                         int dsmad_handle, uint64_t size,
> > +                                         bool is_pmem, bool is_dynamic,
> > +                                         uint64_t dpa_base)
> >  {
> >      g_autofree CDATDsmas *dsmas = NULL;
> >      g_autofree CDATDslbis *dslbis0 = NULL;
> > @@ -64,9 +66,10 @@ static int ct3_build_cdat_entries_for_mr(CDATSubHeader **cdat_table,
> >              .length = sizeof(*dsmas),
> >          },
> >          .DSMADhandle = dsmad_handle,
> > -        .flags = is_pmem ? CDAT_DSMAS_FLAG_NV : 0,
> > +        .flags = (is_pmem ? CDAT_DSMAS_FLAG_NV : 0) |
> > +            (is_dynamic ? CDAT_DSMAS_FLAG_DYNAMIC_CAP : 0),
> >          .DPA_base = dpa_base,
> > -        .DPA_length = memory_region_size(mr),
> > +        .DPA_length = size,
> >      };
> >  
> >      /* For now, no memory side cache, plausiblish numbers */
> > @@ -150,7 +153,7 @@ static int ct3_build_cdat_entries_for_mr(CDATSubHeader **cdat_table,
> >           */
> >          .EFI_memory_type_attr = is_pmem ? 2 : 1,
> >          .DPA_offset = 0,
> > -        .DPA_length = memory_region_size(mr),
> > +        .DPA_length = size,
> >      };
> 
> Might be better to make the change to this function as a precursor patch before
> you introduce the new users.  Will separate the DC bits out from the rest.
> 
> >  
> >      /* Header always at start of structure */
> > @@ -169,21 +172,28 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table, void *priv)
> >      g_autofree CDATSubHeader **table = NULL;
> >      CXLType3Dev *ct3d = priv;
> >      MemoryRegion *volatile_mr = NULL, *nonvolatile_mr = NULL;
> > +    MemoryRegion *dc_mr = NULL;
> >      int dsmad_handle = 0;
> >      int cur_ent = 0;
> >      int len = 0;
> >      int rc, i;
> > +    uint64_t vmr_size = 0, pmr_size = 0;
> 
> Put these next to the memory region definitions above given they are referring to the
> same regions.
> 
> >  
> > -    if (!ct3d->hostpmem && !ct3d->hostvmem) {
> > +    if (!ct3d->hostpmem && !ct3d->hostvmem && !ct3d->dc.num_regions) {
> >          return 0;
> >      }
> >  
> > +    if (ct3d->hostpmem && ct3d->hostvmem && ct3d->dc.host_dc) {
> > +        warn_report("The device has static ram and pmem and dynamic capacity");
> 
> This is the whole how many DVSEC ranges question? 
> I hope we resolved that so we don't care about this...
> 
> > +    }
> > +
> >      if (ct3d->hostvmem) {
> >          volatile_mr = host_memory_backend_get_memory(ct3d->hostvmem);
> >          if (!volatile_mr) {
> >              return -EINVAL;
> >          }
> >          len += CT3_CDAT_NUM_ENTRIES;
> > +        vmr_size = memory_region_size(volatile_mr);
> >      }
> >  
> >      if (ct3d->hostpmem) {
> 
> ....
> 
> > @@ -210,14 +233,38 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table, void *priv)
> >      }
> >  
> >      if (nonvolatile_mr) {
> > -        uint64_t base = volatile_mr ? memory_region_size(volatile_mr) : 0;
> >          rc = ct3_build_cdat_entries_for_mr(&(table[cur_ent]), dsmad_handle++,
> > -                                           nonvolatile_mr, true, base);
> > +                                           pmr_size, true, false, vmr_size);
> >          if (rc < 0) {
> >              goto error_cleanup;
> >          }
> >          cur_ent += CT3_CDAT_NUM_ENTRIES;
> >      }
> > +
> > +    if (dc_mr) {
> > +        uint64_t region_base = vmr_size + pmr_size;
> > +
> > +        /*
> > +         * Currently we create cdat entries for each region, should we only
> > +         * create dsmas table instead??
> 
> We want the whole set.  Need multiple DSMAS for the flags.
> SLBIS refer to DSMAS to identify which memory they cover + they may well be
> different for different regions (could be different types of memory).
> DSEMTS also by DSMAS handle so we need those as well
> 
> 
> > +         * We assume all dc regions are non-volatile for now.
> 
> As expressed below. I'd really prefer them to start as volatile and we can
> consider non volatile later.
> 
> > +         *
> > +         */
> > +        for (i = 0; i < ct3d->dc.num_regions; i++) {
> > +            rc = ct3_build_cdat_entries_for_mr(&(table[cur_ent]),
> > +                                               dsmad_handle++,
> > +                                               ct3d->dc.regions[i].len,
> > +                                               true, true, region_base);
> > +            if (rc < 0) {
> > +                goto error_cleanup;
> > +            }
> > +            ct3d->dc.regions[i].dsmadhandle = dsmad_handle - 1;
> > +
> > +            cur_ent += CT3_CDAT_NUM_ENTRIES;
> > +            region_base += ct3d->dc.regions[i].len;
> > +        }
> > +    }
> > +
> >      assert(len == cur_ent);
> >  
> >      *cdat_table = g_steal_pointer(&table);
> > @@ -445,11 +492,24 @@ static void build_dvsecs(CXLType3Dev *ct3d)
> >              range2_size_hi = ct3d->hostpmem->size >> 32;
> >              range2_size_lo = (2 << 5) | (2 << 2) | 0x3 |
> >                               (ct3d->hostpmem->size & 0xF0000000);
> > +        } else if (ct3d->dc.host_dc) {
> > +            range2_size_hi = ct3d->dc.host_dc->size >> 32;
> > +            range2_size_lo = (2 << 5) | (2 << 2) | 0x3 |
> > +                             (ct3d->dc.host_dc->size & 0xF0000000);
> 
> I've forgotten if we came to a conclusion on whether these should include
> DC or not...  My gut feeling is no because we don't know what to do
> if they are both already in use.
> 

QUESTION:

If we do not include DC, and there is no static ram/pmem capacity and
only dynamic capacity, then the range registers will not be set, is that
what we want?

Fan

> >          }
> > -    } else {
> > +    } else if (ct3d->hostpmem) {
> >          range1_size_hi = ct3d->hostpmem->size >> 32;
> >          range1_size_lo = (2 << 5) | (2 << 2) | 0x3 |
> >                           (ct3d->hostpmem->size & 0xF0000000);
> > +        if (ct3d->dc.host_dc) {
> > +            range2_size_hi = ct3d->dc.host_dc->size >> 32;
> > +            range2_size_lo = (2 << 5) | (2 << 2) | 0x3 |
> > +                             (ct3d->dc.host_dc->size & 0xF0000000);
> > +        }
> > +    } else {
> > +        range1_size_hi = ct3d->dc.host_dc->size >> 32;
> > +        range1_size_lo = (2 << 5) | (2 << 2) | 0x3 |
> > +            (ct3d->dc.host_dc->size & 0xF0000000);
> >      }
> >  
> >      dvsec = (uint8_t *)&(CXLDVSECDevice){
> > @@ -721,6 +781,9 @@ static void ct3d_reg_write(void *opaque, hwaddr offset, uint64_t value,
> >      }
> >  }
> >  
> > +/*
> > + * TODO: region parameters are hard coded, may need to change in the future.
> 
> Agreed :)  We should look at this fairly soon I think, though as
> long as we keep option of defaults that fall back to what we have here
> we can do most of it later. However I would like the defaults to be derived
> from the memory backend size.
> 
> > + */
> >  static int cxl_create_dc_regions(CXLType3Dev *ct3d)
> >  {
> >      int i;
> > @@ -736,6 +799,7 @@ static int cxl_create_dc_regions(CXLType3Dev *ct3d)
> >      if (ct3d->hostpmem) {
> >          region_base += ct3d->hostpmem->size;
> >      }
> > +
> 
> Should be pushed back to the original patch.
> 
> >      for (i = 0; i < ct3d->dc.num_regions; i++) {
> >          region = &ct3d->dc.regions[i];
> >          region->base = region_base;
> 
> > @@ -823,6 +888,50 @@ static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
> >          return false;
> >      }
> >  
> > +    ct3d->dc.total_capacity = 0;
> > +    if (ct3d->dc.host_dc) {
> 
> This confuses me a little. Can we create DC regions without a memory backend?
> I don't think we should allow that - in which case the earlier
> cxl_create_dc_regions() can move under this check.
> 
> > +        MemoryRegion *dc_mr;
> > +        char *dc_name;
> > +        uint64_t total_region_size = 0;
> > +        int i;
> > +
> > +        dc_mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
> > +        if (!dc_mr) {
> > +            error_setg(errp, "dynamic capacity must have backing device");
> > +            return false;
> > +        }
> > +        /* FIXME: set dc as nonvolatile for now */
> 
> As that's less likely to occur than volatile I'd prefer a default of volatile.
> 
> > +        memory_region_set_nonvolatile(dc_mr, true);
> > +        memory_region_set_enabled(dc_mr, true);
> > +        host_memory_backend_set_mapped(ct3d->dc.host_dc, true);
> > +        if (ds->id) {
> > +            dc_name = g_strdup_printf("cxl-dcd-dpa-dc-space:%s", ds->id);
> > +        } else {
> > +            dc_name = g_strdup("cxl-dcd-dpa-dc-space");
> > +        }
> > +        address_space_init(&ct3d->dc.host_dc_as, dc_mr, dc_name);
> > +        g_free(dc_name);
> > +
> > +        for (i = 0; i < ct3d->dc.num_regions; i++) {
> > +            total_region_size += ct3d->dc.regions[i].len;
> > +        }
> > +        /* Make sure the host backend is large enough to cover all dc range */
> 
> I suppose this is another reasonable way of doing region defaults. Just refuse
> them if your defaults don't fit in the provided memory backend.
> We can work with that as long as we cycle back around to regions we can
> configure from the command line fairly soon. 
> 
> > +        if (total_region_size > memory_region_size(dc_mr)) {
> > +            error_setg(errp,
> > +                "too small host backend size, increase to %lu MiB or more",
> > +                total_region_size / MiB);
> > +            return false;
> > +        }
> > +
> > +        if (dc_mr->size % CXL_CAPACITY_MULTIPLIER != 0) {
> > +            error_setg(errp, "DC region size is unaligned to %lx",
> > +                    CXL_CAPACITY_MULTIPLIER);
> > +            return false;
> > +        }
> > +
> > +        ct3d->dc.total_capacity = total_region_size;
> > +    }
> > +
> >      return true;
> >  }
> 
> 
> > @@ -1025,16 +1140,24 @@ static int cxl_type3_hpa_to_as_and_dpa(CXLType3Dev *ct3d,
> >                                         AddressSpace **as,
> >                                         uint64_t *dpa_offset)
> >  {
> 
> >  
> > @@ -1042,19 +1165,18 @@ static int cxl_type3_hpa_to_as_and_dpa(CXLType3Dev *ct3d,
> >          return -EINVAL;
> >      }
> >  
> > -    if (*dpa_offset > ct3d->cxl_dstate.static_mem_size) {
> > +    if (*dpa_offset >= vmr_size + pmr_size + dc_size) {
> >          return -EINVAL;
> >      }
> >  
> > -    if (vmr) {
> > -        if (*dpa_offset < memory_region_size(vmr)) {
> > -            *as = &ct3d->hostvmem_as;
> > -        } else {
> > -            *as = &ct3d->hostpmem_as;
> > -            *dpa_offset -= memory_region_size(vmr);
> > -        }
> > -    } else {
> > +    if (*dpa_offset < vmr_size) {
> > +        *as = &ct3d->hostvmem_as;
> > +    } else if (*dpa_offset < vmr_size + pmr_size) {
> >          *as = &ct3d->hostpmem_as;
> > +        *dpa_offset -= vmr_size;
> > +    } else {
> > +        *as = &ct3d->dc.host_dc_as;
> > +        *dpa_offset -= (vmr_size + pmr_size);
> >      }
> 
> This code is duplicated below.  As a follow up perhaps we should
> add a utility function to get the as and offset within that space.
>   
> >      return 0;
> > @@ -1143,6 +1265,8 @@ static Property ct3_props[] = {
> >      DEFINE_PROP_STRING("cdat", CXLType3Dev, cxl_cstate.cdat.filename),
> >      DEFINE_PROP_UINT16("spdm", CXLType3Dev, spdm_port, 0),
> >      DEFINE_PROP_UINT8("num-dc-regions", CXLType3Dev, dc.num_regions, 0),
> > +    DEFINE_PROP_LINK("nonvolatile-dc-memdev", CXLType3Dev, dc.host_dc,
> > +                    TYPE_MEMORY_BACKEND, HostMemoryBackend *),
> >      DEFINE_PROP_END_OF_LIST(),
> >  };
> >  
> > @@ -1209,33 +1333,39 @@ static void set_lsa(CXLType3Dev *ct3d, const void *buf, uint64_t size,
> >  
> >  static bool set_cacheline(CXLType3Dev *ct3d, uint64_t dpa_offset, uint8_t *data)
> >  {
> > -    MemoryRegion *vmr = NULL, *pmr = NULL;
> > +    MemoryRegion *vmr = NULL, *pmr = NULL, *dc_mr = NULL;
> >      AddressSpace *as;
> > +    uint64_t vmr_size = 0, pmr_size = 0, dc_size = 0;
> >  
> >      if (ct3d->hostvmem) {
> >          vmr = host_memory_backend_get_memory(ct3d->hostvmem);
> > +        vmr_size = memory_region_size(vmr);
> >      }
> >      if (ct3d->hostpmem) {
> >          pmr = host_memory_backend_get_memory(ct3d->hostpmem);
> > +        pmr_size = memory_region_size(pmr);
> >      }
> > +    if (ct3d->dc.host_dc) {
> > +        dc_mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
> > +        dc_size = ct3d->dc.total_capacity;
> > +     }
> >  
> > -    if (!vmr && !pmr) {
> > +    if (!vmr && !pmr && !dc_mr) {
> >          return false;
> >      }
> >  
> > -    if (dpa_offset + CXL_CACHE_LINE_SIZE > ct3d->cxl_dstate.static_mem_size) {
> > +    if (dpa_offset + CXL_CACHE_LINE_SIZE > vmr_size + pmr_size + dc_size) {
> >          return false;
> >      }
> >  
> > -    if (vmr) {
> > -        if (dpa_offset < memory_region_size(vmr)) {
> > -            as = &ct3d->hostvmem_as;
> > -        } else {
> > -            as = &ct3d->hostpmem_as;
> > -            dpa_offset -= memory_region_size(vmr);
> > -        }
> > -    } else {
> > +    if (dpa_offset < vmr_size) {
> > +        as = &ct3d->hostvmem_as;
> > +    } else if (dpa_offset < vmr_size + pmr_size) {
> >          as = &ct3d->hostpmem_as;
> > +        dpa_offset -= vmr_size;
> > +    } else {	
> > +        as = &ct3d->dc.host_dc_as;
> > +        dpa_offset -= (vmr_size + pmr_size);
> >      }
> >  
> >      address_space_write(as, dpa_offset, MEMTXATTRS_UNSPECIFIED, &data,
> 
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 8/9] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
  2024-01-24 16:50   ` Jonathan Cameron
@ 2024-02-08 19:17     ` fan
  2024-02-13  9:29       ` Jonathan Cameron
  0 siblings, 1 reply; 37+ messages in thread
From: fan @ 2024-02-08 19:17 UTC (permalink / raw
  To: Jonathan Cameron
  Cc: nifan.cxl, qemu-devel, linux-cxl, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, nifan, jim.harris, Fan Ni

On Wed, Jan 24, 2024 at 04:50:04PM +0000, Jonathan Cameron wrote:
> On Tue,  7 Nov 2023 10:07:12 -0800
> nifan.cxl@gmail.com wrote:
> 
> > From: Fan Ni <fan.ni@samsung.com>
> > 
> > Since fabric manager emulation is not supported yet, the change implements
> > the functions to add/release dynamic capacity extents as QMP interfaces.
> > 
> > Note: we block any FM issued extent release request if the exact extent
> > does not exist in the extent list of the device. We will loose the
> > restriction later once we have partial release support in the kernel.
> > 
> > 1. Add dynamic capacity extents:
> > 
> > For example, the command to add two continuous extents (each 128MiB long)
> > to region 0 (starting at DPA offset 0) looks like below:
> > 
> > { "execute": "qmp_capabilities" }
> > 
> > { "execute": "cxl-add-dynamic-capacity",
> >   "arguments": {
> >       "path": "/machine/peripheral/cxl-dcd0",
> >       "region-id": 0,
> >       "extents": [
> >       {
> >           "dpa": 0,
> >           "len": 128
> >       },
> >       {
> >           "dpa": 128,
> >           "len": 128
> >       }
> >       ]
> >   }
> > }
> > 
> > 2. Release dynamic capacity extents:
> > 
> > For example, the command to release an extent of size 128MiB from region 0
> > (DPA offset 128MiB) look like below:
> > 
> > { "execute": "cxl-release-dynamic-capacity",
> >   "arguments": {
> >       "path": "/machine/peripheral/cxl-dcd0",
> >       "region-id": 0,
> >       "extents": [
> >       {
> >           "dpa": 128,
> >           "len": 128
> >       }
> >       ]
> >   }
> > }
> > 
> > Signed-off-by: Fan Ni <fan.ni@samsung.com>
> A few more comment and found some answers to previous comments. I should have
> read the whole thing first :(

Hi Jonathan,
One reply to your comment is inlined. Search REPLY to locate it.

> 
> > ---
> >  hw/cxl/cxl-mailbox-utils.c  |  25 +++-
> >  hw/mem/cxl_type3.c          | 225 +++++++++++++++++++++++++++++++++++-
> >  hw/mem/cxl_type3_stubs.c    |  14 +++
> >  include/hw/cxl/cxl_device.h |   8 +-
> >  include/hw/cxl/cxl_events.h |  15 +++
> >  qapi/cxl.json               |  60 +++++++++-
> >  6 files changed, 338 insertions(+), 9 deletions(-)
> > 
> > diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> > index 9f788b03b6..8e6a98753a 100644
> > --- a/hw/cxl/cxl-mailbox-utils.c
> > +++ b/hw/cxl/cxl-mailbox-utils.c
> > @@ -1362,7 +1362,7 @@ static CXLRetCode cmd_dcd_get_dyn_cap_ext_list(const struct cxl_cmd *cmd,
> >   * Check whether any bit between addr[nr, nr+size) is set,
> >   * return true if any bit is set, otherwise return false
> >   */
> > -static bool test_any_bits_set(const unsigned long *addr, int nr, int size)
> > +bool test_any_bits_set(const unsigned long *addr, int nr, int size)
> >  {
> >      unsigned long res = find_next_bit(addr, size + nr, nr);
> >  
> > @@ -1400,7 +1400,7 @@ CXLDCDRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len)
> >      return NULL;
> >  }
> >  
> > -static void cxl_insert_extent_to_extent_list(CXLDCDExtentList *list,
> > +void cxl_insert_extent_to_extent_list(CXLDCDExtentList *list,
> >                                               uint64_t dpa,
> >                                               uint64_t len,
> >                                               uint8_t *tag,
> > @@ -1538,15 +1538,28 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
> >              }
> >          }
> >  
> > -        /*
> > -         * TODO: add a pending extent list based on event log record and
> > -         * verify the input response
> > -         */
> 
> Ahah. I should have read on :)  Ignore comments on previous agreeing
> that such a list was needed.
> 
> > +        QTAILQ_FOREACH(ent, &ct3d->dc.extents_pending_to_add, node) {
> > +            if (ent->start_dpa <= dpa &&
> > +                dpa + len <= ent->start_dpa + ent->len) {
> > +                break;
> > +            }
> > +        }
> > +        if (ent) {
> > +            QTAILQ_REMOVE(&ct3d->dc.extents_pending_to_add, ent, node);
> > +            g_free(ent);
> > +        } else {
> > +            return CXL_MBOX_INVALID_PA;
> > +        }
> Flip to simplify logic
> 
>            if (!end) {
>                 return CXL_MBOX_INVALID_PA;
>            }
> 
> 	   QTAILQ...
> 
> >  
> >          cxl_insert_extent_to_extent_list(extent_list, dpa, len, NULL, 0);
> >          ct3d->dc.total_extent_count += 1;
> >      }
> >  
> > +    /*
> > +     * TODO: extents_pending_to_add needs to be cleared so the extents not
> > +     * accepted can be reclaimed base on spec r3.0: 8.2.9.8.9.3
> > +     */
> > +
> >      return CXL_MBOX_SUCCESS;
> >  }
> >  
> > diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> > index 482329a499..43cea3d818 100644
> > --- a/hw/mem/cxl_type3.c
> > +++ b/hw/mem/cxl_type3.c
> > @@ -813,6 +813,7 @@ static int cxl_create_dc_regions(CXLType3Dev *ct3d)
> >          region_base += region->len;
> >      }
> >      QTAILQ_INIT(&ct3d->dc.extents);
> > +    QTAILQ_INIT(&ct3d->dc.extents_pending_to_add);
> >  
> >      return 0;
> >  }
> > @@ -1616,7 +1617,8 @@ static int ct3d_qmp_cxl_event_log_enc(CxlEventLog log)
> >          return CXL_EVENT_TYPE_FAIL;
> >      case CXL_EVENT_LOG_FATAL:
> >          return CXL_EVENT_TYPE_FATAL;
> > -/* DCD not yet supported */
> > +    case CXL_EVENT_LOG_DYNCAP:
> > +        return CXL_EVENT_TYPE_DYNAMIC_CAP;
> >      default:
> >          return -EINVAL;
> >      }
> > @@ -1867,6 +1869,227 @@ void qmp_cxl_inject_memory_module_event(const char *path, CxlEventLog log,
> >      }
> >  }
> >  
> > +/* CXL r3.0 Table 8-47: Dynanic Capacity Event Record */
> > +static const QemuUUID dynamic_capacity_uuid = {
> > +    .data = UUID(0xca95afa7, 0xf183, 0x4018, 0x8c, 0x2f,
> > +                 0x95, 0x26, 0x8e, 0x10, 0x1a, 0x2a),
> > +};
> > +
> > +typedef enum CXLDCEventType {
> > +    DC_EVENT_ADD_CAPACITY = 0x0,
> > +    DC_EVENT_RELEASE_CAPACITY = 0x1,
> > +    DC_EVENT_FORCED_RELEASE_CAPACITY = 0x2,
> > +    DC_EVENT_REGION_CONFIG_UPDATED = 0x3,
> > +    DC_EVENT_ADD_CAPACITY_RSP = 0x4,
> > +    DC_EVENT_CAPACITY_RELEASED = 0x5,
> > +    DC_EVENT_NUM
> Don't thing EVENT_NUM is used. Don't define it unless it's useful.
> 
> > +} CXLDCEventType;
> > +
> > +/*
> > + * Check whether the exact extent exists in the list
> > + * Return value: true if exists, otherwise false
> > + */
> > +static bool cxl_dc_extent_exists(CXLDCDExtentList *list, CXLDCExtentRaw *ext)
> > +{
> > +    CXLDCDExtent *ent;
> > +
> > +    if (!ext || !list) {
> > +        return false;
> > +    }
> > +
> > +    QTAILQ_FOREACH(ent, list, node) {
> > +        if (ent->start_dpa != ext->start_dpa) {
> > +            continue;
> > +        }
> > +
> > +        /*Found exact extent*/
> 
> 	   return ent->len == ext->len;
> 
> > +        if (ent->len == ext->len) {
> > +            return true;
> > +        } else {
> > +            return false;
> > +        }
> > +    }
> > +    return false;
> > +}
> > +
> > +static void qmp_cxl_process_dynamic_capacity(const char *path, CxlEventLog log,
> > +                                             CXLDCEventType type, uint16_t hid,
> > +                                             uint8_t rid,
> > +                                             CXLDCExtentRecordList *records,
> > +                                             Error **errp)
> > +{
> > +    Object *obj;
> > +    CXLEventDynamicCapacity dCap = {};
> > +    CXLEventRecordHdr *hdr = &dCap.hdr;
> > +    CXLType3Dev *dcd;
> > +    uint8_t flags = 1 << CXL_EVENT_TYPE_INFO;
> > +    uint32_t num_extents = 0;
> > +    CXLDCExtentRecordList *list;
> > +    g_autofree CXLDCExtentRaw *extents = NULL;
> > +    CXLDCDExtentList *extent_list = NULL;
> > +    uint8_t enc_log;
> > +    uint64_t offset, len, block_size;
> > +    int i;
> > +    int rc;
> > +    g_autofree unsigned long *blk_bitmap = NULL;
> > +
> > +    obj = object_resolve_path(path, NULL);
> > +    if (!obj) {
> > +        error_setg(errp, "Unable to resolve path");
> > +        return;
> > +    }
> > +    if (!object_dynamic_cast(obj, TYPE_CXL_TYPE3)) {
> > +        error_setg(errp, "Path not point to a valid CXL type3 device");
> > +        return;
> > +    }
> > +
> > +    dcd = CXL_TYPE3(obj);
> > +    if (!dcd->dc.num_regions) {
> > +        error_setg(errp, "No dynamic capacity support from the device");
> > +        return;
> > +    }
> > +
> > +    rc = ct3d_qmp_cxl_event_log_enc(log);
> > +    if (rc < 0) {
> > +        error_setg(errp, "Unhandled error log type");
> > +        return;
> > +    }
> > +    enc_log = rc;
> > +
> > +    if (rid >= dcd->dc.num_regions) {
> > +        error_setg(errp, "region id is too large");
> > +        return;
> > +    }
> > +    block_size = dcd->dc.regions[rid].block_size;
> > +
> > +    /* Sanity check and count the extents */
> > +    list = records;
> > +    while (list) {
> > +        offset = list->value->offset * MiB;
> > +        len = list->value->len * MiB;
> > +
> > +        if (len == 0) {
> > +            error_setg(errp, "extent with 0 length is not allowed");
> > +            return;
> > +        }
> > +
> > +        if (offset % block_size || len % block_size) {
> > +            error_setg(errp, "dpa or len is not aligned to region block size");
> > +            return;
> > +        }
> > +
> > +        if (offset + len > dcd->dc.regions[rid].len) {
> > +            error_setg(errp, "extent range is beyond the region end");
> > +            return;
> > +        }
> > +
> > +        num_extents++;
> > +        list = list->next;
> > +    }
> > +    if (num_extents == 0) {
> > +        error_setg(errp, "No extents found in the command");
> > +        return;
> > +    }
> > +
> > +    blk_bitmap = bitmap_new(dcd->dc.regions[rid].len / block_size);
> > +
> > +    /* Create Extent list for event being passed to host */
> > +    i = 0;
> > +    list = records;
> > +    extents = g_new0(CXLDCExtentRaw, num_extents);
> > +    while (list) {
> > +        offset = list->value->offset * MiB;
> > +        len = list->value->len * MiB;
> That suggests it wasn't in MiB unlike the comment below.
> 
> > +
> > +        extents[i].start_dpa = offset + dcd->dc.regions[rid].base;
> > +        extents[i].len = len;
> > +        memset(extents[i].tag, 0, 0x10);
> > +        extents[i].shared_seq = 0;
> > +
> > +        /*
> > +         * We block the release request from FM if the exact extent has
> > +         * not been accepted by the host yet
> 
> If it's released before host accepts it that is fine - drop it from the pending list.
> If the host then tries to accept we validate it and fail the accept.
> 
> Should really validate no overlap with existing extents in pending list or
> accepted lists.
> 
> 
> > +         * TODO: We can loose the restriction by skipping the check if desired
> > +         */
> > +        if (type == DC_EVENT_RELEASE_CAPACITY ||
> > +            type == DC_EVENT_FORCED_RELEASE_CAPACITY) {
> > +            if (!cxl_dc_extent_exists(&dcd->dc.extents, &extents[i])) {
> > +                error_setg(errp, "No exact extent found in the extent list");
> > +                return;
> > +            }
> > +        }
> > +
> > +        /* No duplicate or overlapped extents are allowed */
> > +        if (test_any_bits_set(blk_bitmap, offset / block_size,
> > +                              len / block_size)) {
> > +            error_setg(errp, "duplicate or overlapped extents are detected");
> > +            return;
> > +        }
> > +        bitmap_set(blk_bitmap, offset / block_size, len / block_size);
> > +
> > +        list = list->next;
> > +        i++;
> > +    }
> > +
> > +    switch (type) {
> > +    case DC_EVENT_ADD_CAPACITY:
> > +        extent_list = &dcd->dc.extents_pending_to_add;
> > +        break;
> > +    default:
> > +        break;
> > +    }
> > +    /*
> > +     * CXL r3.0 section 8.2.9.1.5: Dynamic Capacity Event Record
> > +     *
> > +     * All Dynamic Capacity event records shall set the Event Record Severity
> > +     * field in the Common Event Record Format to Informational Event. All
> > +     * Dynamic Capacity related events shall be logged in the Dynamic Capacity
> > +     * Event Log.
> > +     */
> > +    cxl_assign_event_header(hdr, &dynamic_capacity_uuid, flags, sizeof(dCap),
> > +                            cxl_device_get_timestamp(&dcd->cxl_dstate));
> > +
> > +    dCap.type = type;
> > +    stw_le_p(&dCap.host_id, hid);
> > +    /* only valid for DC_REGION_CONFIG_UPDATED event */
> > +    dCap.updated_region_id = 0;
> > +    for (i = 0; i < num_extents; i++) {
> > +        memcpy(&dCap.dynamic_capacity_extent, &extents[i],
> > +               sizeof(CXLDCExtentRaw));
> > +
> > +        if (extent_list) {
> Given this is always the same list
> 	   if (type == DC_EVENT_ADD_CAPACITY) {
>                cxl_insert_extent_to_extent_list(&dcd->dc.extents_pending_to_add,
> //local variable here to avoid line length but the basic idea is the same.
> 
> 
> > +            cxl_insert_extent_to_extent_list(extent_list,
> > +                                             extents[i].start_dpa,
> > +                                             extents[i].len,
> > +                                             extents[i].tag,
> > +                                             extents[i].shared_seq);
> > +        }
> > +
> > +        if (cxl_event_insert(&dcd->cxl_dstate, enc_log,
> > +                             (CXLEventRecordRaw *)&dCap)) {
> > +            cxl_event_irq_assert(dcd);
> > +        }
> > +    }
> > +}
> 
> ...
> 
> > diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
> > index b3d35fe000..ca4f824b11 100644
> > --- a/include/hw/cxl/cxl_device.h
> > +++ b/include/hw/cxl/cxl_device.h
> > @@ -491,6 +491,7 @@ struct CXLType3Dev {
> >          AddressSpace host_dc_as;
> >          uint64_t total_capacity; /* 256M aligned */
> >          CXLDCDExtentList extents;
> > +        CXLDCDExtentList extents_pending_to_add;
> >  
> >          uint32_t total_extent_count;
> >          uint32_t ext_list_gen_seq;
> > @@ -550,5 +551,10 @@ void cxl_event_irq_assert(CXLType3Dev *ct3d);
> >  void cxl_set_poison_list_overflowed(CXLType3Dev *ct3d);
> >  
> >  CXLDCDRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len);
> > -
> Avoid the whitespace change either by never adding that blank line or by keeping it here.
> 
> > +void cxl_insert_extent_to_extent_list(CXLDCDExtentList *list,
> > +                                             uint64_t dpa,
> > +                                             uint64_t len,
> > +                                             uint8_t *tag,
> > +                                             uint16_t shared_seq);
> > +bool test_any_bits_set(const unsigned long *addr, int nr, int size);
> >  #endif
> > diff --git a/include/hw/cxl/cxl_events.h b/include/hw/cxl/cxl_events.h
> > index d778487b7e..4f8cb3215d 100644
> > --- a/include/hw/cxl/cxl_events.h
> > +++ b/include/hw/cxl/cxl_events.h
> > @@ -166,4 +166,19 @@ typedef struct CXLEventMemoryModule {
> >      uint8_t reserved[0x3d];
> >  } QEMU_PACKED CXLEventMemoryModule;
> >  
> > +/*
> > + * CXL r3.0 section Table 8-47: Dynamic Capacity Event Record
> > + * All fields little endian.
> > + */
> > +typedef struct CXLEventDynamicCapacity {
> > +    CXLEventRecordHdr hdr;
> > +    uint8_t type;
> > +    uint8_t reserved1;
> > +    uint16_t host_id;
> > +    uint8_t updated_region_id;
> > +    uint8_t reserved2[3];
> > +    uint8_t dynamic_capacity_extent[0x28]; /* defined in cxl_device.h */
> 
> Can't we use that definition here?

REPLY: 

I leave it as it is to avoid include cxl_device.h to cxl_extent.h.

Do you think we need to include the file and use the definition here?

Fan

> 
> > +    uint8_t reserved[0x20];
> > +} QEMU_PACKED CXLEventDynamicCapacity;
> > +
> >  #endif /* CXL_EVENTS_H */
> > diff --git a/qapi/cxl.json b/qapi/cxl.json
> > index 8cc4c72fa9..6b631f64f1 100644
> > --- a/qapi/cxl.json
> > +++ b/qapi/cxl.json
> ...
> 
> > @@ -361,3 +362,60 @@
> >  ##
> >  {'command': 'cxl-inject-correctable-error',
> >   'data': {'path': 'str', 'type': 'CxlCorErrorType'}}
> > +
> > +##
> > +# @CXLDCExtentRecord:
> > +#
> > +# Record of a single extent to add/release
> > +#
> > +# @offset: offset of the extent start related to current region base address
> > +# @len: extent size (in MiB)
> 
> Why?  Extents can be smaller than that (though we might not have implemented
> that yet).  Bytes would be better.
> 
> > +#
> > +# Since: 8.0
> > +##
> > +{ 'struct': 'CXLDCExtentRecord',
> > +  'data': {
> > +      'offset':'uint64',
> > +      'len': 'uint64'
> > +  }
> > +}
> > +
> > +##
> > +# @cxl-add-dynamic-capacity:
> > +#
> > +# Command to start add dynamic capacity extents flow. The host will
> > +# need to respond to indicate it accepts the capacity before it becomes
> > +# available for read and write.
> 
> The device will have to have acknowledged the accept though perhaps that is
> too much detail.
> 
> > +#
> > +# @path: CXL DCD canonical QOM path
> > +# @region-id: id of the region where the extent to add/release
> > +# @extents: Extents to add
> > +#
> > +# Since : 8.2
> 
> Update for next version.  9.0 is ideal target now.
> 
> > +##
> > +{ 'command': 'cxl-add-dynamic-capacity',
> > +  'data': { 'path': 'str',
> > +            'region-id': 'uint8',
> > +            'extents': [ 'CXLDCExtentRecord' ]
> > +           }
> > +}
> > +
> > +##
> > +# @cxl-release-dynamic-capacity:
> > +#
> > +# Command to start release dynamic capacity extents flow. The host will
> > +# need to respond to indicate that it has released the capacity before it
> > +# is made unavailable for read and write and can be re-added.
> > +#
> > +# @path: CXL DCD canonical QOM path
> > +# @region-id: id of the region where the extent to add/release
> > +# @extents: Extents to release
> > +#
> > +# Since : 8.2
> > +##
> > +{ 'command': 'cxl-release-dynamic-capacity',
> > +  'data': { 'path': 'str',
> > +            'region-id': 'uint8',
> > +            'extents': [ 'CXLDCExtentRecord' ]
> > +           }
> > +}
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 9/9] hw/mem/cxl_type3: Add dpa range validation for accesses to dc regions
  2024-01-24 16:58   ` Jonathan Cameron
@ 2024-02-09 19:04     ` fan
  2024-02-13  9:31       ` Jonathan Cameron
  0 siblings, 1 reply; 37+ messages in thread
From: fan @ 2024-02-09 19:04 UTC (permalink / raw
  To: Jonathan Cameron
  Cc: nifan.cxl, qemu-devel, linux-cxl, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, nifan, jim.harris, Fan Ni

On Wed, Jan 24, 2024 at 04:58:15PM +0000, Jonathan Cameron wrote:
> On Tue,  7 Nov 2023 10:07:13 -0800
> nifan.cxl@gmail.com wrote:
> 
> > From: Fan Ni <fan.ni@samsung.com>
> > 
> > Not all dpa range in the dc regions is valid to access until an extent
> DPA ... DC etc
> 
> > covering the range has been added. Add a bitmap for each region to
> > record whether a dc block in the region has been backed by dc extent.
> > For the bitmap, a bit in the bitmap represents a dc block. When a dc
> > extent is added, all the bits of the blocks in the extent will be set,
> > which will be cleared when the extent is released.
> > 
> > Signed-off-by: Fan Ni <fan.ni@samsung.com>
> 
> Hi Fan, one query inline and a few comments.
> 
> Jonathan
> 
> > 
> > --
> > JC changes:
> > - Rebase on what will be next gitlab.com/jic23/qemu CXL staging tree.
> > - Drop unnecessary handling of failed bitmap allocations. In common with
> >   most QEMU allocations they fail hard anyway.
> > - Use previously factored out cxl_find_region() helper
> > - Minor editorial stuff in comments such as spec version references
> >   according to the standard form I'm trying to push through the code.
> > Picked up Jørgen's fix:
> > https://lore.kernel.org/qemu-devel/d0d7ca1d-81bc-19b3-4904-d60046ded844@wdc.com/T/#u
> > ---
> >  hw/cxl/cxl-mailbox-utils.c  | 31 +++++++++------
> >  hw/mem/cxl_type3.c          | 78 +++++++++++++++++++++++++++++++++++++
> >  include/hw/cxl/cxl_device.h | 15 +++++--
> >  3 files changed, 109 insertions(+), 15 deletions(-)
> > 
> > diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> > index 8e6a98753a..6be92fb5ba 100644
> > --- a/hw/cxl/cxl-mailbox-utils.c
> > +++ b/hw/cxl/cxl-mailbox-utils.c
> > @@ -1401,10 +1401,9 @@ CXLDCDRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len)
> >  }
> >  
> >  void cxl_insert_extent_to_extent_list(CXLDCDExtentList *list,
> > -                                             uint64_t dpa,
> > -                                             uint64_t len,
> > -                                             uint8_t *tag,
> > -                                             uint16_t shared_seq)
> > +                                      uint64_t dpa, uint64_t len,
> > +                                      uint8_t *tag,
> > +                                      uint16_t shared_seq)
> 
> avoid noisy whitespace changes like this.
> 
> 
> > diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> > index 43cea3d818..4ec65a751a 100644
> > --- a/hw/mem/cxl_type3.c
> > +++ b/hw/mem/cxl_type3.c
> 
> > +/*
> > + * Check whether a DPA range [dpa, dpa + len) has been backed with DC extents.
> > + * Used when validating read/write to dc regions
> > + */
> > +bool ct3_test_region_block_backed(CXLType3Dev *ct3d, uint64_t dpa,
> > +                                  uint64_t len)
> > +{
> > +    CXLDCDRegion *region;
> > +    uint64_t nbits;
> > +    long nr;
> > +
> > +    region = cxl_find_dc_region(ct3d, dpa, len);
> > +    if (!region) {
> > +        return false;
> > +    }
> > +
> > +    nr = (dpa - region->base) / region->block_size;
> > +    nbits = DIV_ROUND_UP(len, region->block_size);
> > +    return find_next_zero_bit(region->blk_bitmap, nr + nbits, nr) == nr + nbits;
> I'm not sure how this works... Is it taking a size or an end point?
> 
> Linux equivalent takes size, so I'd expect
> 
>     return find_next_zero_bit(region->blk_bitmap, nbits, nr);
> Perhaps a comment would avoid any future confusion on this.
> 

My understanding is that the size is the size of the bitmap, which is
also end of the range to check, not the length of the range to check.

The function find_next_zero_bit(bitmap, size, offset) checks the bitmap range
[offset, size) to find the next unset bit, for the above test, we want to
check range [nr, nr + nbits), so the arguments passed to the function
should be right.

In the definition of the function, whenever offset >= size, it returns size
because size is the end of the range, So if we pass nbits and nr
to the function and nr >= nbits, which can be common, meaning (dpa-region_base)
\> len, the function will always return true; that is not what we want.

To sum up, the second parameter of the function should always be the end
of the range to check, for our case, it is nr + nbits.

Fan



> > +}
> > +
> > +/*
> > + * Mark the DPA range [dpa, dap + len) to be unbacked and inaccessible. This
> > + * happens when a dc extent is return by the host.
> > + */
> > +void ct3_clear_region_block_backed(CXLType3Dev *ct3d, uint64_t dpa,
> > +                                   uint64_t len)
> > +{
> > +    CXLDCDRegion *region;
> > +    uint64_t nbits;
> > +    long nr;
> > +
> > +    region = cxl_find_dc_region(ct3d, dpa, len);
> > +    if (!region) {
> > +        return;
> > +    }
> > +
> > +    nr = (dpa - region->base) / region->block_size;
> > +    nbits = len / region->block_size;
> > +    bitmap_clear(region->blk_bitmap, nr, nbits);
> > +}
> > +
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 5/9] hw/mem/cxl_type3: Add host backend and address space handling for DC regions
  2024-02-06 22:24     ` fan
@ 2024-02-13  9:28       ` Jonathan Cameron
  0 siblings, 0 replies; 37+ messages in thread
From: Jonathan Cameron @ 2024-02-13  9:28 UTC (permalink / raw
  To: fan
  Cc: qemu-devel, linux-cxl, ira.weiny, dan.j.williams, a.manzanares,
	dave, nmtadam.samsung, nifan, jim.harris, Fan Ni


> > >      *cdat_table = g_steal_pointer(&table);
> > > @@ -445,11 +492,24 @@ static void build_dvsecs(CXLType3Dev *ct3d)
> > >              range2_size_hi = ct3d->hostpmem->size >> 32;
> > >              range2_size_lo = (2 << 5) | (2 << 2) | 0x3 |
> > >                               (ct3d->hostpmem->size & 0xF0000000);
> > > +        } else if (ct3d->dc.host_dc) {
> > > +            range2_size_hi = ct3d->dc.host_dc->size >> 32;
> > > +            range2_size_lo = (2 << 5) | (2 << 2) | 0x3 |
> > > +                             (ct3d->dc.host_dc->size & 0xF0000000);  
> > 
> > I've forgotten if we came to a conclusion on whether these should include
> > DC or not...  My gut feeling is no because we don't know what to do
> > if they are both already in use.
> >   
> 
> QUESTION:
> 
> If we do not include DC, and there is no static ram/pmem capacity and
> only dynamic capacity, then the range registers will not be set, is that
> what we want?

I think that's a valid interpretation of the specification.
So for now go with that.

p.s. Sorry for slow response.  Debugging had me distracted from catching
up with the list.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 8/9] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
  2024-02-08 19:17     ` fan
@ 2024-02-13  9:29       ` Jonathan Cameron
  0 siblings, 0 replies; 37+ messages in thread
From: Jonathan Cameron @ 2024-02-13  9:29 UTC (permalink / raw
  To: fan
  Cc: qemu-devel, linux-cxl, ira.weiny, dan.j.williams, a.manzanares,
	dave, nmtadam.samsung, nifan, jim.harris, Fan Ni


> > >  #endif
> > > diff --git a/include/hw/cxl/cxl_events.h b/include/hw/cxl/cxl_events.h
> > > index d778487b7e..4f8cb3215d 100644
> > > --- a/include/hw/cxl/cxl_events.h
> > > +++ b/include/hw/cxl/cxl_events.h
> > > @@ -166,4 +166,19 @@ typedef struct CXLEventMemoryModule {
> > >      uint8_t reserved[0x3d];
> > >  } QEMU_PACKED CXLEventMemoryModule;
> > >  
> > > +/*
> > > + * CXL r3.0 section Table 8-47: Dynamic Capacity Event Record
> > > + * All fields little endian.
> > > + */
> > > +typedef struct CXLEventDynamicCapacity {
> > > +    CXLEventRecordHdr hdr;
> > > +    uint8_t type;
> > > +    uint8_t reserved1;
> > > +    uint16_t host_id;
> > > +    uint8_t updated_region_id;
> > > +    uint8_t reserved2[3];
> > > +    uint8_t dynamic_capacity_extent[0x28]; /* defined in cxl_device.h */  
> > 
> > Can't we use that definition here?  
> 
> REPLY: 
> 
> I leave it as it is to avoid include cxl_device.h to cxl_extent.h.
> 
> Do you think we need to include the file and use the definition here?

I don't feel strongly either way.

Jonathan

> 
> Fan

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 9/9] hw/mem/cxl_type3: Add dpa range validation for accesses to dc regions
  2024-02-09 19:04     ` fan
@ 2024-02-13  9:31       ` Jonathan Cameron
  0 siblings, 0 replies; 37+ messages in thread
From: Jonathan Cameron @ 2024-02-13  9:31 UTC (permalink / raw
  To: fan
  Cc: qemu-devel, linux-cxl, ira.weiny, dan.j.williams, a.manzanares,
	dave, nmtadam.samsung, nifan, jim.harris, Fan Ni


> > > diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> > > index 43cea3d818..4ec65a751a 100644
> > > --- a/hw/mem/cxl_type3.c
> > > +++ b/hw/mem/cxl_type3.c  
> >   
> > > +/*
> > > + * Check whether a DPA range [dpa, dpa + len) has been backed with DC extents.
> > > + * Used when validating read/write to dc regions
> > > + */
> > > +bool ct3_test_region_block_backed(CXLType3Dev *ct3d, uint64_t dpa,
> > > +                                  uint64_t len)
> > > +{
> > > +    CXLDCDRegion *region;
> > > +    uint64_t nbits;
> > > +    long nr;
> > > +
> > > +    region = cxl_find_dc_region(ct3d, dpa, len);
> > > +    if (!region) {
> > > +        return false;
> > > +    }
> > > +
> > > +    nr = (dpa - region->base) / region->block_size;
> > > +    nbits = DIV_ROUND_UP(len, region->block_size);
> > > +    return find_next_zero_bit(region->blk_bitmap, nr + nbits, nr) == nr + nbits;  
> > I'm not sure how this works... Is it taking a size or an end point?
> > 
> > Linux equivalent takes size, so I'd expect
> > 
> >     return find_next_zero_bit(region->blk_bitmap, nbits, nr);
> > Perhaps a comment would avoid any future confusion on this.
> >   
> 
> My understanding is that the size is the size of the bitmap, which is
> also end of the range to check, not the length of the range to check.
> 
> The function find_next_zero_bit(bitmap, size, offset) checks the bitmap range
> [offset, size) to find the next unset bit, for the above test, we want to
> check range [nr, nr + nbits), so the arguments passed to the function
> should be right.
> 
> In the definition of the function, whenever offset >= size, it returns size
> because size is the end of the range, So if we pass nbits and nr
> to the function and nr >= nbits, which can be common, meaning (dpa-region_base)
> \> len, the function will always return true; that is not what we want.  
> 
> To sum up, the second parameter of the function should always be the end
> of the range to check, for our case, it is nr + nbits.
Ok. Thanks for the explanation. That sounds good to me

Jonathan

> 
> Fan

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 8/9] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
  2023-11-07 18:07 ` [PATCH v3 8/9] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents nifan.cxl
  2024-01-24 16:50   ` Jonathan Cameron
@ 2024-02-13 17:44   ` Jonathan Cameron
  2024-02-13 18:21     ` fan
  1 sibling, 1 reply; 37+ messages in thread
From: Jonathan Cameron @ 2024-02-13 17:44 UTC (permalink / raw
  To: nifan.cxl
  Cc: qemu-devel, linux-cxl, ira.weiny, dan.j.williams, a.manzanares,
	dave, nmtadam.samsung, nifan, jim.harris, Fan Ni

> diff --git a/qapi/cxl.json b/qapi/cxl.json
> index 8cc4c72fa9..6b631f64f1 100644
> --- a/qapi/cxl.json
> +++ b/qapi/cxl.json
> @@ -25,7 +25,8 @@
>    'data': ['informational',
>             'warning',
>             'failure',
> -           'fatal']
> +           'fatal',
> +           'dyncap']
>   }
Needs a docs update.

Upstream QEMU seems to have gained some stricter checks
so this just broke my build after a rebase.

Jonathan

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/9] Enabling DCD emulation support in Qemu
  2023-11-07 18:07 [PATCH v3 0/9] Enabling DCD emulation support in Qemu nifan.cxl
                   ` (9 preceding siblings ...)
  2023-11-17  0:09 ` [PATCH v3 0/9] Enabling DCD emulation support in Qemu Ira Weiny
@ 2024-02-13 18:18 ` fan
  2024-02-19 16:18   ` Jonathan Cameron
  10 siblings, 1 reply; 37+ messages in thread
From: fan @ 2024-02-13 18:18 UTC (permalink / raw
  To: nifan.cxl
  Cc: qemu-devel, jonathan.cameron, linux-cxl, ira.weiny,
	dan.j.williams, a.manzanares, dave, nmtadam.samsung, nifan,
	jim.harris

On Tue, Nov 07, 2023 at 10:07:04AM -0800, nifan.cxl@gmail.com wrote:
> From: Fan Ni <nifan.cxl@gmail.com>
> 
> 
> The patch series are based on Jonathan's branch cxl-2023-09-26.
> 
> The main changes include,
> 1. Update cxl_find_dc_region to detect the case the range of the extent cross
>     multiple DC regions.
> 2. Add comments to explain the checks performed in function
>     cxl_detect_malformed_extent_list. (Jonathan)
> 3. Minimize the checks in cmd_dcd_add_dyn_cap_rsp.(Jonathan)
> 4. Update total_extent_count in add/release dynamic capacity response function.
>     (Ira and Jorgen Hansen).
> 5. Fix the logic issue in test_bits and renamed it to
>     test_any_bits_set to clear its function.
> 6. Add pending extent list for dc extent add event.
> 7. When add extent response is received, use the pending-to-add list to
>     verify the extents are valid.
> 8. Add test_any_bits_set and cxl_insert_extent_to_extent_list declaration to
>     cxl_device.h so it can be used in different files.
> 9. Updated ct3d_qmp_cxl_event_log_enc to include dynamic capacity event
>     log type.
> 10. Extract the functionality to delete extent from extent list to a helper
>     function.
> 11. Move the update of the bitmap which reflects which blocks are backed with
> dc extents from the moment when a dc extent is offered to the moment when it
> is accepted from the host.
> 12. Free dc_name after calling address_space_init to avoid memory leak when
>     returning early. (Nathan)
> 13. Add code to detect and reject QMP requests without any extents. (Jonathan)
> 14. Add code to detect and reject QMP requests where the extent len is 0.
> 15. Change the QMP interface and move the region-id out of extents and now
>     each command only takes care of extent add/release request in a single
>     region. (Jonathan)
> 16. Change the region bitmap length from decode_len to len.
> 17. Rename "dpa" to "offset" in the add/release dc extent qmp interface.
>     (Jonathan)
> 18. Block any dc extent release command if the exact extent is not already in
>     the extent list of the device.
> 
> The code is tested together with Ira's kernel DCD support:
> https://github.com/weiny2/linux-kernel/tree/dcd-v3-2023-10-30
> 
> Cover letter from v2 is here:
> https://lore.kernel.org/linux-cxl/20230724162313.34196-1-fan.ni@samsung.com/T/#m63039621087023691c9749a0af1212deb5549ddf
> 
> Last version (v2) is here:
> https://lore.kernel.org/linux-cxl/20230725183939.2741025-1-fan.ni@samsung.com/
> 
> More DCD related discussions are here:
> https://lore.kernel.org/linux-cxl/650cc29ab3f64_50d07294e7@iweiny-mobl.notmuch/
> 
> 
> 
> Fan Ni (9):
>   hw/cxl/cxl-mailbox-utils: Add dc_event_log_size field to output
>     payload of identify memory device command
>   hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative
>     and mailbox command support
>   include/hw/cxl/cxl_device: Rename mem_size as static_mem_size for
>     type3 memory devices
>   hw/mem/cxl_type3: Add support to create DC regions to type3 memory
>     devices
>   hw/mem/cxl_type3: Add host backend and address space handling for DC
>     regions
>   hw/mem/cxl_type3: Add DC extent list representative and get DC extent
>     list mailbox support
>   hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release
>     dynamic capacity response
>   hw/cxl/events: Add qmp interfaces to add/release dynamic capacity
>     extents
>   hw/mem/cxl_type3: Add dpa range validation for accesses to dc regions
> 
>  hw/cxl/cxl-mailbox-utils.c  | 469 +++++++++++++++++++++++++++++-
>  hw/mem/cxl_type3.c          | 548 +++++++++++++++++++++++++++++++++---
>  hw/mem/cxl_type3_stubs.c    |  14 +
>  include/hw/cxl/cxl_device.h |  64 ++++-
>  include/hw/cxl/cxl_events.h |  15 +
>  qapi/cxl.json               |  60 +++-
>  6 files changed, 1123 insertions(+), 47 deletions(-)
> 
> -- 
> 2.42.0
> 

Hi Jonathan,

I have updated the patch set based on your feedback and aligned the code
to cxl spec r3.1.

Here is the new code:
https://github.com/moking/qemu/tree/dcd-v4

I plan to send it out for review early next week to see if there is any kernel
side update for dcd this week so I can test more.

If the plan needs to be adjusted to align with the merge window, please
let me know.

v3[1]->v4: 

The code is rebased on mainstream QEMU with the following patch series:

[PATCH 00/12 qemu] CXL emulation fixes and minor cleanup.
[PATCH 0/5 qemu] hw/cxl: Update CXL emulation to reflect and reference r3.1
hw/cxl/mailbox: change CCI cmd set structure to be a member, not a reference
hw/cxl/mailbox: interface to add CCI commands to an existing CCI

Main changes include:

1. Updated the specification references to align with cxl spec r3.1.
2. Add extra elements to get dc region configuration output payload and
procecced accordingly in mailbox command 4800h.
3. Removed the unwanted space.
4. Refactored ct3_build_cdat_entries_for_mr and extract it as a separate patch.
5. Updated cxl_create_dc_regions function to derive region len from host
backend size.
6. Changed the logic for creating DC regions when host backend and address
space processing is introduced, now cxl_create_dc_regions is called only
when host backend exists.
7. Updated the name of the definitions related to DC extents for consistency.
7. Updated dynamic capacity event record definition to align with spec r3.1.
9. Changed the dynamic capacity request process logic, for release request,
extra checks are done against the pending list to remove the extent yet added.
10. Changed the return value of cxl_create_dc_regions so the return can be used
to remove the extent for the list if needed.
11. Offset and size in the qmp interface are changed to be byte-wise while the
original is MiB-wise.
12. Fixed bugs in handling bitmap for dpa range existence.
13. NOTE: in previous version DC is set to non-volatile, while in this version
we change it to volatile per Jonathan's suggestion.
14. Updated the doc in qapi/cxl.json.

Thank Jonathan for the detailed review of the last version[1].

The code is tested with Ira's last kernel DCD patch set [2] with some minor
bug fixes[3]. Tested operations include:
1. create DC region;
2. Add/release DC extents;
3. convert DC capacity into system RAM;


v3: 
[1] https://lore.kernel.org/linux-cxl/20231107180907.553451-1-nifan.cxl@gmail.com/T/#t
[2] https://github.com/weiny2/linux-kernel/tree/dcd-v3-2023-10-30
[3] https://github.com/moking/linux-dcd/commit/9d24fa6e5d39f934623220953caecc080f93e964

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 8/9] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
  2024-02-13 17:44   ` Jonathan Cameron
@ 2024-02-13 18:21     ` fan
  0 siblings, 0 replies; 37+ messages in thread
From: fan @ 2024-02-13 18:21 UTC (permalink / raw
  To: Jonathan Cameron
  Cc: nifan.cxl, qemu-devel, linux-cxl, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, nifan, jim.harris, Fan Ni

On Tue, Feb 13, 2024 at 05:44:05PM +0000, Jonathan Cameron wrote:
> > diff --git a/qapi/cxl.json b/qapi/cxl.json
> > index 8cc4c72fa9..6b631f64f1 100644
> > --- a/qapi/cxl.json
> > +++ b/qapi/cxl.json
> > @@ -25,7 +25,8 @@
> >    'data': ['informational',
> >             'warning',
> >             'failure',
> > -           'fatal']
> > +           'fatal',
> > +           'dyncap']
> >   }
> Needs a docs update.
> 
> Upstream QEMU seems to have gained some stricter checks
> so this just broke my build after a rebase.
> 
> Jonathan

Thanks. Updated.

FYI. The new version is completed.
https://lore.kernel.org/linux-cxl/ZcuyZ0Nwq31z8YIr@debian/T/#m07b4b4586e2f421a617f08a002b196d932a88966

Thanks,
Fan

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/9] Enabling DCD emulation support in Qemu
  2024-02-13 18:18 ` fan
@ 2024-02-19 16:18   ` Jonathan Cameron
  0 siblings, 0 replies; 37+ messages in thread
From: Jonathan Cameron @ 2024-02-19 16:18 UTC (permalink / raw
  To: fan
  Cc: qemu-devel, linux-cxl, ira.weiny, dan.j.williams, a.manzanares,
	dave, nmtadam.samsung, nifan, jim.harris

On Tue, 13 Feb 2024 10:18:15 -0800
fan <nifan.cxl@gmail.com> wrote:

> On Tue, Nov 07, 2023 at 10:07:04AM -0800, nifan.cxl@gmail.com wrote:
> > From: Fan Ni <nifan.cxl@gmail.com>
> > 
> > 
> > The patch series are based on Jonathan's branch cxl-2023-09-26.
> > 
> > The main changes include,
> > 1. Update cxl_find_dc_region to detect the case the range of the extent cross
> >     multiple DC regions.
> > 2. Add comments to explain the checks performed in function
> >     cxl_detect_malformed_extent_list. (Jonathan)
> > 3. Minimize the checks in cmd_dcd_add_dyn_cap_rsp.(Jonathan)
> > 4. Update total_extent_count in add/release dynamic capacity response function.
> >     (Ira and Jorgen Hansen).
> > 5. Fix the logic issue in test_bits and renamed it to
> >     test_any_bits_set to clear its function.
> > 6. Add pending extent list for dc extent add event.
> > 7. When add extent response is received, use the pending-to-add list to
> >     verify the extents are valid.
> > 8. Add test_any_bits_set and cxl_insert_extent_to_extent_list declaration to
> >     cxl_device.h so it can be used in different files.
> > 9. Updated ct3d_qmp_cxl_event_log_enc to include dynamic capacity event
> >     log type.
> > 10. Extract the functionality to delete extent from extent list to a helper
> >     function.
> > 11. Move the update of the bitmap which reflects which blocks are backed with
> > dc extents from the moment when a dc extent is offered to the moment when it
> > is accepted from the host.
> > 12. Free dc_name after calling address_space_init to avoid memory leak when
> >     returning early. (Nathan)
> > 13. Add code to detect and reject QMP requests without any extents. (Jonathan)
> > 14. Add code to detect and reject QMP requests where the extent len is 0.
> > 15. Change the QMP interface and move the region-id out of extents and now
> >     each command only takes care of extent add/release request in a single
> >     region. (Jonathan)
> > 16. Change the region bitmap length from decode_len to len.
> > 17. Rename "dpa" to "offset" in the add/release dc extent qmp interface.
> >     (Jonathan)
> > 18. Block any dc extent release command if the exact extent is not already in
> >     the extent list of the device.
> > 
> > The code is tested together with Ira's kernel DCD support:
> > https://github.com/weiny2/linux-kernel/tree/dcd-v3-2023-10-30
> > 
> > Cover letter from v2 is here:
> > https://lore.kernel.org/linux-cxl/20230724162313.34196-1-fan.ni@samsung.com/T/#m63039621087023691c9749a0af1212deb5549ddf
> > 
> > Last version (v2) is here:
> > https://lore.kernel.org/linux-cxl/20230725183939.2741025-1-fan.ni@samsung.com/
> > 
> > More DCD related discussions are here:
> > https://lore.kernel.org/linux-cxl/650cc29ab3f64_50d07294e7@iweiny-mobl.notmuch/
> > 
> > 
> > 
> > Fan Ni (9):
> >   hw/cxl/cxl-mailbox-utils: Add dc_event_log_size field to output
> >     payload of identify memory device command
> >   hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative
> >     and mailbox command support
> >   include/hw/cxl/cxl_device: Rename mem_size as static_mem_size for
> >     type3 memory devices
> >   hw/mem/cxl_type3: Add support to create DC regions to type3 memory
> >     devices
> >   hw/mem/cxl_type3: Add host backend and address space handling for DC
> >     regions
> >   hw/mem/cxl_type3: Add DC extent list representative and get DC extent
> >     list mailbox support
> >   hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release
> >     dynamic capacity response
> >   hw/cxl/events: Add qmp interfaces to add/release dynamic capacity
> >     extents
> >   hw/mem/cxl_type3: Add dpa range validation for accesses to dc regions
> > 
> >  hw/cxl/cxl-mailbox-utils.c  | 469 +++++++++++++++++++++++++++++-
> >  hw/mem/cxl_type3.c          | 548 +++++++++++++++++++++++++++++++++---
> >  hw/mem/cxl_type3_stubs.c    |  14 +
> >  include/hw/cxl/cxl_device.h |  64 ++++-
> >  include/hw/cxl/cxl_events.h |  15 +
> >  qapi/cxl.json               |  60 +++-
> >  6 files changed, 1123 insertions(+), 47 deletions(-)
> > 
> > -- 
> > 2.42.0
> >   
> 
> Hi Jonathan,
> 
> I have updated the patch set based on your feedback and aligned the code
> to cxl spec r3.1.
> 
> Here is the new code:
> https://github.com/moking/qemu/tree/dcd-v4
> 
> I plan to send it out for review early next week to see if there is any kernel
> side update for dcd this week so I can test more.

Excellent!

> 
> If the plan needs to be adjusted to align with the merge window, please
> let me know.

I'm focused on the TCG and physmem fixes right now, but would like to do
a detailed review of your new version later this week.

We have a few more weeks - probably want a final version to be on list by
end of this month - so there is a bit of time before the soft feature freeze
on the 12th March https://wiki.qemu.org/Planning/9.0

> 
> v3[1]->v4: 
> 
> The code is rebased on mainstream QEMU with the following patch series:
> 
> [PATCH 00/12 qemu] CXL emulation fixes and minor cleanup.
> [PATCH 0/5 qemu] hw/cxl: Update CXL emulation to reflect and reference r3.1
Those 2 series our now upstream :)

> hw/cxl/mailbox: change CCI cmd set structure to be a member, not a reference
> hw/cxl/mailbox: interface to add CCI commands to an existing CCI
> 
> Main changes include:
> 
> 1. Updated the specification references to align with cxl spec r3.1.
> 2. Add extra elements to get dc region configuration output payload and
> procecced accordingly in mailbox command 4800h.
> 3. Removed the unwanted space.
> 4. Refactored ct3_build_cdat_entries_for_mr and extract it as a separate patch.
> 5. Updated cxl_create_dc_regions function to derive region len from host
> backend size.
> 6. Changed the logic for creating DC regions when host backend and address
> space processing is introduced, now cxl_create_dc_regions is called only
> when host backend exists.
> 7. Updated the name of the definitions related to DC extents for consistency.
> 7. Updated dynamic capacity event record definition to align with spec r3.1.
> 9. Changed the dynamic capacity request process logic, for release request,
> extra checks are done against the pending list to remove the extent yet added.
> 10. Changed the return value of cxl_create_dc_regions so the return can be used
> to remove the extent for the list if needed.
> 11. Offset and size in the qmp interface are changed to be byte-wise while the
> original is MiB-wise.
> 12. Fixed bugs in handling bitmap for dpa range existence.
> 13. NOTE: in previous version DC is set to non-volatile, while in this version
> we change it to volatile per Jonathan's suggestion.
> 14. Updated the doc in qapi/cxl.json.

All sound good. I'll not attempt to review in the git tree  - I've gotten far
too used to email flows for review but I should be able to get on it fairly
quickly once posted. 

> 
> Thank Jonathan for the detailed review of the last version[1].
> 
> The code is tested with Ira's last kernel DCD patch set [2] with some minor
> bug fixes[3]. Tested operations include:
> 1. create DC region;
> 2. Add/release DC extents;
> 3. convert DC capacity into system RAM;

I guess that will hit the TCG bugs/missing features if we end up with page tables
in it.  Should have same problems as for non DC regions.

Review feedback has been helpful on the TCG changes so they should be in 9.0 I think.
Will go via different paths to the CXL support however so no idea when they'll be
in relative to DC support.

Thanks,

Jonathan


> 
> 
> v3: 
> [1] https://lore.kernel.org/linux-cxl/20231107180907.553451-1-nifan.cxl@gmail.com/T/#t
> [2] https://github.com/weiny2/linux-kernel/tree/dcd-v3-2023-10-30
> [3] https://github.com/moking/linux-dcd/commit/9d24fa6e5d39f934623220953caecc080f93e964


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 6/9] hw/mem/cxl_type3: Add DC extent list representative and get DC extent list mailbox support
  2023-11-07 18:07 ` [PATCH v3 6/9] hw/mem/cxl_type3: Add DC extent list representative and get DC extent list mailbox support nifan.cxl
  2024-01-24 15:56   ` Jonathan Cameron
@ 2024-02-23  7:10   ` Wonjae Lee
  1 sibling, 0 replies; 37+ messages in thread
From: Wonjae Lee @ 2024-02-23  7:10 UTC (permalink / raw
  To: nifan.cxl, qemu-devel
  Cc: jonathan.cameron, linux-cxl, ira.weiny, dan.j.williams,
	a.manzanares, dave, nmtadam.samsung, nifan, jim.harris, Fan Ni

On 2023-11-08 오전 3:07, nifan.cxl@gmail.com wrote:
> From: Fan Ni <fan.ni@samsung.com>
> 
> Add dynamic capacity extent list representative to the definition of
> CXLType3Dev and add get DC extent list mailbox command per
> CXL.spec.3.0:.8.2.9.8.9.2.
> 
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
> ---
>   hw/cxl/cxl-mailbox-utils.c  | 73 +++++++++++++++++++++++++++++++++++++
>   hw/mem/cxl_type3.c          |  1 +
>   include/hw/cxl/cxl_device.h | 23 ++++++++++++
>   3 files changed, 97 insertions(+)
> 
> diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> index 1f512b3e6b..56f4aa237a 100644
> --- a/hw/cxl/cxl-mailbox-utils.c
> +++ b/hw/cxl/cxl-mailbox-utils.c
> @@ -82,6 +82,7 @@ enum {
>           #define CLEAR_POISON           0x2
>       DCD_CONFIG  = 0x48,
>           #define GET_DC_CONFIG          0x0
> +        #define GET_DYN_CAP_EXT_LIST   0x1
>       PHYSICAL_SWITCH = 0x51,
>           #define IDENTIFY_SWITCH_DEVICE      0x0
>           #define GET_PHYSICAL_PORT_STATE     0x1
> @@ -1286,6 +1287,75 @@ static CXLRetCode cmd_dcd_get_dyn_cap_config(const struct cxl_cmd *cmd,
>       return CXL_MBOX_SUCCESS;
>   }
>   
> +/*
> + * CXL r3.0 section 8.2.9.8.9.2:
> + * Get Dynamic Capacity Extent List (Opcode 4810h)
> + */
> +static CXLRetCode cmd_dcd_get_dyn_cap_ext_list(const struct cxl_cmd *cmd,
> +                                               uint8_t *payload_in,
> +                                               size_t len_in,
> +                                               uint8_t *payload_out,
> +                                               size_t *len_out,
> +                                               CXLCCI *cci)
> +{
> +    CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
> +    struct get_dyn_cap_ext_list_in_pl {
> +        uint32_t extent_cnt;
> +        uint32_t start_extent_id;
> +    } QEMU_PACKED;
> +
> +    struct get_dyn_cap_ext_list_out_pl {
> +        uint32_t count;
> +        uint32_t total_extents;
> +        uint32_t generation_num;
> +        uint8_t rsvd[4];
> +        CXLDCExtentRaw records[];
> +    } QEMU_PACKED;
> +
> +    struct get_dyn_cap_ext_list_in_pl *in = (void *)payload_in;
> +    struct get_dyn_cap_ext_list_out_pl *out = (void *)payload_out;
> +    uint16_t record_count = 0, i = 0, record_done = 0;
> +    CXLDCDExtentList *extent_list = &ct3d->dc.extents;
> +    CXLDCDExtent *ent;
> +    uint16_t out_pl_len;
> +    uint32_t start_extent_id = in->start_extent_id;
> +
> +    if (start_extent_id > ct3d->dc.total_extent_count) {

Hello,

Shouldn't it be >= rather than >?

Thanks,
Wonjae

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2024-02-23  7:10 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-07 18:07 [PATCH v3 0/9] Enabling DCD emulation support in Qemu nifan.cxl
2023-11-07 18:07 ` [PATCH v3 1/9] hw/cxl/cxl-mailbox-utils: Add dc_event_log_size field to output payload of identify memory device command nifan.cxl
2023-11-07 18:07 ` [PATCH v3 2/9] hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative and mailbox command support nifan.cxl
2024-01-24 14:51   ` Jonathan Cameron
2024-01-29 17:32     ` fan
2024-01-30  9:44       ` Jonathan Cameron
2024-02-01 19:58     ` fan
2024-02-02 11:52       ` Jonathan Cameron
2024-01-24 15:48   ` Jonathan Cameron
2023-11-07 18:07 ` [PATCH v3 3/9] include/hw/cxl/cxl_device: Rename mem_size as static_mem_size for type3 memory devices nifan.cxl
2024-01-24 14:54   ` Jonathan Cameron
2023-11-07 18:07 ` [PATCH v3 4/9] hw/mem/cxl_type3: Add support to create DC regions to " nifan.cxl
2024-01-24 15:23   ` Jonathan Cameron
2024-01-26 13:00     ` Jonathan Cameron
2023-11-07 18:07 ` [PATCH v3 5/9] hw/mem/cxl_type3: Add host backend and address space handling for DC regions nifan.cxl
2024-01-24 15:47   ` Jonathan Cameron
2024-02-06 22:24     ` fan
2024-02-13  9:28       ` Jonathan Cameron
2023-11-07 18:07 ` [PATCH v3 6/9] hw/mem/cxl_type3: Add DC extent list representative and get DC extent list mailbox support nifan.cxl
2024-01-24 15:56   ` Jonathan Cameron
2024-02-23  7:10   ` Wonjae Lee
2023-11-07 18:07 ` [PATCH v3 7/9] hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release dynamic capacity response nifan.cxl
2024-01-24 16:23   ` Jonathan Cameron
2023-11-07 18:07 ` [PATCH v3 8/9] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents nifan.cxl
2024-01-24 16:50   ` Jonathan Cameron
2024-02-08 19:17     ` fan
2024-02-13  9:29       ` Jonathan Cameron
2024-02-13 17:44   ` Jonathan Cameron
2024-02-13 18:21     ` fan
2023-11-07 18:07 ` [PATCH v3 9/9] hw/mem/cxl_type3: Add dpa range validation for accesses to dc regions nifan.cxl
2024-01-24 16:58   ` Jonathan Cameron
2024-02-09 19:04     ` fan
2024-02-13  9:31       ` Jonathan Cameron
2023-11-17  0:09 ` [PATCH v3 0/9] Enabling DCD emulation support in Qemu Ira Weiny
2024-01-26 15:21   ` Jonathan Cameron
2024-02-13 18:18 ` fan
2024-02-19 16:18   ` Jonathan Cameron

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).