* [PATCH v6 01/14] iommufd/fault: Move two fault functions out of the header
2025-01-25 0:30 [PATCH v6 00/14] iommufd: Add vIOMMU infrastructure (Part-3: vEVENTQ) Nicolin Chen
@ 2025-01-25 0:30 ` Nicolin Chen
2025-02-14 20:15 ` Jason Gunthorpe
2025-02-18 5:05 ` Tian, Kevin
2025-01-25 0:30 ` [PATCH v6 02/14] iommufd/fault: Add an iommufd_fault_init() helper Nicolin Chen
` (13 subsequent siblings)
14 siblings, 2 replies; 60+ messages in thread
From: Nicolin Chen @ 2025-01-25 0:30 UTC (permalink / raw)
To: jgg, kevin.tian, corbet, will
Cc: joro, suravee.suthikulpanit, robin.murphy, dwmw2, baolu.lu, shuah,
linux-kernel, iommu, linux-arm-kernel, linux-kselftest, linux-doc,
eric.auger, jean-philippe, mdf, mshavit,
shameerali.kolothum.thodi, smostafa, ddutile, yi.l.liu, patches
There is no need to keep them in the header. The vEVENTQ version of these
two functions will turn out to be a different implementation and will not
share with this fault version. Thus, move them out of the header.
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
drivers/iommu/iommufd/iommufd_private.h | 25 -------------------------
drivers/iommu/iommufd/fault.c | 25 +++++++++++++++++++++++++
2 files changed, 25 insertions(+), 25 deletions(-)
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 0b1bafc7fd99..034df9b256f4 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -451,31 +451,6 @@ struct iommufd_fault {
struct wait_queue_head wait_queue;
};
-/* Fetch the first node out of the fault->deliver list */
-static inline struct iopf_group *
-iommufd_fault_deliver_fetch(struct iommufd_fault *fault)
-{
- struct list_head *list = &fault->deliver;
- struct iopf_group *group = NULL;
-
- spin_lock(&fault->lock);
- if (!list_empty(list)) {
- group = list_first_entry(list, struct iopf_group, node);
- list_del(&group->node);
- }
- spin_unlock(&fault->lock);
- return group;
-}
-
-/* Restore a node back to the head of the fault->deliver list */
-static inline void iommufd_fault_deliver_restore(struct iommufd_fault *fault,
- struct iopf_group *group)
-{
- spin_lock(&fault->lock);
- list_add(&group->node, &fault->deliver);
- spin_unlock(&fault->lock);
-}
-
struct iommufd_attach_handle {
struct iommu_attach_handle handle;
struct iommufd_device *idev;
diff --git a/drivers/iommu/iommufd/fault.c b/drivers/iommu/iommufd/fault.c
index d9a937450e55..e89d27bb9548 100644
--- a/drivers/iommu/iommufd/fault.c
+++ b/drivers/iommu/iommufd/fault.c
@@ -258,6 +258,31 @@ static void iommufd_compose_fault_message(struct iommu_fault *fault,
hwpt_fault->cookie = cookie;
}
+/* Fetch the first node out of the fault->deliver list */
+static struct iopf_group *
+iommufd_fault_deliver_fetch(struct iommufd_fault *fault)
+{
+ struct list_head *list = &fault->deliver;
+ struct iopf_group *group = NULL;
+
+ spin_lock(&fault->lock);
+ if (!list_empty(list)) {
+ group = list_first_entry(list, struct iopf_group, node);
+ list_del(&group->node);
+ }
+ spin_unlock(&fault->lock);
+ return group;
+}
+
+/* Restore a node back to the head of the fault->deliver list */
+static void iommufd_fault_deliver_restore(struct iommufd_fault *fault,
+ struct iopf_group *group)
+{
+ spin_lock(&fault->lock);
+ list_add(&group->node, &fault->deliver);
+ spin_unlock(&fault->lock);
+}
+
static ssize_t iommufd_fault_fops_read(struct file *filep, char __user *buf,
size_t count, loff_t *ppos)
{
--
2.43.0
^ permalink raw reply related [flat|nested] 60+ messages in thread
* Re: [PATCH v6 01/14] iommufd/fault: Move two fault functions out of the header
2025-01-25 0:30 ` [PATCH v6 01/14] iommufd/fault: Move two fault functions out of the header Nicolin Chen
@ 2025-02-14 20:15 ` Jason Gunthorpe
2025-02-18 5:05 ` Tian, Kevin
1 sibling, 0 replies; 60+ messages in thread
From: Jason Gunthorpe @ 2025-02-14 20:15 UTC (permalink / raw)
To: Nicolin Chen
Cc: kevin.tian, corbet, will, joro, suravee.suthikulpanit,
robin.murphy, dwmw2, baolu.lu, shuah, linux-kernel, iommu,
linux-arm-kernel, linux-kselftest, linux-doc, eric.auger,
jean-philippe, mdf, mshavit, shameerali.kolothum.thodi, smostafa,
ddutile, yi.l.liu, patches
On Fri, Jan 24, 2025 at 04:30:30PM -0800, Nicolin Chen wrote:
> There is no need to keep them in the header. The vEVENTQ version of these
> two functions will turn out to be a different implementation and will not
> share with this fault version. Thus, move them out of the header.
>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> ---
> drivers/iommu/iommufd/iommufd_private.h | 25 -------------------------
> drivers/iommu/iommufd/fault.c | 25 +++++++++++++++++++++++++
> 2 files changed, 25 insertions(+), 25 deletions(-)
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Jason
^ permalink raw reply [flat|nested] 60+ messages in thread
* RE: [PATCH v6 01/14] iommufd/fault: Move two fault functions out of the header
2025-01-25 0:30 ` [PATCH v6 01/14] iommufd/fault: Move two fault functions out of the header Nicolin Chen
2025-02-14 20:15 ` Jason Gunthorpe
@ 2025-02-18 5:05 ` Tian, Kevin
1 sibling, 0 replies; 60+ messages in thread
From: Tian, Kevin @ 2025-02-18 5:05 UTC (permalink / raw)
To: Nicolin Chen, jgg@nvidia.com, corbet@lwn.net, will@kernel.org
Cc: joro@8bytes.org, suravee.suthikulpanit@amd.com,
robin.murphy@arm.com, dwmw2@infradead.org,
baolu.lu@linux.intel.com, shuah@kernel.org,
linux-kernel@vger.kernel.org, iommu@lists.linux.dev,
linux-arm-kernel@lists.infradead.org,
linux-kselftest@vger.kernel.org, linux-doc@vger.kernel.org,
eric.auger@redhat.com, jean-philippe@linaro.org, mdf@kernel.org,
mshavit@google.com, shameerali.kolothum.thodi@huawei.com,
smostafa@google.com, ddutile@redhat.com, Liu, Yi L,
patches@lists.linux.dev
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Saturday, January 25, 2025 8:31 AM
>
> There is no need to keep them in the header. The vEVENTQ version of these
> two functions will turn out to be a different implementation and will not
> share with this fault version. Thus, move them out of the header.
>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
^ permalink raw reply [flat|nested] 60+ messages in thread
* [PATCH v6 02/14] iommufd/fault: Add an iommufd_fault_init() helper
2025-01-25 0:30 [PATCH v6 00/14] iommufd: Add vIOMMU infrastructure (Part-3: vEVENTQ) Nicolin Chen
2025-01-25 0:30 ` [PATCH v6 01/14] iommufd/fault: Move two fault functions out of the header Nicolin Chen
@ 2025-01-25 0:30 ` Nicolin Chen
2025-01-25 0:30 ` [PATCH v6 03/14] iommufd: Abstract an iommufd_eventq from iommufd_fault Nicolin Chen
` (12 subsequent siblings)
14 siblings, 0 replies; 60+ messages in thread
From: Nicolin Chen @ 2025-01-25 0:30 UTC (permalink / raw)
To: jgg, kevin.tian, corbet, will
Cc: joro, suravee.suthikulpanit, robin.murphy, dwmw2, baolu.lu, shuah,
linux-kernel, iommu, linux-arm-kernel, linux-kselftest, linux-doc,
eric.auger, jean-philippe, mdf, mshavit,
shameerali.kolothum.thodi, smostafa, ddutile, yi.l.liu, patches
The infrastructure of a fault object will be shared with a new vEVENTQ
object in a following change. Add an iommufd_fault_init helper and an
INIT_EVENTQ_FOPS marco for a vEVENTQ allocator to use too.
Reorder the iommufd_ctx_get and refcount_inc, to keep them symmetrical
with the iommufd_fault_fops_release().
Since the new vEVENTQ doesn't need "response" and its "mutex", so keep
the xa_init_flags and mutex_init in their original locations.
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
drivers/iommu/iommufd/fault.c | 70 +++++++++++++++++++++--------------
1 file changed, 42 insertions(+), 28 deletions(-)
diff --git a/drivers/iommu/iommufd/fault.c b/drivers/iommu/iommufd/fault.c
index e89d27bb9548..08d940204169 100644
--- a/drivers/iommu/iommufd/fault.c
+++ b/drivers/iommu/iommufd/fault.c
@@ -400,20 +400,49 @@ static int iommufd_fault_fops_release(struct inode *inode, struct file *filep)
return 0;
}
-static const struct file_operations iommufd_fault_fops = {
- .owner = THIS_MODULE,
- .open = nonseekable_open,
- .read = iommufd_fault_fops_read,
- .write = iommufd_fault_fops_write,
- .poll = iommufd_fault_fops_poll,
- .release = iommufd_fault_fops_release,
-};
+#define INIT_FAULT_FOPS(read_op, write_op) \
+ ((const struct file_operations){ \
+ .owner = THIS_MODULE, \
+ .open = nonseekable_open, \
+ .read = read_op, \
+ .write = write_op, \
+ .poll = iommufd_fault_fops_poll, \
+ .release = iommufd_fault_fops_release, \
+ })
+
+static int iommufd_fault_init(struct iommufd_fault *fault, char *name,
+ struct iommufd_ctx *ictx,
+ const struct file_operations *fops)
+{
+ struct file *filep;
+ int fdno;
+
+ spin_lock_init(&fault->lock);
+ INIT_LIST_HEAD(&fault->deliver);
+ init_waitqueue_head(&fault->wait_queue);
+
+ filep = anon_inode_getfile(name, fops, fault, O_RDWR);
+ if (IS_ERR(filep))
+ return PTR_ERR(filep);
+
+ fault->ictx = ictx;
+ iommufd_ctx_get(fault->ictx);
+ fault->filep = filep;
+ refcount_inc(&fault->obj.users);
+
+ fdno = get_unused_fd_flags(O_CLOEXEC);
+ if (fdno < 0)
+ fput(filep);
+ return fdno;
+}
+
+static const struct file_operations iommufd_fault_fops =
+ INIT_FAULT_FOPS(iommufd_fault_fops_read, iommufd_fault_fops_write);
int iommufd_fault_alloc(struct iommufd_ucmd *ucmd)
{
struct iommu_fault_alloc *cmd = ucmd->cmd;
struct iommufd_fault *fault;
- struct file *filep;
int fdno;
int rc;
@@ -424,28 +453,14 @@ int iommufd_fault_alloc(struct iommufd_ucmd *ucmd)
if (IS_ERR(fault))
return PTR_ERR(fault);
- fault->ictx = ucmd->ictx;
- INIT_LIST_HEAD(&fault->deliver);
xa_init_flags(&fault->response, XA_FLAGS_ALLOC1);
mutex_init(&fault->mutex);
- spin_lock_init(&fault->lock);
- init_waitqueue_head(&fault->wait_queue);
-
- filep = anon_inode_getfile("[iommufd-pgfault]", &iommufd_fault_fops,
- fault, O_RDWR);
- if (IS_ERR(filep)) {
- rc = PTR_ERR(filep);
- goto out_abort;
- }
- refcount_inc(&fault->obj.users);
- iommufd_ctx_get(fault->ictx);
- fault->filep = filep;
-
- fdno = get_unused_fd_flags(O_CLOEXEC);
+ fdno = iommufd_fault_init(fault, "[iommufd-pgfault]", ucmd->ictx,
+ &iommufd_fault_fops);
if (fdno < 0) {
rc = fdno;
- goto out_fput;
+ goto out_abort;
}
cmd->out_fault_id = fault->obj.id;
@@ -461,8 +476,7 @@ int iommufd_fault_alloc(struct iommufd_ucmd *ucmd)
return 0;
out_put_fdno:
put_unused_fd(fdno);
-out_fput:
- fput(filep);
+ fput(fault->filep);
out_abort:
iommufd_object_abort_and_destroy(ucmd->ictx, &fault->obj);
--
2.43.0
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v6 03/14] iommufd: Abstract an iommufd_eventq from iommufd_fault
2025-01-25 0:30 [PATCH v6 00/14] iommufd: Add vIOMMU infrastructure (Part-3: vEVENTQ) Nicolin Chen
2025-01-25 0:30 ` [PATCH v6 01/14] iommufd/fault: Move two fault functions out of the header Nicolin Chen
2025-01-25 0:30 ` [PATCH v6 02/14] iommufd/fault: Add an iommufd_fault_init() helper Nicolin Chen
@ 2025-01-25 0:30 ` Nicolin Chen
2025-02-14 20:23 ` Jason Gunthorpe
2025-01-25 0:30 ` [PATCH v6 04/14] iommufd: Rename fault.c to eventq.c Nicolin Chen
` (11 subsequent siblings)
14 siblings, 1 reply; 60+ messages in thread
From: Nicolin Chen @ 2025-01-25 0:30 UTC (permalink / raw)
To: jgg, kevin.tian, corbet, will
Cc: joro, suravee.suthikulpanit, robin.murphy, dwmw2, baolu.lu, shuah,
linux-kernel, iommu, linux-arm-kernel, linux-kselftest, linux-doc,
eric.auger, jean-philippe, mdf, mshavit,
shameerali.kolothum.thodi, smostafa, ddutile, yi.l.liu, patches
The fault object was designed exclusively for hwpt's IO page faults (PRI).
But its queue implementation can be reused for other purposes too, such as
hardware IRQ and event injections to user space.
Meanwhile, a fault object holds a list of faults. So it's more accurate to
call it a "fault queue". Combining the reusing idea above, abstract a new
iommufd_eventq as a common structure embedded into struct iommufd_fault,
similar to hwpt_paging holding a common hwpt.
Add a common iommufd_eventq_ops and iommufd_eventq_init to prepare for an
IOMMUFD_OBJ_VEVENTQ (vIOMMU Event Queue).
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
drivers/iommu/iommufd/iommufd_private.h | 28 ++++--
drivers/iommu/iommufd/fault.c | 111 +++++++++++++-----------
drivers/iommu/iommufd/hw_pagetable.c | 6 +-
3 files changed, 82 insertions(+), 63 deletions(-)
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 034df9b256f4..ee365c85dda9 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -433,20 +433,13 @@ void iopt_remove_access(struct io_pagetable *iopt,
u32 iopt_access_list_id);
void iommufd_access_destroy_object(struct iommufd_object *obj);
-/*
- * An iommufd_fault object represents an interface to deliver I/O page faults
- * to the user space. These objects are created/destroyed by the user space and
- * associated with hardware page table objects during page-table allocation.
- */
-struct iommufd_fault {
+struct iommufd_eventq {
struct iommufd_object obj;
struct iommufd_ctx *ictx;
struct file *filep;
spinlock_t lock; /* protects the deliver list */
struct list_head deliver;
- struct mutex mutex; /* serializes response flows */
- struct xarray response;
struct wait_queue_head wait_queue;
};
@@ -459,12 +452,29 @@ struct iommufd_attach_handle {
/* Convert an iommu attach handle to iommufd handle. */
#define to_iommufd_handle(hdl) container_of(hdl, struct iommufd_attach_handle, handle)
+/*
+ * An iommufd_fault object represents an interface to deliver I/O page faults
+ * to the user space. These objects are created/destroyed by the user space and
+ * associated with hardware page table objects during page-table allocation.
+ */
+struct iommufd_fault {
+ struct iommufd_eventq common;
+ struct mutex mutex; /* serializes response flows */
+ struct xarray response;
+};
+
+static inline struct iommufd_fault *
+eventq_to_fault(struct iommufd_eventq *eventq)
+{
+ return container_of(eventq, struct iommufd_fault, common);
+}
+
static inline struct iommufd_fault *
iommufd_get_fault(struct iommufd_ucmd *ucmd, u32 id)
{
return container_of(iommufd_get_object(ucmd->ictx, id,
IOMMUFD_OBJ_FAULT),
- struct iommufd_fault, obj);
+ struct iommufd_fault, common.obj);
}
int iommufd_fault_alloc(struct iommufd_ucmd *ucmd);
diff --git a/drivers/iommu/iommufd/fault.c b/drivers/iommu/iommufd/fault.c
index 08d940204169..0da39c3dfcdb 100644
--- a/drivers/iommu/iommufd/fault.c
+++ b/drivers/iommu/iommufd/fault.c
@@ -17,6 +17,8 @@
#include "../iommu-priv.h"
#include "iommufd_private.h"
+/* IOMMUFD_OBJ_FAULT Functions */
+
static int iommufd_fault_iopf_enable(struct iommufd_device *idev)
{
struct device *dev = idev->dev;
@@ -111,13 +113,13 @@ static void iommufd_auto_response_faults(struct iommufd_hw_pagetable *hwpt,
INIT_LIST_HEAD(&free_list);
mutex_lock(&fault->mutex);
- spin_lock(&fault->lock);
- list_for_each_entry_safe(group, next, &fault->deliver, node) {
+ spin_lock(&fault->common.lock);
+ list_for_each_entry_safe(group, next, &fault->common.deliver, node) {
if (group->attach_handle != &handle->handle)
continue;
list_move(&group->node, &free_list);
}
- spin_unlock(&fault->lock);
+ spin_unlock(&fault->common.lock);
list_for_each_entry_safe(group, next, &free_list, node) {
list_del(&group->node);
@@ -219,7 +221,9 @@ int iommufd_fault_domain_replace_dev(struct iommufd_device *idev,
void iommufd_fault_destroy(struct iommufd_object *obj)
{
- struct iommufd_fault *fault = container_of(obj, struct iommufd_fault, obj);
+ struct iommufd_eventq *eventq =
+ container_of(obj, struct iommufd_eventq, obj);
+ struct iommufd_fault *fault = eventq_to_fault(eventq);
struct iopf_group *group, *next;
unsigned long index;
@@ -229,7 +233,7 @@ void iommufd_fault_destroy(struct iommufd_object *obj)
* accessing this pointer. Therefore, acquiring the mutex here
* is unnecessary.
*/
- list_for_each_entry_safe(group, next, &fault->deliver, node) {
+ list_for_each_entry_safe(group, next, &fault->common.deliver, node) {
list_del(&group->node);
iopf_group_response(group, IOMMU_PAGE_RESP_INVALID);
iopf_free_group(group);
@@ -262,15 +266,15 @@ static void iommufd_compose_fault_message(struct iommu_fault *fault,
static struct iopf_group *
iommufd_fault_deliver_fetch(struct iommufd_fault *fault)
{
- struct list_head *list = &fault->deliver;
+ struct list_head *list = &fault->common.deliver;
struct iopf_group *group = NULL;
- spin_lock(&fault->lock);
+ spin_lock(&fault->common.lock);
if (!list_empty(list)) {
group = list_first_entry(list, struct iopf_group, node);
list_del(&group->node);
}
- spin_unlock(&fault->lock);
+ spin_unlock(&fault->common.lock);
return group;
}
@@ -278,16 +282,17 @@ iommufd_fault_deliver_fetch(struct iommufd_fault *fault)
static void iommufd_fault_deliver_restore(struct iommufd_fault *fault,
struct iopf_group *group)
{
- spin_lock(&fault->lock);
- list_add(&group->node, &fault->deliver);
- spin_unlock(&fault->lock);
+ spin_lock(&fault->common.lock);
+ list_add(&group->node, &fault->common.deliver);
+ spin_unlock(&fault->common.lock);
}
static ssize_t iommufd_fault_fops_read(struct file *filep, char __user *buf,
size_t count, loff_t *ppos)
{
size_t fault_size = sizeof(struct iommu_hwpt_pgfault);
- struct iommufd_fault *fault = filep->private_data;
+ struct iommufd_eventq *eventq = filep->private_data;
+ struct iommufd_fault *fault = eventq_to_fault(eventq);
struct iommu_hwpt_pgfault data = {};
struct iommufd_device *idev;
struct iopf_group *group;
@@ -336,7 +341,8 @@ static ssize_t iommufd_fault_fops_write(struct file *filep, const char __user *b
size_t count, loff_t *ppos)
{
size_t response_size = sizeof(struct iommu_hwpt_page_response);
- struct iommufd_fault *fault = filep->private_data;
+ struct iommufd_eventq *eventq = filep->private_data;
+ struct iommufd_fault *fault = eventq_to_fault(eventq);
struct iommu_hwpt_page_response response;
struct iopf_group *group;
size_t done = 0;
@@ -376,59 +382,61 @@ static ssize_t iommufd_fault_fops_write(struct file *filep, const char __user *b
return done == 0 ? rc : done;
}
-static __poll_t iommufd_fault_fops_poll(struct file *filep,
- struct poll_table_struct *wait)
+/* Common Event Queue Functions */
+
+static __poll_t iommufd_eventq_fops_poll(struct file *filep,
+ struct poll_table_struct *wait)
{
- struct iommufd_fault *fault = filep->private_data;
+ struct iommufd_eventq *eventq = filep->private_data;
__poll_t pollflags = EPOLLOUT;
- poll_wait(filep, &fault->wait_queue, wait);
- spin_lock(&fault->lock);
- if (!list_empty(&fault->deliver))
+ poll_wait(filep, &eventq->wait_queue, wait);
+ spin_lock(&eventq->lock);
+ if (!list_empty(&eventq->deliver))
pollflags |= EPOLLIN | EPOLLRDNORM;
- spin_unlock(&fault->lock);
+ spin_unlock(&eventq->lock);
return pollflags;
}
-static int iommufd_fault_fops_release(struct inode *inode, struct file *filep)
+static int iommufd_eventq_fops_release(struct inode *inode, struct file *filep)
{
- struct iommufd_fault *fault = filep->private_data;
+ struct iommufd_eventq *eventq = filep->private_data;
- refcount_dec(&fault->obj.users);
- iommufd_ctx_put(fault->ictx);
+ refcount_dec(&eventq->obj.users);
+ iommufd_ctx_put(eventq->ictx);
return 0;
}
-#define INIT_FAULT_FOPS(read_op, write_op) \
+#define INIT_EVENTQ_FOPS(read_op, write_op) \
((const struct file_operations){ \
.owner = THIS_MODULE, \
.open = nonseekable_open, \
.read = read_op, \
.write = write_op, \
- .poll = iommufd_fault_fops_poll, \
- .release = iommufd_fault_fops_release, \
+ .poll = iommufd_eventq_fops_poll, \
+ .release = iommufd_eventq_fops_release, \
})
-static int iommufd_fault_init(struct iommufd_fault *fault, char *name,
- struct iommufd_ctx *ictx,
- const struct file_operations *fops)
+static int iommufd_eventq_init(struct iommufd_eventq *eventq, char *name,
+ struct iommufd_ctx *ictx,
+ const struct file_operations *fops)
{
struct file *filep;
int fdno;
- spin_lock_init(&fault->lock);
- INIT_LIST_HEAD(&fault->deliver);
- init_waitqueue_head(&fault->wait_queue);
+ spin_lock_init(&eventq->lock);
+ INIT_LIST_HEAD(&eventq->deliver);
+ init_waitqueue_head(&eventq->wait_queue);
- filep = anon_inode_getfile(name, fops, fault, O_RDWR);
+ filep = anon_inode_getfile(name, fops, eventq, O_RDWR);
if (IS_ERR(filep))
return PTR_ERR(filep);
- fault->ictx = ictx;
- iommufd_ctx_get(fault->ictx);
- fault->filep = filep;
- refcount_inc(&fault->obj.users);
+ eventq->ictx = ictx;
+ iommufd_ctx_get(eventq->ictx);
+ eventq->filep = filep;
+ refcount_inc(&eventq->obj.users);
fdno = get_unused_fd_flags(O_CLOEXEC);
if (fdno < 0)
@@ -437,7 +445,7 @@ static int iommufd_fault_init(struct iommufd_fault *fault, char *name,
}
static const struct file_operations iommufd_fault_fops =
- INIT_FAULT_FOPS(iommufd_fault_fops_read, iommufd_fault_fops_write);
+ INIT_EVENTQ_FOPS(iommufd_fault_fops_read, iommufd_fault_fops_write);
int iommufd_fault_alloc(struct iommufd_ucmd *ucmd)
{
@@ -449,36 +457,37 @@ int iommufd_fault_alloc(struct iommufd_ucmd *ucmd)
if (cmd->flags)
return -EOPNOTSUPP;
- fault = iommufd_object_alloc(ucmd->ictx, fault, IOMMUFD_OBJ_FAULT);
+ fault = __iommufd_object_alloc(ucmd->ictx, fault, IOMMUFD_OBJ_FAULT,
+ common.obj);
if (IS_ERR(fault))
return PTR_ERR(fault);
xa_init_flags(&fault->response, XA_FLAGS_ALLOC1);
mutex_init(&fault->mutex);
- fdno = iommufd_fault_init(fault, "[iommufd-pgfault]", ucmd->ictx,
- &iommufd_fault_fops);
+ fdno = iommufd_eventq_init(&fault->common, "[iommufd-pgfault]",
+ ucmd->ictx, &iommufd_fault_fops);
if (fdno < 0) {
rc = fdno;
goto out_abort;
}
- cmd->out_fault_id = fault->obj.id;
+ cmd->out_fault_id = fault->common.obj.id;
cmd->out_fault_fd = fdno;
rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
if (rc)
goto out_put_fdno;
- iommufd_object_finalize(ucmd->ictx, &fault->obj);
+ iommufd_object_finalize(ucmd->ictx, &fault->common.obj);
- fd_install(fdno, fault->filep);
+ fd_install(fdno, fault->common.filep);
return 0;
out_put_fdno:
put_unused_fd(fdno);
- fput(fault->filep);
+ fput(fault->common.filep);
out_abort:
- iommufd_object_abort_and_destroy(ucmd->ictx, &fault->obj);
+ iommufd_object_abort_and_destroy(ucmd->ictx, &fault->common.obj);
return rc;
}
@@ -491,11 +500,11 @@ int iommufd_fault_iopf_handler(struct iopf_group *group)
hwpt = group->attach_handle->domain->fault_data;
fault = hwpt->fault;
- spin_lock(&fault->lock);
- list_add_tail(&group->node, &fault->deliver);
- spin_unlock(&fault->lock);
+ spin_lock(&fault->common.lock);
+ list_add_tail(&group->node, &fault->common.deliver);
+ spin_unlock(&fault->common.lock);
- wake_up_interruptible(&fault->wait_queue);
+ wake_up_interruptible(&fault->common.wait_queue);
return 0;
}
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index ce03c3804651..12a576f1f13d 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -14,7 +14,7 @@ static void __iommufd_hwpt_destroy(struct iommufd_hw_pagetable *hwpt)
iommu_domain_free(hwpt->domain);
if (hwpt->fault)
- refcount_dec(&hwpt->fault->obj.users);
+ refcount_dec(&hwpt->fault->common.obj.users);
}
void iommufd_hwpt_paging_destroy(struct iommufd_object *obj)
@@ -403,8 +403,8 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
hwpt->fault = fault;
hwpt->domain->iopf_handler = iommufd_fault_iopf_handler;
hwpt->domain->fault_data = hwpt;
- refcount_inc(&fault->obj.users);
- iommufd_put_object(ucmd->ictx, &fault->obj);
+ refcount_inc(&fault->common.obj.users);
+ iommufd_put_object(ucmd->ictx, &fault->common.obj);
}
cmd->out_hwpt_id = hwpt->obj.id;
--
2.43.0
^ permalink raw reply related [flat|nested] 60+ messages in thread
* Re: [PATCH v6 03/14] iommufd: Abstract an iommufd_eventq from iommufd_fault
2025-01-25 0:30 ` [PATCH v6 03/14] iommufd: Abstract an iommufd_eventq from iommufd_fault Nicolin Chen
@ 2025-02-14 20:23 ` Jason Gunthorpe
0 siblings, 0 replies; 60+ messages in thread
From: Jason Gunthorpe @ 2025-02-14 20:23 UTC (permalink / raw)
To: Nicolin Chen
Cc: kevin.tian, corbet, will, joro, suravee.suthikulpanit,
robin.murphy, dwmw2, baolu.lu, shuah, linux-kernel, iommu,
linux-arm-kernel, linux-kselftest, linux-doc, eric.auger,
jean-philippe, mdf, mshavit, shameerali.kolothum.thodi, smostafa,
ddutile, yi.l.liu, patches
On Fri, Jan 24, 2025 at 04:30:32PM -0800, Nicolin Chen wrote:
> The fault object was designed exclusively for hwpt's IO page faults (PRI).
> But its queue implementation can be reused for other purposes too, such as
> hardware IRQ and event injections to user space.
>
> Meanwhile, a fault object holds a list of faults. So it's more accurate to
> call it a "fault queue". Combining the reusing idea above, abstract a new
> iommufd_eventq as a common structure embedded into struct iommufd_fault,
> similar to hwpt_paging holding a common hwpt.
>
> Add a common iommufd_eventq_ops and iommufd_eventq_init to prepare for an
> IOMMUFD_OBJ_VEVENTQ (vIOMMU Event Queue).
>
> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> ---
> drivers/iommu/iommufd/iommufd_private.h | 28 ++++--
> drivers/iommu/iommufd/fault.c | 111 +++++++++++++-----------
> drivers/iommu/iommufd/hw_pagetable.c | 6 +-
> 3 files changed, 82 insertions(+), 63 deletions(-)
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Jason
^ permalink raw reply [flat|nested] 60+ messages in thread
* [PATCH v6 04/14] iommufd: Rename fault.c to eventq.c
2025-01-25 0:30 [PATCH v6 00/14] iommufd: Add vIOMMU infrastructure (Part-3: vEVENTQ) Nicolin Chen
` (2 preceding siblings ...)
2025-01-25 0:30 ` [PATCH v6 03/14] iommufd: Abstract an iommufd_eventq from iommufd_fault Nicolin Chen
@ 2025-01-25 0:30 ` Nicolin Chen
2025-01-25 0:30 ` [PATCH v6 05/14] iommufd: Add IOMMUFD_OBJ_VEVENTQ and IOMMUFD_CMD_VEVENTQ_ALLOC Nicolin Chen
` (10 subsequent siblings)
14 siblings, 0 replies; 60+ messages in thread
From: Nicolin Chen @ 2025-01-25 0:30 UTC (permalink / raw)
To: jgg, kevin.tian, corbet, will
Cc: joro, suravee.suthikulpanit, robin.murphy, dwmw2, baolu.lu, shuah,
linux-kernel, iommu, linux-arm-kernel, linux-kselftest, linux-doc,
eric.auger, jean-philippe, mdf, mshavit,
shameerali.kolothum.thodi, smostafa, ddutile, yi.l.liu, patches
Rename the file, aligning with the new eventq object.
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
drivers/iommu/iommufd/Makefile | 2 +-
drivers/iommu/iommufd/{fault.c => eventq.c} | 0
2 files changed, 1 insertion(+), 1 deletion(-)
rename drivers/iommu/iommufd/{fault.c => eventq.c} (100%)
diff --git a/drivers/iommu/iommufd/Makefile b/drivers/iommu/iommufd/Makefile
index cb784da6cddc..71d692c9a8f4 100644
--- a/drivers/iommu/iommufd/Makefile
+++ b/drivers/iommu/iommufd/Makefile
@@ -1,7 +1,7 @@
# SPDX-License-Identifier: GPL-2.0-only
iommufd-y := \
device.o \
- fault.o \
+ eventq.o \
hw_pagetable.o \
io_pagetable.o \
ioas.o \
diff --git a/drivers/iommu/iommufd/fault.c b/drivers/iommu/iommufd/eventq.c
similarity index 100%
rename from drivers/iommu/iommufd/fault.c
rename to drivers/iommu/iommufd/eventq.c
--
2.43.0
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v6 05/14] iommufd: Add IOMMUFD_OBJ_VEVENTQ and IOMMUFD_CMD_VEVENTQ_ALLOC
2025-01-25 0:30 [PATCH v6 00/14] iommufd: Add vIOMMU infrastructure (Part-3: vEVENTQ) Nicolin Chen
` (3 preceding siblings ...)
2025-01-25 0:30 ` [PATCH v6 04/14] iommufd: Rename fault.c to eventq.c Nicolin Chen
@ 2025-01-25 0:30 ` Nicolin Chen
2025-02-18 5:13 ` Tian, Kevin
2025-02-18 15:29 ` Jason Gunthorpe
2025-01-25 0:30 ` [PATCH v6 06/14] iommufd/viommu: Add iommufd_viommu_get_vdev_id helper Nicolin Chen
` (9 subsequent siblings)
14 siblings, 2 replies; 60+ messages in thread
From: Nicolin Chen @ 2025-01-25 0:30 UTC (permalink / raw)
To: jgg, kevin.tian, corbet, will
Cc: joro, suravee.suthikulpanit, robin.murphy, dwmw2, baolu.lu, shuah,
linux-kernel, iommu, linux-arm-kernel, linux-kselftest, linux-doc,
eric.auger, jean-philippe, mdf, mshavit,
shameerali.kolothum.thodi, smostafa, ddutile, yi.l.liu, patches
Introduce a new IOMMUFD_OBJ_VEVENTQ object for vIOMMU Event Queue that
provides user space (VMM) another FD to read the vIOMMU Events.
Allow a vIOMMU object to allocate vEVENTQs, with a condition that each
vIOMMU can only have one single vEVENTQ per type.
Add iommufd_veventq_alloc() with iommufd_veventq_ops for the new ioctl.
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
drivers/iommu/iommufd/iommufd_private.h | 88 ++++++++++
include/linux/iommufd.h | 3 +
include/uapi/linux/iommufd.h | 85 ++++++++++
drivers/iommu/iommufd/eventq.c | 206 +++++++++++++++++++++++-
drivers/iommu/iommufd/main.c | 7 +
drivers/iommu/iommufd/viommu.c | 2 +
6 files changed, 390 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index ee365c85dda9..7a8feedcea2e 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -519,6 +519,80 @@ static inline int iommufd_hwpt_replace_device(struct iommufd_device *idev,
return iommu_group_replace_domain(idev->igroup->group, hwpt->domain);
}
+/* An iommufd_vevent represents a vIOMMU event in an iommufd_veventq */
+struct iommufd_vevent {
+ struct iommufd_vevent_header header;
+ struct list_head node; /* for iommufd_eventq::deliver */
+ bool on_list;
+ ssize_t data_len;
+ u64 event_data[] __counted_by(data_len);
+};
+
+#define vevent_for_overflow(vevent) \
+ (vevent->header.flags & IOMMU_VEVENTQ_FLAG_OVERFLOW)
+
+/*
+ * An iommufd_veventq object represents an interface to deliver vIOMMU events to
+ * the user space. It is created/destroyed by the user space and associated with
+ * vIOMMU object(s) during the allocations.
+ */
+struct iommufd_veventq {
+ struct iommufd_eventq common;
+ struct iommufd_viommu *viommu;
+ struct list_head node; /* for iommufd_viommu::veventqs */
+ struct iommufd_vevent overflow; /* pre-allocated overflow node */
+
+ unsigned int type;
+ unsigned int depth;
+
+ atomic_t num_events;
+ atomic_t sequence;
+};
+
+static inline struct iommufd_veventq *
+eventq_to_veventq(struct iommufd_eventq *eventq)
+{
+ return container_of(eventq, struct iommufd_veventq, common);
+}
+
+static inline struct iommufd_veventq *
+iommufd_get_veventq(struct iommufd_ucmd *ucmd, u32 id)
+{
+ return container_of(iommufd_get_object(ucmd->ictx, id,
+ IOMMUFD_OBJ_VEVENTQ),
+ struct iommufd_veventq, common.obj);
+}
+
+int iommufd_veventq_alloc(struct iommufd_ucmd *ucmd);
+void iommufd_veventq_destroy(struct iommufd_object *obj);
+void iommufd_veventq_abort(struct iommufd_object *obj);
+
+static inline void iommufd_vevent_handler(struct iommufd_veventq *veventq,
+ struct iommufd_vevent *vevent)
+{
+ struct iommufd_eventq *eventq = &veventq->common;
+
+ /*
+ * Remove the overflow node and add the new node at the same time. Note
+ * it is possible that vevent == &veventq->overflow for sequence update
+ */
+ spin_lock(&eventq->lock);
+ if (veventq->overflow.on_list) {
+ list_del(&veventq->overflow.node);
+ veventq->overflow.on_list = false;
+ }
+ list_add_tail(&vevent->node, &eventq->deliver);
+ vevent->on_list = true;
+ vevent->header.sequence = atomic_read(&veventq->sequence);
+ if (atomic_read(&veventq->sequence) == INT_MAX)
+ atomic_set(&veventq->sequence, 0);
+ else
+ atomic_inc(&veventq->sequence);
+ spin_unlock(&eventq->lock);
+
+ wake_up_interruptible(&eventq->wait_queue);
+}
+
static inline struct iommufd_viommu *
iommufd_get_viommu(struct iommufd_ucmd *ucmd, u32 id)
{
@@ -527,6 +601,20 @@ iommufd_get_viommu(struct iommufd_ucmd *ucmd, u32 id)
struct iommufd_viommu, obj);
}
+static inline struct iommufd_veventq *
+iommufd_viommu_find_veventq(struct iommufd_viommu *viommu, u32 type)
+{
+ struct iommufd_veventq *veventq, *next;
+
+ lockdep_assert_held(&viommu->veventqs_rwsem);
+
+ list_for_each_entry_safe(veventq, next, &viommu->veventqs, node) {
+ if (veventq->type == type)
+ return veventq;
+ }
+ return NULL;
+}
+
int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd);
void iommufd_viommu_destroy(struct iommufd_object *obj);
int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd);
diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h
index 11110c749200..8948b1836940 100644
--- a/include/linux/iommufd.h
+++ b/include/linux/iommufd.h
@@ -34,6 +34,7 @@ enum iommufd_object_type {
IOMMUFD_OBJ_FAULT,
IOMMUFD_OBJ_VIOMMU,
IOMMUFD_OBJ_VDEVICE,
+ IOMMUFD_OBJ_VEVENTQ,
#ifdef CONFIG_IOMMUFD_TEST
IOMMUFD_OBJ_SELFTEST,
#endif
@@ -93,6 +94,8 @@ struct iommufd_viommu {
const struct iommufd_viommu_ops *ops;
struct xarray vdevs;
+ struct list_head veventqs;
+ struct rw_semaphore veventqs_rwsem;
unsigned int type;
};
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index 78747b24bd0f..08cbc6bc3725 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -55,6 +55,7 @@ enum {
IOMMUFD_CMD_VIOMMU_ALLOC = 0x90,
IOMMUFD_CMD_VDEVICE_ALLOC = 0x91,
IOMMUFD_CMD_IOAS_CHANGE_PROCESS = 0x92,
+ IOMMUFD_CMD_VEVENTQ_ALLOC = 0x93,
};
/**
@@ -1014,4 +1015,88 @@ struct iommu_ioas_change_process {
#define IOMMU_IOAS_CHANGE_PROCESS \
_IO(IOMMUFD_TYPE, IOMMUFD_CMD_IOAS_CHANGE_PROCESS)
+/**
+ * enum iommu_veventq_flag - flag for struct iommufd_vevent_header
+ * @IOMMU_VEVENTQ_FLAG_OVERFLOW: vEVENTQ is overflowed
+ */
+enum iommu_veventq_flag {
+ IOMMU_VEVENTQ_FLAG_OVERFLOW = (1ULL << 0),
+};
+
+/**
+ * struct iommufd_vevent_header - Virtual Event Header for a vEVENTQ Status
+ * @flags: Combination of enum iommu_veventq_flag
+ * @sequence: The sequence index of a vEVENT in the vEVENTQ, with a range of
+ * [0, INT_MAX] where the following index of INT_MAX is 0
+ * @__reserved: Must be 0
+ *
+ * Each iommufd_vevent_header reports a sequence index of the following vEVENT:
+ * ---------------------------------------------------------------------------
+ * || header0 {sequence=0} | data0 | header1 {sequence=1} | data1 |...| dataN ||
+ * ---------------------------------------------------------------------------
+ * And this sequence index is expected to be monotonic to the sequence index of
+ * the previous vEVENT. If two adjacent sequence indexes has a delta larger than
+ * 1, it indicates that an overflow occurred to the vEVENTQ and that delta - 1
+ * number of vEVENTs lost due to the overflow (e.g. two lost vEVENTs):
+ * ---------------------------------------------------------------------------
+ * || ... | header3 {sequence=3} | data3 | header6 {sequence=6} | data6 | ... ||
+ * ---------------------------------------------------------------------------
+ * If an overflow occurred to the tail of the vEVENTQ and there is no following
+ * vEVENT providing the next sequence index, a special overflow header would be
+ * added to the tail of the vEVENTQ, where there would be no more type-specific
+ * data following the vEVENTQ:
+ * ---------------------------------------------------------------------------
+ * ||...| header3 {sequence=3} | data4 | header4 {flags=OVERFLOW, sequence=4} ||
+ * ---------------------------------------------------------------------------
+ */
+struct iommufd_vevent_header {
+ __aligned_u64 flags;
+ __u32 sequence;
+ __u32 __reserved;
+};
+
+/**
+ * enum iommu_veventq_type - Virtual Event Queue Type
+ * @IOMMU_VEVENTQ_TYPE_DEFAULT: Reserved for future use
+ */
+enum iommu_veventq_type {
+ IOMMU_VEVENTQ_TYPE_DEFAULT = 0,
+};
+
+/**
+ * struct iommu_veventq_alloc - ioctl(IOMMU_VEVENTQ_ALLOC)
+ * @size: sizeof(struct iommu_veventq_alloc)
+ * @flags: Must be 0
+ * @viommu: virtual IOMMU ID to associate the vEVENTQ with
+ * @type: Type of the vEVENTQ. Must be defined in enum iommu_veventq_type
+ * @veventq_depth: Maximum number of events in the vEVENTQ
+ * @out_veventq_id: The ID of the new vEVENTQ
+ * @out_veventq_fd: The fd of the new vEVENTQ. User space must close the
+ * successfully returned fd after using it
+ * @__reserved: Must be 0
+ *
+ * Explicitly allocate a virtual event queue interface for a vIOMMU. A vIOMMU
+ * can have multiple FDs for different types, but is confined to one per @type.
+ * User space should open the @out_veventq_fd to read vEVENTs out of a vEVENTQ,
+ * if there are vEVENTs available. A vEVENTQ will overflow if the number of the
+ * vEVENTs hits @veventq_depth.
+ *
+ * Each vEVENT in a vEVENTQ encloses a struct iommufd_vevent_header followed by
+ * a type-specific data structure, in a normal case:
+ * -------------------------------------------------------------
+ * || header0 | data0 | header1 | data1 | ... | headerN | dataN ||
+ * -------------------------------------------------------------
+ * unless a tailing overflow is logged (refer to struct iommufd_vevent_header).
+ */
+struct iommu_veventq_alloc {
+ __u32 size;
+ __u32 flags;
+ __u32 viommu_id;
+ __u32 type;
+ __u32 veventq_depth;
+ __u32 out_veventq_id;
+ __u32 out_veventq_fd;
+ __u32 __reserved;
+};
+#define IOMMU_VEVENTQ_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VEVENTQ_ALLOC)
#endif
diff --git a/drivers/iommu/iommufd/eventq.c b/drivers/iommu/iommufd/eventq.c
index 0da39c3dfcdb..a08c8ebaea62 100644
--- a/drivers/iommu/iommufd/eventq.c
+++ b/drivers/iommu/iommufd/eventq.c
@@ -382,13 +382,141 @@ static ssize_t iommufd_fault_fops_write(struct file *filep, const char __user *b
return done == 0 ? rc : done;
}
+/* IOMMUFD_OBJ_VEVENTQ Functions */
+
+void iommufd_veventq_abort(struct iommufd_object *obj)
+{
+ struct iommufd_eventq *eventq =
+ container_of(obj, struct iommufd_eventq, obj);
+ struct iommufd_veventq *veventq = eventq_to_veventq(eventq);
+ struct iommufd_viommu *viommu = veventq->viommu;
+ struct iommufd_vevent *cur, *next;
+
+ lockdep_assert_held_write(&viommu->veventqs_rwsem);
+
+ list_for_each_entry_safe(cur, next, &eventq->deliver, node) {
+ list_del(&cur->node);
+ kfree(cur);
+ }
+
+ refcount_dec(&viommu->obj.users);
+ list_del(&veventq->node);
+}
+
+void iommufd_veventq_destroy(struct iommufd_object *obj)
+{
+ struct iommufd_veventq *veventq = eventq_to_veventq(
+ container_of(obj, struct iommufd_eventq, obj));
+
+ down_write(&veventq->viommu->veventqs_rwsem);
+ iommufd_veventq_abort(obj);
+ up_write(&veventq->viommu->veventqs_rwsem);
+}
+
+static struct iommufd_vevent *
+iommufd_veventq_deliver_fetch(struct iommufd_veventq *veventq)
+{
+ struct iommufd_eventq *eventq = &veventq->common;
+ struct list_head *list = &eventq->deliver;
+ struct iommufd_vevent *vevent = NULL;
+
+ spin_lock(&eventq->lock);
+ if (!list_empty(list)) {
+ vevent = list_first_entry(list, struct iommufd_vevent, node);
+ list_del(&vevent->node);
+ vevent->on_list = false;
+ }
+ /* Make a copy of the overflow node for copy_to_user */
+ if (vevent == &veventq->overflow) {
+ vevent = kzalloc(sizeof(*vevent), GFP_ATOMIC);
+ if (vevent)
+ memcpy(vevent, &veventq->overflow, sizeof(*vevent));
+ }
+ spin_unlock(&eventq->lock);
+ return vevent;
+}
+
+static void iommufd_veventq_deliver_restore(struct iommufd_veventq *veventq,
+ struct iommufd_vevent *vevent)
+{
+ struct iommufd_eventq *eventq = &veventq->common;
+ struct list_head *list = &eventq->deliver;
+
+ spin_lock(&eventq->lock);
+ if (vevent_for_overflow(vevent)) {
+ /* Remove the copy of the overflow node */
+ kfree(vevent);
+ vevent = NULL;
+ /* An empty list needs the overflow node back */
+ if (list_empty(list))
+ vevent = &veventq->overflow;
+ }
+ if (vevent) {
+ list_add(&vevent->node, list);
+ vevent->on_list = true;
+ }
+ spin_unlock(&eventq->lock);
+}
+
+static ssize_t iommufd_veventq_fops_read(struct file *filep, char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ struct iommufd_eventq *eventq = filep->private_data;
+ struct iommufd_veventq *veventq = eventq_to_veventq(eventq);
+ struct iommufd_vevent_header *hdr;
+ struct iommufd_vevent *cur;
+ size_t done = 0;
+ int rc = 0;
+
+ if (*ppos)
+ return -ESPIPE;
+
+ while ((cur = iommufd_veventq_deliver_fetch(veventq))) {
+ /* Validate the remaining bytes against the header size */
+ if (done >= count || sizeof(*hdr) > count - done) {
+ iommufd_veventq_deliver_restore(veventq, cur);
+ break;
+ }
+ hdr = &cur->header;
+
+ /* If being a normal vEVENT, validate against the full size */
+ if (!vevent_for_overflow(cur) &&
+ sizeof(hdr) + cur->data_len > count - done) {
+ iommufd_veventq_deliver_restore(veventq, cur);
+ break;
+ }
+
+ if (copy_to_user(buf + done, hdr, sizeof(*hdr))) {
+ iommufd_veventq_deliver_restore(veventq, cur);
+ rc = -EFAULT;
+ break;
+ }
+ done += sizeof(*hdr);
+
+ if (cur->data_len &&
+ copy_to_user(buf + done, cur->event_data, cur->data_len)) {
+ iommufd_veventq_deliver_restore(veventq, cur);
+ rc = -EFAULT;
+ break;
+ }
+ atomic_dec(&veventq->num_events);
+ done += cur->data_len;
+ kfree(cur);
+ }
+
+ return done == 0 ? rc : done;
+}
+
/* Common Event Queue Functions */
static __poll_t iommufd_eventq_fops_poll(struct file *filep,
struct poll_table_struct *wait)
{
struct iommufd_eventq *eventq = filep->private_data;
- __poll_t pollflags = EPOLLOUT;
+ __poll_t pollflags = 0;
+
+ if (eventq->obj.type == IOMMUFD_OBJ_FAULT)
+ pollflags |= EPOLLOUT;
poll_wait(filep, &eventq->wait_queue, wait);
spin_lock(&eventq->lock);
@@ -403,6 +531,10 @@ static int iommufd_eventq_fops_release(struct inode *inode, struct file *filep)
{
struct iommufd_eventq *eventq = filep->private_data;
+ if (eventq->obj.type == IOMMUFD_OBJ_VEVENTQ) {
+ atomic_set(&eventq_to_veventq(eventq)->sequence, 0);
+ atomic_set(&eventq_to_veventq(eventq)->num_events, 0);
+ }
refcount_dec(&eventq->obj.users);
iommufd_ctx_put(eventq->ictx);
return 0;
@@ -508,3 +640,75 @@ int iommufd_fault_iopf_handler(struct iopf_group *group)
return 0;
}
+
+static const struct file_operations iommufd_veventq_fops =
+ INIT_EVENTQ_FOPS(iommufd_veventq_fops_read, NULL);
+
+int iommufd_veventq_alloc(struct iommufd_ucmd *ucmd)
+{
+ struct iommu_veventq_alloc *cmd = ucmd->cmd;
+ struct iommufd_veventq *veventq;
+ struct iommufd_viommu *viommu;
+ int fdno;
+ int rc;
+
+ if (cmd->flags || cmd->type == IOMMU_VEVENTQ_TYPE_DEFAULT)
+ return -EOPNOTSUPP;
+ if (!cmd->veventq_depth)
+ return -EINVAL;
+
+ viommu = iommufd_get_viommu(ucmd, cmd->viommu_id);
+ if (IS_ERR(viommu))
+ return PTR_ERR(viommu);
+
+ down_write(&viommu->veventqs_rwsem);
+
+ if (iommufd_viommu_find_veventq(viommu, cmd->type)) {
+ rc = -EEXIST;
+ goto out_unlock_veventqs;
+ }
+
+ veventq = __iommufd_object_alloc(ucmd->ictx, veventq,
+ IOMMUFD_OBJ_VEVENTQ, common.obj);
+ if (IS_ERR(veventq)) {
+ rc = PTR_ERR(veventq);
+ goto out_unlock_veventqs;
+ }
+
+ veventq->type = cmd->type;
+ veventq->viommu = viommu;
+ refcount_inc(&viommu->obj.users);
+ atomic_set(&veventq->sequence, 0);
+ atomic_set(&veventq->num_events, 0);
+ veventq->depth = cmd->veventq_depth;
+ list_add_tail(&veventq->node, &viommu->veventqs);
+ veventq->overflow.header.flags = IOMMU_VEVENTQ_FLAG_OVERFLOW;
+
+ fdno = iommufd_eventq_init(&veventq->common, "[iommufd-viommu-event]",
+ ucmd->ictx, &iommufd_veventq_fops);
+ if (fdno < 0) {
+ rc = fdno;
+ goto out_abort;
+ }
+
+ cmd->out_veventq_id = veventq->common.obj.id;
+ cmd->out_veventq_fd = fdno;
+
+ rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
+ if (rc)
+ goto out_put_fdno;
+
+ iommufd_object_finalize(ucmd->ictx, &veventq->common.obj);
+ fd_install(fdno, veventq->common.filep);
+ goto out_unlock_veventqs;
+
+out_put_fdno:
+ put_unused_fd(fdno);
+ fput(veventq->common.filep);
+out_abort:
+ iommufd_object_abort_and_destroy(ucmd->ictx, &veventq->common.obj);
+out_unlock_veventqs:
+ up_write(&viommu->veventqs_rwsem);
+ iommufd_put_object(ucmd->ictx, &viommu->obj);
+ return rc;
+}
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index a11e9cfd790f..0d451601fb9a 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -308,6 +308,7 @@ union ucmd_buffer {
struct iommu_ioas_unmap unmap;
struct iommu_option option;
struct iommu_vdevice_alloc vdev;
+ struct iommu_veventq_alloc veventq;
struct iommu_vfio_ioas vfio_ioas;
struct iommu_viommu_alloc viommu;
#ifdef CONFIG_IOMMUFD_TEST
@@ -363,6 +364,8 @@ static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = {
IOCTL_OP(IOMMU_OPTION, iommufd_option, struct iommu_option, val64),
IOCTL_OP(IOMMU_VDEVICE_ALLOC, iommufd_vdevice_alloc_ioctl,
struct iommu_vdevice_alloc, virt_id),
+ IOCTL_OP(IOMMU_VEVENTQ_ALLOC, iommufd_veventq_alloc,
+ struct iommu_veventq_alloc, out_veventq_fd),
IOCTL_OP(IOMMU_VFIO_IOAS, iommufd_vfio_ioas, struct iommu_vfio_ioas,
__reserved),
IOCTL_OP(IOMMU_VIOMMU_ALLOC, iommufd_viommu_alloc_ioctl,
@@ -505,6 +508,10 @@ static const struct iommufd_object_ops iommufd_object_ops[] = {
[IOMMUFD_OBJ_VDEVICE] = {
.destroy = iommufd_vdevice_destroy,
},
+ [IOMMUFD_OBJ_VEVENTQ] = {
+ .destroy = iommufd_veventq_destroy,
+ .abort = iommufd_veventq_abort,
+ },
[IOMMUFD_OBJ_VIOMMU] = {
.destroy = iommufd_viommu_destroy,
},
diff --git a/drivers/iommu/iommufd/viommu.c b/drivers/iommu/iommufd/viommu.c
index 69b88e8c7c26..01df2b985f02 100644
--- a/drivers/iommu/iommufd/viommu.c
+++ b/drivers/iommu/iommufd/viommu.c
@@ -59,6 +59,8 @@ int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd)
viommu->ictx = ucmd->ictx;
viommu->hwpt = hwpt_paging;
refcount_inc(&viommu->hwpt->common.obj.users);
+ INIT_LIST_HEAD(&viommu->veventqs);
+ init_rwsem(&viommu->veventqs_rwsem);
/*
* It is the most likely case that a physical IOMMU is unpluggable. A
* pluggable IOMMU instance (if exists) is responsible for refcounting
--
2.43.0
^ permalink raw reply related [flat|nested] 60+ messages in thread
* RE: [PATCH v6 05/14] iommufd: Add IOMMUFD_OBJ_VEVENTQ and IOMMUFD_CMD_VEVENTQ_ALLOC
2025-01-25 0:30 ` [PATCH v6 05/14] iommufd: Add IOMMUFD_OBJ_VEVENTQ and IOMMUFD_CMD_VEVENTQ_ALLOC Nicolin Chen
@ 2025-02-18 5:13 ` Tian, Kevin
2025-02-18 17:53 ` Nicolin Chen
2025-02-18 15:29 ` Jason Gunthorpe
1 sibling, 1 reply; 60+ messages in thread
From: Tian, Kevin @ 2025-02-18 5:13 UTC (permalink / raw)
To: Nicolin Chen, jgg@nvidia.com, corbet@lwn.net, will@kernel.org
Cc: joro@8bytes.org, suravee.suthikulpanit@amd.com,
robin.murphy@arm.com, dwmw2@infradead.org,
baolu.lu@linux.intel.com, shuah@kernel.org,
linux-kernel@vger.kernel.org, iommu@lists.linux.dev,
linux-arm-kernel@lists.infradead.org,
linux-kselftest@vger.kernel.org, linux-doc@vger.kernel.org,
eric.auger@redhat.com, jean-philippe@linaro.org, mdf@kernel.org,
mshavit@google.com, shameerali.kolothum.thodi@huawei.com,
smostafa@google.com, ddutile@redhat.com, Liu, Yi L,
patches@lists.linux.dev
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Saturday, January 25, 2025 8:31 AM
> +
> +/*
> + * An iommufd_veventq object represents an interface to deliver vIOMMU
> events to
> + * the user space. It is created/destroyed by the user space and associated
> with
> + * vIOMMU object(s) during the allocations.
s/object(s)/object/, given the eventq cannot be shared between vIOMMUs.
> +static inline void iommufd_vevent_handler(struct iommufd_veventq
> *veventq,
> + struct iommufd_vevent *vevent)
> +{
> + struct iommufd_eventq *eventq = &veventq->common;
> +
> + /*
> + * Remove the overflow node and add the new node at the same
> time. Note
> + * it is possible that vevent == &veventq->overflow for sequence
> update
> + */
> + spin_lock(&eventq->lock);
> + if (veventq->overflow.on_list) {
> + list_del(&veventq->overflow.node);
> + veventq->overflow.on_list = false;
> + }
We can save one field 'on_list' in every entry by:
if (list_is_last(&veventq->overflow.node, &eventq->deliver))
list_del(&veventq->overflow.node);
> +
> +/**
> + * struct iommufd_vevent_header - Virtual Event Header for a vEVENTQ
> Status
> + * @flags: Combination of enum iommu_veventq_flag
> + * @sequence: The sequence index of a vEVENT in the vEVENTQ, with a
> range of
> + * [0, INT_MAX] where the following index of INT_MAX is 0
> + * @__reserved: Must be 0
> + *
> + * Each iommufd_vevent_header reports a sequence index of the following
> vEVENT:
> + * ---------------------------------------------------------------------------
> + * || header0 {sequence=0} | data0 | header1 {sequence=1} | data1 |...|
> dataN ||
> + * ---------------------------------------------------------------------------
> + * And this sequence index is expected to be monotonic to the sequence
> index of
> + * the previous vEVENT. If two adjacent sequence indexes has a delta larger
> than
> + * 1, it indicates that an overflow occurred to the vEVENTQ and that delta - 1
> + * number of vEVENTs lost due to the overflow (e.g. two lost vEVENTs):
> + * ---------------------------------------------------------------------------
> + * || ... | header3 {sequence=3} | data3 | header6 {sequence=6} | data6 | ...
> ||
> + * ---------------------------------------------------------------------------
> + * If an overflow occurred to the tail of the vEVENTQ and there is no
> following
> + * vEVENT providing the next sequence index, a special overflow header
> would be
> + * added to the tail of the vEVENTQ, where there would be no more type-
> specific
> + * data following the vEVENTQ:
> + * ---------------------------------------------------------------------------
> + * ||...| header3 {sequence=3} | data4 | header4 {flags=OVERFLOW,
> sequence=4} ||
> + * ---------------------------------------------------------------------------
> + */
> +struct iommufd_vevent_header {
> + __aligned_u64 flags;
> + __u32 sequence;
> + __u32 __reserved;
> +};
Is there a reason that flags must be u64? At a glance all flags fields
(except the one in iommu_hwpt_vtd_s1) in iommufd uAPIs are u32
which can cut the size of the header by half...
> +void iommufd_veventq_abort(struct iommufd_object *obj)
> +{
> + struct iommufd_eventq *eventq =
> + container_of(obj, struct iommufd_eventq, obj);
> + struct iommufd_veventq *veventq = eventq_to_veventq(eventq);
> + struct iommufd_viommu *viommu = veventq->viommu;
> + struct iommufd_vevent *cur, *next;
> +
> + lockdep_assert_held_write(&viommu->veventqs_rwsem);
> +
> + list_for_each_entry_safe(cur, next, &eventq->deliver, node) {
> + list_del(&cur->node);
> + kfree(cur);
kfree() doesn't apply to the overflow node.
otherwise it looks good to me:
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v6 05/14] iommufd: Add IOMMUFD_OBJ_VEVENTQ and IOMMUFD_CMD_VEVENTQ_ALLOC
2025-02-18 5:13 ` Tian, Kevin
@ 2025-02-18 17:53 ` Nicolin Chen
0 siblings, 0 replies; 60+ messages in thread
From: Nicolin Chen @ 2025-02-18 17:53 UTC (permalink / raw)
To: Tian, Kevin
Cc: jgg@nvidia.com, corbet@lwn.net, will@kernel.org, joro@8bytes.org,
suravee.suthikulpanit@amd.com, robin.murphy@arm.com,
dwmw2@infradead.org, baolu.lu@linux.intel.com, shuah@kernel.org,
linux-kernel@vger.kernel.org, iommu@lists.linux.dev,
linux-arm-kernel@lists.infradead.org,
linux-kselftest@vger.kernel.org, linux-doc@vger.kernel.org,
eric.auger@redhat.com, jean-philippe@linaro.org, mdf@kernel.org,
mshavit@google.com, shameerali.kolothum.thodi@huawei.com,
smostafa@google.com, ddutile@redhat.com, Liu, Yi L,
patches@lists.linux.dev
On Tue, Feb 18, 2025 at 05:13:47AM +0000, Tian, Kevin wrote:
> > From: Nicolin Chen <nicolinc@nvidia.com>
> > Sent: Saturday, January 25, 2025 8:31 AM
> > +
> > +/*
> > + * An iommufd_veventq object represents an interface to deliver vIOMMU
> > events to
> > + * the user space. It is created/destroyed by the user space and associated
> > with
> > + * vIOMMU object(s) during the allocations.
>
> s/object(s)/object/, given the eventq cannot be shared between vIOMMUs.
Done. Adding an "a" too.
> > +static inline void iommufd_vevent_handler(struct iommufd_veventq
> > *veventq,
> > + struct iommufd_vevent *vevent)
> > +{
> > + struct iommufd_eventq *eventq = &veventq->common;
> > +
> > + /*
> > + * Remove the overflow node and add the new node at the same
> > time. Note
> > + * it is possible that vevent == &veventq->overflow for sequence
> > update
> > + */
> > + spin_lock(&eventq->lock);
> > + if (veventq->overflow.on_list) {
> > + list_del(&veventq->overflow.node);
> > + veventq->overflow.on_list = false;
> > + }
>
> We can save one field 'on_list' in every entry by:
>
> if (list_is_last(&veventq->overflow.node, &eventq->deliver))
> list_del(&veventq->overflow.node);
Hmm. Given that the overflow node, if being on the list, should be
always the last one... yes!
> > +struct iommufd_vevent_header {
> > + __aligned_u64 flags;
> > + __u32 sequence;
> > + __u32 __reserved;
> > +};
>
> Is there a reason that flags must be u64? At a glance all flags fields
> (except the one in iommu_hwpt_vtd_s1) in iommufd uAPIs are u32
> which can cut the size of the header by half...
Not having a particular reason yet. Just thought that a 64-bit
could make the uAPI more expandable. It's true that u32 would
be cleaner. I will make a change.
>
> > +void iommufd_veventq_abort(struct iommufd_object *obj)
> > +{
> > + struct iommufd_eventq *eventq =
> > + container_of(obj, struct iommufd_eventq, obj);
> > + struct iommufd_veventq *veventq = eventq_to_veventq(eventq);
> > + struct iommufd_viommu *viommu = veventq->viommu;
> > + struct iommufd_vevent *cur, *next;
> > +
> > + lockdep_assert_held_write(&viommu->veventqs_rwsem);
> > +
> > + list_for_each_entry_safe(cur, next, &eventq->deliver, node) {
> > + list_del(&cur->node);
> > + kfree(cur);
>
> kfree() doesn't apply to the overflow node.
Oh right, that's missed.
> otherwise it looks good to me:
>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Thanks!
Nicolin
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v6 05/14] iommufd: Add IOMMUFD_OBJ_VEVENTQ and IOMMUFD_CMD_VEVENTQ_ALLOC
2025-01-25 0:30 ` [PATCH v6 05/14] iommufd: Add IOMMUFD_OBJ_VEVENTQ and IOMMUFD_CMD_VEVENTQ_ALLOC Nicolin Chen
2025-02-18 5:13 ` Tian, Kevin
@ 2025-02-18 15:29 ` Jason Gunthorpe
2025-02-18 17:47 ` Nicolin Chen
1 sibling, 1 reply; 60+ messages in thread
From: Jason Gunthorpe @ 2025-02-18 15:29 UTC (permalink / raw)
To: Nicolin Chen
Cc: kevin.tian, corbet, will, joro, suravee.suthikulpanit,
robin.murphy, dwmw2, baolu.lu, shuah, linux-kernel, iommu,
linux-arm-kernel, linux-kselftest, linux-doc, eric.auger,
jean-philippe, mdf, mshavit, shameerali.kolothum.thodi, smostafa,
ddutile, yi.l.liu, patches
On Fri, Jan 24, 2025 at 04:30:34PM -0800, Nicolin Chen wrote:
> + list_add_tail(&vevent->node, &eventq->deliver);
> + vevent->on_list = true;
> + vevent->header.sequence = atomic_read(&veventq->sequence);
> + if (atomic_read(&veventq->sequence) == INT_MAX)
> + atomic_set(&veventq->sequence, 0);
> + else
> + atomic_inc(&veventq->sequence);
> + spin_unlock(&eventq->lock);
This is all locked, we don't need veventq->sequence to be an atomic?
The bounding can be done with some simple math:
veventq->sequence = (veventq->sequence + 1) & INT_MAX;
> +static struct iommufd_vevent *
> +iommufd_veventq_deliver_fetch(struct iommufd_veventq *veventq)
> +{
> + struct iommufd_eventq *eventq = &veventq->common;
> + struct list_head *list = &eventq->deliver;
> + struct iommufd_vevent *vevent = NULL;
> +
> + spin_lock(&eventq->lock);
> + if (!list_empty(list)) {
> + vevent = list_first_entry(list, struct iommufd_vevent, node);
> + list_del(&vevent->node);
> + vevent->on_list = false;
> + }
> + /* Make a copy of the overflow node for copy_to_user */
> + if (vevent == &veventq->overflow) {
> + vevent = kzalloc(sizeof(*vevent), GFP_ATOMIC);
> + if (vevent)
> + memcpy(vevent, &veventq->overflow, sizeof(*vevent));
> + }
This error handling is wonky, if we can't allocate then we shouldn't
have done the list_del. Just return NULL which will cause
iommufd_veventq_fops_read() to exist and userspace will try again.
> @@ -403,6 +531,10 @@ static int iommufd_eventq_fops_release(struct inode *inode, struct file *filep)
> {
> struct iommufd_eventq *eventq = filep->private_data;
>
> + if (eventq->obj.type == IOMMUFD_OBJ_VEVENTQ) {
> + atomic_set(&eventq_to_veventq(eventq)->sequence, 0);
> + atomic_set(&eventq_to_veventq(eventq)->num_events, 0);
> + }
Why? We are about to free the memory?
> +int iommufd_veventq_alloc(struct iommufd_ucmd *ucmd)
> +{
> + struct iommu_veventq_alloc *cmd = ucmd->cmd;
> + struct iommufd_veventq *veventq;
> + struct iommufd_viommu *viommu;
> + int fdno;
> + int rc;
> +
> + if (cmd->flags || cmd->type == IOMMU_VEVENTQ_TYPE_DEFAULT)
> + return -EOPNOTSUPP;
> + if (!cmd->veventq_depth)
> + return -EINVAL;
Check __reserved for 0 too
Jason
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v6 05/14] iommufd: Add IOMMUFD_OBJ_VEVENTQ and IOMMUFD_CMD_VEVENTQ_ALLOC
2025-02-18 15:29 ` Jason Gunthorpe
@ 2025-02-18 17:47 ` Nicolin Chen
2025-02-18 18:08 ` Jason Gunthorpe
0 siblings, 1 reply; 60+ messages in thread
From: Nicolin Chen @ 2025-02-18 17:47 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: kevin.tian, corbet, will, joro, suravee.suthikulpanit,
robin.murphy, dwmw2, baolu.lu, shuah, linux-kernel, iommu,
linux-arm-kernel, linux-kselftest, linux-doc, eric.auger,
jean-philippe, mdf, mshavit, shameerali.kolothum.thodi, smostafa,
ddutile, yi.l.liu, patches
On Tue, Feb 18, 2025 at 11:29:59AM -0400, Jason Gunthorpe wrote:
> On Fri, Jan 24, 2025 at 04:30:34PM -0800, Nicolin Chen wrote:
> > + list_add_tail(&vevent->node, &eventq->deliver);
> > + vevent->on_list = true;
> > + vevent->header.sequence = atomic_read(&veventq->sequence);
> > + if (atomic_read(&veventq->sequence) == INT_MAX)
> > + atomic_set(&veventq->sequence, 0);
> > + else
> > + atomic_inc(&veventq->sequence);
> > + spin_unlock(&eventq->lock);
>
> This is all locked, we don't need veventq->sequence to be an atomic?
>
> The bounding can be done with some simple math:
>
> veventq->sequence = (veventq->sequence + 1) & INT_MAX;
Ack. Perhaps we can reuse eventq->lock to fence @num_events too.
> > +static struct iommufd_vevent *
> > +iommufd_veventq_deliver_fetch(struct iommufd_veventq *veventq)
> > +{
> > + struct iommufd_eventq *eventq = &veventq->common;
> > + struct list_head *list = &eventq->deliver;
> > + struct iommufd_vevent *vevent = NULL;
> > +
> > + spin_lock(&eventq->lock);
> > + if (!list_empty(list)) {
> > + vevent = list_first_entry(list, struct iommufd_vevent, node);
> > + list_del(&vevent->node);
> > + vevent->on_list = false;
> > + }
> > + /* Make a copy of the overflow node for copy_to_user */
> > + if (vevent == &veventq->overflow) {
> > + vevent = kzalloc(sizeof(*vevent), GFP_ATOMIC);
> > + if (vevent)
> > + memcpy(vevent, &veventq->overflow, sizeof(*vevent));
> > + }
>
> This error handling is wonky, if we can't allocate then we shouldn't
> have done the list_del. Just return NULL which will cause
> iommufd_veventq_fops_read() to exist and userspace will try again.
OK.
We have two cases to support here:
1) Normal vevent node -- list_del and return the node.
2) Overflow node -- list_del and return a copy.
I think we can do:
if (!list_empty(list)) {
struct iommufd_vevent *next;
next = list_first_entry(list, struct iommufd_vevent, node);
if (next == &veventq->overflow) {
/* Make a copy of the overflow node for copy_to_user */
vevent = kzalloc(sizeof(*vevent), GFP_ATOMIC);
if (!vevent)
goto out_unlock;
}
list_del(&next->node);
if (vevent)
memcpy(vevent, next, sizeof(*vevent));
else
vevent = next;
}
> > @@ -403,6 +531,10 @@ static int iommufd_eventq_fops_release(struct inode *inode, struct file *filep)
> > {
> > struct iommufd_eventq *eventq = filep->private_data;
> >
> > + if (eventq->obj.type == IOMMUFD_OBJ_VEVENTQ) {
> > + atomic_set(&eventq_to_veventq(eventq)->sequence, 0);
> > + atomic_set(&eventq_to_veventq(eventq)->num_events, 0);
> > + }
>
> Why? We are about to free the memory?
Ack. I thought about a re-entry of an open(). But release() does
lose the event_fd completely, and user space wouldn't be able to
open the same fd again.
> > +int iommufd_veventq_alloc(struct iommufd_ucmd *ucmd)
> > +{
> > + struct iommu_veventq_alloc *cmd = ucmd->cmd;
> > + struct iommufd_veventq *veventq;
> > + struct iommufd_viommu *viommu;
> > + int fdno;
> > + int rc;
> > +
> > + if (cmd->flags || cmd->type == IOMMU_VEVENTQ_TYPE_DEFAULT)
> > + return -EOPNOTSUPP;
> > + if (!cmd->veventq_depth)
> > + return -EINVAL;
>
> Check __reserved for 0 too
Kevin is suggesting a 32-bit flag field, so I think we can drop
the __reserved in that case.
Thanks
Nicolin
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v6 05/14] iommufd: Add IOMMUFD_OBJ_VEVENTQ and IOMMUFD_CMD_VEVENTQ_ALLOC
2025-02-18 17:47 ` Nicolin Chen
@ 2025-02-18 18:08 ` Jason Gunthorpe
2025-02-18 18:15 ` Nicolin Chen
0 siblings, 1 reply; 60+ messages in thread
From: Jason Gunthorpe @ 2025-02-18 18:08 UTC (permalink / raw)
To: Nicolin Chen
Cc: kevin.tian, corbet, will, joro, suravee.suthikulpanit,
robin.murphy, dwmw2, baolu.lu, shuah, linux-kernel, iommu,
linux-arm-kernel, linux-kselftest, linux-doc, eric.auger,
jean-philippe, mdf, mshavit, shameerali.kolothum.thodi, smostafa,
ddutile, yi.l.liu, patches
On Tue, Feb 18, 2025 at 09:47:50AM -0800, Nicolin Chen wrote:
> I think we can do:
> if (!list_empty(list)) {
> struct iommufd_vevent *next;
>
> next = list_first_entry(list, struct iommufd_vevent, node);
> if (next == &veventq->overflow) {
> /* Make a copy of the overflow node for copy_to_user */
> vevent = kzalloc(sizeof(*vevent), GFP_ATOMIC);
> if (!vevent)
> goto out_unlock;
> }
> list_del(&next->node);
> if (vevent)
> memcpy(vevent, next, sizeof(*vevent));
> else
> vevent = next;
> }
That looks right
> > > +int iommufd_veventq_alloc(struct iommufd_ucmd *ucmd)
> > > +{
> > > + struct iommu_veventq_alloc *cmd = ucmd->cmd;
> > > + struct iommufd_veventq *veventq;
> > > + struct iommufd_viommu *viommu;
> > > + int fdno;
> > > + int rc;
> > > +
> > > + if (cmd->flags || cmd->type == IOMMU_VEVENTQ_TYPE_DEFAULT)
> > > + return -EOPNOTSUPP;
> > > + if (!cmd->veventq_depth)
> > > + return -EINVAL;
> >
> > Check __reserved for 0 too
>
> Kevin is suggesting a 32-bit flag field, so I think we can drop
> the __reserved in that case.
Those are different structs?
Jason
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v6 05/14] iommufd: Add IOMMUFD_OBJ_VEVENTQ and IOMMUFD_CMD_VEVENTQ_ALLOC
2025-02-18 18:08 ` Jason Gunthorpe
@ 2025-02-18 18:15 ` Nicolin Chen
0 siblings, 0 replies; 60+ messages in thread
From: Nicolin Chen @ 2025-02-18 18:15 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: kevin.tian, corbet, will, joro, suravee.suthikulpanit,
robin.murphy, dwmw2, baolu.lu, shuah, linux-kernel, iommu,
linux-arm-kernel, linux-kselftest, linux-doc, eric.auger,
jean-philippe, mdf, mshavit, shameerali.kolothum.thodi, smostafa,
ddutile, yi.l.liu, patches
On Tue, Feb 18, 2025 at 02:08:05PM -0400, Jason Gunthorpe wrote:
> On Tue, Feb 18, 2025 at 09:47:50AM -0800, Nicolin Chen wrote:
> > > > +int iommufd_veventq_alloc(struct iommufd_ucmd *ucmd)
> > > > +{
> > > > + struct iommu_veventq_alloc *cmd = ucmd->cmd;
> > > > + struct iommufd_veventq *veventq;
> > > > + struct iommufd_viommu *viommu;
> > > > + int fdno;
> > > > + int rc;
> > > > +
> > > > + if (cmd->flags || cmd->type == IOMMU_VEVENTQ_TYPE_DEFAULT)
> > > > + return -EOPNOTSUPP;
> > > > + if (!cmd->veventq_depth)
> > > > + return -EINVAL;
> > >
> > > Check __reserved for 0 too
> >
> > Kevin is suggesting a 32-bit flag field, so I think we can drop
> > the __reserved in that case.
>
> Those are different structs?
Oops. Right. I will check the __reserved.
Thanks
Nicolin
^ permalink raw reply [flat|nested] 60+ messages in thread
* [PATCH v6 06/14] iommufd/viommu: Add iommufd_viommu_get_vdev_id helper
2025-01-25 0:30 [PATCH v6 00/14] iommufd: Add vIOMMU infrastructure (Part-3: vEVENTQ) Nicolin Chen
` (4 preceding siblings ...)
2025-01-25 0:30 ` [PATCH v6 05/14] iommufd: Add IOMMUFD_OBJ_VEVENTQ and IOMMUFD_CMD_VEVENTQ_ALLOC Nicolin Chen
@ 2025-01-25 0:30 ` Nicolin Chen
2025-02-18 15:31 ` Jason Gunthorpe
2025-01-25 0:30 ` [PATCH v6 07/14] iommufd/viommu: Add iommufd_viommu_report_event helper Nicolin Chen
` (8 subsequent siblings)
14 siblings, 1 reply; 60+ messages in thread
From: Nicolin Chen @ 2025-01-25 0:30 UTC (permalink / raw)
To: jgg, kevin.tian, corbet, will
Cc: joro, suravee.suthikulpanit, robin.murphy, dwmw2, baolu.lu, shuah,
linux-kernel, iommu, linux-arm-kernel, linux-kselftest, linux-doc,
eric.auger, jean-philippe, mdf, mshavit,
shameerali.kolothum.thodi, smostafa, ddutile, yi.l.liu, patches
This is a reverse search v.s. iommufd_viommu_find_dev, as drivers may want
to convert a struct device pointer (physical) to its virtual device ID for
an event injection to the user space VM.
Again, this avoids exposing more core structures to the drivers, than the
iommufd_viommu alone.
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
include/linux/iommufd.h | 9 +++++++++
drivers/iommu/iommufd/driver.c | 24 ++++++++++++++++++++++++
2 files changed, 33 insertions(+)
diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h
index 8948b1836940..05cb393aff0a 100644
--- a/include/linux/iommufd.h
+++ b/include/linux/iommufd.h
@@ -190,6 +190,8 @@ struct iommufd_object *_iommufd_object_alloc(struct iommufd_ctx *ictx,
enum iommufd_object_type type);
struct device *iommufd_viommu_find_dev(struct iommufd_viommu *viommu,
unsigned long vdev_id);
+int iommufd_viommu_get_vdev_id(struct iommufd_viommu *viommu,
+ struct device *dev, unsigned long *vdev_id);
#else /* !CONFIG_IOMMUFD_DRIVER_CORE */
static inline struct iommufd_object *
_iommufd_object_alloc(struct iommufd_ctx *ictx, size_t size,
@@ -203,6 +205,13 @@ iommufd_viommu_find_dev(struct iommufd_viommu *viommu, unsigned long vdev_id)
{
return NULL;
}
+
+static inline int iommufd_viommu_get_vdev_id(struct iommufd_viommu *viommu,
+ struct device *dev,
+ unsigned long *vdev_id)
+{
+ return -ENOENT;
+}
#endif /* CONFIG_IOMMUFD_DRIVER_CORE */
/*
diff --git a/drivers/iommu/iommufd/driver.c b/drivers/iommu/iommufd/driver.c
index 2d98b04ff1cb..185c4fde8987 100644
--- a/drivers/iommu/iommufd/driver.c
+++ b/drivers/iommu/iommufd/driver.c
@@ -49,5 +49,29 @@ struct device *iommufd_viommu_find_dev(struct iommufd_viommu *viommu,
}
EXPORT_SYMBOL_NS_GPL(iommufd_viommu_find_dev, "IOMMUFD");
+/* Return -ENOENT if device is not associated to the vIOMMU */
+int iommufd_viommu_get_vdev_id(struct iommufd_viommu *viommu,
+ struct device *dev, unsigned long *vdev_id)
+{
+ struct iommufd_vdevice *vdev;
+ unsigned long index;
+ int rc = -ENOENT;
+
+ if (WARN_ON_ONCE(!vdev_id))
+ return -EINVAL;
+
+ xa_lock(&viommu->vdevs);
+ xa_for_each(&viommu->vdevs, index, vdev) {
+ if (vdev->dev == dev) {
+ *vdev_id = (unsigned long)vdev->id;
+ rc = 0;
+ break;
+ }
+ }
+ xa_unlock(&viommu->vdevs);
+ return rc;
+}
+EXPORT_SYMBOL_NS_GPL(iommufd_viommu_get_vdev_id, "IOMMUFD");
+
MODULE_DESCRIPTION("iommufd code shared with builtin modules");
MODULE_LICENSE("GPL");
--
2.43.0
^ permalink raw reply related [flat|nested] 60+ messages in thread
* Re: [PATCH v6 06/14] iommufd/viommu: Add iommufd_viommu_get_vdev_id helper
2025-01-25 0:30 ` [PATCH v6 06/14] iommufd/viommu: Add iommufd_viommu_get_vdev_id helper Nicolin Chen
@ 2025-02-18 15:31 ` Jason Gunthorpe
2025-02-20 5:17 ` Nicolin Chen
0 siblings, 1 reply; 60+ messages in thread
From: Jason Gunthorpe @ 2025-02-18 15:31 UTC (permalink / raw)
To: Nicolin Chen
Cc: kevin.tian, corbet, will, joro, suravee.suthikulpanit,
robin.murphy, dwmw2, baolu.lu, shuah, linux-kernel, iommu,
linux-arm-kernel, linux-kselftest, linux-doc, eric.auger,
jean-philippe, mdf, mshavit, shameerali.kolothum.thodi, smostafa,
ddutile, yi.l.liu, patches
On Fri, Jan 24, 2025 at 04:30:35PM -0800, Nicolin Chen wrote:
> This is a reverse search v.s. iommufd_viommu_find_dev, as drivers may want
> to convert a struct device pointer (physical) to its virtual device ID for
> an event injection to the user space VM.
>
> Again, this avoids exposing more core structures to the drivers, than the
> iommufd_viommu alone.
>
> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> ---
> include/linux/iommufd.h | 9 +++++++++
> drivers/iommu/iommufd/driver.c | 24 ++++++++++++++++++++++++
> 2 files changed, 33 insertions(+)
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> + xa_lock(&viommu->vdevs);
> + xa_for_each(&viommu->vdevs, index, vdev) {
> + if (vdev->dev == dev) {
> + *vdev_id = (unsigned long)vdev->id;
I don't think we need this cast
Jason
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v6 06/14] iommufd/viommu: Add iommufd_viommu_get_vdev_id helper
2025-02-18 15:31 ` Jason Gunthorpe
@ 2025-02-20 5:17 ` Nicolin Chen
2025-02-20 16:19 ` Jason Gunthorpe
0 siblings, 1 reply; 60+ messages in thread
From: Nicolin Chen @ 2025-02-20 5:17 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: kevin.tian, corbet, will, joro, suravee.suthikulpanit,
robin.murphy, dwmw2, baolu.lu, shuah, linux-kernel, iommu,
linux-arm-kernel, linux-kselftest, linux-doc, eric.auger,
jean-philippe, mdf, mshavit, shameerali.kolothum.thodi, smostafa,
ddutile, yi.l.liu, patches
On Tue, Feb 18, 2025 at 11:31:54AM -0400, Jason Gunthorpe wrote:
> On Fri, Jan 24, 2025 at 04:30:35PM -0800, Nicolin Chen wrote:
> > This is a reverse search v.s. iommufd_viommu_find_dev, as drivers may want
> > to convert a struct device pointer (physical) to its virtual device ID for
> > an event injection to the user space VM.
> >
> > Again, this avoids exposing more core structures to the drivers, than the
> > iommufd_viommu alone.
> >
> > Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
> > Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> > Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> > ---
> > include/linux/iommufd.h | 9 +++++++++
> > drivers/iommu/iommufd/driver.c | 24 ++++++++++++++++++++++++
> > 2 files changed, 33 insertions(+)
>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
>
> > + xa_lock(&viommu->vdevs);
> > + xa_for_each(&viommu->vdevs, index, vdev) {
> > + if (vdev->dev == dev) {
> > + *vdev_id = (unsigned long)vdev->id;
>
> I don't think we need this cast
The left side is ulong for xarray index, while the right side is
__aligned_u64 for uAPI. Could there be a gcc warning when somebody
builds the kernel having a BITS_PER_LONG=32?
iommufd_vdevice_alloc_ioctl() does test vdev->id against ULONG_MAX
though..
Thanks
Nicolin
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v6 06/14] iommufd/viommu: Add iommufd_viommu_get_vdev_id helper
2025-02-20 5:17 ` Nicolin Chen
@ 2025-02-20 16:19 ` Jason Gunthorpe
0 siblings, 0 replies; 60+ messages in thread
From: Jason Gunthorpe @ 2025-02-20 16:19 UTC (permalink / raw)
To: Nicolin Chen
Cc: kevin.tian, corbet, will, joro, suravee.suthikulpanit,
robin.murphy, dwmw2, baolu.lu, shuah, linux-kernel, iommu,
linux-arm-kernel, linux-kselftest, linux-doc, eric.auger,
jean-philippe, mdf, mshavit, shameerali.kolothum.thodi, smostafa,
ddutile, yi.l.liu, patches
On Wed, Feb 19, 2025 at 09:17:18PM -0800, Nicolin Chen wrote:
> On Tue, Feb 18, 2025 at 11:31:54AM -0400, Jason Gunthorpe wrote:
> > On Fri, Jan 24, 2025 at 04:30:35PM -0800, Nicolin Chen wrote:
> > > This is a reverse search v.s. iommufd_viommu_find_dev, as drivers may want
> > > to convert a struct device pointer (physical) to its virtual device ID for
> > > an event injection to the user space VM.
> > >
> > > Again, this avoids exposing more core structures to the drivers, than the
> > > iommufd_viommu alone.
> > >
> > > Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
> > > Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> > > Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> > > ---
> > > include/linux/iommufd.h | 9 +++++++++
> > > drivers/iommu/iommufd/driver.c | 24 ++++++++++++++++++++++++
> > > 2 files changed, 33 insertions(+)
> >
> > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> >
> > > + xa_lock(&viommu->vdevs);
> > > + xa_for_each(&viommu->vdevs, index, vdev) {
> > > + if (vdev->dev == dev) {
> > > + *vdev_id = (unsigned long)vdev->id;
> >
> > I don't think we need this cast
>
> The left side is ulong for xarray index, while the right side is
> __aligned_u64 for uAPI. Could there be a gcc warning when somebody
> builds the kernel having a BITS_PER_LONG=32?
No. The kernel is full of these implicit casts
Jason
^ permalink raw reply [flat|nested] 60+ messages in thread
* [PATCH v6 07/14] iommufd/viommu: Add iommufd_viommu_report_event helper
2025-01-25 0:30 [PATCH v6 00/14] iommufd: Add vIOMMU infrastructure (Part-3: vEVENTQ) Nicolin Chen
` (5 preceding siblings ...)
2025-01-25 0:30 ` [PATCH v6 06/14] iommufd/viommu: Add iommufd_viommu_get_vdev_id helper Nicolin Chen
@ 2025-01-25 0:30 ` Nicolin Chen
2025-02-18 5:14 ` Tian, Kevin
2025-02-18 15:35 ` Jason Gunthorpe
2025-01-25 0:30 ` [PATCH v6 08/14] iommufd/selftest: Require vdev_id when attaching to a nested domain Nicolin Chen
` (7 subsequent siblings)
14 siblings, 2 replies; 60+ messages in thread
From: Nicolin Chen @ 2025-01-25 0:30 UTC (permalink / raw)
To: jgg, kevin.tian, corbet, will
Cc: joro, suravee.suthikulpanit, robin.murphy, dwmw2, baolu.lu, shuah,
linux-kernel, iommu, linux-arm-kernel, linux-kselftest, linux-doc,
eric.auger, jean-philippe, mdf, mshavit,
shameerali.kolothum.thodi, smostafa, ddutile, yi.l.liu, patches
Similar to iommu_report_device_fault, this allows IOMMU drivers to report
vIOMMU events from threaded IRQ handlers to user space hypervisors.
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
include/linux/iommufd.h | 11 +++++++++
drivers/iommu/iommufd/driver.c | 45 ++++++++++++++++++++++++++++++++++
2 files changed, 56 insertions(+)
diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h
index 05cb393aff0a..60eff9272551 100644
--- a/include/linux/iommufd.h
+++ b/include/linux/iommufd.h
@@ -11,6 +11,7 @@
#include <linux/refcount.h>
#include <linux/types.h>
#include <linux/xarray.h>
+#include <uapi/linux/iommufd.h>
struct device;
struct file;
@@ -192,6 +193,9 @@ struct device *iommufd_viommu_find_dev(struct iommufd_viommu *viommu,
unsigned long vdev_id);
int iommufd_viommu_get_vdev_id(struct iommufd_viommu *viommu,
struct device *dev, unsigned long *vdev_id);
+int iommufd_viommu_report_event(struct iommufd_viommu *viommu,
+ enum iommu_veventq_type type, void *event_data,
+ size_t data_len);
#else /* !CONFIG_IOMMUFD_DRIVER_CORE */
static inline struct iommufd_object *
_iommufd_object_alloc(struct iommufd_ctx *ictx, size_t size,
@@ -212,6 +216,13 @@ static inline int iommufd_viommu_get_vdev_id(struct iommufd_viommu *viommu,
{
return -ENOENT;
}
+
+static inline int iommufd_viommu_report_event(struct iommufd_viommu *viommu,
+ enum iommu_veventq_type type,
+ void *event_data, size_t data_len)
+{
+ return -EOPNOTSUPP;
+}
#endif /* CONFIG_IOMMUFD_DRIVER_CORE */
/*
diff --git a/drivers/iommu/iommufd/driver.c b/drivers/iommu/iommufd/driver.c
index 185c4fde8987..9e52ce66204c 100644
--- a/drivers/iommu/iommufd/driver.c
+++ b/drivers/iommu/iommufd/driver.c
@@ -73,5 +73,50 @@ int iommufd_viommu_get_vdev_id(struct iommufd_viommu *viommu,
}
EXPORT_SYMBOL_NS_GPL(iommufd_viommu_get_vdev_id, "IOMMUFD");
+/*
+ * Typically called in driver's threaded IRQ handler.
+ * The @type and @event_data must be defined in include/uapi/linux/iommufd.h
+ */
+int iommufd_viommu_report_event(struct iommufd_viommu *viommu,
+ enum iommu_veventq_type type, void *event_data,
+ size_t data_len)
+{
+ struct iommufd_veventq *veventq;
+ struct iommufd_vevent *vevent;
+ int rc = 0;
+
+ if (WARN_ON_ONCE(!data_len || !event_data))
+ return -EINVAL;
+
+ down_read(&viommu->veventqs_rwsem);
+
+ veventq = iommufd_viommu_find_veventq(viommu, type);
+ if (!veventq) {
+ rc = -EOPNOTSUPP;
+ goto out_unlock_veventqs;
+ }
+
+ if (atomic_read(&veventq->num_events) == veventq->depth) {
+ vevent = &veventq->overflow;
+ goto out_set_header;
+ }
+
+ vevent = kmalloc(struct_size(vevent, event_data, data_len), GFP_KERNEL);
+ if (!vevent) {
+ rc = -ENOMEM;
+ goto out_unlock_veventqs;
+ }
+ memcpy(vevent->event_data, event_data, data_len);
+ vevent->data_len = data_len;
+ atomic_inc(&veventq->num_events);
+
+out_set_header:
+ iommufd_vevent_handler(veventq, vevent);
+out_unlock_veventqs:
+ up_read(&viommu->veventqs_rwsem);
+ return rc;
+}
+EXPORT_SYMBOL_NS_GPL(iommufd_viommu_report_event, "IOMMUFD");
+
MODULE_DESCRIPTION("iommufd code shared with builtin modules");
MODULE_LICENSE("GPL");
--
2.43.0
^ permalink raw reply related [flat|nested] 60+ messages in thread
* RE: [PATCH v6 07/14] iommufd/viommu: Add iommufd_viommu_report_event helper
2025-01-25 0:30 ` [PATCH v6 07/14] iommufd/viommu: Add iommufd_viommu_report_event helper Nicolin Chen
@ 2025-02-18 5:14 ` Tian, Kevin
2025-02-18 15:35 ` Jason Gunthorpe
1 sibling, 0 replies; 60+ messages in thread
From: Tian, Kevin @ 2025-02-18 5:14 UTC (permalink / raw)
To: Nicolin Chen, jgg@nvidia.com, corbet@lwn.net, will@kernel.org
Cc: joro@8bytes.org, suravee.suthikulpanit@amd.com,
robin.murphy@arm.com, dwmw2@infradead.org,
baolu.lu@linux.intel.com, shuah@kernel.org,
linux-kernel@vger.kernel.org, iommu@lists.linux.dev,
linux-arm-kernel@lists.infradead.org,
linux-kselftest@vger.kernel.org, linux-doc@vger.kernel.org,
eric.auger@redhat.com, jean-philippe@linaro.org, mdf@kernel.org,
mshavit@google.com, shameerali.kolothum.thodi@huawei.com,
smostafa@google.com, ddutile@redhat.com, Liu, Yi L,
patches@lists.linux.dev
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Saturday, January 25, 2025 8:31 AM
>
> Similar to iommu_report_device_fault, this allows IOMMU drivers to report
> vIOMMU events from threaded IRQ handlers to user space hypervisors.
>
> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v6 07/14] iommufd/viommu: Add iommufd_viommu_report_event helper
2025-01-25 0:30 ` [PATCH v6 07/14] iommufd/viommu: Add iommufd_viommu_report_event helper Nicolin Chen
2025-02-18 5:14 ` Tian, Kevin
@ 2025-02-18 15:35 ` Jason Gunthorpe
2025-02-19 6:58 ` Tian, Kevin
1 sibling, 1 reply; 60+ messages in thread
From: Jason Gunthorpe @ 2025-02-18 15:35 UTC (permalink / raw)
To: Nicolin Chen
Cc: kevin.tian, corbet, will, joro, suravee.suthikulpanit,
robin.murphy, dwmw2, baolu.lu, shuah, linux-kernel, iommu,
linux-arm-kernel, linux-kselftest, linux-doc, eric.auger,
jean-philippe, mdf, mshavit, shameerali.kolothum.thodi, smostafa,
ddutile, yi.l.liu, patches
On Fri, Jan 24, 2025 at 04:30:36PM -0800, Nicolin Chen wrote:
> +int iommufd_viommu_report_event(struct iommufd_viommu *viommu,
> + enum iommu_veventq_type type, void *event_data,
> + size_t data_len)
> +{
> + struct iommufd_veventq *veventq;
> + struct iommufd_vevent *vevent;
> + int rc = 0;
> +
> + if (WARN_ON_ONCE(!data_len || !event_data))
> + return -EINVAL;
> +
> + down_read(&viommu->veventqs_rwsem);
> +
> + veventq = iommufd_viommu_find_veventq(viommu, type);
> + if (!veventq) {
> + rc = -EOPNOTSUPP;
> + goto out_unlock_veventqs;
> + }
> +
> + if (atomic_read(&veventq->num_events) == veventq->depth) {
> + vevent = &veventq->overflow;
> + goto out_set_header;
> + }
> +
> + vevent = kmalloc(struct_size(vevent, event_data, data_len), GFP_KERNEL);
> + if (!vevent) {
> + rc = -ENOMEM;
> + goto out_unlock_veventqs;
This should record an overflow too
Jason
^ permalink raw reply [flat|nested] 60+ messages in thread
* RE: [PATCH v6 07/14] iommufd/viommu: Add iommufd_viommu_report_event helper
2025-02-18 15:35 ` Jason Gunthorpe
@ 2025-02-19 6:58 ` Tian, Kevin
2025-02-20 21:16 ` Nicolin Chen
0 siblings, 1 reply; 60+ messages in thread
From: Tian, Kevin @ 2025-02-19 6:58 UTC (permalink / raw)
To: Jason Gunthorpe, Nicolin Chen
Cc: corbet@lwn.net, will@kernel.org, joro@8bytes.org,
suravee.suthikulpanit@amd.com, robin.murphy@arm.com,
dwmw2@infradead.org, baolu.lu@linux.intel.com, shuah@kernel.org,
linux-kernel@vger.kernel.org, iommu@lists.linux.dev,
linux-arm-kernel@lists.infradead.org,
linux-kselftest@vger.kernel.org, linux-doc@vger.kernel.org,
eric.auger@redhat.com, jean-philippe@linaro.org, mdf@kernel.org,
mshavit@google.com, shameerali.kolothum.thodi@huawei.com,
smostafa@google.com, ddutile@redhat.com, Liu, Yi L,
patches@lists.linux.dev
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, February 18, 2025 11:36 PM
>
> On Fri, Jan 24, 2025 at 04:30:36PM -0800, Nicolin Chen wrote:
> > +int iommufd_viommu_report_event(struct iommufd_viommu *viommu,
> > + enum iommu_veventq_type type, void
> *event_data,
> > + size_t data_len)
> > +{
> > + struct iommufd_veventq *veventq;
> > + struct iommufd_vevent *vevent;
> > + int rc = 0;
> > +
> > + if (WARN_ON_ONCE(!data_len || !event_data))
> > + return -EINVAL;
> > +
> > + down_read(&viommu->veventqs_rwsem);
> > +
> > + veventq = iommufd_viommu_find_veventq(viommu, type);
> > + if (!veventq) {
> > + rc = -EOPNOTSUPP;
> > + goto out_unlock_veventqs;
> > + }
> > +
> > + if (atomic_read(&veventq->num_events) == veventq->depth) {
> > + vevent = &veventq->overflow;
> > + goto out_set_header;
> > + }
> > +
> > + vevent = kmalloc(struct_size(vevent, event_data, data_len),
> GFP_KERNEL);
> > + if (!vevent) {
> > + rc = -ENOMEM;
> > + goto out_unlock_veventqs;
>
> This should record an overflow too
>
In that case probably we want to rename 'overflow' to 'lost_event'
which counts lost events for whatever reasons (overflow, oom, etc.)
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v6 07/14] iommufd/viommu: Add iommufd_viommu_report_event helper
2025-02-19 6:58 ` Tian, Kevin
@ 2025-02-20 21:16 ` Nicolin Chen
2025-02-21 4:27 ` Tian, Kevin
0 siblings, 1 reply; 60+ messages in thread
From: Nicolin Chen @ 2025-02-20 21:16 UTC (permalink / raw)
To: Jason Gunthorpe, Tian, Kevin
Cc: corbet@lwn.net, will@kernel.org, joro@8bytes.org,
suravee.suthikulpanit@amd.com, robin.murphy@arm.com,
dwmw2@infradead.org, baolu.lu@linux.intel.com, shuah@kernel.org,
linux-kernel@vger.kernel.org, iommu@lists.linux.dev,
linux-arm-kernel@lists.infradead.org,
linux-kselftest@vger.kernel.org, linux-doc@vger.kernel.org,
eric.auger@redhat.com, jean-philippe@linaro.org, mdf@kernel.org,
mshavit@google.com, shameerali.kolothum.thodi@huawei.com,
smostafa@google.com, ddutile@redhat.com, Liu, Yi L,
patches@lists.linux.dev
On Wed, Feb 19, 2025 at 06:58:16AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Tuesday, February 18, 2025 11:36 PM
> >
> > On Fri, Jan 24, 2025 at 04:30:36PM -0800, Nicolin Chen wrote:
> > > +int iommufd_viommu_report_event(struct iommufd_viommu *viommu,
> > > + enum iommu_veventq_type type, void
> > *event_data,
> > > + size_t data_len)
> > > +{
> > > + struct iommufd_veventq *veventq;
> > > + struct iommufd_vevent *vevent;
> > > + int rc = 0;
> > > +
> > > + if (WARN_ON_ONCE(!data_len || !event_data))
> > > + return -EINVAL;
> > > +
> > > + down_read(&viommu->veventqs_rwsem);
> > > +
> > > + veventq = iommufd_viommu_find_veventq(viommu, type);
> > > + if (!veventq) {
> > > + rc = -EOPNOTSUPP;
> > > + goto out_unlock_veventqs;
> > > + }
> > > +
> > > + if (atomic_read(&veventq->num_events) == veventq->depth) {
> > > + vevent = &veventq->overflow;
> > > + goto out_set_header;
> > > + }
> > > +
> > > + vevent = kmalloc(struct_size(vevent, event_data, data_len),
> > GFP_KERNEL);
> > > + if (!vevent) {
> > > + rc = -ENOMEM;
> > > + goto out_unlock_veventqs;
> >
> > This should record an overflow too
> >
>
> In that case probably we want to rename 'overflow' to 'lost_event'
> which counts lost events for whatever reasons (overflow, oom, etc.)
Naming-wise, I think it may sound better to call the overflow node
just an 'exception' that concludes with lost eventsfor different
reasons.
A complication is that this 'exception' would give userspace a very
implicit reason as the node just report the number of lost events
v.s. providing the reason to each lost event.
With this, maybe the IOMMU_VEVENTQ_FLAG_OVERFLOW in the uAPI should
be just IOMMU_VEVENTQ_FLAG_EXCEPTION? Any better idea?
Thanks
Nicolin
^ permalink raw reply [flat|nested] 60+ messages in thread
* RE: [PATCH v6 07/14] iommufd/viommu: Add iommufd_viommu_report_event helper
2025-02-20 21:16 ` Nicolin Chen
@ 2025-02-21 4:27 ` Tian, Kevin
2025-02-21 13:39 ` Jason Gunthorpe
0 siblings, 1 reply; 60+ messages in thread
From: Tian, Kevin @ 2025-02-21 4:27 UTC (permalink / raw)
To: Nicolin Chen, Jason Gunthorpe
Cc: corbet@lwn.net, will@kernel.org, joro@8bytes.org,
suravee.suthikulpanit@amd.com, robin.murphy@arm.com,
dwmw2@infradead.org, baolu.lu@linux.intel.com, shuah@kernel.org,
linux-kernel@vger.kernel.org, iommu@lists.linux.dev,
linux-arm-kernel@lists.infradead.org,
linux-kselftest@vger.kernel.org, linux-doc@vger.kernel.org,
eric.auger@redhat.com, jean-philippe@linaro.org, mdf@kernel.org,
mshavit@google.com, shameerali.kolothum.thodi@huawei.com,
smostafa@google.com, ddutile@redhat.com, Liu, Yi L,
patches@lists.linux.dev
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Friday, February 21, 2025 5:16 AM
>
> On Wed, Feb 19, 2025 at 06:58:16AM +0000, Tian, Kevin wrote:
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > Sent: Tuesday, February 18, 2025 11:36 PM
> > >
> > > On Fri, Jan 24, 2025 at 04:30:36PM -0800, Nicolin Chen wrote:
> > > > +int iommufd_viommu_report_event(struct iommufd_viommu
> *viommu,
> > > > + enum iommu_veventq_type type, void
> > > *event_data,
> > > > + size_t data_len)
> > > > +{
> > > > + struct iommufd_veventq *veventq;
> > > > + struct iommufd_vevent *vevent;
> > > > + int rc = 0;
> > > > +
> > > > + if (WARN_ON_ONCE(!data_len || !event_data))
> > > > + return -EINVAL;
> > > > +
> > > > + down_read(&viommu->veventqs_rwsem);
> > > > +
> > > > + veventq = iommufd_viommu_find_veventq(viommu, type);
> > > > + if (!veventq) {
> > > > + rc = -EOPNOTSUPP;
> > > > + goto out_unlock_veventqs;
> > > > + }
> > > > +
> > > > + if (atomic_read(&veventq->num_events) == veventq->depth) {
> > > > + vevent = &veventq->overflow;
> > > > + goto out_set_header;
> > > > + }
> > > > +
> > > > + vevent = kmalloc(struct_size(vevent, event_data, data_len),
> > > GFP_KERNEL);
> > > > + if (!vevent) {
> > > > + rc = -ENOMEM;
> > > > + goto out_unlock_veventqs;
> > >
> > > This should record an overflow too
> > >
> >
> > In that case probably we want to rename 'overflow' to 'lost_event'
> > which counts lost events for whatever reasons (overflow, oom, etc.)
>
> Naming-wise, I think it may sound better to call the overflow node
> just an 'exception' that concludes with lost eventsfor different
> reasons.
>
> A complication is that this 'exception' would give userspace a very
> implicit reason as the node just report the number of lost events
> v.s. providing the reason to each lost event.
>
> With this, maybe the IOMMU_VEVENTQ_FLAG_OVERFLOW in the uAPI
> should
> be just IOMMU_VEVENTQ_FLAG_EXCEPTION? Any better idea?
>
'Exception' is broader compared to lost events.
We may view it in this way. When the overflow node is not used, the
user detects the sequence gap between two event entries. In that
case the gap literally reveals about the number of lost events. No
reason for why those events are lost.
With that in mind we don't need provide reason for the static node,
and IOMMU_EVENTQ_FLAG_LOST_EVENTS sounds closer to the
real intention.
Given it's a renaming, let's have consensus from Jason before you
make a change.
Thanks
Kevin
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v6 07/14] iommufd/viommu: Add iommufd_viommu_report_event helper
2025-02-21 4:27 ` Tian, Kevin
@ 2025-02-21 13:39 ` Jason Gunthorpe
0 siblings, 0 replies; 60+ messages in thread
From: Jason Gunthorpe @ 2025-02-21 13:39 UTC (permalink / raw)
To: Tian, Kevin
Cc: Nicolin Chen, corbet@lwn.net, will@kernel.org, joro@8bytes.org,
suravee.suthikulpanit@amd.com, robin.murphy@arm.com,
dwmw2@infradead.org, baolu.lu@linux.intel.com, shuah@kernel.org,
linux-kernel@vger.kernel.org, iommu@lists.linux.dev,
linux-arm-kernel@lists.infradead.org,
linux-kselftest@vger.kernel.org, linux-doc@vger.kernel.org,
eric.auger@redhat.com, jean-philippe@linaro.org, mdf@kernel.org,
mshavit@google.com, shameerali.kolothum.thodi@huawei.com,
smostafa@google.com, ddutile@redhat.com, Liu, Yi L,
patches@lists.linux.dev
On Fri, Feb 21, 2025 at 04:27:32AM +0000, Tian, Kevin wrote:
> With that in mind we don't need provide reason for the static node,
> and IOMMU_EVENTQ_FLAG_LOST_EVENTS sounds closer to the
> real intention.
I also like lost events - it is simple and to the point
Jason
^ permalink raw reply [flat|nested] 60+ messages in thread
* [PATCH v6 08/14] iommufd/selftest: Require vdev_id when attaching to a nested domain
2025-01-25 0:30 [PATCH v6 00/14] iommufd: Add vIOMMU infrastructure (Part-3: vEVENTQ) Nicolin Chen
` (6 preceding siblings ...)
2025-01-25 0:30 ` [PATCH v6 07/14] iommufd/viommu: Add iommufd_viommu_report_event helper Nicolin Chen
@ 2025-01-25 0:30 ` Nicolin Chen
2025-02-18 5:15 ` Tian, Kevin
2025-01-25 0:30 ` [PATCH v6 09/14] iommufd/selftest: Add IOMMU_TEST_OP_TRIGGER_VEVENT for vEVENTQ coverage Nicolin Chen
` (6 subsequent siblings)
14 siblings, 1 reply; 60+ messages in thread
From: Nicolin Chen @ 2025-01-25 0:30 UTC (permalink / raw)
To: jgg, kevin.tian, corbet, will
Cc: joro, suravee.suthikulpanit, robin.murphy, dwmw2, baolu.lu, shuah,
linux-kernel, iommu, linux-arm-kernel, linux-kselftest, linux-doc,
eric.auger, jean-philippe, mdf, mshavit,
shameerali.kolothum.thodi, smostafa, ddutile, yi.l.liu, patches
When attaching a device to a vIOMMU-based nested domain, vdev_id must be
present. Add a piece of code hard-requesting it, preparing for a vEVENTQ
support in the following patch. Then, update the TEST_F.
A HWPT-based nested domain will return a NULL new_viommu, thus no such a
vDEVICE requirement.
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
drivers/iommu/iommufd/selftest.c | 24 ++++++++++++++++++++++++
tools/testing/selftests/iommu/iommufd.c | 5 +++++
2 files changed, 29 insertions(+)
diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c
index a0de6d6d4e68..d786561359c4 100644
--- a/drivers/iommu/iommufd/selftest.c
+++ b/drivers/iommu/iommufd/selftest.c
@@ -161,7 +161,10 @@ enum selftest_obj_type {
struct mock_dev {
struct device dev;
+ struct mock_viommu *viommu;
+ struct rw_semaphore viommu_rwsem;
unsigned long flags;
+ unsigned long vdev_id;
int id;
u32 cache[MOCK_DEV_CACHE_NUM];
};
@@ -193,10 +196,30 @@ static int mock_domain_nop_attach(struct iommu_domain *domain,
struct device *dev)
{
struct mock_dev *mdev = to_mock_dev(dev);
+ struct mock_viommu *new_viommu = NULL;
+ unsigned long vdev_id = 0;
+ int rc;
if (domain->dirty_ops && (mdev->flags & MOCK_FLAGS_DEVICE_NO_DIRTY))
return -EINVAL;
+ iommu_group_mutex_assert(dev);
+ if (domain->type == IOMMU_DOMAIN_NESTED) {
+ new_viommu = to_mock_nested(domain)->mock_viommu;
+ if (new_viommu) {
+ rc = iommufd_viommu_get_vdev_id(&new_viommu->core, dev,
+ &vdev_id);
+ if (rc)
+ return rc;
+ }
+ }
+ if (new_viommu != mdev->viommu) {
+ down_write(&mdev->viommu_rwsem);
+ mdev->viommu = new_viommu;
+ mdev->vdev_id = vdev_id;
+ up_write(&mdev->viommu_rwsem);
+ }
+
return 0;
}
@@ -861,6 +884,7 @@ static struct mock_dev *mock_dev_create(unsigned long dev_flags)
if (!mdev)
return ERR_PTR(-ENOMEM);
+ init_rwsem(&mdev->viommu_rwsem);
device_initialize(&mdev->dev);
mdev->flags = dev_flags;
mdev->dev.release = mock_dev_release;
diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
index a1b2b657999d..212e5d62e13d 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -2736,6 +2736,7 @@ TEST_F(iommufd_viommu, viommu_alloc_nested_iopf)
uint32_t iopf_hwpt_id;
uint32_t fault_id;
uint32_t fault_fd;
+ uint32_t vdev_id;
if (self->device_id) {
test_ioctl_fault_alloc(&fault_id, &fault_fd);
@@ -2752,6 +2753,10 @@ TEST_F(iommufd_viommu, viommu_alloc_nested_iopf)
&iopf_hwpt_id, IOMMU_HWPT_DATA_SELFTEST, &data,
sizeof(data));
+ /* Must allocate vdevice before attaching to a nested hwpt */
+ test_err_mock_domain_replace(ENOENT, self->stdev_id,
+ iopf_hwpt_id);
+ test_cmd_vdevice_alloc(viommu_id, dev_id, 0x99, &vdev_id);
test_cmd_mock_domain_replace(self->stdev_id, iopf_hwpt_id);
EXPECT_ERRNO(EBUSY,
_test_ioctl_destroy(self->fd, iopf_hwpt_id));
--
2.43.0
^ permalink raw reply related [flat|nested] 60+ messages in thread
* RE: [PATCH v6 08/14] iommufd/selftest: Require vdev_id when attaching to a nested domain
2025-01-25 0:30 ` [PATCH v6 08/14] iommufd/selftest: Require vdev_id when attaching to a nested domain Nicolin Chen
@ 2025-02-18 5:15 ` Tian, Kevin
0 siblings, 0 replies; 60+ messages in thread
From: Tian, Kevin @ 2025-02-18 5:15 UTC (permalink / raw)
To: Nicolin Chen, jgg@nvidia.com, corbet@lwn.net, will@kernel.org
Cc: joro@8bytes.org, suravee.suthikulpanit@amd.com,
robin.murphy@arm.com, dwmw2@infradead.org,
baolu.lu@linux.intel.com, shuah@kernel.org,
linux-kernel@vger.kernel.org, iommu@lists.linux.dev,
linux-arm-kernel@lists.infradead.org,
linux-kselftest@vger.kernel.org, linux-doc@vger.kernel.org,
eric.auger@redhat.com, jean-philippe@linaro.org, mdf@kernel.org,
mshavit@google.com, shameerali.kolothum.thodi@huawei.com,
smostafa@google.com, ddutile@redhat.com, Liu, Yi L,
patches@lists.linux.dev
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Saturday, January 25, 2025 8:31 AM
>
> When attaching a device to a vIOMMU-based nested domain, vdev_id must
> be
> present. Add a piece of code hard-requesting it, preparing for a vEVENTQ
> support in the following patch. Then, update the TEST_F.
>
> A HWPT-based nested domain will return a NULL new_viommu, thus no such
> a
> vDEVICE requirement.
>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
^ permalink raw reply [flat|nested] 60+ messages in thread
* [PATCH v6 09/14] iommufd/selftest: Add IOMMU_TEST_OP_TRIGGER_VEVENT for vEVENTQ coverage
2025-01-25 0:30 [PATCH v6 00/14] iommufd: Add vIOMMU infrastructure (Part-3: vEVENTQ) Nicolin Chen
` (7 preceding siblings ...)
2025-01-25 0:30 ` [PATCH v6 08/14] iommufd/selftest: Require vdev_id when attaching to a nested domain Nicolin Chen
@ 2025-01-25 0:30 ` Nicolin Chen
2025-02-18 5:16 ` Tian, Kevin
2025-01-25 0:30 ` [PATCH v6 10/14] iommufd/selftest: Add IOMMU_VEVENTQ_ALLOC test coverage Nicolin Chen
` (5 subsequent siblings)
14 siblings, 1 reply; 60+ messages in thread
From: Nicolin Chen @ 2025-01-25 0:30 UTC (permalink / raw)
To: jgg, kevin.tian, corbet, will
Cc: joro, suravee.suthikulpanit, robin.murphy, dwmw2, baolu.lu, shuah,
linux-kernel, iommu, linux-arm-kernel, linux-kselftest, linux-doc,
eric.auger, jean-philippe, mdf, mshavit,
shameerali.kolothum.thodi, smostafa, ddutile, yi.l.liu, patches
The handler will get vDEVICE object from the given mdev and convert it to
its per-vIOMMU virtual ID to mimic a real IOMMU driver.
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
drivers/iommu/iommufd/iommufd_test.h | 10 ++++++++++
drivers/iommu/iommufd/selftest.c | 30 ++++++++++++++++++++++++++++
2 files changed, 40 insertions(+)
diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h
index a6b7a163f636..87e9165cea27 100644
--- a/drivers/iommu/iommufd/iommufd_test.h
+++ b/drivers/iommu/iommufd/iommufd_test.h
@@ -24,6 +24,7 @@ enum {
IOMMU_TEST_OP_MD_CHECK_IOTLB,
IOMMU_TEST_OP_TRIGGER_IOPF,
IOMMU_TEST_OP_DEV_CHECK_CACHE,
+ IOMMU_TEST_OP_TRIGGER_VEVENT,
};
enum {
@@ -145,6 +146,9 @@ struct iommu_test_cmd {
__u32 id;
__u32 cache;
} check_dev_cache;
+ struct {
+ __u32 dev_id;
+ } trigger_vevent;
};
__u32 last;
};
@@ -212,4 +216,10 @@ struct iommu_viommu_invalidate_selftest {
__u32 cache_id;
};
+#define IOMMU_VEVENTQ_TYPE_SELFTEST 0xbeefbeef
+
+struct iommu_viommu_event_selftest {
+ __u32 virt_id;
+};
+
#endif
diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c
index d786561359c4..0ebaaf795676 100644
--- a/drivers/iommu/iommufd/selftest.c
+++ b/drivers/iommu/iommufd/selftest.c
@@ -1632,6 +1632,34 @@ static int iommufd_test_trigger_iopf(struct iommufd_ucmd *ucmd,
return 0;
}
+static int iommufd_test_trigger_vevent(struct iommufd_ucmd *ucmd,
+ struct iommu_test_cmd *cmd)
+{
+ struct iommu_viommu_event_selftest test = {};
+ struct iommufd_device *idev;
+ struct mock_dev *mdev;
+ int rc = -ENOENT;
+
+ idev = iommufd_get_device(ucmd, cmd->trigger_vevent.dev_id);
+ if (IS_ERR(idev))
+ return PTR_ERR(idev);
+ mdev = to_mock_dev(idev->dev);
+
+ down_read(&mdev->viommu_rwsem);
+ if (!mdev->viommu || !mdev->vdev_id)
+ goto out_unlock;
+
+ test.virt_id = mdev->vdev_id;
+ rc = iommufd_viommu_report_event(&mdev->viommu->core,
+ IOMMU_VEVENTQ_TYPE_SELFTEST, &test,
+ sizeof(test));
+out_unlock:
+ up_read(&mdev->viommu_rwsem);
+ iommufd_put_object(ucmd->ictx, &idev->obj);
+
+ return rc;
+}
+
void iommufd_selftest_destroy(struct iommufd_object *obj)
{
struct selftest_obj *sobj = to_selftest_obj(obj);
@@ -1713,6 +1741,8 @@ int iommufd_test(struct iommufd_ucmd *ucmd)
cmd->dirty.flags);
case IOMMU_TEST_OP_TRIGGER_IOPF:
return iommufd_test_trigger_iopf(ucmd, cmd);
+ case IOMMU_TEST_OP_TRIGGER_VEVENT:
+ return iommufd_test_trigger_vevent(ucmd, cmd);
default:
return -EOPNOTSUPP;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 60+ messages in thread
* RE: [PATCH v6 09/14] iommufd/selftest: Add IOMMU_TEST_OP_TRIGGER_VEVENT for vEVENTQ coverage
2025-01-25 0:30 ` [PATCH v6 09/14] iommufd/selftest: Add IOMMU_TEST_OP_TRIGGER_VEVENT for vEVENTQ coverage Nicolin Chen
@ 2025-02-18 5:16 ` Tian, Kevin
0 siblings, 0 replies; 60+ messages in thread
From: Tian, Kevin @ 2025-02-18 5:16 UTC (permalink / raw)
To: Nicolin Chen, jgg@nvidia.com, corbet@lwn.net, will@kernel.org
Cc: joro@8bytes.org, suravee.suthikulpanit@amd.com,
robin.murphy@arm.com, dwmw2@infradead.org,
baolu.lu@linux.intel.com, shuah@kernel.org,
linux-kernel@vger.kernel.org, iommu@lists.linux.dev,
linux-arm-kernel@lists.infradead.org,
linux-kselftest@vger.kernel.org, linux-doc@vger.kernel.org,
eric.auger@redhat.com, jean-philippe@linaro.org, mdf@kernel.org,
mshavit@google.com, shameerali.kolothum.thodi@huawei.com,
smostafa@google.com, ddutile@redhat.com, Liu, Yi L,
patches@lists.linux.dev
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Saturday, January 25, 2025 8:31 AM
>
> The handler will get vDEVICE object from the given mdev and convert it to
> its per-vIOMMU virtual ID to mimic a real IOMMU driver.
>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
^ permalink raw reply [flat|nested] 60+ messages in thread
* [PATCH v6 10/14] iommufd/selftest: Add IOMMU_VEVENTQ_ALLOC test coverage
2025-01-25 0:30 [PATCH v6 00/14] iommufd: Add vIOMMU infrastructure (Part-3: vEVENTQ) Nicolin Chen
` (8 preceding siblings ...)
2025-01-25 0:30 ` [PATCH v6 09/14] iommufd/selftest: Add IOMMU_TEST_OP_TRIGGER_VEVENT for vEVENTQ coverage Nicolin Chen
@ 2025-01-25 0:30 ` Nicolin Chen
2025-02-18 5:19 ` Tian, Kevin
2025-01-25 0:30 ` [PATCH v6 11/14] Documentation: userspace-api: iommufd: Update FAULT and VEVENTQ Nicolin Chen
` (4 subsequent siblings)
14 siblings, 1 reply; 60+ messages in thread
From: Nicolin Chen @ 2025-01-25 0:30 UTC (permalink / raw)
To: jgg, kevin.tian, corbet, will
Cc: joro, suravee.suthikulpanit, robin.murphy, dwmw2, baolu.lu, shuah,
linux-kernel, iommu, linux-arm-kernel, linux-kselftest, linux-doc,
eric.auger, jean-philippe, mdf, mshavit,
shameerali.kolothum.thodi, smostafa, ddutile, yi.l.liu, patches
Trigger vEVENTs by feeding an idev ID and validating the returned output
virt_ids whether they equal to the value that was set to the vDEVICE.
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
tools/testing/selftests/iommu/iommufd_utils.h | 115 ++++++++++++++++++
tools/testing/selftests/iommu/iommufd.c | 31 +++++
.../selftests/iommu/iommufd_fail_nth.c | 7 ++
3 files changed, 153 insertions(+)
diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h
index d979f5b0efe8..38ff1c51dd97 100644
--- a/tools/testing/selftests/iommu/iommufd_utils.h
+++ b/tools/testing/selftests/iommu/iommufd_utils.h
@@ -9,6 +9,7 @@
#include <sys/ioctl.h>
#include <stdint.h>
#include <assert.h>
+#include <poll.h>
#include "../kselftest_harness.h"
#include "../../../../drivers/iommu/iommufd/iommufd_test.h"
@@ -936,3 +937,117 @@ static int _test_cmd_vdevice_alloc(int fd, __u32 viommu_id, __u32 idev_id,
EXPECT_ERRNO(_errno, \
_test_cmd_vdevice_alloc(self->fd, viommu_id, idev_id, \
virt_id, vdev_id))
+
+static int _test_cmd_veventq_alloc(int fd, __u32 viommu_id, __u32 type,
+ __u32 *veventq_id, __u32 *veventq_fd)
+{
+ struct iommu_veventq_alloc cmd = {
+ .size = sizeof(cmd),
+ .type = type,
+ .veventq_depth = 2,
+ .viommu_id = viommu_id,
+ };
+ int ret;
+
+ ret = ioctl(fd, IOMMU_VEVENTQ_ALLOC, &cmd);
+ if (ret)
+ return ret;
+ if (veventq_id)
+ *veventq_id = cmd.out_veventq_id;
+ if (veventq_fd)
+ *veventq_fd = cmd.out_veventq_fd;
+ return 0;
+}
+
+#define test_cmd_veventq_alloc(viommu_id, type, veventq_id, veventq_fd) \
+ ASSERT_EQ(0, _test_cmd_veventq_alloc(self->fd, viommu_id, type, \
+ veventq_id, veventq_fd))
+#define test_err_veventq_alloc(_errno, viommu_id, type, veventq_id, \
+ veventq_fd) \
+ EXPECT_ERRNO(_errno, \
+ _test_cmd_veventq_alloc(self->fd, viommu_id, type, \
+ veventq_id, veventq_fd))
+
+static int _test_cmd_trigger_vevents(int fd, __u32 dev_id, __u32 nvevents)
+{
+ struct iommu_test_cmd trigger_vevent_cmd = {
+ .size = sizeof(trigger_vevent_cmd),
+ .op = IOMMU_TEST_OP_TRIGGER_VEVENT,
+ .trigger_vevent = {
+ .dev_id = dev_id,
+ },
+ };
+ int ret;
+
+ while (nvevents--) {
+ ret = ioctl(fd, _IOMMU_TEST_CMD(IOMMU_TEST_OP_TRIGGER_VEVENT),
+ &trigger_vevent_cmd);
+ if (ret < 0)
+ return -1;
+ }
+ return ret;
+}
+
+#define test_cmd_trigger_vevents(dev_id, nvevents) \
+ ASSERT_EQ(0, _test_cmd_trigger_vevents(self->fd, dev_id, nvevents))
+
+static int _test_cmd_read_vevents(int fd, __u32 event_fd, __u32 nvevents,
+ __u32 virt_id, int *prev_seq)
+{
+ struct pollfd pollfd = { .fd = event_fd, .events = POLLIN };
+ struct iommu_viommu_event_selftest *event;
+ struct iommufd_vevent_header *hdr;
+ ssize_t bytes;
+ void *data;
+ int ret, i;
+
+ ret = poll(&pollfd, 1, 1000);
+ if (ret < 0)
+ return -1;
+
+ data = calloc(nvevents, sizeof(*hdr) + sizeof(*event));
+ if (!data) {
+ errno = ENOMEM;
+ return -1;
+ }
+
+ bytes = read(event_fd, data,
+ nvevents * (sizeof(*hdr) + sizeof(*event)));
+ if (bytes <= 0) {
+ errno = EFAULT;
+ ret = -1;
+ goto out_free;
+ }
+
+ for (i = 0; i < nvevents; i++) {
+ hdr = data + i * (sizeof(*hdr) + sizeof(*event));
+
+ if (hdr->flags & IOMMU_VEVENTQ_FLAG_OVERFLOW ||
+ hdr->sequence - *prev_seq > 1) {
+ *prev_seq = hdr->sequence;
+ errno = EOVERFLOW;
+ ret = -1;
+ goto out_free;
+ }
+ *prev_seq = hdr->sequence;
+ event = data + sizeof(*hdr);
+ if (event->virt_id != virt_id) {
+ errno = EINVAL;
+ ret = -1;
+ goto out_free;
+ }
+ }
+
+ ret = 0;
+out_free:
+ free(data);
+ return ret;
+}
+
+#define test_cmd_read_vevents(event_fd, nvevents, virt_id, prev_seq) \
+ ASSERT_EQ(0, _test_cmd_read_vevents(self->fd, event_fd, nvevents, \
+ virt_id, prev_seq))
+#define test_err_read_vevents(_errno, event_fd, nvevents, virt_id, prev_seq) \
+ EXPECT_ERRNO(_errno, \
+ _test_cmd_read_vevents(self->fd, event_fd, nvevents, \
+ virt_id, prev_seq))
diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
index 212e5d62e13d..dd453aae8fed 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -2774,15 +2774,46 @@ TEST_F(iommufd_viommu, vdevice_alloc)
uint32_t viommu_id = self->viommu_id;
uint32_t dev_id = self->device_id;
uint32_t vdev_id = 0;
+ uint32_t veventq_id;
+ uint32_t veventq_fd;
+ int prev_seq = -1;
if (dev_id) {
+ /* Must allocate vdevice before attaching to a nested hwpt */
+ test_err_mock_domain_replace(ENOENT, self->stdev_id,
+ self->nested_hwpt_id);
+
+ /* Allocate a vEVENTQ with veventq_depth=2 */
+ test_cmd_veventq_alloc(viommu_id, IOMMU_VEVENTQ_TYPE_SELFTEST,
+ &veventq_id, &veventq_fd);
+ test_err_veventq_alloc(EEXIST, viommu_id,
+ IOMMU_VEVENTQ_TYPE_SELFTEST, NULL, NULL);
/* Set vdev_id to 0x99, unset it, and set to 0x88 */
test_cmd_vdevice_alloc(viommu_id, dev_id, 0x99, &vdev_id);
+ test_cmd_mock_domain_replace(self->stdev_id,
+ self->nested_hwpt_id);
+ test_cmd_trigger_vevents(dev_id, 1);
+ test_cmd_read_vevents(veventq_fd, 1, 0x99, &prev_seq);
test_err_vdevice_alloc(EEXIST, viommu_id, dev_id, 0x99,
&vdev_id);
+ test_cmd_mock_domain_replace(self->stdev_id, self->ioas_id);
test_ioctl_destroy(vdev_id);
+
+ /* Try again with 0x88 */
test_cmd_vdevice_alloc(viommu_id, dev_id, 0x88, &vdev_id);
+ test_cmd_mock_domain_replace(self->stdev_id,
+ self->nested_hwpt_id);
+ /* Trigger an overflow with three events */
+ test_cmd_trigger_vevents(dev_id, 3);
+ test_err_read_vevents(EOVERFLOW, veventq_fd, 3, 0x88,
+ &prev_seq);
+ /* Overflow must be gone after the previous reads */
+ test_cmd_trigger_vevents(dev_id, 1);
+ test_cmd_read_vevents(veventq_fd, 1, 0x88, &prev_seq);
+ close(veventq_fd);
+ test_cmd_mock_domain_replace(self->stdev_id, self->ioas_id);
test_ioctl_destroy(vdev_id);
+ test_ioctl_destroy(veventq_id);
} else {
test_err_vdevice_alloc(ENOENT, viommu_id, dev_id, 0x99, NULL);
}
diff --git a/tools/testing/selftests/iommu/iommufd_fail_nth.c b/tools/testing/selftests/iommu/iommufd_fail_nth.c
index 64b1f8e1b0cf..99a7f7897bb2 100644
--- a/tools/testing/selftests/iommu/iommufd_fail_nth.c
+++ b/tools/testing/selftests/iommu/iommufd_fail_nth.c
@@ -620,6 +620,7 @@ TEST_FAIL_NTH(basic_fail_nth, device)
};
struct iommu_test_hw_info info;
uint32_t fault_id, fault_fd;
+ uint32_t veventq_id, veventq_fd;
uint32_t fault_hwpt_id;
uint32_t ioas_id;
uint32_t ioas_id2;
@@ -692,6 +693,12 @@ TEST_FAIL_NTH(basic_fail_nth, device)
IOMMU_HWPT_DATA_SELFTEST, &data, sizeof(data)))
return -1;
+ if (_test_cmd_veventq_alloc(self->fd, viommu_id,
+ IOMMU_VEVENTQ_TYPE_SELFTEST, &veventq_id,
+ &veventq_fd))
+ return -1;
+ close(veventq_fd);
+
return 0;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 60+ messages in thread
* RE: [PATCH v6 10/14] iommufd/selftest: Add IOMMU_VEVENTQ_ALLOC test coverage
2025-01-25 0:30 ` [PATCH v6 10/14] iommufd/selftest: Add IOMMU_VEVENTQ_ALLOC test coverage Nicolin Chen
@ 2025-02-18 5:19 ` Tian, Kevin
0 siblings, 0 replies; 60+ messages in thread
From: Tian, Kevin @ 2025-02-18 5:19 UTC (permalink / raw)
To: Nicolin Chen, jgg@nvidia.com, corbet@lwn.net, will@kernel.org
Cc: joro@8bytes.org, suravee.suthikulpanit@amd.com,
robin.murphy@arm.com, dwmw2@infradead.org,
baolu.lu@linux.intel.com, shuah@kernel.org,
linux-kernel@vger.kernel.org, iommu@lists.linux.dev,
linux-arm-kernel@lists.infradead.org,
linux-kselftest@vger.kernel.org, linux-doc@vger.kernel.org,
eric.auger@redhat.com, jean-philippe@linaro.org, mdf@kernel.org,
mshavit@google.com, shameerali.kolothum.thodi@huawei.com,
smostafa@google.com, ddutile@redhat.com, Liu, Yi L,
patches@lists.linux.dev
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Saturday, January 25, 2025 8:31 AM
>
> Trigger vEVENTs by feeding an idev ID and validating the returned output
> virt_ids whether they equal to the value that was set to the vDEVICE.
>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
^ permalink raw reply [flat|nested] 60+ messages in thread
* [PATCH v6 11/14] Documentation: userspace-api: iommufd: Update FAULT and VEVENTQ
2025-01-25 0:30 [PATCH v6 00/14] iommufd: Add vIOMMU infrastructure (Part-3: vEVENTQ) Nicolin Chen
` (9 preceding siblings ...)
2025-01-25 0:30 ` [PATCH v6 10/14] iommufd/selftest: Add IOMMU_VEVENTQ_ALLOC test coverage Nicolin Chen
@ 2025-01-25 0:30 ` Nicolin Chen
2025-01-28 8:21 ` Bagas Sanjaya
2025-02-18 17:02 ` Jason Gunthorpe
2025-01-25 0:30 ` [PATCH v6 12/14] iommu/arm-smmu-v3: Introduce struct arm_smmu_vmaster Nicolin Chen
` (3 subsequent siblings)
14 siblings, 2 replies; 60+ messages in thread
From: Nicolin Chen @ 2025-01-25 0:30 UTC (permalink / raw)
To: jgg, kevin.tian, corbet, will
Cc: joro, suravee.suthikulpanit, robin.murphy, dwmw2, baolu.lu, shuah,
linux-kernel, iommu, linux-arm-kernel, linux-kselftest, linux-doc,
eric.auger, jean-philippe, mdf, mshavit,
shameerali.kolothum.thodi, smostafa, ddutile, yi.l.liu, patches
With the introduction of the new objects, update the doc to reflect that.
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
Documentation/userspace-api/iommufd.rst | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/Documentation/userspace-api/iommufd.rst b/Documentation/userspace-api/iommufd.rst
index 70289d6815d2..b0df15865dec 100644
--- a/Documentation/userspace-api/iommufd.rst
+++ b/Documentation/userspace-api/iommufd.rst
@@ -63,6 +63,13 @@ Following IOMMUFD objects are exposed to userspace:
space usually has mappings from guest-level I/O virtual addresses to guest-
level physical addresses.
+- IOMMUFD_FAULT, representing a software queue for an HWPT reporting IO page
+ faults using the IOMMU HW's PRI (Page Request Interface). This queue object
+ provides user space an FD to poll the page fault events and also to respond
+ to those events. A FAULT object must be created first to get a fault_id that
+ could be then used to allocate a fault-enabled HWPT via the IOMMU_HWPT_ALLOC
+ command by setting the IOMMU_HWPT_FAULT_ID_VALID bit in its flags field.
+
- IOMMUFD_OBJ_VIOMMU, representing a slice of the physical IOMMU instance,
passed to or shared with a VM. It may be some HW-accelerated virtualization
features and some SW resources used by the VM. For examples:
@@ -109,6 +116,14 @@ Following IOMMUFD objects are exposed to userspace:
vIOMMU, which is a separate ioctl call from attaching the same device to an
HWPT_PAGING that the vIOMMU holds.
+- IOMMUFD_OBJ_VEVENTQ, representing a software queue for a vIOMMU to report its
+ events such as translation faults occurred to a nested stage-1 (excluding I/O
+ page faults that should go through IOMMUFD_OBJ_FAULT) and HW-specific events.
+ This queue object provides user space an FD to poll/read the vIOMMU events. A
+ vIOMMU object must be created first to get its viommu_id, which could be then
+ used to allocate a vEVENTQ. Each vIOMMU can support multiple types of vEVENTS,
+ but is confined to one vEVENTQ per vEVENTQ type.
+
All user-visible objects are destroyed via the IOMMU_DESTROY uAPI.
The diagrams below show relationships between user-visible objects and kernel
@@ -251,8 +266,10 @@ User visible objects are backed by following datastructures:
- iommufd_device for IOMMUFD_OBJ_DEVICE.
- iommufd_hwpt_paging for IOMMUFD_OBJ_HWPT_PAGING.
- iommufd_hwpt_nested for IOMMUFD_OBJ_HWPT_NESTED.
+- iommufd_fault for IOMMUFD_OBJ_FAULT.
- iommufd_viommu for IOMMUFD_OBJ_VIOMMU.
- iommufd_vdevice for IOMMUFD_OBJ_VDEVICE.
+- iommufd_veventq for IOMMUFD_OBJ_VEVENTQ.
Several terminologies when looking at these datastructures:
--
2.43.0
^ permalink raw reply related [flat|nested] 60+ messages in thread
* Re: [PATCH v6 11/14] Documentation: userspace-api: iommufd: Update FAULT and VEVENTQ
2025-01-25 0:30 ` [PATCH v6 11/14] Documentation: userspace-api: iommufd: Update FAULT and VEVENTQ Nicolin Chen
@ 2025-01-28 8:21 ` Bagas Sanjaya
2025-02-18 17:02 ` Jason Gunthorpe
1 sibling, 0 replies; 60+ messages in thread
From: Bagas Sanjaya @ 2025-01-28 8:21 UTC (permalink / raw)
To: Nicolin Chen, jgg, kevin.tian, corbet, will
Cc: joro, suravee.suthikulpanit, robin.murphy, dwmw2, baolu.lu, shuah,
linux-kernel, iommu, linux-arm-kernel, linux-kselftest, linux-doc,
eric.auger, jean-philippe, mdf, mshavit,
shameerali.kolothum.thodi, smostafa, ddutile, yi.l.liu, patches
[-- Attachment #1: Type: text/plain, Size: 2779 bytes --]
On Fri, Jan 24, 2025 at 04:30:40PM -0800, Nicolin Chen wrote:
> diff --git a/Documentation/userspace-api/iommufd.rst b/Documentation/userspace-api/iommufd.rst
> index 70289d6815d2..b0df15865dec 100644
> --- a/Documentation/userspace-api/iommufd.rst
> +++ b/Documentation/userspace-api/iommufd.rst
> @@ -63,6 +63,13 @@ Following IOMMUFD objects are exposed to userspace:
> space usually has mappings from guest-level I/O virtual addresses to guest-
> level physical addresses.
>
> +- IOMMUFD_FAULT, representing a software queue for an HWPT reporting IO page
> + faults using the IOMMU HW's PRI (Page Request Interface). This queue object
> + provides user space an FD to poll the page fault events and also to respond
> + to those events. A FAULT object must be created first to get a fault_id that
> + could be then used to allocate a fault-enabled HWPT via the IOMMU_HWPT_ALLOC
> + command by setting the IOMMU_HWPT_FAULT_ID_VALID bit in its flags field.
> +
> - IOMMUFD_OBJ_VIOMMU, representing a slice of the physical IOMMU instance,
> passed to or shared with a VM. It may be some HW-accelerated virtualization
> features and some SW resources used by the VM. For examples:
> @@ -109,6 +116,14 @@ Following IOMMUFD objects are exposed to userspace:
> vIOMMU, which is a separate ioctl call from attaching the same device to an
> HWPT_PAGING that the vIOMMU holds.
>
> +- IOMMUFD_OBJ_VEVENTQ, representing a software queue for a vIOMMU to report its
> + events such as translation faults occurred to a nested stage-1 (excluding I/O
> + page faults that should go through IOMMUFD_OBJ_FAULT) and HW-specific events.
> + This queue object provides user space an FD to poll/read the vIOMMU events. A
> + vIOMMU object must be created first to get its viommu_id, which could be then
> + used to allocate a vEVENTQ. Each vIOMMU can support multiple types of vEVENTS,
> + but is confined to one vEVENTQ per vEVENTQ type.
> +
> All user-visible objects are destroyed via the IOMMU_DESTROY uAPI.
>
> The diagrams below show relationships between user-visible objects and kernel
> @@ -251,8 +266,10 @@ User visible objects are backed by following datastructures:
> - iommufd_device for IOMMUFD_OBJ_DEVICE.
> - iommufd_hwpt_paging for IOMMUFD_OBJ_HWPT_PAGING.
> - iommufd_hwpt_nested for IOMMUFD_OBJ_HWPT_NESTED.
> +- iommufd_fault for IOMMUFD_OBJ_FAULT.
> - iommufd_viommu for IOMMUFD_OBJ_VIOMMU.
> - iommufd_vdevice for IOMMUFD_OBJ_VDEVICE.
> +- iommufd_veventq for IOMMUFD_OBJ_VEVENTQ.
>
> Several terminologies when looking at these datastructures:
>
Looks good, thanks!
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
--
An old man doll... just what I always wanted! - Clara
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v6 11/14] Documentation: userspace-api: iommufd: Update FAULT and VEVENTQ
2025-01-25 0:30 ` [PATCH v6 11/14] Documentation: userspace-api: iommufd: Update FAULT and VEVENTQ Nicolin Chen
2025-01-28 8:21 ` Bagas Sanjaya
@ 2025-02-18 17:02 ` Jason Gunthorpe
1 sibling, 0 replies; 60+ messages in thread
From: Jason Gunthorpe @ 2025-02-18 17:02 UTC (permalink / raw)
To: Nicolin Chen
Cc: kevin.tian, corbet, will, joro, suravee.suthikulpanit,
robin.murphy, dwmw2, baolu.lu, shuah, linux-kernel, iommu,
linux-arm-kernel, linux-kselftest, linux-doc, eric.auger,
jean-philippe, mdf, mshavit, shameerali.kolothum.thodi, smostafa,
ddutile, yi.l.liu, patches
On Fri, Jan 24, 2025 at 04:30:40PM -0800, Nicolin Chen wrote:
> With the introduction of the new objects, update the doc to reflect that.
>
> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> ---
> Documentation/userspace-api/iommufd.rst | 17 +++++++++++++++++
> 1 file changed, 17 insertions(+)
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Jason
^ permalink raw reply [flat|nested] 60+ messages in thread
* [PATCH v6 12/14] iommu/arm-smmu-v3: Introduce struct arm_smmu_vmaster
2025-01-25 0:30 [PATCH v6 00/14] iommufd: Add vIOMMU infrastructure (Part-3: vEVENTQ) Nicolin Chen
` (10 preceding siblings ...)
2025-01-25 0:30 ` [PATCH v6 11/14] Documentation: userspace-api: iommufd: Update FAULT and VEVENTQ Nicolin Chen
@ 2025-01-25 0:30 ` Nicolin Chen
2025-02-18 17:08 ` Jason Gunthorpe
2025-01-25 0:30 ` [PATCH v6 13/14] iommu/arm-smmu-v3: Report events that belong to devices attached to vIOMMU Nicolin Chen
` (2 subsequent siblings)
14 siblings, 1 reply; 60+ messages in thread
From: Nicolin Chen @ 2025-01-25 0:30 UTC (permalink / raw)
To: jgg, kevin.tian, corbet, will
Cc: joro, suravee.suthikulpanit, robin.murphy, dwmw2, baolu.lu, shuah,
linux-kernel, iommu, linux-arm-kernel, linux-kselftest, linux-doc,
eric.auger, jean-philippe, mdf, mshavit,
shameerali.kolothum.thodi, smostafa, ddutile, yi.l.liu, patches
Use it to store all vSMMU-related data. The vsid (Virtual Stream ID) will
be the first use case. Then, add a rw_semaphore to protect it.
Also add a pair of arm_smmu_attach_prepare/commit_vmaster helpers to set
or unset the master->vmaster point. Put these helpers inside the existing
arm_smmu_attach_prepare/commit(). Note that identity and blocked ops don't
call arm_smmu_attach_prepare/commit(), thus simply call the new helpers at
the top, so a device attaching to an identity/blocked domain can unset the
master->vmaster when the device is moving away from a nested domain.
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 23 ++++++++++
.../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 45 +++++++++++++++++++
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 32 ++++++++++++-
3 files changed, 99 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index bd9d7c85576a..4435ad7db776 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -799,6 +799,11 @@ struct arm_smmu_stream {
struct rb_node node;
};
+struct arm_smmu_vmaster {
+ struct arm_vsmmu *vsmmu;
+ unsigned long vsid;
+};
+
struct arm_smmu_event {
u8 stall : 1,
ssv : 1,
@@ -824,6 +829,8 @@ struct arm_smmu_master {
struct arm_smmu_device *smmu;
struct device *dev;
struct arm_smmu_stream *streams;
+ struct arm_smmu_vmaster *vmaster;
+ struct rw_semaphore vmaster_rwsem;
/* Locked by the iommu core using the group mutex */
struct arm_smmu_ctx_desc_cfg cd_table;
unsigned int num_streams;
@@ -972,6 +979,7 @@ struct arm_smmu_attach_state {
bool disable_ats;
ioasid_t ssid;
/* Resulting state */
+ struct arm_smmu_vmaster *vmaster;
bool ats_enabled;
};
@@ -1055,9 +1063,24 @@ struct iommufd_viommu *arm_vsmmu_alloc(struct device *dev,
struct iommu_domain *parent,
struct iommufd_ctx *ictx,
unsigned int viommu_type);
+int arm_smmu_attach_prepare_vmaster(struct arm_smmu_attach_state *state,
+ struct iommu_domain *domain);
+void arm_smmu_attach_commit_vmaster(struct arm_smmu_attach_state *state);
#else
#define arm_smmu_hw_info NULL
#define arm_vsmmu_alloc NULL
+
+static inline int
+arm_smmu_attach_prepare_vmaster(struct arm_smmu_attach_state *state,
+ struct iommu_domain *domain)
+{
+ return 0; /* NOP */
+}
+
+static inline void
+arm_smmu_attach_commit_vmaster(struct arm_smmu_attach_state *state)
+{
+}
#endif /* CONFIG_ARM_SMMU_V3_IOMMUFD */
#endif /* _ARM_SMMU_V3_H */
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
index c7cc613050d9..98138088fd16 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
@@ -85,6 +85,51 @@ static void arm_smmu_make_nested_domain_ste(
}
}
+int arm_smmu_attach_prepare_vmaster(struct arm_smmu_attach_state *state,
+ struct iommu_domain *domain)
+{
+ struct arm_smmu_nested_domain *nested_domain;
+ struct arm_smmu_vmaster *vmaster;
+ unsigned long vsid;
+ int ret;
+
+ iommu_group_mutex_assert(state->master->dev);
+
+ if (domain->type != IOMMU_DOMAIN_NESTED)
+ return 0;
+ nested_domain = to_smmu_nested_domain(domain);
+
+ /* Skip invalid vSTE */
+ if (!(nested_domain->ste[0] & cpu_to_le64(STRTAB_STE_0_V)))
+ return 0;
+
+ ret = iommufd_viommu_get_vdev_id(&nested_domain->vsmmu->core,
+ state->master->dev, &vsid);
+ if (ret)
+ return ret;
+
+ vmaster = kzalloc(sizeof(*vmaster), GFP_KERNEL);
+ if (!vmaster)
+ return -ENOMEM;
+ vmaster->vsmmu = nested_domain->vsmmu;
+ vmaster->vsid = vsid;
+ state->vmaster = vmaster;
+
+ return 0;
+}
+
+void arm_smmu_attach_commit_vmaster(struct arm_smmu_attach_state *state)
+{
+ struct arm_smmu_master *master = state->master;
+
+ down_write(&master->vmaster_rwsem);
+ if (state->vmaster != master->vmaster) {
+ kfree(master->vmaster);
+ master->vmaster = state->vmaster;
+ }
+ up_write(&master->vmaster_rwsem);
+}
+
static int arm_smmu_attach_dev_nested(struct iommu_domain *domain,
struct device *dev)
{
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index ea76f25c0661..686c171dd273 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2802,6 +2802,7 @@ int arm_smmu_attach_prepare(struct arm_smmu_attach_state *state,
struct arm_smmu_domain *smmu_domain =
to_smmu_domain_devices(new_domain);
unsigned long flags;
+ int ret;
/*
* arm_smmu_share_asid() must not see two domains pointing to the same
@@ -2831,9 +2832,15 @@ int arm_smmu_attach_prepare(struct arm_smmu_attach_state *state,
}
if (smmu_domain) {
+ ret = arm_smmu_attach_prepare_vmaster(state, new_domain);
+ if (ret)
+ return ret;
+
master_domain = kzalloc(sizeof(*master_domain), GFP_KERNEL);
- if (!master_domain)
+ if (!master_domain) {
+ kfree(state->vmaster);
return -ENOMEM;
+ }
master_domain->master = master;
master_domain->ssid = state->ssid;
if (new_domain->type == IOMMU_DOMAIN_NESTED)
@@ -2860,6 +2867,7 @@ int arm_smmu_attach_prepare(struct arm_smmu_attach_state *state,
spin_unlock_irqrestore(&smmu_domain->devices_lock,
flags);
kfree(master_domain);
+ kfree(state->vmaster);
return -EINVAL;
}
@@ -2892,6 +2900,8 @@ void arm_smmu_attach_commit(struct arm_smmu_attach_state *state)
lockdep_assert_held(&arm_smmu_asid_lock);
+ arm_smmu_attach_commit_vmaster(state);
+
if (state->ats_enabled && !master->ats_enabled) {
arm_smmu_enable_ats(master);
} else if (state->ats_enabled && master->ats_enabled) {
@@ -3158,8 +3168,17 @@ static void arm_smmu_attach_dev_ste(struct iommu_domain *domain,
static int arm_smmu_attach_dev_identity(struct iommu_domain *domain,
struct device *dev)
{
+ int ret;
struct arm_smmu_ste ste;
struct arm_smmu_master *master = dev_iommu_priv_get(dev);
+ struct arm_smmu_attach_state state = {
+ .master = master,
+ };
+
+ ret = arm_smmu_attach_prepare_vmaster(&state, domain);
+ if (ret)
+ return ret;
+ arm_smmu_attach_commit_vmaster(&state);
arm_smmu_make_bypass_ste(master->smmu, &ste);
arm_smmu_attach_dev_ste(domain, dev, &ste, STRTAB_STE_1_S1DSS_BYPASS);
@@ -3178,7 +3197,17 @@ static struct iommu_domain arm_smmu_identity_domain = {
static int arm_smmu_attach_dev_blocked(struct iommu_domain *domain,
struct device *dev)
{
+ int ret;
struct arm_smmu_ste ste;
+ struct arm_smmu_master *master = dev_iommu_priv_get(dev);
+ struct arm_smmu_attach_state state = {
+ .master = master,
+ };
+
+ ret = arm_smmu_attach_prepare_vmaster(&state, domain);
+ if (ret)
+ return ret;
+ arm_smmu_attach_commit_vmaster(&state);
arm_smmu_make_abort_ste(&ste);
arm_smmu_attach_dev_ste(domain, dev, &ste,
@@ -3428,6 +3457,7 @@ static struct iommu_device *arm_smmu_probe_device(struct device *dev)
master->dev = dev;
master->smmu = smmu;
+ init_rwsem(&master->vmaster_rwsem);
dev_iommu_priv_set(dev, master);
ret = arm_smmu_insert_master(smmu, master);
--
2.43.0
^ permalink raw reply related [flat|nested] 60+ messages in thread
* Re: [PATCH v6 12/14] iommu/arm-smmu-v3: Introduce struct arm_smmu_vmaster
2025-01-25 0:30 ` [PATCH v6 12/14] iommu/arm-smmu-v3: Introduce struct arm_smmu_vmaster Nicolin Chen
@ 2025-02-18 17:08 ` Jason Gunthorpe
2025-02-20 7:16 ` Nicolin Chen
0 siblings, 1 reply; 60+ messages in thread
From: Jason Gunthorpe @ 2025-02-18 17:08 UTC (permalink / raw)
To: Nicolin Chen
Cc: kevin.tian, corbet, will, joro, suravee.suthikulpanit,
robin.murphy, dwmw2, baolu.lu, shuah, linux-kernel, iommu,
linux-arm-kernel, linux-kselftest, linux-doc, eric.auger,
jean-philippe, mdf, mshavit, shameerali.kolothum.thodi, smostafa,
ddutile, yi.l.liu, patches
On Fri, Jan 24, 2025 at 04:30:41PM -0800, Nicolin Chen wrote:
> + int ret;
> struct arm_smmu_ste ste;
> struct arm_smmu_master *master = dev_iommu_priv_get(dev);
> + struct arm_smmu_attach_state state = {
> + .master = master,
> + };
> +
> + ret = arm_smmu_attach_prepare_vmaster(&state, domain);
> + if (ret)
> + return ret;
> + arm_smmu_attach_commit_vmaster(&state);
I think you should make this into just a arm_smmu_clear_vmaster()
with less complication..
Jason
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v6 12/14] iommu/arm-smmu-v3: Introduce struct arm_smmu_vmaster
2025-02-18 17:08 ` Jason Gunthorpe
@ 2025-02-20 7:16 ` Nicolin Chen
0 siblings, 0 replies; 60+ messages in thread
From: Nicolin Chen @ 2025-02-20 7:16 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: kevin.tian, corbet, will, joro, suravee.suthikulpanit,
robin.murphy, dwmw2, baolu.lu, shuah, linux-kernel, iommu,
linux-arm-kernel, linux-kselftest, linux-doc, eric.auger,
jean-philippe, mdf, mshavit, shameerali.kolothum.thodi, smostafa,
ddutile, yi.l.liu, patches
On Tue, Feb 18, 2025 at 01:08:24PM -0400, Jason Gunthorpe wrote:
> On Fri, Jan 24, 2025 at 04:30:41PM -0800, Nicolin Chen wrote:
> > + int ret;
> > struct arm_smmu_ste ste;
> > struct arm_smmu_master *master = dev_iommu_priv_get(dev);
> > + struct arm_smmu_attach_state state = {
> > + .master = master,
> > + };
> > +
> > + ret = arm_smmu_attach_prepare_vmaster(&state, domain);
> > + if (ret)
> > + return ret;
> > + arm_smmu_attach_commit_vmaster(&state);
>
> I think you should make this into just a arm_smmu_clear_vmaster()
> with less complication..
Ack:
+void arm_smmu_master_clear_vmaster(struct arm_smmu_master *master)
+{
+ mutex_lock(&master->smmu->streams_mutex);
+ kfree(master->vmaster);
+ master->vmaster = NULL;
+ mutex_unlock(&master->smmu->streams_mutex);
+}
[...]
@@ -3162,6 +3172,7 @@ static int arm_smmu_attach_dev_identity(struct iommu_domain *domain,
struct arm_smmu_ste ste;
struct arm_smmu_master *master = dev_iommu_priv_get(dev);
+ arm_smmu_master_clear_vmaster(master);
arm_smmu_make_bypass_ste(master->smmu, &ste);
arm_smmu_attach_dev_ste(domain, dev, &ste, STRTAB_STE_1_S1DSS_BYPASS);
return 0;
@@ -3180,7 +3191,9 @@ static int arm_smmu_attach_dev_blocked(struct iommu_domain *domain,
struct device *dev)
{
struct arm_smmu_ste ste;
+ struct arm_smmu_master *master = dev_iommu_priv_get(dev);
+ arm_smmu_master_clear_vmaster(master);
arm_smmu_make_abort_ste(&ste);
arm_smmu_attach_dev_ste(domain, dev, &ste,
STRTAB_STE_1_S1DSS_TERMINATE);
Thanks
Nicolin
^ permalink raw reply [flat|nested] 60+ messages in thread
* [PATCH v6 13/14] iommu/arm-smmu-v3: Report events that belong to devices attached to vIOMMU
2025-01-25 0:30 [PATCH v6 00/14] iommufd: Add vIOMMU infrastructure (Part-3: vEVENTQ) Nicolin Chen
` (11 preceding siblings ...)
2025-01-25 0:30 ` [PATCH v6 12/14] iommu/arm-smmu-v3: Introduce struct arm_smmu_vmaster Nicolin Chen
@ 2025-01-25 0:30 ` Nicolin Chen
2025-02-18 5:21 ` Tian, Kevin
2025-02-18 17:18 ` Jason Gunthorpe
2025-01-25 0:30 ` [PATCH v6 14/14] iommu/arm-smmu-v3: Set MEV bit in nested STE for DoS mitigations Nicolin Chen
2025-02-14 8:03 ` [PATCH v6 00/14] iommufd: Add vIOMMU infrastructure (Part-3: vEVENTQ) Nicolin Chen
14 siblings, 2 replies; 60+ messages in thread
From: Nicolin Chen @ 2025-01-25 0:30 UTC (permalink / raw)
To: jgg, kevin.tian, corbet, will
Cc: joro, suravee.suthikulpanit, robin.murphy, dwmw2, baolu.lu, shuah,
linux-kernel, iommu, linux-arm-kernel, linux-kselftest, linux-doc,
eric.auger, jean-philippe, mdf, mshavit,
shameerali.kolothum.thodi, smostafa, ddutile, yi.l.liu, patches
Aside from the IOPF framework, iommufd provides an additional pathway to
report hardware events, via the vEVENTQ of vIOMMU infrastructure.
Define an iommu_vevent_arm_smmuv3 uAPI structure, and report stage-1 events
in the threaded IRQ handler.
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 7 +++
include/uapi/linux/iommufd.h | 15 +++++
.../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 15 +++++
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 58 +++++++++++--------
4 files changed, 70 insertions(+), 25 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 4435ad7db776..d24c3d8ee397 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -1066,6 +1066,7 @@ struct iommufd_viommu *arm_vsmmu_alloc(struct device *dev,
int arm_smmu_attach_prepare_vmaster(struct arm_smmu_attach_state *state,
struct iommu_domain *domain);
void arm_smmu_attach_commit_vmaster(struct arm_smmu_attach_state *state);
+int arm_vmaster_report_event(struct arm_smmu_vmaster *vmaster, u64 *evt);
#else
#define arm_smmu_hw_info NULL
#define arm_vsmmu_alloc NULL
@@ -1081,6 +1082,12 @@ static inline void
arm_smmu_attach_commit_vmaster(struct arm_smmu_attach_state *state)
{
}
+
+static inline int arm_vmaster_report_event(struct arm_smmu_vmaster *vmaster,
+ u64 *evt)
+{
+ return -EOPNOTSUPP;
+}
#endif /* CONFIG_ARM_SMMU_V3_IOMMUFD */
#endif /* _ARM_SMMU_V3_H */
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index 08cbc6bc3725..cbc30eff302d 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -1058,9 +1058,24 @@ struct iommufd_vevent_header {
/**
* enum iommu_veventq_type - Virtual Event Queue Type
* @IOMMU_VEVENTQ_TYPE_DEFAULT: Reserved for future use
+ * @IOMMU_VEVENTQ_TYPE_ARM_SMMUV3: ARM SMMUv3 Virtual Event Queue
*/
enum iommu_veventq_type {
IOMMU_VEVENTQ_TYPE_DEFAULT = 0,
+ IOMMU_VEVENTQ_TYPE_ARM_SMMUV3 = 1,
+};
+
+/**
+ * struct iommu_vevent_arm_smmuv3 - ARM SMMUv3 Virtual Event
+ * (IOMMU_VEVENTQ_TYPE_ARM_SMMUV3)
+ * @evt: 256-bit ARM SMMUv3 Event record, little-endian.
+ * (Refer to "7.3 Event records" in SMMUv3 HW Spec)
+ *
+ * StreamID field reports a virtual device ID. To receive a virtual event for a
+ * device, a vDEVICE must be allocated via IOMMU_VDEVICE_ALLOC.
+ */
+struct iommu_vevent_arm_smmuv3 {
+ __aligned_le64 evt[4];
};
/**
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
index 98138088fd16..ceeed907a714 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
@@ -443,4 +443,19 @@ struct iommufd_viommu *arm_vsmmu_alloc(struct device *dev,
return &vsmmu->core;
}
+int arm_vmaster_report_event(struct arm_smmu_vmaster *vmaster, u64 *evt)
+{
+ struct iommu_vevent_arm_smmuv3 vevt;
+ int i;
+
+ vevt.evt[0] = cpu_to_le64((evt[0] & ~EVTQ_0_SID) |
+ FIELD_PREP(EVTQ_0_SID, vmaster->vsid));
+ for (i = 1; i < EVTQ_ENT_DWORDS; i++)
+ vevt.evt[i] = cpu_to_le64(evt[i]);
+
+ return iommufd_viommu_report_event(&vmaster->vsmmu->core,
+ IOMMU_VEVENTQ_TYPE_ARM_SMMUV3, &vevt,
+ sizeof(vevt));
+}
+
MODULE_IMPORT_NS("IOMMUFD");
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 686c171dd273..59fbc342a095 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1812,8 +1812,8 @@ static void arm_smmu_decode_event(struct arm_smmu_device *smmu, u64 *raw,
mutex_unlock(&smmu->streams_mutex);
}
-static int arm_smmu_handle_event(struct arm_smmu_device *smmu,
- struct arm_smmu_event *event)
+static int arm_smmu_handle_event(struct arm_smmu_device *smmu, u64 *evt,
+ struct arm_smmu_event *event)
{
int ret = 0;
u32 perm = 0;
@@ -1831,31 +1831,30 @@ static int arm_smmu_handle_event(struct arm_smmu_device *smmu,
return -EOPNOTSUPP;
}
- if (!event->stall)
- return -EOPNOTSUPP;
-
- if (event->read)
- perm |= IOMMU_FAULT_PERM_READ;
- else
- perm |= IOMMU_FAULT_PERM_WRITE;
+ if (event->stall) {
+ if (event->read)
+ perm |= IOMMU_FAULT_PERM_READ;
+ else
+ perm |= IOMMU_FAULT_PERM_WRITE;
- if (event->instruction)
- perm |= IOMMU_FAULT_PERM_EXEC;
+ if (event->instruction)
+ perm |= IOMMU_FAULT_PERM_EXEC;
- if (event->privileged)
- perm |= IOMMU_FAULT_PERM_PRIV;
+ if (event->privileged)
+ perm |= IOMMU_FAULT_PERM_PRIV;
- flt->type = IOMMU_FAULT_PAGE_REQ;
- flt->prm = (struct iommu_fault_page_request) {
- .flags = IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE,
- .grpid = event->stag,
- .perm = perm,
- .addr = event->iova,
- };
+ flt->type = IOMMU_FAULT_PAGE_REQ;
+ flt->prm = (struct iommu_fault_page_request){
+ .flags = IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE,
+ .grpid = event->stag,
+ .perm = perm,
+ .addr = event->iova,
+ };
- if (event->ssv) {
- flt->prm.flags |= IOMMU_FAULT_PAGE_REQUEST_PASID_VALID;
- flt->prm.pasid = event->ssid;
+ if (event->ssv) {
+ flt->prm.flags |= IOMMU_FAULT_PAGE_REQUEST_PASID_VALID;
+ flt->prm.pasid = event->ssid;
+ }
}
mutex_lock(&smmu->streams_mutex);
@@ -1865,7 +1864,16 @@ static int arm_smmu_handle_event(struct arm_smmu_device *smmu,
goto out_unlock;
}
- ret = iommu_report_device_fault(master->dev, &fault_evt);
+ if (event->stall) {
+ ret = iommu_report_device_fault(master->dev, &fault_evt);
+ } else {
+ down_read(&master->vmaster_rwsem);
+ if (master->vmaster && !event->s2)
+ ret = arm_vmaster_report_event(master->vmaster, evt);
+ else
+ ret = -EFAULT; /* Unhandled events should be pinned */
+ up_read(&master->vmaster_rwsem);
+ }
out_unlock:
mutex_unlock(&smmu->streams_mutex);
return ret;
@@ -1943,7 +1951,7 @@ static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
do {
while (!queue_remove_raw(q, evt)) {
arm_smmu_decode_event(smmu, evt, &event);
- if (arm_smmu_handle_event(smmu, &event))
+ if (arm_smmu_handle_event(smmu, evt, &event))
arm_smmu_dump_event(smmu, evt, &event, &rs);
put_device(event.dev);
--
2.43.0
^ permalink raw reply related [flat|nested] 60+ messages in thread
* RE: [PATCH v6 13/14] iommu/arm-smmu-v3: Report events that belong to devices attached to vIOMMU
2025-01-25 0:30 ` [PATCH v6 13/14] iommu/arm-smmu-v3: Report events that belong to devices attached to vIOMMU Nicolin Chen
@ 2025-02-18 5:21 ` Tian, Kevin
2025-02-18 17:18 ` Jason Gunthorpe
1 sibling, 0 replies; 60+ messages in thread
From: Tian, Kevin @ 2025-02-18 5:21 UTC (permalink / raw)
To: Nicolin Chen, jgg@nvidia.com, corbet@lwn.net, will@kernel.org
Cc: joro@8bytes.org, suravee.suthikulpanit@amd.com,
robin.murphy@arm.com, dwmw2@infradead.org,
baolu.lu@linux.intel.com, shuah@kernel.org,
linux-kernel@vger.kernel.org, iommu@lists.linux.dev,
linux-arm-kernel@lists.infradead.org,
linux-kselftest@vger.kernel.org, linux-doc@vger.kernel.org,
eric.auger@redhat.com, jean-philippe@linaro.org, mdf@kernel.org,
mshavit@google.com, shameerali.kolothum.thodi@huawei.com,
smostafa@google.com, ddutile@redhat.com, Liu, Yi L,
patches@lists.linux.dev
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Saturday, January 25, 2025 8:31 AM
>
> Aside from the IOPF framework, iommufd provides an additional pathway to
> report hardware events, via the vEVENTQ of vIOMMU infrastructure.
>
> Define an iommu_vevent_arm_smmuv3 uAPI structure, and report stage-1
> events
> in the threaded IRQ handler.
>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v6 13/14] iommu/arm-smmu-v3: Report events that belong to devices attached to vIOMMU
2025-01-25 0:30 ` [PATCH v6 13/14] iommu/arm-smmu-v3: Report events that belong to devices attached to vIOMMU Nicolin Chen
2025-02-18 5:21 ` Tian, Kevin
@ 2025-02-18 17:18 ` Jason Gunthorpe
2025-02-18 18:28 ` Nicolin Chen
1 sibling, 1 reply; 60+ messages in thread
From: Jason Gunthorpe @ 2025-02-18 17:18 UTC (permalink / raw)
To: Nicolin Chen
Cc: kevin.tian, corbet, will, joro, suravee.suthikulpanit,
robin.murphy, dwmw2, baolu.lu, shuah, linux-kernel, iommu,
linux-arm-kernel, linux-kselftest, linux-doc, eric.auger,
jean-philippe, mdf, mshavit, shameerali.kolothum.thodi, smostafa,
ddutile, yi.l.liu, patches
On Fri, Jan 24, 2025 at 04:30:42PM -0800, Nicolin Chen wrote:
> @@ -1831,31 +1831,30 @@ static int arm_smmu_handle_event(struct arm_smmu_device *smmu,
> return -EOPNOTSUPP;
> }
There is still the filter at the top:
switch (event->id) {
case EVT_ID_TRANSLATION_FAULT:
case EVT_ID_ADDR_SIZE_FAULT:
case EVT_ID_ACCESS_FAULT:
case EVT_ID_PERMISSION_FAULT:
break;
default:
return -EOPNOTSUPP;
}
Is that right here or should more event types be forwarded to the
guest?
> mutex_lock(&smmu->streams_mutex);
[..]
> - ret = iommu_report_device_fault(master->dev, &fault_evt);
> + if (event->stall) {
> + ret = iommu_report_device_fault(master->dev, &fault_evt);
> + } else {
> + down_read(&master->vmaster_rwsem);
This already holds the streams_mutex across all of this, do you think
we should get rid of the vmaster_rwsem and hold the streams_mutex on
write instead?
Jason
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v6 13/14] iommu/arm-smmu-v3: Report events that belong to devices attached to vIOMMU
2025-02-18 17:18 ` Jason Gunthorpe
@ 2025-02-18 18:28 ` Nicolin Chen
2025-02-18 18:50 ` Jason Gunthorpe
0 siblings, 1 reply; 60+ messages in thread
From: Nicolin Chen @ 2025-02-18 18:28 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: kevin.tian, corbet, will, joro, suravee.suthikulpanit,
robin.murphy, dwmw2, baolu.lu, shuah, linux-kernel, iommu,
linux-arm-kernel, linux-kselftest, linux-doc, eric.auger,
jean-philippe, mdf, mshavit, shameerali.kolothum.thodi, smostafa,
ddutile, yi.l.liu, patches
On Tue, Feb 18, 2025 at 01:18:21PM -0400, Jason Gunthorpe wrote:
> On Fri, Jan 24, 2025 at 04:30:42PM -0800, Nicolin Chen wrote:
>
> > @@ -1831,31 +1831,30 @@ static int arm_smmu_handle_event(struct arm_smmu_device *smmu,
> > return -EOPNOTSUPP;
> > }
>
> There is still the filter at the top:
>
> switch (event->id) {
> case EVT_ID_TRANSLATION_FAULT:
> case EVT_ID_ADDR_SIZE_FAULT:
> case EVT_ID_ACCESS_FAULT:
> case EVT_ID_PERMISSION_FAULT:
> break;
> default:
> return -EOPNOTSUPP;
> }
>
> Is that right here or should more event types be forwarded to the
> guest?
That doesn't seem to be right. Something like EVT_ID_BAD_CD_CONFIG
should be forwarded too. I will go through the list.
> > mutex_lock(&smmu->streams_mutex);
> [..]
>
> > - ret = iommu_report_device_fault(master->dev, &fault_evt);
> > + if (event->stall) {
> > + ret = iommu_report_device_fault(master->dev, &fault_evt);
> > + } else {
> > + down_read(&master->vmaster_rwsem);
>
> This already holds the streams_mutex across all of this, do you think
> we should get rid of the vmaster_rwsem and hold the streams_mutex on
> write instead?
They are per master v.s. per smmu. The latter one would make master
commits/attaches exclusive, which feels unnecessary to me, although
it would make the code here slightly cleaner..
Thanks
Nicolin
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v6 13/14] iommu/arm-smmu-v3: Report events that belong to devices attached to vIOMMU
2025-02-18 18:28 ` Nicolin Chen
@ 2025-02-18 18:50 ` Jason Gunthorpe
2025-02-18 19:02 ` Nicolin Chen
2025-02-20 20:45 ` Nicolin Chen
0 siblings, 2 replies; 60+ messages in thread
From: Jason Gunthorpe @ 2025-02-18 18:50 UTC (permalink / raw)
To: Nicolin Chen
Cc: kevin.tian, corbet, will, joro, suravee.suthikulpanit,
robin.murphy, dwmw2, baolu.lu, shuah, linux-kernel, iommu,
linux-arm-kernel, linux-kselftest, linux-doc, eric.auger,
jean-philippe, mdf, mshavit, shameerali.kolothum.thodi, smostafa,
ddutile, yi.l.liu, patches
On Tue, Feb 18, 2025 at 10:28:04AM -0800, Nicolin Chen wrote:
> On Tue, Feb 18, 2025 at 01:18:21PM -0400, Jason Gunthorpe wrote:
> > On Fri, Jan 24, 2025 at 04:30:42PM -0800, Nicolin Chen wrote:
> >
> > > @@ -1831,31 +1831,30 @@ static int arm_smmu_handle_event(struct arm_smmu_device *smmu,
> > > return -EOPNOTSUPP;
> > > }
> >
> > There is still the filter at the top:
> >
> > switch (event->id) {
> > case EVT_ID_TRANSLATION_FAULT:
> > case EVT_ID_ADDR_SIZE_FAULT:
> > case EVT_ID_ACCESS_FAULT:
> > case EVT_ID_PERMISSION_FAULT:
> > break;
> > default:
> > return -EOPNOTSUPP;
> > }
> >
> > Is that right here or should more event types be forwarded to the
> > guest?
>
> That doesn't seem to be right. Something like EVT_ID_BAD_CD_CONFIG
> should be forwarded too. I will go through the list.
I think the above should decode into a 'faultable' path because they
all decode to something with an IOVA
The rest should decode to things that include a SID and the SID decode
should always be forwarded to the VM. Maybe there are small
exclusions, but generally that is how I would see it..
> > This already holds the streams_mutex across all of this, do you think
> > we should get rid of the vmaster_rwsem and hold the streams_mutex on
> > write instead?
>
> They are per master v.s. per smmu. The latter one would make master
> commits/attaches exclusive, which feels unnecessary to me, although
> it would make the code here slightly cleaner..
I'd pay the cost on the attach side to have a single lock on the fault
side..
Jason
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v6 13/14] iommu/arm-smmu-v3: Report events that belong to devices attached to vIOMMU
2025-02-18 18:50 ` Jason Gunthorpe
@ 2025-02-18 19:02 ` Nicolin Chen
2025-02-18 19:08 ` Jason Gunthorpe
2025-02-20 20:45 ` Nicolin Chen
1 sibling, 1 reply; 60+ messages in thread
From: Nicolin Chen @ 2025-02-18 19:02 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: kevin.tian, corbet, will, joro, suravee.suthikulpanit,
robin.murphy, dwmw2, baolu.lu, shuah, linux-kernel, iommu,
linux-arm-kernel, linux-kselftest, linux-doc, eric.auger,
jean-philippe, mdf, mshavit, shameerali.kolothum.thodi, smostafa,
ddutile, yi.l.liu, patches
On Tue, Feb 18, 2025 at 02:50:46PM -0400, Jason Gunthorpe wrote:
> On Tue, Feb 18, 2025 at 10:28:04AM -0800, Nicolin Chen wrote:
> > On Tue, Feb 18, 2025 at 01:18:21PM -0400, Jason Gunthorpe wrote:
> > > On Fri, Jan 24, 2025 at 04:30:42PM -0800, Nicolin Chen wrote:
> > >
> > > > @@ -1831,31 +1831,30 @@ static int arm_smmu_handle_event(struct arm_smmu_device *smmu,
> > > > return -EOPNOTSUPP;
> > > > }
> > >
> > > There is still the filter at the top:
> > >
> > > switch (event->id) {
> > > case EVT_ID_TRANSLATION_FAULT:
> > > case EVT_ID_ADDR_SIZE_FAULT:
> > > case EVT_ID_ACCESS_FAULT:
> > > case EVT_ID_PERMISSION_FAULT:
> > > break;
> > > default:
> > > return -EOPNOTSUPP;
> > > }
> > >
> > > Is that right here or should more event types be forwarded to the
> > > guest?
> >
> > That doesn't seem to be right. Something like EVT_ID_BAD_CD_CONFIG
> > should be forwarded too. I will go through the list.
>
> I think the above should decode into a 'faultable' path because they
> all decode to something with an IOVA
>
> The rest should decode to things that include a SID and the SID decode
> should always be forwarded to the VM. Maybe there are small
> exclusions, but generally that is how I would see it..
Ack. SMMU spec defines three type:
"Three categories of events might be recorded into the Event queue:
• Configuration errors.
• Faults from the translation process.
• Miscellaneous."
The driver cares the first two only, as you remarked here.
> > > This already holds the streams_mutex across all of this, do you think
> > > we should get rid of the vmaster_rwsem and hold the streams_mutex on
> > > write instead?
> >
> > They are per master v.s. per smmu. The latter one would make master
> > commits/attaches exclusive, which feels unnecessary to me, although
> > it would make the code here slightly cleaner..
>
> I'd pay the cost on the attach side to have a single lock on the fault
> side..
OK. Maybe a small patch to turn the streams_mutex to streams_rwsem?
Thanks
Nicolin
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v6 13/14] iommu/arm-smmu-v3: Report events that belong to devices attached to vIOMMU
2025-02-18 19:02 ` Nicolin Chen
@ 2025-02-18 19:08 ` Jason Gunthorpe
2025-02-18 19:27 ` Nicolin Chen
0 siblings, 1 reply; 60+ messages in thread
From: Jason Gunthorpe @ 2025-02-18 19:08 UTC (permalink / raw)
To: Nicolin Chen
Cc: kevin.tian, corbet, will, joro, suravee.suthikulpanit,
robin.murphy, dwmw2, baolu.lu, shuah, linux-kernel, iommu,
linux-arm-kernel, linux-kselftest, linux-doc, eric.auger,
jean-philippe, mdf, mshavit, shameerali.kolothum.thodi, smostafa,
ddutile, yi.l.liu, patches
On Tue, Feb 18, 2025 at 11:02:23AM -0800, Nicolin Chen wrote:
> > > > This already holds the streams_mutex across all of this, do you think
> > > > we should get rid of the vmaster_rwsem and hold the streams_mutex on
> > > > write instead?
> > >
> > > They are per master v.s. per smmu. The latter one would make master
> > > commits/attaches exclusive, which feels unnecessary to me, although
> > > it would make the code here slightly cleaner..
> >
> > I'd pay the cost on the attach side to have a single lock on the fault
> > side..
>
> OK. Maybe a small patch to turn the streams_mutex to streams_rwsem?
I don't think the interrupt path is multithreaded, is it? So only 1
reader anyhow?
Jason
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v6 13/14] iommu/arm-smmu-v3: Report events that belong to devices attached to vIOMMU
2025-02-18 19:08 ` Jason Gunthorpe
@ 2025-02-18 19:27 ` Nicolin Chen
0 siblings, 0 replies; 60+ messages in thread
From: Nicolin Chen @ 2025-02-18 19:27 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: kevin.tian, corbet, will, joro, suravee.suthikulpanit,
robin.murphy, dwmw2, baolu.lu, shuah, linux-kernel, iommu,
linux-arm-kernel, linux-kselftest, linux-doc, eric.auger,
jean-philippe, mdf, mshavit, shameerali.kolothum.thodi, smostafa,
ddutile, yi.l.liu, patches
On Tue, Feb 18, 2025 at 03:08:46PM -0400, Jason Gunthorpe wrote:
> On Tue, Feb 18, 2025 at 11:02:23AM -0800, Nicolin Chen wrote:
> > > > > This already holds the streams_mutex across all of this, do you think
> > > > > we should get rid of the vmaster_rwsem and hold the streams_mutex on
> > > > > write instead?
> > > >
> > > > They are per master v.s. per smmu. The latter one would make master
> > > > commits/attaches exclusive, which feels unnecessary to me, although
> > > > it would make the code here slightly cleaner..
> > >
> > > I'd pay the cost on the attach side to have a single lock on the fault
> > > side..
> >
> > OK. Maybe a small patch to turn the streams_mutex to streams_rwsem?
>
> I don't think the interrupt path is multithreaded, is it? So only 1
> reader anyhow?
Right, it's IRQF_ONESHOT. I will keep that unchanged.
Thanks
Nicolin
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v6 13/14] iommu/arm-smmu-v3: Report events that belong to devices attached to vIOMMU
2025-02-18 18:50 ` Jason Gunthorpe
2025-02-18 19:02 ` Nicolin Chen
@ 2025-02-20 20:45 ` Nicolin Chen
2025-02-20 23:24 ` Jason Gunthorpe
1 sibling, 1 reply; 60+ messages in thread
From: Nicolin Chen @ 2025-02-20 20:45 UTC (permalink / raw)
To: Jason Gunthorpe, robin.murphy, will
Cc: kevin.tian, corbet, joro, suravee.suthikulpanit, dwmw2, baolu.lu,
shuah, linux-kernel, iommu, linux-arm-kernel, linux-kselftest,
linux-doc, eric.auger, jean-philippe, mdf, mshavit,
shameerali.kolothum.thodi, smostafa, ddutile, yi.l.liu, patches
On Tue, Feb 18, 2025 at 02:50:46PM -0400, Jason Gunthorpe wrote:
> On Tue, Feb 18, 2025 at 10:28:04AM -0800, Nicolin Chen wrote:
> > On Tue, Feb 18, 2025 at 01:18:21PM -0400, Jason Gunthorpe wrote:
> > > On Fri, Jan 24, 2025 at 04:30:42PM -0800, Nicolin Chen wrote:
> > >
> > > > @@ -1831,31 +1831,30 @@ static int arm_smmu_handle_event(struct arm_smmu_device *smmu,
> > > > return -EOPNOTSUPP;
> > > > }
> > >
> > > There is still the filter at the top:
> > >
> > > switch (event->id) {
> > > case EVT_ID_TRANSLATION_FAULT:
> > > case EVT_ID_ADDR_SIZE_FAULT:
> > > case EVT_ID_ACCESS_FAULT:
> > > case EVT_ID_PERMISSION_FAULT:
> > > break;
> > > default:
> > > return -EOPNOTSUPP;
> > > }
> > >
> > > Is that right here or should more event types be forwarded to the
> > > guest?
> >
> > That doesn't seem to be right. Something like EVT_ID_BAD_CD_CONFIG
> > should be forwarded too. I will go through the list.
>
> I think the above should decode into a 'faultable' path because they
> all decode to something with an IOVA
>
> The rest should decode to things that include a SID and the SID decode
> should always be forwarded to the VM. Maybe there are small
> exclusions, but generally that is how I would see it..
I think we are safe to add these:
------------------------------------------------------------
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index fd2f13a63f27..be9746ecdc65 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -1067,7 +1067,16 @@ enum iommu_veventq_type {
* struct iommu_vevent_arm_smmuv3 - ARM SMMUv3 Virtual Event
* (IOMMU_VEVENTQ_TYPE_ARM_SMMUV3)
* @evt: 256-bit ARM SMMUv3 Event record, little-endian.
- * (Refer to "7.3 Event records" in SMMUv3 HW Spec)
+ * Reported event records: (Refer to "7.3 Event records" in SMMUv3 HW Spec)
+ * - 0x02 C_BAD_STREAMID
+ * - 0x04 C_BAD_STE
+ * - 0x06 F_STREAM_DISABLED
+ * - 0x08 C_BAD_SUBSTREAMID
+ * - 0x0a C_BAD_STE
+ * - 0x10 F_TRANSLATION
+ * - 0x11 F_ADDR_SIZE
+ * - 0x12 F_ACCESS
+ * - 0x13 F_PERMISSION
*
* StreamID field reports a virtual device ID. To receive a virtual event for a
* device, a vDEVICE must be allocated via IOMMU_VDEVICE_ALLOC.
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 0efda55ad6bd..f3aa9ce16058 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1827,7 +1827,15 @@ static int arm_smmu_handle_event(struct arm_smmu_device *smmu, u64 *evt,
case EVT_ID_ADDR_SIZE_FAULT:
case EVT_ID_ACCESS_FAULT:
case EVT_ID_PERMISSION_FAULT:
+ case EVT_ID_BAD_CD_CONFIG:
+ case EVT_ID_BAD_STE_CONFIG:
+ case EVT_ID_BAD_STREAMID_CONFIG:
+ case EVT_ID_BAD_SUBSTREAMID_CONFIG:
+ case EVT_ID_STREAM_DISABLED_FAULT:
break;
+ case EVT_ID_STE_FETCH_FAULT:
+ case EVT_ID_CD_FETCH_FAULT:
+ /* FIXME need to replace fetch_addr with IPA? */
default:
return -EOPNOTSUPP;
}
------------------------------------------------------------
All of the supported events require vSID replacement. Those faults
with addresses are dealing with stage-1 IOVA or IPA, i.e. IOVA and
PA for a VM. So, they could be simply forwarded.
But F_CD_FETCH and F_STE_FETCH seem to be complicated here, as both
report PA in their FetchAddr fields, although the spec does mention
both might be injected to a guest VM:
- "Note: This event might be injected into a guest VM, as though
from a virtual SMMU, when a hypervisor receives a stage 2
Translation-related fault indicating CD fetch as a cause (with
CLASS == CD)."
- "Note: This event might be injected into a guest VM, as though
from a virtual SMMU, when a hypervisor detects invalid guest
configuration that would cause a guest STE fetch from an illegal
IPA."
For F_CD_FETCH, at least the CD table pointer in the nested STE is
an IPA, and all the entries in the CD table that can be 2-level are
IPAs as well. So, we need some kinda reverse translation from a PA
to IPA using its stage-2 mapping. I am not sure what's the best way
to do that...
For F_STE_FETCH, the host prepared the nested STE, so there is no
IPA involved. We would have to ask VMM to fill the field since an
STE IPA should be just a piece of entry given the vSID. One thing
that I am not sure is whether the FetchAddr is STE-size aligned or
not, though we can carry the offset in the FetchAddr field via the
vEVENT by masking away any upper bits...
I wonder if @Robin or @Will may also shed some light on these two
events.
Otherwise, perhaps not-supporting them in this series might be a
safer bet?
Thanks
Nicolin
^ permalink raw reply related [flat|nested] 60+ messages in thread
* Re: [PATCH v6 13/14] iommu/arm-smmu-v3: Report events that belong to devices attached to vIOMMU
2025-02-20 20:45 ` Nicolin Chen
@ 2025-02-20 23:24 ` Jason Gunthorpe
2025-02-21 8:10 ` Nicolin Chen
0 siblings, 1 reply; 60+ messages in thread
From: Jason Gunthorpe @ 2025-02-20 23:24 UTC (permalink / raw)
To: Nicolin Chen
Cc: robin.murphy, will, kevin.tian, corbet, joro,
suravee.suthikulpanit, dwmw2, baolu.lu, shuah, linux-kernel,
iommu, linux-arm-kernel, linux-kselftest, linux-doc, eric.auger,
jean-philippe, mdf, mshavit, shameerali.kolothum.thodi, smostafa,
ddutile, yi.l.liu, patches
On Thu, Feb 20, 2025 at 12:45:46PM -0800, Nicolin Chen wrote:
> ------------------------------------------------------------
> diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
> index fd2f13a63f27..be9746ecdc65 100644
> --- a/include/uapi/linux/iommufd.h
> +++ b/include/uapi/linux/iommufd.h
> @@ -1067,7 +1067,16 @@ enum iommu_veventq_type {
> * struct iommu_vevent_arm_smmuv3 - ARM SMMUv3 Virtual Event
> * (IOMMU_VEVENTQ_TYPE_ARM_SMMUV3)
> * @evt: 256-bit ARM SMMUv3 Event record, little-endian.
> - * (Refer to "7.3 Event records" in SMMUv3 HW Spec)
> + * Reported event records: (Refer to "7.3 Event records" in SMMUv3 HW Spec)
> + * - 0x02 C_BAD_STREAMID
This is documented as 'Transaction StreamID out of range.' so it would
by a hypervisor kernel bug to hit it
> + * - 0x04 C_BAD_STE
I'm not sure we do enough validation to reject all bad STE fragments
so it makes sense this could happen.
> + * - 0x06 F_STREAM_DISABLED
This looked guest triggerable to me.. so it makes sense
> + * - 0x08 C_BAD_SUBSTREAMID
> + * - 0x0a C_BAD_STE
Typo, this is C_BAD_CD
> + * - 0x10 F_TRANSLATION
> + * - 0x11 F_ADDR_SIZE
> + * - 0x12 F_ACCESS
> + * - 0x13 F_PERMISSION
List makes sense to me otherwise
> But F_CD_FETCH and F_STE_FETCH seem to be complicated here, as both
F_STE_FETCH would indicate a hypervisor failure managing the stream
table so no need to forward it.
> report PA in their FetchAddr fields, although the spec does mention
> both might be injected to a guest VM:
> - "Note: This event might be injected into a guest VM, as though
> from a virtual SMMU, when a hypervisor receives a stage 2
> Translation-related fault indicating CD fetch as a cause (with
> CLASS == CD)."
That sounds like the VMM should be catching the
F_TRANSLATION and convert it for the CLASS=CD
> For F_CD_FETCH, at least the CD table pointer in the nested STE is
> an IPA, and all the entries in the CD table that can be 2-level are
> IPAs as well. So, we need some kinda reverse translation from a PA
> to IPA using its stage-2 mapping. I am not sure what's the best way
> to do that...
And if the F_TRANSLATION covers the case then maybe this just stays in
the hypervisor?
> Otherwise, perhaps not-supporting them in this series might be a
> safer bet?
Yeah, I would consider skipping F_CD_FETCH. May also just try it out
and see what events come out on a CD fetch failure..
Jason
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v6 13/14] iommu/arm-smmu-v3: Report events that belong to devices attached to vIOMMU
2025-02-20 23:24 ` Jason Gunthorpe
@ 2025-02-21 8:10 ` Nicolin Chen
0 siblings, 0 replies; 60+ messages in thread
From: Nicolin Chen @ 2025-02-21 8:10 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: robin.murphy, will, kevin.tian, corbet, joro,
suravee.suthikulpanit, dwmw2, baolu.lu, shuah, linux-kernel,
iommu, linux-arm-kernel, linux-kselftest, linux-doc, eric.auger,
jean-philippe, mdf, mshavit, shameerali.kolothum.thodi, smostafa,
ddutile, yi.l.liu, patches
On Thu, Feb 20, 2025 at 07:24:07PM -0400, Jason Gunthorpe wrote:
> On Thu, Feb 20, 2025 at 12:45:46PM -0800, Nicolin Chen wrote:
> > ------------------------------------------------------------
> > diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
> > index fd2f13a63f27..be9746ecdc65 100644
> > --- a/include/uapi/linux/iommufd.h
> > +++ b/include/uapi/linux/iommufd.h
> > @@ -1067,7 +1067,16 @@ enum iommu_veventq_type {
> > * struct iommu_vevent_arm_smmuv3 - ARM SMMUv3 Virtual Event
> > * (IOMMU_VEVENTQ_TYPE_ARM_SMMUV3)
> > * @evt: 256-bit ARM SMMUv3 Event record, little-endian.
> > - * (Refer to "7.3 Event records" in SMMUv3 HW Spec)
> > + * Reported event records: (Refer to "7.3 Event records" in SMMUv3 HW Spec)
> > + * - 0x02 C_BAD_STREAMID
>
> This is documented as 'Transaction StreamID out of range.' so it would
> by a hypervisor kernel bug to hit it
I see. Dropping it.
> > + * - 0x04 C_BAD_STE
>
> I'm not sure we do enough validation to reject all bad STE fragments
> so it makes sense this could happen.
>
> > + * - 0x06 F_STREAM_DISABLED
>
> This looked guest triggerable to me.. so it makes sense
Keeping these two.
> > + * - 0x08 C_BAD_SUBSTREAMID
> > + * - 0x0a C_BAD_STE
>
> Typo, this is C_BAD_CD
Fixed.
> > But F_CD_FETCH and F_STE_FETCH seem to be complicated here, as both
>
> F_STE_FETCH would indicate a hypervisor failure managing the stream
> table so no need to forward it.
>
> > report PA in their FetchAddr fields, although the spec does mention
> > both might be injected to a guest VM:
> > - "Note: This event might be injected into a guest VM, as though
> > from a virtual SMMU, when a hypervisor receives a stage 2
> > Translation-related fault indicating CD fetch as a cause (with
> > CLASS == CD)."
>
> That sounds like the VMM should be catching the
> F_TRANSLATION and convert it for the CLASS=CD
>
> > For F_CD_FETCH, at least the CD table pointer in the nested STE is
> > an IPA, and all the entries in the CD table that can be 2-level are
> > IPAs as well. So, we need some kinda reverse translation from a PA
> > to IPA using its stage-2 mapping. I am not sure what's the best way
> > to do that...
>
> And if the F_TRANSLATION covers the case then maybe this just stays in
> the hypervisor?
> > Otherwise, perhaps not-supporting them in this series might be a
> > safer bet?
>
> Yeah, I would consider skipping F_CD_FETCH. May also just try it out
> and see what events come out on a CD fetch failure..
I will skip these two for now. Meanwhile, will try some hack to
trigger a FETCH fault.
Thanks
Nicolin
^ permalink raw reply [flat|nested] 60+ messages in thread
* [PATCH v6 14/14] iommu/arm-smmu-v3: Set MEV bit in nested STE for DoS mitigations
2025-01-25 0:30 [PATCH v6 00/14] iommufd: Add vIOMMU infrastructure (Part-3: vEVENTQ) Nicolin Chen
` (12 preceding siblings ...)
2025-01-25 0:30 ` [PATCH v6 13/14] iommu/arm-smmu-v3: Report events that belong to devices attached to vIOMMU Nicolin Chen
@ 2025-01-25 0:30 ` Nicolin Chen
2025-02-18 5:24 ` Tian, Kevin
` (2 more replies)
2025-02-14 8:03 ` [PATCH v6 00/14] iommufd: Add vIOMMU infrastructure (Part-3: vEVENTQ) Nicolin Chen
14 siblings, 3 replies; 60+ messages in thread
From: Nicolin Chen @ 2025-01-25 0:30 UTC (permalink / raw)
To: jgg, kevin.tian, corbet, will
Cc: joro, suravee.suthikulpanit, robin.murphy, dwmw2, baolu.lu, shuah,
linux-kernel, iommu, linux-arm-kernel, linux-kselftest, linux-doc,
eric.auger, jean-philippe, mdf, mshavit,
shameerali.kolothum.thodi, smostafa, ddutile, yi.l.liu, patches
There is a DoS concern on the shared hardware event queue among devices
passed through to VMs, that too many translation failures that belong to
VMs could overflow the shared hardware event queue if those VMs or their
VMMs don't handle/recover the devices properly.
The MEV bit in the STE allows to configure the SMMU HW to merge similar
event records, though there is no guarantee. Set it in a nested STE for
DoS mitigations.
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 1 +
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 2 ++
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 4 ++--
3 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index d24c3d8ee397..7181001fc5d7 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -266,6 +266,7 @@ static inline u32 arm_smmu_strtab_l2_idx(u32 sid)
#define STRTAB_STE_1_S1COR GENMASK_ULL(5, 4)
#define STRTAB_STE_1_S1CSH GENMASK_ULL(7, 6)
+#define STRTAB_STE_1_MEV (1UL << 19)
#define STRTAB_STE_1_S2FWB (1UL << 25)
#define STRTAB_STE_1_S1STALLD (1UL << 27)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
index ceeed907a714..20a0e39d7caa 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
@@ -43,6 +43,8 @@ static void arm_smmu_make_nested_cd_table_ste(
target->data[0] |= nested_domain->ste[0] &
~cpu_to_le64(STRTAB_STE_0_CFG);
target->data[1] |= nested_domain->ste[1];
+ /* Merge events for DoS mitigations on eventq */
+ target->data[1] |= STRTAB_STE_1_MEV;
}
/*
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 59fbc342a095..14e079cfb8b6 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1051,7 +1051,7 @@ void arm_smmu_get_ste_used(const __le64 *ent, __le64 *used_bits)
cpu_to_le64(STRTAB_STE_1_S1DSS | STRTAB_STE_1_S1CIR |
STRTAB_STE_1_S1COR | STRTAB_STE_1_S1CSH |
STRTAB_STE_1_S1STALLD | STRTAB_STE_1_STRW |
- STRTAB_STE_1_EATS);
+ STRTAB_STE_1_EATS | STRTAB_STE_1_MEV);
used_bits[2] |= cpu_to_le64(STRTAB_STE_2_S2VMID);
/*
@@ -1067,7 +1067,7 @@ void arm_smmu_get_ste_used(const __le64 *ent, __le64 *used_bits)
if (cfg & BIT(1)) {
used_bits[1] |=
cpu_to_le64(STRTAB_STE_1_S2FWB | STRTAB_STE_1_EATS |
- STRTAB_STE_1_SHCFG);
+ STRTAB_STE_1_SHCFG | STRTAB_STE_1_MEV);
used_bits[2] |=
cpu_to_le64(STRTAB_STE_2_S2VMID | STRTAB_STE_2_VTCR |
STRTAB_STE_2_S2AA64 | STRTAB_STE_2_S2ENDI |
--
2.43.0
^ permalink raw reply related [flat|nested] 60+ messages in thread
* RE: [PATCH v6 14/14] iommu/arm-smmu-v3: Set MEV bit in nested STE for DoS mitigations
2025-01-25 0:30 ` [PATCH v6 14/14] iommu/arm-smmu-v3: Set MEV bit in nested STE for DoS mitigations Nicolin Chen
@ 2025-02-18 5:24 ` Tian, Kevin
2025-02-18 18:17 ` Pranjal Shrivastava
2025-02-18 17:21 ` Jason Gunthorpe
2025-02-20 9:09 ` Nicolin Chen
2 siblings, 1 reply; 60+ messages in thread
From: Tian, Kevin @ 2025-02-18 5:24 UTC (permalink / raw)
To: Nicolin Chen, jgg@nvidia.com, corbet@lwn.net, will@kernel.org
Cc: joro@8bytes.org, suravee.suthikulpanit@amd.com,
robin.murphy@arm.com, dwmw2@infradead.org,
baolu.lu@linux.intel.com, shuah@kernel.org,
linux-kernel@vger.kernel.org, iommu@lists.linux.dev,
linux-arm-kernel@lists.infradead.org,
linux-kselftest@vger.kernel.org, linux-doc@vger.kernel.org,
eric.auger@redhat.com, jean-philippe@linaro.org, mdf@kernel.org,
mshavit@google.com, shameerali.kolothum.thodi@huawei.com,
smostafa@google.com, ddutile@redhat.com, Liu, Yi L,
patches@lists.linux.dev
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Saturday, January 25, 2025 8:31 AM
>
> There is a DoS concern on the shared hardware event queue among devices
> passed through to VMs, that too many translation failures that belong to
> VMs could overflow the shared hardware event queue if those VMs or their
> VMMs don't handle/recover the devices properly.
This statement is not specific to the nested configuration.
>
> The MEV bit in the STE allows to configure the SMMU HW to merge similar
> event records, though there is no guarantee. Set it in a nested STE for
> DoS mitigations.
Is MEV available only in nested mode? Otherwise it perhaps makes
sense to turn it on in all configurations in IOMMUFD paths...
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v6 14/14] iommu/arm-smmu-v3: Set MEV bit in nested STE for DoS mitigations
2025-02-18 5:24 ` Tian, Kevin
@ 2025-02-18 18:17 ` Pranjal Shrivastava
2025-02-18 18:52 ` Jason Gunthorpe
2025-02-18 18:53 ` Nicolin Chen
0 siblings, 2 replies; 60+ messages in thread
From: Pranjal Shrivastava @ 2025-02-18 18:17 UTC (permalink / raw)
To: Tian, Kevin
Cc: Nicolin Chen, jgg@nvidia.com, corbet@lwn.net, will@kernel.org,
joro@8bytes.org, suravee.suthikulpanit@amd.com,
robin.murphy@arm.com, dwmw2@infradead.org,
baolu.lu@linux.intel.com, shuah@kernel.org,
linux-kernel@vger.kernel.org, iommu@lists.linux.dev,
linux-arm-kernel@lists.infradead.org,
linux-kselftest@vger.kernel.org, linux-doc@vger.kernel.org,
eric.auger@redhat.com, jean-philippe@linaro.org, mdf@kernel.org,
mshavit@google.com, shameerali.kolothum.thodi@huawei.com,
smostafa@google.com, ddutile@redhat.com, Liu, Yi L,
patches@lists.linux.dev
On Tue, Feb 18, 2025 at 05:24:08AM +0000, Tian, Kevin wrote:
> > From: Nicolin Chen <nicolinc@nvidia.com>
> > Sent: Saturday, January 25, 2025 8:31 AM
> >
> > There is a DoS concern on the shared hardware event queue among devices
> > passed through to VMs, that too many translation failures that belong to
> > VMs could overflow the shared hardware event queue if those VMs or their
> > VMMs don't handle/recover the devices properly.
>
> This statement is not specific to the nested configuration.
>
> >
> > The MEV bit in the STE allows to configure the SMMU HW to merge similar
> > event records, though there is no guarantee. Set it in a nested STE for
> > DoS mitigations.
>
> Is MEV available only in nested mode? Otherwise it perhaps makes
> sense to turn it on in all configurations in IOMMUFD paths...
MEV is available at all times (if an implemented by the HW) and doesn't
depend on the nested mode. As per the Arm SMMUv3 spec (section 3.5.5):
Events can be merged where all of the following conditions are upheld:
- The event types and all fields are identical, except fields explicitly
indicated in section 7.3 Event records.
- If present, the Stall field is 0. Stall fault records are not merged.
I'm not sure to what extent, but I think *trying* to merge similar event
should reduce some chances of overflowing the hw eventq.
> Is MEV available only in nested mode? Otherwise it perhaps makes
> sense to turn it on in all configurations in IOMMUFD paths...
I think the arm-smmu-v3's iommufd implementation only supports nested
which could be the reason.
Thanks,
Praan
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v6 14/14] iommu/arm-smmu-v3: Set MEV bit in nested STE for DoS mitigations
2025-02-18 18:17 ` Pranjal Shrivastava
@ 2025-02-18 18:52 ` Jason Gunthorpe
2025-02-20 7:12 ` Nicolin Chen
2025-02-18 18:53 ` Nicolin Chen
1 sibling, 1 reply; 60+ messages in thread
From: Jason Gunthorpe @ 2025-02-18 18:52 UTC (permalink / raw)
To: Pranjal Shrivastava
Cc: Tian, Kevin, Nicolin Chen, corbet@lwn.net, will@kernel.org,
joro@8bytes.org, suravee.suthikulpanit@amd.com,
robin.murphy@arm.com, dwmw2@infradead.org,
baolu.lu@linux.intel.com, shuah@kernel.org,
linux-kernel@vger.kernel.org, iommu@lists.linux.dev,
linux-arm-kernel@lists.infradead.org,
linux-kselftest@vger.kernel.org, linux-doc@vger.kernel.org,
eric.auger@redhat.com, jean-philippe@linaro.org, mdf@kernel.org,
mshavit@google.com, shameerali.kolothum.thodi@huawei.com,
smostafa@google.com, ddutile@redhat.com, Liu, Yi L,
patches@lists.linux.dev
On Tue, Feb 18, 2025 at 06:17:15PM +0000, Pranjal Shrivastava wrote:
> > Is MEV available only in nested mode? Otherwise it perhaps makes
> > sense to turn it on in all configurations in IOMMUFD paths...
>
> I think the arm-smmu-v3's iommufd implementation only supports nested
> which could be the reason.
I think starting with MEV in this limited case is reasonable.
I agree it makes sense to always turn it on from a production
perspective..
Jason
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v6 14/14] iommu/arm-smmu-v3: Set MEV bit in nested STE for DoS mitigations
2025-02-18 18:52 ` Jason Gunthorpe
@ 2025-02-20 7:12 ` Nicolin Chen
0 siblings, 0 replies; 60+ messages in thread
From: Nicolin Chen @ 2025-02-20 7:12 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Pranjal Shrivastava, Tian, Kevin, corbet@lwn.net, will@kernel.org,
joro@8bytes.org, suravee.suthikulpanit@amd.com,
robin.murphy@arm.com, dwmw2@infradead.org,
baolu.lu@linux.intel.com, shuah@kernel.org,
linux-kernel@vger.kernel.org, iommu@lists.linux.dev,
linux-arm-kernel@lists.infradead.org,
linux-kselftest@vger.kernel.org, linux-doc@vger.kernel.org,
eric.auger@redhat.com, jean-philippe@linaro.org, mdf@kernel.org,
mshavit@google.com, shameerali.kolothum.thodi@huawei.com,
smostafa@google.com, ddutile@redhat.com, Liu, Yi L,
patches@lists.linux.dev
On Tue, Feb 18, 2025 at 02:52:29PM -0400, Jason Gunthorpe wrote:
> On Tue, Feb 18, 2025 at 06:17:15PM +0000, Pranjal Shrivastava wrote:
>
> > > Is MEV available only in nested mode? Otherwise it perhaps makes
> > > sense to turn it on in all configurations in IOMMUFD paths...
> >
> > I think the arm-smmu-v3's iommufd implementation only supports nested
> > which could be the reason.
>
> I think starting with MEV in this limited case is reasonable.
>
> I agree it makes sense to always turn it on from a production
> perspective..
Then, I will just add a line to the commit log:
"In the future, we might want to enable the MEV for non-nested cases too
such as domain->type == IOMMU_DOMAIN_UNMANAGED or even IOMMU_DOMAIN_DMA."
Thanks
Nicolin
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v6 14/14] iommu/arm-smmu-v3: Set MEV bit in nested STE for DoS mitigations
2025-02-18 18:17 ` Pranjal Shrivastava
2025-02-18 18:52 ` Jason Gunthorpe
@ 2025-02-18 18:53 ` Nicolin Chen
2025-02-20 16:15 ` Pranjal Shrivastava
1 sibling, 1 reply; 60+ messages in thread
From: Nicolin Chen @ 2025-02-18 18:53 UTC (permalink / raw)
To: Pranjal Shrivastava
Cc: Tian, Kevin, jgg@nvidia.com, corbet@lwn.net, will@kernel.org,
joro@8bytes.org, suravee.suthikulpanit@amd.com,
robin.murphy@arm.com, dwmw2@infradead.org,
baolu.lu@linux.intel.com, shuah@kernel.org,
linux-kernel@vger.kernel.org, iommu@lists.linux.dev,
linux-arm-kernel@lists.infradead.org,
linux-kselftest@vger.kernel.org, linux-doc@vger.kernel.org,
eric.auger@redhat.com, jean-philippe@linaro.org, mdf@kernel.org,
mshavit@google.com, shameerali.kolothum.thodi@huawei.com,
smostafa@google.com, ddutile@redhat.com, Liu, Yi L,
patches@lists.linux.dev
On Tue, Feb 18, 2025 at 06:17:15PM +0000, Pranjal Shrivastava wrote:
> On Tue, Feb 18, 2025 at 05:24:08AM +0000, Tian, Kevin wrote:
> > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > Sent: Saturday, January 25, 2025 8:31 AM
> > >
> > > There is a DoS concern on the shared hardware event queue among devices
> > > passed through to VMs, that too many translation failures that belong to
> > > VMs could overflow the shared hardware event queue if those VMs or their
> > > VMMs don't handle/recover the devices properly.
> >
> > This statement is not specific to the nested configuration.
> >
> > >
> > > The MEV bit in the STE allows to configure the SMMU HW to merge similar
> > > event records, though there is no guarantee. Set it in a nested STE for
> > > DoS mitigations.
> >
> > Is MEV available only in nested mode? Otherwise it perhaps makes
> > sense to turn it on in all configurations in IOMMUFD paths...
>
> MEV is available at all times (if an implemented by the HW) and doesn't
> depend on the nested mode. As per the Arm SMMUv3 spec (section 3.5.5):
>
> Events can be merged where all of the following conditions are upheld:
> - The event types and all fields are identical, except fields explicitly
> indicated in section 7.3 Event records.
>
> - If present, the Stall field is 0. Stall fault records are not merged.
>
> I'm not sure to what extent, but I think *trying* to merge similar event
> should reduce some chances of overflowing the hw eventq.
>
> > Is MEV available only in nested mode? Otherwise it perhaps makes
> > sense to turn it on in all configurations in IOMMUFD paths...
>
> I think the arm-smmu-v3's iommufd implementation only supports nested
> which could be the reason.
I guess what Kevin says is that non-nested STE should set the MEV
as well, e.g. BYPASS and ABORT, and perhaps stage-1-only case too
where the attaching domain = UNMANAGED.
Thanks
Nicolin
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v6 14/14] iommu/arm-smmu-v3: Set MEV bit in nested STE for DoS mitigations
2025-02-18 18:53 ` Nicolin Chen
@ 2025-02-20 16:15 ` Pranjal Shrivastava
0 siblings, 0 replies; 60+ messages in thread
From: Pranjal Shrivastava @ 2025-02-20 16:15 UTC (permalink / raw)
To: Nicolin Chen
Cc: Tian, Kevin, jgg@nvidia.com, corbet@lwn.net, will@kernel.org,
joro@8bytes.org, suravee.suthikulpanit@amd.com,
robin.murphy@arm.com, dwmw2@infradead.org,
baolu.lu@linux.intel.com, shuah@kernel.org,
linux-kernel@vger.kernel.org, iommu@lists.linux.dev,
linux-arm-kernel@lists.infradead.org,
linux-kselftest@vger.kernel.org, linux-doc@vger.kernel.org,
eric.auger@redhat.com, jean-philippe@linaro.org, mdf@kernel.org,
mshavit@google.com, shameerali.kolothum.thodi@huawei.com,
smostafa@google.com, ddutile@redhat.com, Liu, Yi L,
patches@lists.linux.dev
On Tue, Feb 18, 2025 at 10:53:55AM -0800, Nicolin Chen wrote:
> > > Is MEV available only in nested mode? Otherwise it perhaps makes
> > > sense to turn it on in all configurations in IOMMUFD paths...
> >
> > I think the arm-smmu-v3's iommufd implementation only supports nested
> > which could be the reason.
>
> I guess what Kevin says is that non-nested STE should set the MEV
> as well, e.g. BYPASS and ABORT, and perhaps stage-1-only case too
> where the attaching domain = UNMANAGED.
>
Ohh okay, got it. Thanks!
Praan
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v6 14/14] iommu/arm-smmu-v3: Set MEV bit in nested STE for DoS mitigations
2025-01-25 0:30 ` [PATCH v6 14/14] iommu/arm-smmu-v3: Set MEV bit in nested STE for DoS mitigations Nicolin Chen
2025-02-18 5:24 ` Tian, Kevin
@ 2025-02-18 17:21 ` Jason Gunthorpe
2025-02-18 18:14 ` Nicolin Chen
2025-02-20 9:09 ` Nicolin Chen
2 siblings, 1 reply; 60+ messages in thread
From: Jason Gunthorpe @ 2025-02-18 17:21 UTC (permalink / raw)
To: Nicolin Chen
Cc: kevin.tian, corbet, will, joro, suravee.suthikulpanit,
robin.murphy, dwmw2, baolu.lu, shuah, linux-kernel, iommu,
linux-arm-kernel, linux-kselftest, linux-doc, eric.auger,
jean-philippe, mdf, mshavit, shameerali.kolothum.thodi, smostafa,
ddutile, yi.l.liu, patches
On Fri, Jan 24, 2025 at 04:30:43PM -0800, Nicolin Chen wrote:
> There is a DoS concern on the shared hardware event queue among devices
> passed through to VMs, that too many translation failures that belong to
> VMs could overflow the shared hardware event queue if those VMs or their
> VMMs don't handle/recover the devices properly.
>
> The MEV bit in the STE allows to configure the SMMU HW to merge similar
> event records, though there is no guarantee. Set it in a nested STE for
> DoS mitigations.
>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> ---
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 1 +
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 2 ++
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 4 ++--
> 3 files changed, 5 insertions(+), 2 deletions(-)
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -1051,7 +1051,7 @@ void arm_smmu_get_ste_used(const __le64 *ent, __le64 *used_bits)
> cpu_to_le64(STRTAB_STE_1_S1DSS | STRTAB_STE_1_S1CIR |
> STRTAB_STE_1_S1COR | STRTAB_STE_1_S1CSH |
> STRTAB_STE_1_S1STALLD | STRTAB_STE_1_STRW |
> - STRTAB_STE_1_EATS);
> + STRTAB_STE_1_EATS | STRTAB_STE_1_MEV);
> used_bits[2] |= cpu_to_le64(STRTAB_STE_2_S2VMID);
You also ran the test suite?
Jason
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v6 14/14] iommu/arm-smmu-v3: Set MEV bit in nested STE for DoS mitigations
2025-02-18 17:21 ` Jason Gunthorpe
@ 2025-02-18 18:14 ` Nicolin Chen
0 siblings, 0 replies; 60+ messages in thread
From: Nicolin Chen @ 2025-02-18 18:14 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: kevin.tian, corbet, will, joro, suravee.suthikulpanit,
robin.murphy, dwmw2, baolu.lu, shuah, linux-kernel, iommu,
linux-arm-kernel, linux-kselftest, linux-doc, eric.auger,
jean-philippe, mdf, mshavit, shameerali.kolothum.thodi, smostafa,
ddutile, yi.l.liu, patches
On Tue, Feb 18, 2025 at 01:21:20PM -0400, Jason Gunthorpe wrote:
> On Fri, Jan 24, 2025 at 04:30:43PM -0800, Nicolin Chen wrote:
> > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > @@ -1051,7 +1051,7 @@ void arm_smmu_get_ste_used(const __le64 *ent, __le64 *used_bits)
> > cpu_to_le64(STRTAB_STE_1_S1DSS | STRTAB_STE_1_S1CIR |
> > STRTAB_STE_1_S1COR | STRTAB_STE_1_S1CSH |
> > STRTAB_STE_1_S1STALLD | STRTAB_STE_1_STRW |
> > - STRTAB_STE_1_EATS);
> > + STRTAB_STE_1_EATS | STRTAB_STE_1_MEV);
> > used_bits[2] |= cpu_to_le64(STRTAB_STE_2_S2VMID);
>
> You also ran the test suite?
Yes, I enabled that in my config, and didn't see anything wrong:
[ 10.832738] # Subtest: arm-smmu-v3-kunit-test
[ 10.837549] # module: arm_smmu_v3_test
[ 10.844208] ok 1 arm_smmu_v3_write_ste_test_bypass_to_abort
[ 10.844339] ok 2 arm_smmu_v3_write_ste_test_abort_to_bypass
[ 10.850507] ok 3 arm_smmu_v3_write_ste_test_cdtable_to_abort
[ 10.856669] ok 4 arm_smmu_v3_write_ste_test_abort_to_cdtable
[ 10.862934] ok 5 arm_smmu_v3_write_ste_test_cdtable_to_bypass
[ 10.869200] ok 6 arm_smmu_v3_write_ste_test_bypass_to_cdtable
[ 10.875550] ok 7 arm_smmu_v3_write_ste_test_cdtable_s1dss_change
[ 10.881899] ok 8 arm_smmu_v3_write_ste_test_s1dssbypass_to_stebypass
[ 10.888512] ok 9 arm_smmu_v3_write_ste_test_stebypass_to_s1dssbypass
[ 10.895482] ok 10 arm_smmu_v3_write_ste_test_s2_to_abort
[ 10.902457] ok 11 arm_smmu_v3_write_ste_test_abort_to_s2
[ 10.908355] ok 12 arm_smmu_v3_write_ste_test_s2_to_bypass
[ 10.914263] ok 13 arm_smmu_v3_write_ste_test_bypass_to_s2
[ 10.920269] ok 14 arm_smmu_v3_write_ste_test_s1_to_s2
[ 10.926267] ok 15 arm_smmu_v3_write_ste_test_s2_to_s1
[ 10.931900] ok 16 arm_smmu_v3_write_ste_test_non_hitless
[ 10.937536] ok 17 arm_smmu_v3_write_cd_test_s1_clear
[ 10.943435] ok 18 arm_smmu_v3_write_cd_test_s1_change_asid
[ 10.948995] ok 19 arm_smmu_v3_write_ste_test_s1_to_s2_stall
[ 10.955074] ok 20 arm_smmu_v3_write_ste_test_s2_to_s1_stall
[ 10.961244] ok 21 arm_smmu_v3_write_cd_test_sva_clear
[ 10.967419] ok 22 arm_smmu_v3_write_cd_test_sva_release
[ 10.972941] # arm-smmu-v3-kunit-test: pass:22 fail:0 skip:0 total:22
[ 10.985141] ok 1 arm-smmu-v3-kunit-test
Thanks
Nicolin
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v6 14/14] iommu/arm-smmu-v3: Set MEV bit in nested STE for DoS mitigations
2025-01-25 0:30 ` [PATCH v6 14/14] iommu/arm-smmu-v3: Set MEV bit in nested STE for DoS mitigations Nicolin Chen
2025-02-18 5:24 ` Tian, Kevin
2025-02-18 17:21 ` Jason Gunthorpe
@ 2025-02-20 9:09 ` Nicolin Chen
2 siblings, 0 replies; 60+ messages in thread
From: Nicolin Chen @ 2025-02-20 9:09 UTC (permalink / raw)
To: jgg, kevin.tian, corbet, will
Cc: joro, suravee.suthikulpanit, robin.murphy, dwmw2, baolu.lu, shuah,
linux-kernel, iommu, linux-arm-kernel, linux-kselftest, linux-doc,
eric.auger, jean-philippe, mdf, mshavit,
shameerali.kolothum.thodi, smostafa, ddutile, yi.l.liu, patches
On Fri, Jan 24, 2025 at 04:30:43PM -0800, Nicolin Chen wrote:
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
> index ceeed907a714..20a0e39d7caa 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
> @@ -43,6 +43,8 @@ static void arm_smmu_make_nested_cd_table_ste(
> target->data[0] |= nested_domain->ste[0] &
> ~cpu_to_le64(STRTAB_STE_0_CFG);
> target->data[1] |= nested_domain->ste[1];
> + /* Merge events for DoS mitigations on eventq */
> + target->data[1] |= STRTAB_STE_1_MEV;
This should have cpu_to_le64(). Fixed accordingly.
Thanks
Nicolin
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v6 00/14] iommufd: Add vIOMMU infrastructure (Part-3: vEVENTQ)
2025-01-25 0:30 [PATCH v6 00/14] iommufd: Add vIOMMU infrastructure (Part-3: vEVENTQ) Nicolin Chen
` (13 preceding siblings ...)
2025-01-25 0:30 ` [PATCH v6 14/14] iommu/arm-smmu-v3: Set MEV bit in nested STE for DoS mitigations Nicolin Chen
@ 2025-02-14 8:03 ` Nicolin Chen
14 siblings, 0 replies; 60+ messages in thread
From: Nicolin Chen @ 2025-02-14 8:03 UTC (permalink / raw)
To: jgg, kevin.tian, corbet, will
Cc: joro, suravee.suthikulpanit, robin.murphy, dwmw2, baolu.lu, shuah,
linux-kernel, iommu, linux-arm-kernel, linux-kselftest, linux-doc,
eric.auger, jean-philippe, mdf, mshavit,
shameerali.kolothum.thodi, smostafa, ddutile, yi.l.liu, patches
On Fri, Jan 24, 2025 at 04:30:29PM -0800, Nicolin Chen wrote:
> This is on Github:
> https://github.com/nicolinc/iommufd/commits/iommufd_veventq-v6
>
> Testing with RMR patches for MSI:
> https://github.com/nicolinc/iommufd/commits/iommufd_veventq-v6-with-rmr
> Paring QEMU branch for testing:
> https://github.com/nicolinc/qemu/commits/wip/for_iommufd_veventq-v6
>
> Changelog
> v6
> * Drop supports_veventq viommu op
> * Split bug/cosmetics fixes out of the series
> * Drop the blocking mutex around copy_to_user()
> * Add veventq_depth in uAPI to limit vEVENTQ size
> * Revise the documentation for a clear description
> * Fix sparse warnings in arm_vmaster_report_event()
> * Rework iommufd_viommu_get_vdev_id() to return -ENOENT v.s. 0
> * Allow Abort/Bypass STEs to allocate vEVENTQ and set STE.MEV for DoS
> mitigations
I rebased a v7 on rc2, but found no changelog at all from v6 to v7,
except Bagas's "Reviewed-by".
So, I think we can still review on this version?
Thanks
Nicolin
^ permalink raw reply [flat|nested] 60+ messages in thread