gfs2.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* [RFC 0/7] dlm: the ultimate verifier for DLM lock correctness
@ 2024-08-27 18:02 Alexander Aring
  2024-08-27 18:02 ` [RFC 1/7] dlm: fix possible lkb_resource null dereference Alexander Aring
                   ` (6 more replies)
  0 siblings, 7 replies; 9+ messages in thread
From: Alexander Aring @ 2024-08-27 18:02 UTC (permalink / raw)
  To: teigland
  Cc: gfs2, song, yukuai3, agruenba, mark, jlbec, joseph.qi, gregkh,
	rafael, akpm, linux-kernel, linux-raid, ocfs2-devel, netdev,
	vvidic, heming.zhao, lucien.xin, paulmck, rcu, juri.lelli,
	williams, aahringo

Hi,

I send this rfc patch series to show a (for me) usable use-case for the
DLM net-namespace functionality that is currently pending, see [0]. This
patch-series introduce the DLM verifier to check on DLM correctness on
any workload running on DLM with net-namespace feature. E.g. on gfs2 you
can just run some filesystem benchmark tests and see if DLM works as
aspected.

This comes very useful when DLM recovery kicks in e.g. when nodes
leaving the lockspace due e.g. fencing and recovery solves lock
dependencies transparently from the user. However there is no "fake
fencing switch" yet for DLM net-namespaces, but might be an idea for
future functionality.

There could be bugs in the verifier, that I don't care if they exists...
We need to check whats happening when the verifier complains but so far
everything looks fine. It just an issue if the verifier doesn't say
anything but a small bug introduced in DLM and the verifier will
complain a lot.

There might be still improvements in the DLM verifier. I needed to
change a little bit the python scripts to generate the code but I did
not add them here to this patch series. Also checkpatch complains about
some things in the verifier code but I oriented myself mostly to the
other existing verifiers. There is a printout of all holders if those
violates the DLM compatible locking states. I might improve them when I
actually try to figure out an existing problem, but for now this
printout is very minimal.

I mainly do this work because I prepare more changes in the DLM recovery
code in future to scale with lockspaces with a lot of members that we
can easily try out with the net-namespace functionality.

I cc here the rcu people, may they also get some ideas to check on lock
correctness using tracing kernel verifier subsystem.

- Alex

[0] https://lore.kernel.org/gfs2/20240814143414.1877505-1-aahringo@redhat.com/

Alexander Aring (7):
  dlm: fix possible lkb_resource null dereference
  dlm: fix swapped args sb_flags vs sb_status
  dlm: make add_to_waiters() that is can't fail
  dlm: add our_nodeid to tracepoints
  dlm: add lkb rv mode to ast tracepoint
  dlm: add more tracepoints for DLM kernel verifier
  rv: add dlm compatible lock state kernel verifier

 Documentation/trace/rv/monitor_dlm.rst |  77 +++++
 fs/dlm/ast.c                           |  30 +-
 fs/dlm/dlm_internal.h                  |   3 +
 fs/dlm/lock.c                          |  64 ++--
 fs/dlm/lockspace.c                     |   4 +
 fs/dlm/user.c                          |   9 +-
 include/trace/events/dlm.h             | 121 ++++++-
 include/trace/events/rv.h              |   9 +
 kernel/trace/rv/Kconfig                |  18 +
 kernel/trace/rv/Makefile               |   1 +
 kernel/trace/rv/monitors/dlm/dlm.c     | 445 +++++++++++++++++++++++++
 kernel/trace/rv/monitors/dlm/dlm.h     |  38 +++
 kernel/trace/rv/monitors/dlm/dlm_da.h  | 143 ++++++++
 tools/verification/models/dlm.dot      |  14 +
 14 files changed, 907 insertions(+), 69 deletions(-)
 create mode 100644 Documentation/trace/rv/monitor_dlm.rst
 create mode 100644 kernel/trace/rv/monitors/dlm/dlm.c
 create mode 100644 kernel/trace/rv/monitors/dlm/dlm.h
 create mode 100644 kernel/trace/rv/monitors/dlm/dlm_da.h
 create mode 100644 tools/verification/models/dlm.dot

-- 
2.43.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RFC 1/7] dlm: fix possible lkb_resource null dereference
  2024-08-27 18:02 [RFC 0/7] dlm: the ultimate verifier for DLM lock correctness Alexander Aring
@ 2024-08-27 18:02 ` Alexander Aring
  2024-08-27 18:02 ` [RFC 2/7] dlm: fix swapped args sb_flags vs sb_status Alexander Aring
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Alexander Aring @ 2024-08-27 18:02 UTC (permalink / raw)
  To: teigland
  Cc: gfs2, song, yukuai3, agruenba, mark, jlbec, joseph.qi, gregkh,
	rafael, akpm, linux-kernel, linux-raid, ocfs2-devel, netdev,
	vvidic, heming.zhao, lucien.xin, paulmck, rcu, juri.lelli,
	williams, aahringo

This patch fixes a possible null pointer dereference when this function is
called from request_lock() as lkb->lkb_resource is not assigned yet,
only after validate_lock_args() by calling attach_lkb(). Another issue
is that a resource name could be a non printable bytearray and we cannot
assume to be ASCII coded.

In this patch we just drop the printout of the resource name, the lkb id
is enough to make a possible connection to a resource name if this
exists.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
 fs/dlm/lock.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
index 0e8d2b9bf908..121d2976986b 100644
--- a/fs/dlm/lock.c
+++ b/fs/dlm/lock.c
@@ -2861,16 +2861,14 @@ static int validate_lock_args(struct dlm_ls *ls, struct dlm_lkb *lkb,
 	case -EINVAL:
 		/* annoy the user because dlm usage is wrong */
 		WARN_ON(1);
-		log_error(ls, "%s %d %x %x %x %d %d %s", __func__,
+		log_error(ls, "%s %d %x %x %x %d %d", __func__,
 			  rv, lkb->lkb_id, dlm_iflags_val(lkb), args->flags,
-			  lkb->lkb_status, lkb->lkb_wait_type,
-			  lkb->lkb_resource->res_name);
+			  lkb->lkb_status, lkb->lkb_wait_type);
 		break;
 	default:
-		log_debug(ls, "%s %d %x %x %x %d %d %s", __func__,
+		log_debug(ls, "%s %d %x %x %x %d %d", __func__,
 			  rv, lkb->lkb_id, dlm_iflags_val(lkb), args->flags,
-			  lkb->lkb_status, lkb->lkb_wait_type,
-			  lkb->lkb_resource->res_name);
+			  lkb->lkb_status, lkb->lkb_wait_type);
 		break;
 	}
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC 2/7] dlm: fix swapped args sb_flags vs sb_status
  2024-08-27 18:02 [RFC 0/7] dlm: the ultimate verifier for DLM lock correctness Alexander Aring
  2024-08-27 18:02 ` [RFC 1/7] dlm: fix possible lkb_resource null dereference Alexander Aring
@ 2024-08-27 18:02 ` Alexander Aring
  2024-08-27 18:02 ` [RFC 3/7] dlm: make add_to_waiters() that is can't fail Alexander Aring
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Alexander Aring @ 2024-08-27 18:02 UTC (permalink / raw)
  To: teigland
  Cc: gfs2, song, yukuai3, agruenba, mark, jlbec, joseph.qi, gregkh,
	rafael, akpm, linux-kernel, linux-raid, ocfs2-devel, netdev,
	vvidic, heming.zhao, lucien.xin, paulmck, rcu, juri.lelli,
	williams, aahringo

The arguments got swapped by commit 986ae3c2a8df ("dlm: fix race between
final callback and remove") fixing this now.

Fixes: 986ae3c2a8df ("dlm: fix race between final callback and remove")
Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
 fs/dlm/ast.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/dlm/ast.c b/fs/dlm/ast.c
index 742b30b61c19..0fe8d80ce5e8 100644
--- a/fs/dlm/ast.c
+++ b/fs/dlm/ast.c
@@ -30,7 +30,7 @@ static void dlm_run_callback(uint32_t ls_id, uint32_t lkb_id, int8_t mode,
 		trace_dlm_bast(ls_id, lkb_id, mode, res_name, res_length);
 		bastfn(astparam, mode);
 	} else if (flags & DLM_CB_CAST) {
-		trace_dlm_ast(ls_id, lkb_id, sb_status, sb_flags, res_name,
+		trace_dlm_ast(ls_id, lkb_id, sb_flags, sb_status, res_name,
 			      res_length);
 		lksb->sb_status = sb_status;
 		lksb->sb_flags = sb_flags;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC 3/7] dlm: make add_to_waiters() that is can't fail
  2024-08-27 18:02 [RFC 0/7] dlm: the ultimate verifier for DLM lock correctness Alexander Aring
  2024-08-27 18:02 ` [RFC 1/7] dlm: fix possible lkb_resource null dereference Alexander Aring
  2024-08-27 18:02 ` [RFC 2/7] dlm: fix swapped args sb_flags vs sb_status Alexander Aring
@ 2024-08-27 18:02 ` Alexander Aring
  2024-08-27 18:02 ` [RFC 4/7] dlm: add our_nodeid to tracepoints Alexander Aring
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Alexander Aring @ 2024-08-27 18:02 UTC (permalink / raw)
  To: teigland
  Cc: gfs2, song, yukuai3, agruenba, mark, jlbec, joseph.qi, gregkh,
	rafael, akpm, linux-kernel, linux-raid, ocfs2-devel, netdev,
	vvidic, heming.zhao, lucien.xin, paulmck, rcu, juri.lelli,
	williams, aahringo

If add_to_waiters() fails we have a problem because the previous called
functions such as validate_lock_args() or validate_unlock_args() sets
specific lkb values that are set for a request, there exists no way back
to revert those changes. When there is a pending lock request the
original request arguments will be overwritten with unknown
consequences.

The good news are that I believe those cases that we fail in
add_to_waiters() can't happen or very unlikely to happen (only if the DLM
user does stupid API things), but if so we have the above mentioned
problem.

There are two conditions that will be removed here. The first one is the
-EINVAL case which contains is_overlap_unlock() or (is_overlap_cancel()
and mstype == DLM_MSG_CANCEL).

The is_overlap_unlock() is missing for the normal UNLOCK case which is
moved to validate_unlock_args(). The is_overlap_cancel() already happens
in validate_unlock_args() when DLM_LKF_CANCEL is set. In case of
validate_lock_args() we check on is_overlap() when it is not a new request,
on a new request the lkb is always new and does not have those values set.

The -EBUSY check can't happen in case as for non new lock requests (when
DLM_LKF_CONVERT is set) we already check in validate_lock_args() for
lkb_wait_type and is_overlap(). Then there is only
validate_unlock_args() that will never hit the default case because
dlm_unlock() will produce DLM_MSG_UNLOCK and DLM_MSG_CANCEL messages.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
 fs/dlm/lock.c | 43 ++++++++++++++-----------------------------
 1 file changed, 14 insertions(+), 29 deletions(-)

diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
index 121d2976986b..8cb5a537bfd3 100644
--- a/fs/dlm/lock.c
+++ b/fs/dlm/lock.c
@@ -1703,19 +1703,11 @@ static int msg_reply_type(int mstype)
 /* add/remove lkb from global waiters list of lkb's waiting for
    a reply from a remote node */
 
-static int add_to_waiters(struct dlm_lkb *lkb, int mstype, int to_nodeid)
+static void add_to_waiters(struct dlm_lkb *lkb, int mstype, int to_nodeid)
 {
 	struct dlm_ls *ls = lkb->lkb_resource->res_ls;
-	int error = 0;
 
 	spin_lock_bh(&ls->ls_waiters_lock);
-
-	if (is_overlap_unlock(lkb) ||
-	    (is_overlap_cancel(lkb) && (mstype == DLM_MSG_CANCEL))) {
-		error = -EINVAL;
-		goto out;
-	}
-
 	if (lkb->lkb_wait_type || is_overlap_cancel(lkb)) {
 		switch (mstype) {
 		case DLM_MSG_UNLOCK:
@@ -1725,7 +1717,11 @@ static int add_to_waiters(struct dlm_lkb *lkb, int mstype, int to_nodeid)
 			set_bit(DLM_IFL_OVERLAP_CANCEL_BIT, &lkb->lkb_iflags);
 			break;
 		default:
-			error = -EBUSY;
+			/* should never happen as validate_lock_args() checks
+			 * on lkb_wait_type and validate_unlock_args() only
+			 * creates UNLOCK or CANCEL messages.
+			 */
+			WARN_ON_ONCE(1);
 			goto out;
 		}
 		lkb->lkb_wait_count++;
@@ -1747,12 +1743,7 @@ static int add_to_waiters(struct dlm_lkb *lkb, int mstype, int to_nodeid)
 	hold_lkb(lkb);
 	list_add(&lkb->lkb_wait_reply, &ls->ls_waiters);
  out:
-	if (error)
-		log_error(ls, "addwait error %x %d flags %x %d %d %s",
-			  lkb->lkb_id, error, dlm_iflags_val(lkb), mstype,
-			  lkb->lkb_wait_type, lkb->lkb_resource->res_name);
 	spin_unlock_bh(&ls->ls_waiters_lock);
-	return error;
 }
 
 /* We clear the RESEND flag because we might be taking an lkb off the waiters
@@ -2926,13 +2917,16 @@ static int validate_unlock_args(struct dlm_lkb *lkb, struct dlm_args *args)
 		goto out;
 	}
 
+	if (is_overlap_unlock(lkb))
+		goto out;
+
 	/* cancel not allowed with another cancel/unlock in progress */
 
 	if (args->flags & DLM_LKF_CANCEL) {
 		if (lkb->lkb_exflags & DLM_LKF_CANCEL)
 			goto out;
 
-		if (is_overlap(lkb))
+		if (is_overlap_cancel(lkb))
 			goto out;
 
 		if (test_bit(DLM_IFL_RESEND_BIT, &lkb->lkb_iflags)) {
@@ -2970,9 +2964,6 @@ static int validate_unlock_args(struct dlm_lkb *lkb, struct dlm_args *args)
 		if (lkb->lkb_exflags & DLM_LKF_FORCEUNLOCK)
 			goto out;
 
-		if (is_overlap_unlock(lkb))
-			goto out;
-
 		if (test_bit(DLM_IFL_RESEND_BIT, &lkb->lkb_iflags)) {
 			set_bit(DLM_IFL_OVERLAP_UNLOCK_BIT, &lkb->lkb_iflags);
 			rv = -EBUSY;
@@ -3608,10 +3599,7 @@ static int send_common(struct dlm_rsb *r, struct dlm_lkb *lkb, int mstype)
 
 	to_nodeid = r->res_nodeid;
 
-	error = add_to_waiters(lkb, mstype, to_nodeid);
-	if (error)
-		return error;
-
+	add_to_waiters(lkb, mstype, to_nodeid);
 	error = create_message(r, lkb, to_nodeid, mstype, &ms, &mh);
 	if (error)
 		goto fail;
@@ -3714,10 +3702,7 @@ static int send_lookup(struct dlm_rsb *r, struct dlm_lkb *lkb)
 
 	to_nodeid = dlm_dir_nodeid(r);
 
-	error = add_to_waiters(lkb, DLM_MSG_LOOKUP, to_nodeid);
-	if (error)
-		return error;
-
+	add_to_waiters(lkb, DLM_MSG_LOOKUP, to_nodeid);
 	error = create_message(r, NULL, to_nodeid, DLM_MSG_LOOKUP, &ms, &mh);
 	if (error)
 		goto fail;
@@ -6342,8 +6327,8 @@ int dlm_debug_add_lkb_to_waiters(struct dlm_ls *ls, uint32_t lkb_id,
 	if (error)
 		return error;
 
-	error = add_to_waiters(lkb, mstype, to_nodeid);
+	add_to_waiters(lkb, mstype, to_nodeid);
 	dlm_put_lkb(lkb);
-	return error;
+	return 0;
 }
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC 4/7] dlm: add our_nodeid to tracepoints
  2024-08-27 18:02 [RFC 0/7] dlm: the ultimate verifier for DLM lock correctness Alexander Aring
                   ` (2 preceding siblings ...)
  2024-08-27 18:02 ` [RFC 3/7] dlm: make add_to_waiters() that is can't fail Alexander Aring
@ 2024-08-27 18:02 ` Alexander Aring
  2024-08-27 18:02 ` [RFC 5/7] dlm: add lkb rv mode to ast tracepoint Alexander Aring
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Alexander Aring @ 2024-08-27 18:02 UTC (permalink / raw)
  To: teigland
  Cc: gfs2, song, yukuai3, agruenba, mark, jlbec, joseph.qi, gregkh,
	rafael, akpm, linux-kernel, linux-raid, ocfs2-devel, netdev,
	vvidic, heming.zhao, lucien.xin, paulmck, rcu, juri.lelli,
	williams, aahringo

This patch adds our_nodeid to some DLM tracepoints that are necessary
for the DLM kernel verifier to know from which nodeid the traceevent
comes from. This is useful when using DLM in net-namespaces to get a
whole cluster-view of DLM in traces.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
 fs/dlm/ast.c               | 23 +++++++++++++----------
 fs/dlm/dlm_internal.h      |  1 +
 fs/dlm/user.c              |  9 +++++----
 include/trace/events/dlm.h | 36 +++++++++++++++++++++++-------------
 4 files changed, 42 insertions(+), 27 deletions(-)

diff --git a/fs/dlm/ast.c b/fs/dlm/ast.c
index 0fe8d80ce5e8..01de0d4b9450 100644
--- a/fs/dlm/ast.c
+++ b/fs/dlm/ast.c
@@ -18,20 +18,21 @@
 #include "user.h"
 #include "ast.h"
 
-static void dlm_run_callback(uint32_t ls_id, uint32_t lkb_id, int8_t mode,
-			     uint32_t flags, uint8_t sb_flags, int sb_status,
-			     struct dlm_lksb *lksb,
+static void dlm_run_callback(int our_nodeid, uint32_t ls_id, uint32_t lkb_id,
+			     int8_t mode, uint32_t flags, uint8_t sb_flags,
+			     int sb_status, struct dlm_lksb *lksb,
 			     void (*astfn)(void *astparam),
 			     void (*bastfn)(void *astparam, int mode),
 			     void *astparam, const char *res_name,
 			     size_t res_length)
 {
 	if (flags & DLM_CB_BAST) {
-		trace_dlm_bast(ls_id, lkb_id, mode, res_name, res_length);
+		trace_dlm_bast(our_nodeid, ls_id, lkb_id, mode, res_name,
+			       res_length);
 		bastfn(astparam, mode);
 	} else if (flags & DLM_CB_CAST) {
-		trace_dlm_ast(ls_id, lkb_id, sb_flags, sb_status, res_name,
-			      res_length);
+		trace_dlm_ast(our_nodeid, ls_id, lkb_id, sb_flags, sb_status,
+			      res_name, res_length);
 		lksb->sb_status = sb_status;
 		lksb->sb_flags = sb_flags;
 		astfn(astparam);
@@ -40,8 +41,8 @@ static void dlm_run_callback(uint32_t ls_id, uint32_t lkb_id, int8_t mode,
 
 static void dlm_do_callback(struct dlm_callback *cb)
 {
-	dlm_run_callback(cb->ls_id, cb->lkb_id, cb->mode, cb->flags,
-			 cb->sb_flags, cb->sb_status, cb->lkb_lksb,
+	dlm_run_callback(cb->our_nodeid, cb->ls_id, cb->lkb_id, cb->mode,
+			 cb->flags, cb->sb_flags, cb->sb_status, cb->lkb_lksb,
 			 cb->astfn, cb->bastfn, cb->astparam,
 			 cb->res_name, cb->res_length);
 	dlm_free_cb(cb);
@@ -130,6 +131,7 @@ int dlm_get_cb(struct dlm_lkb *lkb, uint32_t flags, int mode,
 		return -ENOMEM;
 
 	/* for tracing */
+	(*cb)->our_nodeid = ls->ls_dn->our_node->id;
 	(*cb)->lkb_id = lkb->lkb_id;
 	(*cb)->ls_id = ls->ls_global_id;
 	memcpy((*cb)->res_name, rsb->res_name, rsb->res_length);
@@ -185,8 +187,9 @@ void dlm_add_cb(struct dlm_lkb *lkb, uint32_t flags, int mode, int status,
 			list_add(&cb->list, &ls->ls_cb_delay);
 	} else {
 		if (test_bit(LSFL_SOFTIRQ, &ls->ls_flags)) {
-			dlm_run_callback(ls->ls_global_id, lkb->lkb_id, mode, flags,
-					 sbflags, status, lkb->lkb_lksb,
+			dlm_run_callback(ls->ls_dn->our_node->id,
+					 ls->ls_global_id, lkb->lkb_id, mode,
+					 flags, sbflags, status, lkb->lkb_lksb,
 					 lkb->lkb_astfn, lkb->lkb_bastfn,
 					 lkb->lkb_astparam, rsb->res_name,
 					 rsb->res_length);
diff --git a/fs/dlm/dlm_internal.h b/fs/dlm/dlm_internal.h
index 2de5ef2653cd..bc3ff1b64e0c 100644
--- a/fs/dlm/dlm_internal.h
+++ b/fs/dlm/dlm_internal.h
@@ -234,6 +234,7 @@ struct dlm_callback {
 	bool			copy_lvb;
 	struct dlm_lksb		*lkb_lksb;
 	unsigned char		lvbptr[DLM_USER_LVB_LEN];
+	int			our_nodeid;
 
 	union {
 		void			*astparam;	/* caller's ast arg */
diff --git a/fs/dlm/user.c b/fs/dlm/user.c
index 1b682f8f95b6..c4d6e67ff63e 100644
--- a/fs/dlm/user.c
+++ b/fs/dlm/user.c
@@ -868,13 +868,14 @@ static ssize_t device_read(struct file *file, char __user *buf, size_t count,
 	spin_unlock_bh(&proc->asts_spin);
 
 	if (cb->flags & DLM_CB_BAST) {
-		trace_dlm_bast(cb->ls_id, cb->lkb_id, cb->mode, cb->res_name,
-			       cb->res_length);
+		trace_dlm_bast(cb->our_nodeid, cb->ls_id, cb->lkb_id,
+			       cb->mode, cb->res_name, cb->res_length);
 	} else if (cb->flags & DLM_CB_CAST) {
 		cb->lkb_lksb->sb_status = cb->sb_status;
 		cb->lkb_lksb->sb_flags = cb->sb_flags;
-		trace_dlm_ast(cb->ls_id, cb->lkb_id, cb->sb_status,
-			      cb->sb_flags, cb->res_name, cb->res_length);
+		trace_dlm_ast(cb->our_nodeid, cb->ls_id, cb->lkb_id,
+			      cb->sb_status, cb->sb_flags, cb->res_name,
+			      cb->res_length);
 	}
 
 	ret = copy_result_to_user(&cb->ua,
diff --git a/include/trace/events/dlm.h b/include/trace/events/dlm.h
index af160082c9e3..2621bb7ac3a8 100644
--- a/include/trace/events/dlm.h
+++ b/include/trace/events/dlm.h
@@ -98,6 +98,7 @@ TRACE_EVENT(dlm_lock_start,
 	TP_ARGS(ls, lkb, name, namelen, mode, flags),
 
 	TP_STRUCT__entry(
+		__field(unsigned int, our_nodeid)
 		__field(__u32, ls_id)
 		__field(__u32, lkb_id)
 		__field(int, mode)
@@ -109,6 +110,7 @@ TRACE_EVENT(dlm_lock_start,
 	TP_fast_assign(
 		struct dlm_rsb *r;
 
+		__entry->our_nodeid = ls->ls_dn->our_node->id;
 		__entry->ls_id = ls->ls_global_id;
 		__entry->lkb_id = lkb->lkb_id;
 		__entry->mode = mode;
@@ -123,8 +125,8 @@ TRACE_EVENT(dlm_lock_start,
 			       __get_dynamic_array_len(res_name));
 	),
 
-	TP_printk("ls_id=%u lkb_id=%x mode=%s flags=%s res_name=%s",
-		  __entry->ls_id, __entry->lkb_id,
+	TP_printk("our_nodeid=%u ls_id=%u lkb_id=%x mode=%s flags=%s res_name=%s",
+		  __entry->our_nodeid, __entry->ls_id, __entry->lkb_id,
 		  show_lock_mode(__entry->mode),
 		  show_lock_flags(__entry->flags),
 		  __print_hex_str(__get_dynamic_array(res_name),
@@ -141,6 +143,7 @@ TRACE_EVENT(dlm_lock_end,
 	TP_ARGS(ls, lkb, name, namelen, mode, flags, error, kernel_lock),
 
 	TP_STRUCT__entry(
+		__field(unsigned int, our_nodeid)
 		__field(__u32, ls_id)
 		__field(__u32, lkb_id)
 		__field(int, mode)
@@ -153,6 +156,7 @@ TRACE_EVENT(dlm_lock_end,
 	TP_fast_assign(
 		struct dlm_rsb *r;
 
+		__entry->our_nodeid = ls->ls_dn->our_node->id;
 		__entry->ls_id = ls->ls_global_id;
 		__entry->lkb_id = lkb->lkb_id;
 		__entry->mode = mode;
@@ -178,8 +182,8 @@ TRACE_EVENT(dlm_lock_end,
 
 	),
 
-	TP_printk("ls_id=%u lkb_id=%x mode=%s flags=%s error=%d res_name=%s",
-		  __entry->ls_id, __entry->lkb_id,
+	TP_printk("our_nodeid=%u ls_id=%u lkb_id=%x mode=%s flags=%s error=%d res_name=%s",
+		  __entry->our_nodeid, __entry->ls_id, __entry->lkb_id,
 		  show_lock_mode(__entry->mode),
 		  show_lock_flags(__entry->flags), __entry->error,
 		  __print_hex_str(__get_dynamic_array(res_name),
@@ -189,12 +193,13 @@ TRACE_EVENT(dlm_lock_end,
 
 TRACE_EVENT(dlm_bast,
 
-	TP_PROTO(__u32 ls_id, __u32 lkb_id, int mode,
+	TP_PROTO(unsigned int our_nodeid, __u32 ls_id, __u32 lkb_id, int mode,
 		 const char *res_name, size_t res_length),
 
-	TP_ARGS(ls_id, lkb_id, mode, res_name, res_length),
+	TP_ARGS(our_nodeid, ls_id, lkb_id, mode, res_name, res_length),
 
 	TP_STRUCT__entry(
+		__field(unsigned int, our_nodeid)
 		__field(__u32, ls_id)
 		__field(__u32, lkb_id)
 		__field(int, mode)
@@ -202,6 +207,7 @@ TRACE_EVENT(dlm_bast,
 	),
 
 	TP_fast_assign(
+		__entry->our_nodeid = our_nodeid;
 		__entry->ls_id = ls_id;
 		__entry->lkb_id = lkb_id;
 		__entry->mode = mode;
@@ -210,8 +216,8 @@ TRACE_EVENT(dlm_bast,
 		       __get_dynamic_array_len(res_name));
 	),
 
-	TP_printk("ls_id=%u lkb_id=%x mode=%s res_name=%s",
-		  __entry->ls_id, __entry->lkb_id,
+	TP_printk("our_nodeid=%u ls_id=%u lkb_id=%x mode=%s res_name=%s",
+		  __entry->our_nodeid, __entry->ls_id, __entry->lkb_id,
 		  show_lock_mode(__entry->mode),
 		  __print_hex_str(__get_dynamic_array(res_name),
 				  __get_dynamic_array_len(res_name)))
@@ -220,12 +226,15 @@ TRACE_EVENT(dlm_bast,
 
 TRACE_EVENT(dlm_ast,
 
-	TP_PROTO(__u32 ls_id, __u32 lkb_id, __u8 sb_flags, int sb_status,
-		 const char *res_name, size_t res_length),
+	TP_PROTO(unsigned int our_nodeid, __u32 ls_id, __u32 lkb_id,
+		 __u8 sb_flags, int sb_status, const char *res_name,
+		 size_t res_length),
 
-	TP_ARGS(ls_id, lkb_id, sb_flags, sb_status, res_name, res_length),
+	TP_ARGS(our_nodeid, ls_id, lkb_id, sb_flags, sb_status, res_name,
+		res_length),
 
 	TP_STRUCT__entry(
+		__field(unsigned int, our_nodeid)
 		__field(__u32, ls_id)
 		__field(__u32, lkb_id)
 		__field(__u8, sb_flags)
@@ -234,6 +243,7 @@ TRACE_EVENT(dlm_ast,
 	),
 
 	TP_fast_assign(
+		__entry->our_nodeid = our_nodeid;
 		__entry->ls_id = ls_id;
 		__entry->lkb_id = lkb_id;
 		__entry->sb_flags = sb_flags;
@@ -243,8 +253,8 @@ TRACE_EVENT(dlm_ast,
 		       __get_dynamic_array_len(res_name));
 	),
 
-	TP_printk("ls_id=%u lkb_id=%x sb_flags=%s sb_status=%d res_name=%s",
-		  __entry->ls_id, __entry->lkb_id,
+	TP_printk("our_nodeid=%u ls_id=%u lkb_id=%x sb_flags=%s sb_status=%d res_name=%s",
+		  __entry->our_nodeid, __entry->ls_id, __entry->lkb_id,
 		  show_dlm_sb_flags(__entry->sb_flags), __entry->sb_status,
 		  __print_hex_str(__get_dynamic_array(res_name),
 				  __get_dynamic_array_len(res_name)))
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC 5/7] dlm: add lkb rv mode to ast tracepoint
  2024-08-27 18:02 [RFC 0/7] dlm: the ultimate verifier for DLM lock correctness Alexander Aring
                   ` (3 preceding siblings ...)
  2024-08-27 18:02 ` [RFC 4/7] dlm: add our_nodeid to tracepoints Alexander Aring
@ 2024-08-27 18:02 ` Alexander Aring
  2024-08-27 18:02 ` [RFC 6/7] dlm: add more tracepoints for DLM kernel verifier Alexander Aring
  2024-08-27 18:02 ` [RFC 7/7] rv: add dlm compatible lock state " Alexander Aring
  6 siblings, 0 replies; 9+ messages in thread
From: Alexander Aring @ 2024-08-27 18:02 UTC (permalink / raw)
  To: teigland
  Cc: gfs2, song, yukuai3, agruenba, mark, jlbec, joseph.qi, gregkh,
	rafael, akpm, linux-kernel, linux-raid, ocfs2-devel, netdev,
	vvidic, heming.zhao, lucien.xin, paulmck, rcu, juri.lelli,
	williams, aahringo

This patch adds the lkb_rv_mode to the ast tracepoint. The lkb_rv_mode
is the requested mode by dlm_lock() requests. We cannot use lkb_mode as
dlm internally sometimes changes this value and for cases like the dlm
kernel verifier we want to check on DLM correctness from what the user
is seeing. This new tracepoint events tells us what the requested mode
was.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
 fs/dlm/ast.c               |  9 +++++----
 fs/dlm/dlm_internal.h      |  2 ++
 fs/dlm/lock.c              |  1 +
 fs/dlm/user.c              |  2 +-
 include/trace/events/dlm.h | 11 +++++++----
 5 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/fs/dlm/ast.c b/fs/dlm/ast.c
index 01de0d4b9450..fc2ca37011d8 100644
--- a/fs/dlm/ast.c
+++ b/fs/dlm/ast.c
@@ -24,7 +24,7 @@ static void dlm_run_callback(int our_nodeid, uint32_t ls_id, uint32_t lkb_id,
 			     void (*astfn)(void *astparam),
 			     void (*bastfn)(void *astparam, int mode),
 			     void *astparam, const char *res_name,
-			     size_t res_length)
+			     size_t res_length, int rv_mode)
 {
 	if (flags & DLM_CB_BAST) {
 		trace_dlm_bast(our_nodeid, ls_id, lkb_id, mode, res_name,
@@ -32,7 +32,7 @@ static void dlm_run_callback(int our_nodeid, uint32_t ls_id, uint32_t lkb_id,
 		bastfn(astparam, mode);
 	} else if (flags & DLM_CB_CAST) {
 		trace_dlm_ast(our_nodeid, ls_id, lkb_id, sb_flags, sb_status,
-			      res_name, res_length);
+			      res_name, res_length, rv_mode);
 		lksb->sb_status = sb_status;
 		lksb->sb_flags = sb_flags;
 		astfn(astparam);
@@ -44,7 +44,7 @@ static void dlm_do_callback(struct dlm_callback *cb)
 	dlm_run_callback(cb->our_nodeid, cb->ls_id, cb->lkb_id, cb->mode,
 			 cb->flags, cb->sb_flags, cb->sb_status, cb->lkb_lksb,
 			 cb->astfn, cb->bastfn, cb->astparam,
-			 cb->res_name, cb->res_length);
+			 cb->res_name, cb->res_length, cb->rv_mode);
 	dlm_free_cb(cb);
 }
 
@@ -134,6 +134,7 @@ int dlm_get_cb(struct dlm_lkb *lkb, uint32_t flags, int mode,
 	(*cb)->our_nodeid = ls->ls_dn->our_node->id;
 	(*cb)->lkb_id = lkb->lkb_id;
 	(*cb)->ls_id = ls->ls_global_id;
+	(*cb)->rv_mode = lkb->lkb_rv_mode;
 	memcpy((*cb)->res_name, rsb->res_name, rsb->res_length);
 	(*cb)->res_length = rsb->res_length;
 
@@ -192,7 +193,7 @@ void dlm_add_cb(struct dlm_lkb *lkb, uint32_t flags, int mode, int status,
 					 flags, sbflags, status, lkb->lkb_lksb,
 					 lkb->lkb_astfn, lkb->lkb_bastfn,
 					 lkb->lkb_astparam, rsb->res_name,
-					 rsb->res_length);
+					 rsb->res_length, lkb->lkb_rv_mode);
 		} else {
 			rv = dlm_get_queue_cb(lkb, flags, mode, status, sbflags, &cb);
 			if (!rv)
diff --git a/fs/dlm/dlm_internal.h b/fs/dlm/dlm_internal.h
index bc3ff1b64e0c..3f630696f7ab 100644
--- a/fs/dlm/dlm_internal.h
+++ b/fs/dlm/dlm_internal.h
@@ -247,6 +247,7 @@ struct dlm_callback {
 	size_t			res_length;
 	uint32_t		ls_id;
 	uint32_t		lkb_id;
+	int			rv_mode;
 
 	struct list_head	list;
 };
@@ -296,6 +297,7 @@ struct dlm_lkb {
 		void			*lkb_astparam;	/* caller's ast arg */
 		struct dlm_user_args	*lkb_ua;
 	};
+	int			lkb_rv_mode;
 	struct rcu_head		rcu;
 };
 
diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
index 8cb5a537bfd3..21bb9603a0df 100644
--- a/fs/dlm/lock.c
+++ b/fs/dlm/lock.c
@@ -2844,6 +2844,7 @@ static int validate_lock_args(struct dlm_ls *ls, struct dlm_lkb *lkb,
 	lkb->lkb_lksb = args->lksb;
 	lkb->lkb_lvbptr = args->lksb->sb_lvbptr;
 	lkb->lkb_ownpid = (int) current->pid;
+	lkb->lkb_rv_mode = args->mode;
 	rv = 0;
  out:
 	switch (rv) {
diff --git a/fs/dlm/user.c b/fs/dlm/user.c
index c4d6e67ff63e..75fb85676e90 100644
--- a/fs/dlm/user.c
+++ b/fs/dlm/user.c
@@ -875,7 +875,7 @@ static ssize_t device_read(struct file *file, char __user *buf, size_t count,
 		cb->lkb_lksb->sb_flags = cb->sb_flags;
 		trace_dlm_ast(cb->our_nodeid, cb->ls_id, cb->lkb_id,
 			      cb->sb_status, cb->sb_flags, cb->res_name,
-			      cb->res_length);
+			      cb->res_length, cb->rv_mode);
 	}
 
 	ret = copy_result_to_user(&cb->ua,
diff --git a/include/trace/events/dlm.h b/include/trace/events/dlm.h
index 2621bb7ac3a8..f8d7ca451760 100644
--- a/include/trace/events/dlm.h
+++ b/include/trace/events/dlm.h
@@ -228,10 +228,10 @@ TRACE_EVENT(dlm_ast,
 
 	TP_PROTO(unsigned int our_nodeid, __u32 ls_id, __u32 lkb_id,
 		 __u8 sb_flags, int sb_status, const char *res_name,
-		 size_t res_length),
+		 size_t res_length, int rv_mode),
 
 	TP_ARGS(our_nodeid, ls_id, lkb_id, sb_flags, sb_status, res_name,
-		res_length),
+		res_length, rv_mode),
 
 	TP_STRUCT__entry(
 		__field(unsigned int, our_nodeid)
@@ -239,6 +239,7 @@ TRACE_EVENT(dlm_ast,
 		__field(__u32, lkb_id)
 		__field(__u8, sb_flags)
 		__field(int, sb_status)
+		__field(int, rv_mode)
 		__dynamic_array(unsigned char, res_name, res_length)
 	),
 
@@ -248,16 +249,18 @@ TRACE_EVENT(dlm_ast,
 		__entry->lkb_id = lkb_id;
 		__entry->sb_flags = sb_flags;
 		__entry->sb_status = sb_status;
+		__entry->rv_mode = rv_mode;
 
 		memcpy(__get_dynamic_array(res_name), res_name,
 		       __get_dynamic_array_len(res_name));
 	),
 
-	TP_printk("our_nodeid=%u ls_id=%u lkb_id=%x sb_flags=%s sb_status=%d res_name=%s",
+	TP_printk("our_nodeid=%u ls_id=%u lkb_id=%x sb_flags=%s sb_status=%d res_name=%s rv_mode=%d",
 		  __entry->our_nodeid, __entry->ls_id, __entry->lkb_id,
 		  show_dlm_sb_flags(__entry->sb_flags), __entry->sb_status,
 		  __print_hex_str(__get_dynamic_array(res_name),
-				  __get_dynamic_array_len(res_name)))
+				  __get_dynamic_array_len(res_name)),
+		  __entry->rv_mode)
 
 );
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC 6/7] dlm: add more tracepoints for DLM kernel verifier
  2024-08-27 18:02 [RFC 0/7] dlm: the ultimate verifier for DLM lock correctness Alexander Aring
                   ` (4 preceding siblings ...)
  2024-08-27 18:02 ` [RFC 5/7] dlm: add lkb rv mode to ast tracepoint Alexander Aring
@ 2024-08-27 18:02 ` Alexander Aring
  2024-08-27 18:02 ` [RFC 7/7] rv: add dlm compatible lock state " Alexander Aring
  6 siblings, 0 replies; 9+ messages in thread
From: Alexander Aring @ 2024-08-27 18:02 UTC (permalink / raw)
  To: teigland
  Cc: gfs2, song, yukuai3, agruenba, mark, jlbec, joseph.qi, gregkh,
	rafael, akpm, linux-kernel, linux-raid, ocfs2-devel, netdev,
	vvidic, heming.zhao, lucien.xin, paulmck, rcu, juri.lelli,
	williams, aahringo

This patch adds more useful tracepoints for the kernel verifier. The
lock/unlock validation tracepoints are useful to store the request state
mode that the lock must change to, at this time the lock request cannot
fail anymore. (It can fail but there might be upcoming changes because
we can't deal with that e.g. -ENOMEM cases).

Another tracepoint is dlm_release_lockspace() that signals us that a
node is leaving the lockspace and all locks holding should be dropped.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
 fs/dlm/lock.c              | 10 +++--
 fs/dlm/lockspace.c         |  4 ++
 include/trace/events/dlm.h | 80 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 91 insertions(+), 3 deletions(-)

diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
index 21bb9603a0df..597418cba76a 100644
--- a/fs/dlm/lock.c
+++ b/fs/dlm/lock.c
@@ -2811,7 +2811,8 @@ static int set_unlock_args(uint32_t flags, void *astarg, struct dlm_args *args)
 }
 
 static int validate_lock_args(struct dlm_ls *ls, struct dlm_lkb *lkb,
-			      struct dlm_args *args)
+			      struct dlm_args *args, const char *res_name,
+			      size_t res_length)
 {
 	int rv = -EBUSY;
 
@@ -2845,6 +2846,7 @@ static int validate_lock_args(struct dlm_ls *ls, struct dlm_lkb *lkb,
 	lkb->lkb_lvbptr = args->lksb->sb_lvbptr;
 	lkb->lkb_ownpid = (int) current->pid;
 	lkb->lkb_rv_mode = args->mode;
+	trace_dlm_lock_validated(ls, lkb, args, res_name, res_length);
 	rv = 0;
  out:
 	switch (rv) {
@@ -2988,6 +2990,7 @@ static int validate_unlock_args(struct dlm_lkb *lkb, struct dlm_args *args)
 	lkb->lkb_exflags |= args->flags;
 	dlm_set_sbflags_val(lkb, 0);
 	lkb->lkb_astparam = args->astparam;
+	trace_dlm_unlock_validated(ls, lkb, args);
 	rv = 0;
  out:
 	switch (rv) {
@@ -3264,7 +3267,7 @@ static int request_lock(struct dlm_ls *ls, struct dlm_lkb *lkb,
 	struct dlm_rsb *r;
 	int error;
 
-	error = validate_lock_args(ls, lkb, args);
+	error = validate_lock_args(ls, lkb, args, name, len);
 	if (error)
 		return error;
 
@@ -3295,7 +3298,8 @@ static int convert_lock(struct dlm_ls *ls, struct dlm_lkb *lkb,
 	hold_rsb(r);
 	lock_rsb(r);
 
-	error = validate_lock_args(ls, lkb, args);
+	error = validate_lock_args(ls, lkb, args, r->res_name,
+				   r->res_length);
 	if (error)
 		goto out;
 
diff --git a/fs/dlm/lockspace.c b/fs/dlm/lockspace.c
index 092f7017b896..c19d797264c5 100644
--- a/fs/dlm/lockspace.c
+++ b/fs/dlm/lockspace.c
@@ -9,6 +9,8 @@
 *******************************************************************************
 ******************************************************************************/
 
+#include <trace/events/dlm.h>
+
 #include <linux/netdevice.h>
 #include <linux/module.h>
 
@@ -781,6 +783,8 @@ static int release_lockspace(struct dlm_net *dn, struct dlm_ls *ls, int force)
 		return rv;
 	}
 
+	trace_dlm_release_lockspace(dn->our_node->id, ls->ls_global_id);
+
 	if (dn->ls_count == 1)
 		dlm_midcomms_version_wait(ls->ls_dn);
 
diff --git a/include/trace/events/dlm.h b/include/trace/events/dlm.h
index f8d7ca451760..facad6251e43 100644
--- a/include/trace/events/dlm.h
+++ b/include/trace/events/dlm.h
@@ -338,6 +338,86 @@ TRACE_EVENT(dlm_unlock_end,
 
 );
 
+TRACE_EVENT(dlm_release_lockspace,
+
+	TP_PROTO(unsigned int our_nodeid, __u32 ls_id),
+
+	TP_ARGS(our_nodeid, ls_id),
+
+	TP_STRUCT__entry(
+		__field(unsigned int, our_nodeid)
+		__field(__u32, ls_id)
+	),
+
+	TP_fast_assign(
+		__entry->our_nodeid = our_nodeid;
+		__entry->ls_id = ls_id;
+	),
+
+	TP_printk("our_nodeid=%u ls_id=%u",
+		  __entry->our_nodeid, __entry->ls_id)
+
+);
+
+TRACE_EVENT(dlm_lock_validated,
+
+	TP_PROTO(struct dlm_ls *ls, struct dlm_lkb *lkb, struct dlm_args *args,
+		 const char *res_name, size_t res_length),
+
+	TP_ARGS(ls, lkb, args, res_name, res_length),
+
+	TP_STRUCT__entry(
+		__field(uint32_t, our_nodeid)
+		__field(uint32_t, ls_id)
+		__dynamic_array(unsigned char, res_name, res_length)
+		__field(int, mode)
+	),
+
+	TP_fast_assign(
+		__entry->our_nodeid = ls->ls_dn->our_node->id;
+		__entry->ls_id = ls->ls_global_id;
+		memcpy(__get_dynamic_array(res_name), res_name,
+		       __get_dynamic_array_len(res_name));
+		__entry->mode = args->mode;
+	),
+
+	TP_printk("our_nodeid=%u ls_id=%u res_name=%s mode=%d",
+		  __entry->our_nodeid, __entry->ls_id,
+		  __print_hex_str(__get_dynamic_array(res_name),
+				  __get_dynamic_array_len(res_name)),
+		  __entry->mode)
+
+);
+
+TRACE_EVENT(dlm_unlock_validated,
+
+	TP_PROTO(struct dlm_ls *ls, struct dlm_lkb *lkb, struct dlm_args *args),
+
+	TP_ARGS(ls, lkb, args),
+
+	TP_STRUCT__entry(
+		__field(uint32_t, our_nodeid)
+		__field(uint32_t, ls_id)
+		__dynamic_array(unsigned char, res_name,
+				lkb->lkb_resource->res_length)
+	),
+
+	TP_fast_assign(
+		struct dlm_rsb *r = lkb->lkb_resource;
+
+		__entry->our_nodeid = ls->ls_dn->our_node->id;
+		__entry->ls_id = ls->ls_global_id;
+		memcpy(__get_dynamic_array(res_name), r->res_name,
+		       __get_dynamic_array_len(res_name));
+	),
+
+	TP_printk("our_nodeid=%u ls_id=%u res_name=%s",
+		  __entry->our_nodeid, __entry->ls_id,
+		  __print_hex_str(__get_dynamic_array(res_name),
+				  __get_dynamic_array_len(res_name)))
+
+);
+
 DECLARE_EVENT_CLASS(dlm_rcom_template,
 
 	TP_PROTO(uint32_t dst, uint32_t h_seq, const struct dlm_rcom *rc),
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC 7/7] rv: add dlm compatible lock state kernel verifier
  2024-08-27 18:02 [RFC 0/7] dlm: the ultimate verifier for DLM lock correctness Alexander Aring
                   ` (5 preceding siblings ...)
  2024-08-27 18:02 ` [RFC 6/7] dlm: add more tracepoints for DLM kernel verifier Alexander Aring
@ 2024-08-27 18:02 ` Alexander Aring
  2024-08-30 12:55   ` Alexander Aring
  6 siblings, 1 reply; 9+ messages in thread
From: Alexander Aring @ 2024-08-27 18:02 UTC (permalink / raw)
  To: teigland
  Cc: gfs2, song, yukuai3, agruenba, mark, jlbec, joseph.qi, gregkh,
	rafael, akpm, linux-kernel, linux-raid, ocfs2-devel, netdev,
	vvidic, heming.zhao, lucien.xin, paulmck, rcu, juri.lelli,
	williams, aahringo

This patch adds the DLM kernel lock state verifier. It can be simply
activated by:

echo "dlm" > /sys/kernel/tracing/rv/enabled_monitors
echo "printk" > /sys/kernel/tracing/rv/monitors/dlm/reactors

then run any kind of workload on DLM to check on DLM correctness in
sense of compatible lock states and their current holders. For example
there cannot be two lock holders or more for a specific lock holding the
exclusive lock.

IMPORTANT: This kernel verifier for DLM only makes sense to use it with
combination of DLM's net-namespace feature to run a DLM cluster on one
Linux kernel instance. This offers us a whole cluster view in the Linux
tracing subsystem that this verifier takes advantage of!

The above note is the reason why this verifier works kinda different
than other verifiers. We build a layer to have a cluster view and check
if a lock state is still compatible with all current lock holder states.
That's why we have a rhashtable in the verifier to keep track of all
current cluster wide locks. However we use da events to check if our
model is violating or not.

As example gfs2 can be used to produce some kind of filesystem benchmark
that produces DLM lock handling. This verifier will then check if DLM is
working as expected.

Most useful feature is checking DLM recovery handling if nodes
leave the lockspace and DLM is still correct.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
 Documentation/trace/rv/monitor_dlm.rst |  77 +++++
 include/trace/events/rv.h              |   9 +
 kernel/trace/rv/Kconfig                |  18 +
 kernel/trace/rv/Makefile               |   1 +
 kernel/trace/rv/monitors/dlm/dlm.c     | 445 +++++++++++++++++++++++++
 kernel/trace/rv/monitors/dlm/dlm.h     |  38 +++
 kernel/trace/rv/monitors/dlm/dlm_da.h  | 143 ++++++++
 tools/verification/models/dlm.dot      |  14 +
 8 files changed, 745 insertions(+)
 create mode 100644 Documentation/trace/rv/monitor_dlm.rst
 create mode 100644 kernel/trace/rv/monitors/dlm/dlm.c
 create mode 100644 kernel/trace/rv/monitors/dlm/dlm.h
 create mode 100644 kernel/trace/rv/monitors/dlm/dlm_da.h
 create mode 100644 tools/verification/models/dlm.dot

diff --git a/Documentation/trace/rv/monitor_dlm.rst b/Documentation/trace/rv/monitor_dlm.rst
new file mode 100644
index 000000000000..95cdf3d1a904
--- /dev/null
+++ b/Documentation/trace/rv/monitor_dlm.rst
@@ -0,0 +1,77 @@
+Monitor dlm
+============
+
+- Name: dlm - dlm runtime lock compatibility verifier.
+  Only makes sense with DLM net-names paces because we
+  need the whole traced cluster view.
+- Type: per-dlm_lock deterministic automaton
+- Author: Alexander Aring <aahringo@redhat.com>
+
+Description
+-----------
+
+This is a per-dlm lock compatibility monitor, with the following
+definition::
+
+               |
+    with       |
+   others      v
+ compatible  +-------------+
+  +--------- |             |
+  |          |    valid    |
+  +--------> |             |
+             +-------------+
+               |
+               | all_unlock
+               v
+             #=============#
+             H     free    H
+             #=============#
+
+This model is on a per cluster wide DLM lock basis. Each cluster
+node can hold a specific lock resource in a certain lock mode.
+This lock mode is either compatible or not compatible with all
+other nodes holding the particular lock resource in a different
+lock mode. A simple lock state is the Exclusive Lock state. Two
+nodes can never held the exclusive lock for a specific lock
+resource at the same time. This is what "with others compatible"
+edge means when a node changes the lock state and checks if the
+lock state is still compatible with other holders of the lock, we
+are still in the valid state.
+
+If all holders for a specific lock resource that we track switch
+to unlock state, we free the monitoring resource as we don't track
+the lock correctness anymore. The lock can be monitored again if
+the same lock resource switches to a valid lock state.
+
+This monitor introduce also another lock state to signal that a
+lock state is in transition. The user signals a lock state change
+and we waiting for a lock state completion (ast) callback. At this
+time the user cannot assume to hold the state in a certain state
+until completion and we need to ignore lock holders they are in
+transition.
+
+IMPORTANT NOTE:
+
+This monitor makes only sense when having a cluster wide view in
+the local Linux tracing subsystem. For now this means a DLM user
+should construct a cluster with several nodes by using
+net-namespaces. This will allow the DLM monitor to track cluster
+wide lock changes. The monitor also works on a real cluster with
+several machines as nodes, but it will not make any sense as we
+don't check on any cluster-wide DLM correctness. Only for per-node
+local DLM correctness, which is unlikely to break.
+
+There might be ideas to use time synchronized tracing to get a
+cluster wide Linux tracing view and run the kernel verifier on
+a real cluster, however this isn't supported yet and only an idea
+to how to might handle it.
+
+This monitor is different than other current monitors. It builds
+an nonexitent layer that represents the current cluster state that
+we don't track in such a way in DLM. That's why it only works
+with DLM and net-namespaces together.
+
+Specification
+-------------
+Grapviz Dot file in tools/verification/models/dlm.dot
diff --git a/include/trace/events/rv.h b/include/trace/events/rv.h
index 56592da9301c..cd031a4d994d 100644
--- a/include/trace/events/rv.h
+++ b/include/trace/events/rv.h
@@ -66,6 +66,15 @@ DEFINE_EVENT(error_da_monitor, error_wip,
 	     TP_PROTO(char *state, char *event),
 	     TP_ARGS(state, event));
 #endif /* CONFIG_RV_MON_WIP */
+#ifdef CONFIG_RV_MON_DLM
+DEFINE_EVENT(event_da_monitor, event_dlm,
+	    TP_PROTO(char *state, char *event, char *next_state, bool final_state),
+	    TP_ARGS(state, event, next_state, final_state));
+
+DEFINE_EVENT(error_da_monitor, error_dlm,
+	     TP_PROTO(char *state, char *event),
+	     TP_ARGS(state, event));
+#endif /* CONFIG_RV_MON_DLM */
 #endif /* CONFIG_DA_MON_EVENTS_IMPLICIT */
 
 #ifdef CONFIG_DA_MON_EVENTS_ID
diff --git a/kernel/trace/rv/Kconfig b/kernel/trace/rv/Kconfig
index 831779607e84..bc12b72088c4 100644
--- a/kernel/trace/rv/Kconfig
+++ b/kernel/trace/rv/Kconfig
@@ -50,6 +50,24 @@ config RV_MON_WWNR
 	  For further information, see:
 	    Documentation/trace/rv/monitor_wwnr.rst
 
+config RV_MON_DLM
+	depends on RV
+	depends on DLM
+	depends on NET_NS
+	bool "dlm monitor"
+	help
+	  Enable dlm (runtime lock compatibility verifier) sample monitor,
+	  this monitor will check on DLM lock correctness in sense of
+	  checking on compatible lock modes during DLM runtime. E.g. two
+	  cluster wide lock holders that holding the exclusive lock state
+	  for a specific lock.
+
+	  IMPORTANT: the verifier only works on DLMs net-namespace feature
+	  that is e.g. supported by gfs2.
+
+	  For further information, see:
+	    Documentation/trace/rv/monitor_dlm.rst
+
 config RV_REACTORS
 	bool "Runtime verification reactors"
 	default y
diff --git a/kernel/trace/rv/Makefile b/kernel/trace/rv/Makefile
index 963d14875b45..b1ac0d69ebef 100644
--- a/kernel/trace/rv/Makefile
+++ b/kernel/trace/rv/Makefile
@@ -3,6 +3,7 @@
 obj-$(CONFIG_RV) += rv.o
 obj-$(CONFIG_RV_MON_WIP) += monitors/wip/wip.o
 obj-$(CONFIG_RV_MON_WWNR) += monitors/wwnr/wwnr.o
+obj-$(CONFIG_RV_MON_DLM) += monitors/dlm/dlm.o
 obj-$(CONFIG_RV_REACTORS) += rv_reactors.o
 obj-$(CONFIG_RV_REACT_PRINTK) += reactor_printk.o
 obj-$(CONFIG_RV_REACT_PANIC) += reactor_panic.o
diff --git a/kernel/trace/rv/monitors/dlm/dlm.c b/kernel/trace/rv/monitors/dlm/dlm.c
new file mode 100644
index 000000000000..2f384b5b08b6
--- /dev/null
+++ b/kernel/trace/rv/monitors/dlm/dlm.c
@@ -0,0 +1,445 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include <linux/ftrace.h>
+#include <linux/tracepoint.h>
+#include <linux/console.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/rv.h>
+#include <rv/instrumentation.h>
+#include <rv/da_monitor.h>
+
+#define MODULE_NAME "dlm_rv"
+#include <trace/events/rv.h>
+#include <trace/events/dlm.h>
+
+#include "dlm.h"
+#include "dlm_da.h"
+#include "../../../fs/dlm/lock.h"
+
+/* out of DLM DLM API mode values */
+#define STATE_MODE_UNLOCK -1
+#define STATE_MODE_IN_TRANSITION -2
+
+struct dlm_rv_lock_key {
+	uint32_t ls_id;
+	char res_name[DLM_RESNAME_MAXLEN];
+};
+
+struct dlm_rv_lock {
+	union rv_dlm_lock_monitor rv;
+
+	struct dlm_rv_lock_key key;
+	struct rhash_head node;
+
+	struct list_head holders;
+
+	/* holder for lock list */
+	struct list_head list;
+};
+
+/* protected by dlm_rv_hash_lock */
+struct dlm_rv_holder {
+	unsigned int nodeid;
+	int mode;
+
+	struct list_head list;
+};
+
+static const struct rhashtable_params dlm_rv_hash_params = {
+	.nelem_hint = 3, /* start small */
+	.key_len = sizeof(struct dlm_rv_lock_key),
+	.key_offset = offsetof(struct dlm_rv_lock, key),
+	.head_offset = offsetof(struct dlm_rv_lock, node),
+	.automatic_shrinking = true,
+};
+
+static LIST_HEAD(dlm_rv_locks);
+static struct rhashtable dlm_rv_hash;
+static DEFINE_SPINLOCK(dlm_rv_hash_lock);
+static struct kmem_cache *lk_cache;
+static struct kmem_cache *hl_cache;
+
+/*
+ * Entry point for the per-dlm_lock monitor.
+ */
+#define DECLARE_DA_MON_PER_DLM_LOCK(name, type)							\
+												\
+DECLARE_AUTOMATA_HELPERS(name, type)								\
+DECLARE_DA_MON_GENERIC_HELPERS(name, type)							\
+DECLARE_DA_MON_MODEL_HANDLER_PER_DLM_LOCK(name, type)						\
+DECLARE_DA_MON_INIT_PER_DLM_LOCK(name, type)							\
+DECLARE_DA_MON_MONITOR_HANDLER_PER_DLM_LOCK(name, type)
+
+static struct rv_monitor rv_dlm;
+DECLARE_DA_MON_PER_DLM_LOCK(dlm, unsigned char);
+
+static struct dlm_rv_lock *
+lookup_lock(uint32_t ls_id, const char *res_name, size_t res_length)
+{
+	struct dlm_rv_lock_key key = {
+		.ls_id = ls_id,
+	};
+
+	WARN_ON(res_length > DLM_RESNAME_MAXLEN);
+
+	/* the key.res_name is DLM_RESNAME_MAXLEN */
+	memcpy(&key.res_name, res_name, res_length);
+	return rhashtable_lookup_fast(&dlm_rv_hash, &key, dlm_rv_hash_params);
+}
+
+static struct dlm_rv_holder *
+lookup_holder(struct dlm_rv_lock *lk, unsigned int nodeid)
+{
+	struct dlm_rv_holder *iter, *hl = NULL;
+
+	list_for_each_entry(iter, &lk->holders, list) {
+		if (iter->nodeid == nodeid) {
+			hl = iter;
+			break;
+		}
+	}
+
+	return hl;
+}
+
+/* set a specific mode to a lock node holder identified by the nodeid */
+static void set_holder_state(struct dlm_rv_lock *lk, unsigned int nodeid,
+			     int mode)
+{
+	struct dlm_rv_holder *hl;
+
+	hl = lookup_holder(lk, nodeid);
+	if (hl) {
+		hl->mode = mode;
+		return;
+	}
+
+	/* we only create holders they are not start with UNLOCK */
+	if (mode == STATE_MODE_UNLOCK)
+		return;
+
+	hl = kmem_cache_zalloc(hl_cache, GFP_ATOMIC);
+	if (WARN_ON_ONCE(!hl))
+		return;
+
+	hl->nodeid = nodeid;
+	hl->mode = mode;
+
+	list_add(&hl->list, &lk->holders);
+}
+
+/* check if all lock holders except the one from nodeid is still
+ * compatible with the mode given by mode. Usually the nodeid which
+ * is skipped has the applied mode as parameter to check if the
+ * state change is valid.
+ */
+static int check_valid_lock_holders(struct dlm_rv_lock *lk, int mode,
+				    unsigned int nodeid)
+{
+	struct dlm_rv_holder *hl;
+
+	list_for_each_entry(hl, &lk->holders, list) {
+		/* ignore ourself
+		 * ignore pending lock states
+		 */
+		if (hl->nodeid == nodeid ||
+		    hl->mode == STATE_MODE_IN_TRANSITION)
+			continue;
+
+		if (!dlm_modes_compat(mode, hl->mode))
+			return 0;
+	}
+
+	return 1;
+}
+
+/* check if all holders for a lock are in unlock state */
+static int check_all_unlock_holders(struct dlm_rv_lock *lk)
+{
+	struct dlm_rv_holder *hl;
+
+	/* should never happen but when we delete the lk */
+	if (WARN_ON(list_empty(&lk->holders)))
+		return 1;
+
+	list_for_each_entry(hl, &lk->holders, list) {
+		if (hl->mode != STATE_MODE_UNLOCK)
+			return 0;
+	}
+
+	return 1;
+}
+
+/* drop all lock holders for a specific lock */
+static void drop_all_lock_holders(struct dlm_rv_lock *lk)
+{
+	struct dlm_rv_holder *hl, *tmp;
+
+	list_for_each_entry_safe(hl, tmp, &lk->holders, list) {
+		list_del(&hl->list);
+		kmem_cache_free(hl_cache, hl);
+	}
+}
+
+/* unlock specific lock holder if available and if all lock holders
+ * are in unlock state, we remove and free the lock.
+ */
+static void unlock_lock_holder(struct dlm_rv_lock *lk, unsigned int nodeid)
+{
+	int rv;
+
+	set_holder_state(lk, nodeid, STATE_MODE_UNLOCK);
+	rv = check_all_unlock_holders(lk);
+	if (rv) {
+		drop_all_lock_holders(lk);
+
+		list_del(&lk->list);
+		rhashtable_remove_fast(&dlm_rv_hash, &lk->node,
+				       dlm_rv_hash_params);
+		/* move into final state */
+		da_handle_event_dlm(lk, all_unlock_dlm);
+		kmem_cache_free(lk_cache, lk);
+	}
+}
+
+static void handle_dlm_ast(void *data, unsigned int our_nodeid, __u32 ls_id,
+			   __u32 lkb_id, __u8 sb_flags, int sb_status,
+			   const char *res_name, size_t res_length, int mode)
+{
+	struct dlm_rv_holder *hl;
+	struct dlm_rv_lock *lk;
+	int rv;
+
+	switch (sb_status) {
+	case -DLM_EUNLOCK:
+		/* handle an unlock of an lock we saw before */
+		spin_lock_bh(&dlm_rv_hash_lock);
+		/* switch to unlock state if there is a lock available
+		 * and check if all locks are in unlock mode, see
+		 * unlock_lock_holder().
+		 */
+		lk = lookup_lock(ls_id, res_name, res_length);
+		if (lk)
+			unlock_lock_holder(lk, our_nodeid);
+		spin_unlock_bh(&dlm_rv_hash_lock);
+		return;
+	case 0:
+		/* successful lock state change */
+		break;
+	default:
+		/* ignored */
+		return;
+	}
+
+	spin_lock_bh(&dlm_rv_hash_lock);
+	lk = lookup_lock(ls_id, res_name, res_length);
+	if (!lk) {
+		/* start to begin tracking DLM cluster lock */
+		lk = kmem_cache_zalloc(lk_cache, GFP_ATOMIC);
+		if (WARN_ON_ONCE(!lk)) {
+			spin_unlock_bh(&dlm_rv_hash_lock);
+			return;
+		}
+
+		lk->key.ls_id = ls_id;
+		memcpy(lk->key.res_name, res_name, res_length);
+		INIT_LIST_HEAD(&lk->holders);
+
+		da_monitor_reset_dlm(da_get_monitor_dlm(lk));
+		da_handle_start_event_dlm(lk, with_others_compatible_dlm);
+		set_holder_state(lk, our_nodeid, mode);
+
+		list_add_tail(&lk->list, &dlm_rv_locks);
+		rv = rhashtable_insert_fast(&dlm_rv_hash, &lk->node,
+					    dlm_rv_hash_params);
+		spin_unlock_bh(&dlm_rv_hash_lock);
+
+		WARN_ON(rv);
+		return;
+	}
+
+	/* lock is known, change it's state and check if it doesn't
+	 * violate the DLM cluster wide compatible lock modes
+	 */
+	set_holder_state(lk, our_nodeid, mode);
+	rv = check_valid_lock_holders(lk, mode, our_nodeid);
+	if (rv) {
+		/* the whole validation process, this event signals
+		 * everything is fine and DLM works correctly there
+		 * are no cluster-wide locks that violates DLM locking.
+		 */
+		da_handle_event_dlm(lk, with_others_compatible_dlm);
+	} else {
+		/* print all holders of the lock when a invalid lock state is entered */
+		console_lock();
+		pr_info("---\n");
+		pr_info("ls_id %u lkb_id: 0x%08x\n", ls_id, lkb_id);
+		pr_info("holders:\n");
+		list_for_each_entry(hl, &lk->holders, list) {
+			pr_info("\tnodeid: %u mode: %d\n", hl->nodeid,
+				hl->mode);
+		}
+		pr_info("---\n");
+		console_unlock();
+
+		/* move into an invalid state change, we don't have a edge for that
+		 * so we just use event_max_dlm.
+		 */
+		da_handle_event_dlm(lk, event_max_dlm);
+	}
+	spin_unlock_bh(&dlm_rv_hash_lock);
+}
+
+/* set the holder to transition state as lock downgrades can issue
+ * grant messages to other nodes we need to ignore if a lock on a
+ * specific node is in state transition. From point of DLM API
+ * the user cannot assume to still hold the lock at this point
+ * anyway.
+ */
+static void set_holder_transition(uint32_t ls_id, const char *res_name,
+				  size_t res_length, uint32_t our_nodeid)
+{
+	struct dlm_rv_holder *hl;
+	struct dlm_rv_lock *lk;
+
+	spin_lock_bh(&dlm_rv_hash_lock);
+	lk = lookup_lock(ls_id, res_name, res_length);
+	if (lk) {
+		hl = lookup_holder(lk, our_nodeid);
+		if (hl)
+			hl->mode = STATE_MODE_IN_TRANSITION;
+	}
+	spin_unlock_bh(&dlm_rv_hash_lock);
+}
+
+/* after a lock request got validated it cannot fail */
+static void handle_dlm_lock_validated(void *data, struct dlm_ls *ls,
+				      struct dlm_lkb *lkb,
+				      struct dlm_args *args,
+				      const char *res_name, size_t res_length)
+{
+	set_holder_transition(ls->ls_global_id, res_name,
+			      res_length, ls->ls_dn->our_node->id);
+}
+
+static void handle_dlm_unlock_validated(void *data, struct dlm_ls *ls,
+					struct dlm_lkb *lkb,
+					struct dlm_args *args)
+{
+	set_holder_transition(ls->ls_global_id,
+			      lkb->lkb_resource->res_name,
+			      lkb->lkb_resource->res_length,
+			      ls->ls_dn->our_node->id);
+}
+
+/* remove all holders, recovery will fast this up and we need to drop them */
+static void handle_dlm_release_lockspace(void *data, unsigned int our_nodeid,
+					 __u32 ls_id)
+{
+	struct dlm_rv_lock *lk, *lk_tmp;
+
+	spin_lock_bh(&dlm_rv_hash_lock);
+	list_for_each_entry_safe(lk, lk_tmp, &dlm_rv_locks, list) {
+		if (lk->key.ls_id != ls_id)
+			continue;
+
+		/* unlock all locks for the node that calls
+		 * dlm_release_lockspace(). It's not necessary
+		 * from the DLM API that a node need to unlock
+		 * all locks before calling dlm_release_lockspace()
+		 * there is even an optimization because each recovery
+		 * will deal with that locally. However we handle a
+		 * dlm_release_lockspace() on a specific node as
+		 * unlock all locks.
+		 */
+		unlock_lock_holder(lk, our_nodeid);
+	}
+	spin_unlock_bh(&dlm_rv_hash_lock);
+}
+
+static void rhash_lock_free(void *ptr, void *arg)
+{
+	struct dlm_rv_lock *lk = ptr;
+
+	list_del(&lk->list);
+	drop_all_lock_holders(lk);
+	kmem_cache_free(lk_cache, lk);
+}
+
+static int enable_dlm(void)
+{
+	int retval;
+
+	retval = rhashtable_init(&dlm_rv_hash, &dlm_rv_hash_params);
+	if (retval)
+		return retval;
+
+	retval = da_monitor_init_dlm();
+	if (retval) {
+		rhashtable_destroy(&dlm_rv_hash);
+		return retval;
+	}
+
+	rv_attach_trace_probe("dlm", dlm_ast, handle_dlm_ast);
+	rv_attach_trace_probe("dlm", dlm_lock_validated, handle_dlm_lock_validated);
+	rv_attach_trace_probe("dlm", dlm_unlock_validated, handle_dlm_unlock_validated);
+	rv_attach_trace_probe("dlm", dlm_release_lockspace, handle_dlm_release_lockspace);
+
+	return 0;
+}
+
+static void disable_dlm(void)
+{
+	rv_dlm.enabled = 0;
+
+	rv_detach_trace_probe("dlm", dlm_ast, handle_dlm_ast);
+	rv_detach_trace_probe("dlm", dlm_lock_validated, handle_dlm_lock_validated);
+	rv_detach_trace_probe("dlm", dlm_unlock_validated, handle_dlm_unlock_validated);
+	rv_detach_trace_probe("dlm", dlm_release_lockspace, handle_dlm_release_lockspace);
+
+	da_monitor_destroy_dlm();
+
+	rhashtable_free_and_destroy(&dlm_rv_hash, rhash_lock_free, NULL);
+}
+
+static struct rv_monitor rv_dlm = {
+	.name = "dlm",
+	.description = "dlm runtime lock compatibility verifier",
+	.enable = enable_dlm,
+	.disable = disable_dlm,
+	.reset = da_monitor_reset_all_dlm,
+	.enabled = 0,
+};
+
+static int __init register_dlm(void)
+{
+	lk_cache = KMEM_CACHE(dlm_rv_lock, 0);
+	if (!lk_cache)
+		return -ENOMEM;
+
+	hl_cache = KMEM_CACHE(dlm_rv_holder, 0);
+	if (!hl_cache) {
+		kmem_cache_destroy(lk_cache);
+		return -ENOMEM;
+	}
+
+	rv_register_monitor(&rv_dlm);
+	return 0;
+}
+
+static void __exit unregister_dlm(void)
+{
+	rv_unregister_monitor(&rv_dlm);
+
+	kmem_cache_destroy(hl_cache);
+	kmem_cache_destroy(lk_cache);
+}
+
+module_init(register_dlm);
+module_exit(unregister_dlm);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Alexander Aring <aahringo@redhat.com>");
+MODULE_DESCRIPTION("dlm: runtime lock compatibility verifier");
diff --git a/kernel/trace/rv/monitors/dlm/dlm.h b/kernel/trace/rv/monitors/dlm/dlm.h
new file mode 100644
index 000000000000..514614be2ca9
--- /dev/null
+++ b/kernel/trace/rv/monitors/dlm/dlm.h
@@ -0,0 +1,38 @@
+enum states_dlm {
+	valid_dlm = 0,
+	free_dlm,
+	state_max_dlm
+};
+
+#define INVALID_STATE state_max_dlm
+
+enum events_dlm {
+	all_unlock_dlm = 0,
+	with_others_compatible_dlm,
+	event_max_dlm
+};
+
+struct automaton_dlm {
+	char *state_names[state_max_dlm];
+	char *event_names[event_max_dlm];
+	unsigned char function[state_max_dlm][event_max_dlm];
+	unsigned char initial_state;
+	bool final_states[state_max_dlm];
+};
+
+static const struct automaton_dlm automaton_dlm = {
+	.state_names = {
+		"valid",
+		"free"
+	},
+	.event_names = {
+		"all_unlock",
+		"with_others_compatible"
+	},
+	.function = {
+		{      free_dlm,     valid_dlm },
+		{ INVALID_STATE, INVALID_STATE },
+	},
+	.initial_state = valid_dlm,
+	.final_states = { 0, 1 },
+};
diff --git a/kernel/trace/rv/monitors/dlm/dlm_da.h b/kernel/trace/rv/monitors/dlm/dlm_da.h
new file mode 100644
index 000000000000..064ed5085b30
--- /dev/null
+++ b/kernel/trace/rv/monitors/dlm/dlm_da.h
@@ -0,0 +1,143 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#ifndef __DLM_DA_RV__
+#define __DLM_DA_RV__
+
+/*
+ * Event handler for per_dlm_lock monitors.
+ */
+#define DECLARE_DA_MON_MODEL_HANDLER_PER_DLM_LOCK(name, type)					\
+												\
+static inline bool da_event_##name(struct da_monitor *da_mon, struct dlm_rv_lock *lk,		\
+				   enum events_##name event)					\
+{												\
+	type curr_state = da_monitor_curr_state_##name(da_mon);					\
+	type next_state = model_get_next_state_##name(curr_state, event);			\
+												\
+	if (next_state != INVALID_STATE) {							\
+		da_monitor_set_state_##name(da_mon, next_state);				\
+												\
+		trace_event_##name(model_get_state_name_##name(curr_state),			\
+				   model_get_event_name_##name(event),				\
+				   model_get_state_name_##name(next_state),			\
+				   model_is_final_state_##name(next_state));			\
+												\
+		return true;									\
+	}											\
+												\
+	if (rv_reacting_on_##name())								\
+		cond_react_##name(format_react_msg_##name(curr_state, event));			\
+												\
+	trace_error_##name(model_get_state_name_##name(curr_state),				\
+			   model_get_event_name_##name(event));					\
+												\
+	return false;										\
+}
+
+/*
+ * Functions to define, init and get a per-dlm-lock monitor.
+ */
+#define DECLARE_DA_MON_INIT_PER_DLM_LOCK(name, type)						\
+												\
+/*												\
+ * da_get_monitor_##name - return the monitor in the allocated slot for tsk			\
+ */												\
+static inline struct da_monitor *da_get_monitor_##name(struct dlm_rv_lock *lk)			\
+{												\
+	return &lk->rv.da_mon;									\
+}												\
+												\
+static void da_monitor_reset_all_##name(void)							\
+{												\
+}												\
+												\
+/*												\
+ * da_monitor_init_##name - initialize the per-task monitor					\
+ *												\
+ * Try to allocate a slot in the task's vector of monitors. If there				\
+ * is an available slot, use it and reset all task's monitor.					\
+ */												\
+static int da_monitor_init_##name(void)								\
+{												\
+	da_monitor_reset_all_##name();								\
+	return 0;										\
+}												\
+												\
+/*												\
+ * da_monitor_destroy_##name - return the allocated slot					\
+ */												\
+static inline void da_monitor_destroy_##name(void)						\
+{												\
+	return;											\
+}
+
+/*
+ * Handle event for per task.
+ */
+#define DECLARE_DA_MON_MONITOR_HANDLER_PER_DLM_LOCK(name, type)					\
+												\
+static inline void										\
+__da_handle_event_##name(struct da_monitor *da_mon, struct dlm_rv_lock *lk,			\
+			 enum events_##name event)						\
+{												\
+	bool retval;										\
+												\
+	retval = da_event_##name(da_mon, lk, event);						\
+	if (!retval)										\
+		da_monitor_reset_##name(da_mon);						\
+}												\
+												\
+/*												\
+ * da_handle_event_##name - handle an event							\
+ */												\
+static inline void										\
+da_handle_event_##name(struct dlm_rv_lock *lk, enum events_##name event)			\
+{												\
+	struct da_monitor *da_mon = da_get_monitor_##name(lk);					\
+	bool retval;										\
+												\
+	retval = da_monitor_handling_event_##name(da_mon);					\
+	if (!retval)										\
+		return;										\
+												\
+	__da_handle_event_##name(da_mon, lk, event);						\
+}												\
+												\
+/*												\
+ * da_handle_start_event_##name - start monitoring or handle event				\
+ *												\
+ * This function is used to notify the monitor that the system is returning			\
+ * to the initial state, so the monitor can start monitoring in the next event.			\
+ * Thus:											\
+ *												\
+ * If the monitor already started, handle the event.						\
+ * If the monitor did not start yet, start the monitor but skip the event.			\
+ */												\
+static inline bool										\
+da_handle_start_event_##name(struct dlm_rv_lock *lk, enum events_##name event)			\
+{												\
+	struct da_monitor *da_mon;								\
+												\
+	if (!da_monitor_enabled_##name())							\
+		return 0;									\
+												\
+	da_mon = da_get_monitor_##name(lk);							\
+												\
+	if (unlikely(!da_monitoring_##name(da_mon))) {						\
+		da_monitor_start_##name(da_mon);						\
+		return 0;									\
+	}											\
+												\
+	__da_handle_event_##name(da_mon, lk, event);						\
+												\
+	return 1;										\
+}
+
+/*
+ * Futher monitor types are expected, so make this a union.
+ */
+union rv_dlm_lock_monitor {
+	struct da_monitor da_mon;
+};
+
+#endif /* __DLM_DA_RV__ */
diff --git a/tools/verification/models/dlm.dot b/tools/verification/models/dlm.dot
new file mode 100644
index 000000000000..43092c865e3b
--- /dev/null
+++ b/tools/verification/models/dlm.dot
@@ -0,0 +1,14 @@
+digraph state_automaton {
+	{node [shape = circle] "valid"};
+	{node [shape = plaintext, style=invis, label=""] "__init_valid"};
+	{node [shape = doublecircle] "free"};
+	"__init_valid" -> "valid";
+	"valid" [label = "valid", color = green3]
+	"valid" -> "valid" [ label = "with_others_compatible" ];
+	"free" [label = "free"]
+	"valid" -> "free" [ label = "all_unlock" ];
+	{ rank = min ;
+		"__init_valid";
+		"valid";
+	}
+}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [RFC 7/7] rv: add dlm compatible lock state kernel verifier
  2024-08-27 18:02 ` [RFC 7/7] rv: add dlm compatible lock state " Alexander Aring
@ 2024-08-30 12:55   ` Alexander Aring
  0 siblings, 0 replies; 9+ messages in thread
From: Alexander Aring @ 2024-08-30 12:55 UTC (permalink / raw)
  To: teigland
  Cc: gfs2, song, yukuai3, agruenba, mark, jlbec, joseph.qi, gregkh,
	rafael, akpm, linux-kernel, linux-raid, ocfs2-devel, netdev,
	vvidic, heming.zhao, lucien.xin, paulmck, rcu, juri.lelli,
	williams

Hi,

On Tue, Aug 27, 2024 at 2:03 PM Alexander Aring <aahringo@redhat.com> wrote:
...
> +       set_holder_state(lk, our_nodeid, mode);
> +       rv = check_valid_lock_holders(lk, mode, our_nodeid);
> +       if (rv) {
> +               /* the whole validation process, this event signals
> +                * everything is fine and DLM works correctly there
> +                * are no cluster-wide locks that violates DLM locking.
> +                */
> +               da_handle_event_dlm(lk, with_others_compatible_dlm);
> +       } else {
> +               /* print all holders of the lock when a invalid lock state is entered */
> +               console_lock();

I can't hold this lock in some contexts the ast callback can be called from.
I will drop this lock as I don't care.

It would be nice to use this msg callback from the refactor but then I
somehow need to pass the lk pointer to it.

This however works for me that I know at least which nodes/modes are
incompatible if it hits.

> +               pr_info("---\n");
> +               pr_info("ls_id %u lkb_id: 0x%08x\n", ls_id, lkb_id);
> +               pr_info("holders:\n");
> +               list_for_each_entry(hl, &lk->holders, list) {
> +                       pr_info("\tnodeid: %u mode: %d\n", hl->nodeid,
> +                               hl->mode);
> +               }
> +               pr_info("---\n");
> +               console_unlock();
> +
> +               /* move into an invalid state change, we don't have a edge for that
> +                * so we just use event_max_dlm.
> +                */
> +               da_handle_event_dlm(lk, event_max_dlm);
> +       }
> +       spin_unlock_bh(&dlm_rv_hash_lock);
> +}
> +
> +/* set the holder to transition state as lock downgrades can issue
> + * grant messages to other nodes we need to ignore if a lock on a
> + * specific node is in state transition. From point of DLM API
> + * the user cannot assume to still hold the lock at this point
> + * anyway.
> + */
> +static void set_holder_transition(uint32_t ls_id, const char *res_name,
> +                                 size_t res_length, uint32_t our_nodeid)
> +{
> +       struct dlm_rv_holder *hl;
> +       struct dlm_rv_lock *lk;
> +
> +       spin_lock_bh(&dlm_rv_hash_lock);
> +       lk = lookup_lock(ls_id, res_name, res_length);
> +       if (lk) {
> +               hl = lookup_holder(lk, our_nodeid);
> +               if (hl)
> +                       hl->mode = STATE_MODE_IN_TRANSITION;
> +       }
> +       spin_unlock_bh(&dlm_rv_hash_lock);
> +}
> +
> +/* after a lock request got validated it cannot fail */
> +static void handle_dlm_lock_validated(void *data, struct dlm_ls *ls,
> +                                     struct dlm_lkb *lkb,
> +                                     struct dlm_args *args,
> +                                     const char *res_name, size_t res_length)
> +{
> +       set_holder_transition(ls->ls_global_id, res_name,
> +                             res_length, ls->ls_dn->our_node->id);
> +}
> +
> +static void handle_dlm_unlock_validated(void *data, struct dlm_ls *ls,
> +                                       struct dlm_lkb *lkb,
> +                                       struct dlm_args *args)
> +{

we need to ignore unlock(CANCEL) requests.

> +       set_holder_transition(ls->ls_global_id,
> +                             lkb->lkb_resource->res_name,
> +                             lkb->lkb_resource->res_length,
> +                             ls->ls_dn->our_node->id);
> +}
> +

- Alex


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-08-30 12:55 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-27 18:02 [RFC 0/7] dlm: the ultimate verifier for DLM lock correctness Alexander Aring
2024-08-27 18:02 ` [RFC 1/7] dlm: fix possible lkb_resource null dereference Alexander Aring
2024-08-27 18:02 ` [RFC 2/7] dlm: fix swapped args sb_flags vs sb_status Alexander Aring
2024-08-27 18:02 ` [RFC 3/7] dlm: make add_to_waiters() that is can't fail Alexander Aring
2024-08-27 18:02 ` [RFC 4/7] dlm: add our_nodeid to tracepoints Alexander Aring
2024-08-27 18:02 ` [RFC 5/7] dlm: add lkb rv mode to ast tracepoint Alexander Aring
2024-08-27 18:02 ` [RFC 6/7] dlm: add more tracepoints for DLM kernel verifier Alexander Aring
2024-08-27 18:02 ` [RFC 7/7] rv: add dlm compatible lock state " Alexander Aring
2024-08-30 12:55   ` Alexander Aring

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).