Netdev Archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 net-next 00/14] af_unix: Rework GC.
@ 2024-02-16 21:05 Kuniyuki Iwashima
  2024-02-16 21:05 ` [PATCH v2 net-next 01/14] af_unix: Add struct unix_vertex in struct unix_sock Kuniyuki Iwashima
                   ` (14 more replies)
  0 siblings, 15 replies; 22+ messages in thread
From: Kuniyuki Iwashima @ 2024-02-16 21:05 UTC (permalink / raw
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

When we pass a file descriptor to an AF_UNIX socket via SCM_RIGTHS,
the underlying struct file of the inflight fd gets its refcount bumped.
If the fd is of an AF_UNIX socket, we need to track it in case it forms
cyclic references.

Let's say we send a fd of AF_UNIX socket A to B and vice versa and
close() both sockets.

When created, each socket's struct file initially has one reference.
After the fd exchange, both refcounts are bumped up to 2.  Then, close()
decreases both to 1.  From this point on, no one can touch the file/socket.

However, the struct file has one refcount and thus never calls the
release() function of the AF_UNIX socket.

That's why we need to track all inflight AF_UNIX sockets and run garbage
collection.

This series replaces the current GC implementation that locks each inflight
socket's receive queue and requires trickiness in other places.

The new GC does not lock each socket's queue to minimise its effect and
tries to be lightweight if there is no cyclic reference or no update in
the shape of the inflight fd graph.

The new implementation is based on Tarjan's Strongly Connected Components
algorithm, and we consider each inflight file descriptor of AF_UNIX sockets
as an edge in a directed graph.

For the details, please see each patch.

  patch 1  -  5 : Add struct to express inflight socket graphs
  patch       6 : Optimse inflight fd counting
  patch       7 : Group SCC possibly forming a cycle
  patch 8  - 10 : Make GC lightweight
  patch 11 - 12 : Detect dead cycle references
  patch      13 : Replace GC algorithm
  patch      14 : selftest

After this series is applied, we can remove the two ugly tricks for race,
scm_fp_dup() in unix_attach_fds() and spin_lock dance in unix_peek_fds()
as done in patch 14/15 of v1.


Changes:
  v2:
    * Drop 2 cleanup patches (patch 14/15 in v1)

    * Patch 2
      * Fix build error when CONFIG_UNIX=n
    * Patch 7
      * Fix build warning for using goto label at the end of the loop
    * Patch 13
      * Call kfree_skb() for oob skb
    * Patch 14
      * Add test case for MSG_OOB

  v1: https://lore.kernel.org/netdev/20240203030058.60750-1-kuniyu@amazon.com/


Kuniyuki Iwashima (14):
  af_unix: Add struct unix_vertex in struct unix_sock.
  af_unix: Allocate struct unix_edge for each inflight AF_UNIX fd.
  af_unix: Link struct unix_edge when queuing skb.
  af_unix: Save listener for embryo socket.
  af_unix: Fix up unix_edge.successor for embryo socket.
  af_unix: Bulk update unix_tot_inflight/unix_inflight when queuing skb.
  af_unix: Detect Strongly Connected Components.
  af_unix: Save O(n) setup of Tarjan's algo.
  af_unix: Skip GC if no cycle exists.
  af_unix: Avoid Tarjan's algorithm if unnecessary.
  af_unix: Assign a unique index to SCC.
  af_unix: Detect dead SCC.
  af_unix: Replace garbage collection algorithm.
  selftest: af_unix: Test GC for SCM_RIGHTS.

 include/net/af_unix.h                         |  33 +-
 include/net/scm.h                             |   8 +
 net/core/scm.c                                |   9 +
 net/unix/af_unix.c                            |  27 +-
 net/unix/garbage.c                            | 473 +++++++++++-------
 tools/testing/selftests/net/.gitignore        |   1 +
 tools/testing/selftests/net/af_unix/Makefile  |   2 +-
 .../selftests/net/af_unix/scm_rights.c        | 286 +++++++++++
 8 files changed, 632 insertions(+), 207 deletions(-)
 create mode 100644 tools/testing/selftests/net/af_unix/scm_rights.c

-- 
2.30.2


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 net-next 01/14] af_unix: Add struct unix_vertex in struct unix_sock.
  2024-02-16 21:05 [PATCH v2 net-next 00/14] af_unix: Rework GC Kuniyuki Iwashima
@ 2024-02-16 21:05 ` Kuniyuki Iwashima
  2024-02-16 21:05 ` [PATCH v2 net-next 02/14] af_unix: Allocate struct unix_edge for each inflight AF_UNIX fd Kuniyuki Iwashima
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Kuniyuki Iwashima @ 2024-02-16 21:05 UTC (permalink / raw
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

We will replace the garbage collection algorithm for AF_UNIX,
where we will consider each inflight file descriptor of AF_UNIX
sockets as an edge in a directed graph.

Here, we introduce a new struct unix_vertex representing a vertex
in the graph and add it to struct unix_sock.

In the following patch, we will allocate another struct per edge,
which we finally link to the inflight socket's unix_vertex.edges
when sendmsg() succeeds.

Also, we will count the number of edges as unix_vertex.out_degree.

The first time an AF_UNIX socket is passed successfully, we will
link its unix_vertex.entry to a global list.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 include/net/af_unix.h | 8 ++++++++
 net/unix/af_unix.c    | 1 +
 net/unix/garbage.c    | 8 ++++++++
 3 files changed, 17 insertions(+)

diff --git a/include/net/af_unix.h b/include/net/af_unix.h
index 627ea8e2d915..664f6bff60ab 100644
--- a/include/net/af_unix.h
+++ b/include/net/af_unix.h
@@ -22,9 +22,16 @@ extern unsigned int unix_tot_inflight;
 
 void unix_inflight(struct user_struct *user, struct file *fp);
 void unix_notinflight(struct user_struct *user, struct file *fp);
+void unix_init_vertex(struct unix_sock *u);
 void unix_gc(void);
 void wait_for_unix_gc(struct scm_fp_list *fpl);
 
+struct unix_vertex {
+	struct list_head edges;
+	struct list_head entry;
+	unsigned long out_degree;
+};
+
 struct sock *unix_peer_get(struct sock *sk);
 
 #define UNIX_HASH_MOD	(256 - 1)
@@ -62,6 +69,7 @@ struct unix_sock {
 	struct path		path;
 	struct mutex		iolock, bindlock;
 	struct sock		*peer;
+	struct unix_vertex	vertex;
 	struct list_head	link;
 	unsigned long		inflight;
 	spinlock_t		lock;
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 4892e9428c9f..ae145b6f77d8 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -996,6 +996,7 @@ static struct sock *unix_create1(struct net *net, struct socket *sock, int kern,
 	u->path.dentry = NULL;
 	u->path.mnt = NULL;
 	spin_lock_init(&u->lock);
+	unix_init_vertex(u);
 	INIT_LIST_HEAD(&u->link);
 	mutex_init(&u->iolock); /* single task reading lock */
 	mutex_init(&u->bindlock); /* single task binding lock */
diff --git a/net/unix/garbage.c b/net/unix/garbage.c
index 51acf795f096..6a71997ac67a 100644
--- a/net/unix/garbage.c
+++ b/net/unix/garbage.c
@@ -101,6 +101,14 @@ struct unix_sock *unix_get_socket(struct file *filp)
 	return NULL;
 }
 
+void unix_init_vertex(struct unix_sock *u)
+{
+	struct unix_vertex *vertex = &u->vertex;
+
+	vertex->out_degree = 0;
+	INIT_LIST_HEAD(&vertex->edges);
+}
+
 DEFINE_SPINLOCK(unix_gc_lock);
 unsigned int unix_tot_inflight;
 static LIST_HEAD(gc_candidates);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 net-next 02/14] af_unix: Allocate struct unix_edge for each inflight AF_UNIX fd.
  2024-02-16 21:05 [PATCH v2 net-next 00/14] af_unix: Rework GC Kuniyuki Iwashima
  2024-02-16 21:05 ` [PATCH v2 net-next 01/14] af_unix: Add struct unix_vertex in struct unix_sock Kuniyuki Iwashima
@ 2024-02-16 21:05 ` Kuniyuki Iwashima
  2024-02-16 21:05 ` [PATCH v2 net-next 03/14] af_unix: Link struct unix_edge when queuing skb Kuniyuki Iwashima
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Kuniyuki Iwashima @ 2024-02-16 21:05 UTC (permalink / raw
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

When we send a fd using SCM_RIGHTS message, we allocate struct
scm_fp_list to struct scm_cookie in scm_fp_copy().  Then, we bump
each refcount of the inflight fds' struct file and save them in
scm_fp_list.fp.

Later, unix_attach_fds() inexplicably clones scm_fp_list of
scm_cookie and sets it to skb.  (We will remove this part after
replacing GC.)

Now we add a new function call in unix_attach_fds() to preallocate
to skb's scm_fp_list an array of struct unix_edge in the number of
inflight AF_UNIX fds.

There we just preallocate memory and do not use immediately because
sendmsg() could fail after this point.  The actual use will be in
the next patch.

When we queue skb with inflight edges, we will set the inflight
socket's unix_vertex as unix_edge->predecessor and the receiver's
vertex as successor, and then we will link the edge to the inflight
socket's unix_vertex.edges.

Note that we set NULL to cloned scm_fp_list.edges in scm_fp_dup()
so that MSG_PEEK does not change the shape of the directed graph.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 include/net/af_unix.h |  8 ++++++++
 include/net/scm.h     |  7 +++++++
 net/core/scm.c        |  7 +++++++
 net/unix/af_unix.c    |  5 +++++
 net/unix/garbage.c    | 18 ++++++++++++++++++
 5 files changed, 45 insertions(+)

diff --git a/include/net/af_unix.h b/include/net/af_unix.h
index 664f6bff60ab..cab9dfb666f3 100644
--- a/include/net/af_unix.h
+++ b/include/net/af_unix.h
@@ -23,6 +23,8 @@ extern unsigned int unix_tot_inflight;
 void unix_inflight(struct user_struct *user, struct file *fp);
 void unix_notinflight(struct user_struct *user, struct file *fp);
 void unix_init_vertex(struct unix_sock *u);
+int unix_alloc_edges(struct scm_fp_list *fpl);
+void unix_free_edges(struct scm_fp_list *fpl);
 void unix_gc(void);
 void wait_for_unix_gc(struct scm_fp_list *fpl);
 
@@ -32,6 +34,12 @@ struct unix_vertex {
 	unsigned long out_degree;
 };
 
+struct unix_edge {
+	struct unix_vertex *predecessor;
+	struct unix_vertex *successor;
+	struct list_head entry;
+};
+
 struct sock *unix_peer_get(struct sock *sk);
 
 #define UNIX_HASH_MOD	(256 - 1)
diff --git a/include/net/scm.h b/include/net/scm.h
index 92276a2c5543..a1142dee086c 100644
--- a/include/net/scm.h
+++ b/include/net/scm.h
@@ -23,10 +23,17 @@ struct scm_creds {
 	kgid_t	gid;
 };
 
+#ifdef CONFIG_UNIX
+struct unix_edge;
+#endif
+
 struct scm_fp_list {
 	short			count;
 	short			count_unix;
 	short			max;
+#ifdef CONFIG_UNIX
+	struct unix_edge	*edges;
+#endif
 	struct user_struct	*user;
 	struct file		*fp[SCM_MAX_FD];
 };
diff --git a/net/core/scm.c b/net/core/scm.c
index 9cd4b0a01cd6..bc75b6927222 100644
--- a/net/core/scm.c
+++ b/net/core/scm.c
@@ -87,6 +87,9 @@ static int scm_fp_copy(struct cmsghdr *cmsg, struct scm_fp_list **fplp)
 		*fplp = fpl;
 		fpl->count = 0;
 		fpl->count_unix = 0;
+#if IS_ENABLED(CONFIG_UNIX)
+		fpl->edges = NULL;
+#endif
 		fpl->max = SCM_MAX_FD;
 		fpl->user = NULL;
 	}
@@ -376,6 +379,10 @@ struct scm_fp_list *scm_fp_dup(struct scm_fp_list *fpl)
 	if (new_fpl) {
 		for (i = 0; i < fpl->count; i++)
 			get_file(fpl->fp[i]);
+
+#if IS_ENABLED(CONFIG_UNIX)
+		new_fpl->edges = NULL;
+#endif
 		new_fpl->max = new_fpl->count;
 		new_fpl->user = get_uid(fpl->user);
 	}
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index ae145b6f77d8..0391f66546a6 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1819,6 +1819,9 @@ static int unix_attach_fds(struct scm_cookie *scm, struct sk_buff *skb)
 	for (i = scm->fp->count - 1; i >= 0; i--)
 		unix_inflight(scm->fp->user, scm->fp->fp[i]);
 
+	if (unix_alloc_edges(UNIXCB(skb).fp))
+		return -ENOMEM;
+
 	return 0;
 }
 
@@ -1829,6 +1832,8 @@ static void unix_detach_fds(struct scm_cookie *scm, struct sk_buff *skb)
 	scm->fp = UNIXCB(skb).fp;
 	UNIXCB(skb).fp = NULL;
 
+	unix_free_edges(scm->fp);
+
 	for (i = scm->fp->count - 1; i >= 0; i--)
 		unix_notinflight(scm->fp->user, scm->fp->fp[i]);
 }
diff --git a/net/unix/garbage.c b/net/unix/garbage.c
index 6a71997ac67a..ec998c7d6b4c 100644
--- a/net/unix/garbage.c
+++ b/net/unix/garbage.c
@@ -109,6 +109,24 @@ void unix_init_vertex(struct unix_sock *u)
 	INIT_LIST_HEAD(&vertex->edges);
 }
 
+int unix_alloc_edges(struct scm_fp_list *fpl)
+{
+	if (!fpl->count_unix)
+		return 0;
+
+	fpl->edges = kvmalloc_array(fpl->count_unix, sizeof(*fpl->edges),
+				    GFP_KERNEL_ACCOUNT);
+	if (!fpl->edges)
+		return -ENOMEM;
+
+	return 0;
+}
+
+void unix_free_edges(struct scm_fp_list *fpl)
+{
+	kvfree(fpl->edges);
+}
+
 DEFINE_SPINLOCK(unix_gc_lock);
 unsigned int unix_tot_inflight;
 static LIST_HEAD(gc_candidates);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 net-next 03/14] af_unix: Link struct unix_edge when queuing skb.
  2024-02-16 21:05 [PATCH v2 net-next 00/14] af_unix: Rework GC Kuniyuki Iwashima
  2024-02-16 21:05 ` [PATCH v2 net-next 01/14] af_unix: Add struct unix_vertex in struct unix_sock Kuniyuki Iwashima
  2024-02-16 21:05 ` [PATCH v2 net-next 02/14] af_unix: Allocate struct unix_edge for each inflight AF_UNIX fd Kuniyuki Iwashima
@ 2024-02-16 21:05 ` Kuniyuki Iwashima
  2024-02-20 12:11   ` Paolo Abeni
  2024-02-16 21:05 ` [PATCH v2 net-next 04/14] af_unix: Save listener for embryo socket Kuniyuki Iwashima
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 22+ messages in thread
From: Kuniyuki Iwashima @ 2024-02-16 21:05 UTC (permalink / raw
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

Just before queuing skb with inflight fds, we call scm_stat_add(),
which is a good place to set up the preallocated struct unix_edge
in UNIXCB(skb).fp->edges.

Then, we call unix_add_edges() and construct the directed graph
as follows:

  1. Set the inflight socket's unix_vertex to unix_edge.predecessor
  2. Set the receiver's unix_vertex to unix_edge.successor
  3. Link unix_edge.entry to the inflight socket's unix_vertex.edges
  4. Link inflight socket's unix_vertex.entry to unix_unvisited_vertices.

Let's say we pass the fd of AF_UNIX socket A to B and the fd of B
to C.  The graph looks like this:

  +-------------------------+
  | unix_unvisited_vertices | <------------------------.
  +-------------------------+                          |
  +                                                    |
  |   +-------------+                +-------------+   |            +-------------+
  |   | unix_sock A |                | unix_sock B |   |            | unix_sock C |
  |   +-------------+                +-------------+   |            +-------------+
  |   | unix_vertex | <----.  .----> | unix_vertex | <-|--.  .----> | unix_vertex |
  |   | +-----------+      |  |      | +-----------+   |  |  |      | +-----------+
  `-> | |   entry   | +------------> | |   entry   | +-'  |  |      | |   entry   |
      | |-----------|      |  |      | |-----------|      |  |      | |-----------|
      | |   edges   | <-.  |  |      | |   edges   | <-.  |  |      | |   edges   |
      +-+-----------+   |  |  |      +-+-----------+   |  |  |      +-+-----------+
                        |  |  |                        |  |  |
  .---------------------'  |  |  .---------------------'  |  |
  |                        |  |  |                        |  |
  |   +-------------+      |  |  |   +-------------+      |  |
  |   |  unix_edge  |      |  |  |   |  unix_edge  |      |  |
  |   +-------------+      |  |  |   +-------------+      |  |
  `-> |    entry    |      |  |  `-> |    entry    |      |  |
      |-------------|      |  |      |-------------|      |  |
      | predecessor | +----'  |      | predecessor | +----'  |
      |-------------|         |      |-------------|         |
      |  successor  | +-------'      |  successor  | +-------'
      +-------------+                +-------------+

Henceforth, we denote such a graph as A -> B (-> C).

Now, we can express all inflight fd graphs that do not contain
embryo sockets.  The following two patches will support the
particular case.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 include/net/af_unix.h |  2 ++
 include/net/scm.h     |  1 +
 net/core/scm.c        |  2 ++
 net/unix/af_unix.c    |  8 +++++--
 net/unix/garbage.c    | 55 ++++++++++++++++++++++++++++++++++++++++++-
 5 files changed, 65 insertions(+), 3 deletions(-)

diff --git a/include/net/af_unix.h b/include/net/af_unix.h
index cab9dfb666f3..54d62467a70b 100644
--- a/include/net/af_unix.h
+++ b/include/net/af_unix.h
@@ -23,6 +23,8 @@ extern unsigned int unix_tot_inflight;
 void unix_inflight(struct user_struct *user, struct file *fp);
 void unix_notinflight(struct user_struct *user, struct file *fp);
 void unix_init_vertex(struct unix_sock *u);
+void unix_add_edges(struct scm_fp_list *fpl, struct unix_sock *receiver);
+void unix_del_edges(struct scm_fp_list *fpl);
 int unix_alloc_edges(struct scm_fp_list *fpl);
 void unix_free_edges(struct scm_fp_list *fpl);
 void unix_gc(void);
diff --git a/include/net/scm.h b/include/net/scm.h
index a1142dee086c..7d807fe466a3 100644
--- a/include/net/scm.h
+++ b/include/net/scm.h
@@ -32,6 +32,7 @@ struct scm_fp_list {
 	short			count_unix;
 	short			max;
 #ifdef CONFIG_UNIX
+	bool			inflight;
 	struct unix_edge	*edges;
 #endif
 	struct user_struct	*user;
diff --git a/net/core/scm.c b/net/core/scm.c
index bc75b6927222..cad0c153ac93 100644
--- a/net/core/scm.c
+++ b/net/core/scm.c
@@ -88,6 +88,7 @@ static int scm_fp_copy(struct cmsghdr *cmsg, struct scm_fp_list **fplp)
 		fpl->count = 0;
 		fpl->count_unix = 0;
 #if IS_ENABLED(CONFIG_UNIX)
+		fpl->inflight = false;
 		fpl->edges = NULL;
 #endif
 		fpl->max = SCM_MAX_FD;
@@ -381,6 +382,7 @@ struct scm_fp_list *scm_fp_dup(struct scm_fp_list *fpl)
 			get_file(fpl->fp[i]);
 
 #if IS_ENABLED(CONFIG_UNIX)
+		new_fpl->inflight = false;
 		new_fpl->edges = NULL;
 #endif
 		new_fpl->max = new_fpl->count;
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 0391f66546a6..ea7bac18a781 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1956,8 +1956,10 @@ static void scm_stat_add(struct sock *sk, struct sk_buff *skb)
 	struct scm_fp_list *fp = UNIXCB(skb).fp;
 	struct unix_sock *u = unix_sk(sk);
 
-	if (unlikely(fp && fp->count))
+	if (unlikely(fp && fp->count)) {
 		atomic_add(fp->count, &u->scm_stat.nr_fds);
+		unix_add_edges(fp, u);
+	}
 }
 
 static void scm_stat_del(struct sock *sk, struct sk_buff *skb)
@@ -1965,8 +1967,10 @@ static void scm_stat_del(struct sock *sk, struct sk_buff *skb)
 	struct scm_fp_list *fp = UNIXCB(skb).fp;
 	struct unix_sock *u = unix_sk(sk);
 
-	if (unlikely(fp && fp->count))
+	if (unlikely(fp && fp->count)) {
 		atomic_sub(fp->count, &u->scm_stat.nr_fds);
+		unix_del_edges(fp);
+	}
 }
 
 /*
diff --git a/net/unix/garbage.c b/net/unix/garbage.c
index ec998c7d6b4c..353416f38738 100644
--- a/net/unix/garbage.c
+++ b/net/unix/garbage.c
@@ -109,6 +109,57 @@ void unix_init_vertex(struct unix_sock *u)
 	INIT_LIST_HEAD(&vertex->edges);
 }
 
+DEFINE_SPINLOCK(unix_gc_lock);
+static LIST_HEAD(unix_unvisited_vertices);
+
+void unix_add_edges(struct scm_fp_list *fpl, struct unix_sock *receiver)
+{
+	int i = 0, j = 0;
+
+	spin_lock(&unix_gc_lock);
+
+	while (i < fpl->count_unix) {
+		struct unix_sock *inflight = unix_get_socket(fpl->fp[j++]);
+		struct unix_edge *edge;
+
+		if (!inflight)
+			continue;
+
+		edge = fpl->edges + i++;
+		edge->predecessor = &inflight->vertex;
+		edge->successor = &receiver->vertex;
+
+		if (!edge->predecessor->out_degree++)
+			list_add_tail(&edge->predecessor->entry, &unix_unvisited_vertices);
+
+		list_add_tail(&edge->entry, &edge->predecessor->edges);
+	}
+
+	spin_unlock(&unix_gc_lock);
+
+	fpl->inflight = true;
+}
+
+void unix_del_edges(struct scm_fp_list *fpl)
+{
+	int i = 0;
+
+	spin_lock(&unix_gc_lock);
+
+	while (i < fpl->count_unix) {
+		struct unix_edge *edge = fpl->edges + i++;
+
+		list_del(&edge->entry);
+
+		if (!--edge->predecessor->out_degree)
+			list_del_init(&edge->predecessor->entry);
+	}
+
+	spin_unlock(&unix_gc_lock);
+
+	fpl->inflight = false;
+}
+
 int unix_alloc_edges(struct scm_fp_list *fpl)
 {
 	if (!fpl->count_unix)
@@ -124,10 +175,12 @@ int unix_alloc_edges(struct scm_fp_list *fpl)
 
 void unix_free_edges(struct scm_fp_list *fpl)
 {
+	if (fpl->inflight)
+		unix_del_edges(fpl);
+
 	kvfree(fpl->edges);
 }
 
-DEFINE_SPINLOCK(unix_gc_lock);
 unsigned int unix_tot_inflight;
 static LIST_HEAD(gc_candidates);
 static LIST_HEAD(gc_inflight_list);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 net-next 04/14] af_unix: Save listener for embryo socket.
  2024-02-16 21:05 [PATCH v2 net-next 00/14] af_unix: Rework GC Kuniyuki Iwashima
                   ` (2 preceding siblings ...)
  2024-02-16 21:05 ` [PATCH v2 net-next 03/14] af_unix: Link struct unix_edge when queuing skb Kuniyuki Iwashima
@ 2024-02-16 21:05 ` Kuniyuki Iwashima
  2024-02-16 21:05 ` [PATCH v2 net-next 05/14] af_unix: Fix up unix_edge.successor " Kuniyuki Iwashima
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Kuniyuki Iwashima @ 2024-02-16 21:05 UTC (permalink / raw
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

This is a prep patch for the following change, where we need to
fetch the listening socket from the destination embryo socket in
sendmsg().

We add a new field to struct unix_sock to save a pointer to a
listening socket.

We set it when connect() creates a new socket, and clear it when
accept() is called.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 include/net/af_unix.h | 1 +
 net/unix/af_unix.c    | 5 ++++-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/include/net/af_unix.h b/include/net/af_unix.h
index 54d62467a70b..438d2a18ba2e 100644
--- a/include/net/af_unix.h
+++ b/include/net/af_unix.h
@@ -79,6 +79,7 @@ struct unix_sock {
 	struct path		path;
 	struct mutex		iolock, bindlock;
 	struct sock		*peer;
+	struct sock		*listener;
 	struct unix_vertex	vertex;
 	struct list_head	link;
 	unsigned long		inflight;
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index ea7bac18a781..1ebc3c15f972 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -992,6 +992,7 @@ static struct sock *unix_create1(struct net *net, struct socket *sock, int kern,
 	sk->sk_max_ack_backlog	= net->unx.sysctl_max_dgram_qlen;
 	sk->sk_destruct		= unix_sock_destructor;
 	u = unix_sk(sk);
+	u->listener = NULL;
 	u->inflight = 0;
 	u->path.dentry = NULL;
 	u->path.mnt = NULL;
@@ -1611,6 +1612,7 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr,
 	newsk->sk_type		= sk->sk_type;
 	init_peercred(newsk);
 	newu = unix_sk(newsk);
+	newu->listener = other;
 	RCU_INIT_POINTER(newsk->sk_wq, &newu->peer_wq);
 	otheru = unix_sk(other);
 
@@ -1706,8 +1708,8 @@ static int unix_accept(struct socket *sock, struct socket *newsock, int flags,
 		       bool kern)
 {
 	struct sock *sk = sock->sk;
-	struct sock *tsk;
 	struct sk_buff *skb;
+	struct sock *tsk;
 	int err;
 
 	err = -EOPNOTSUPP;
@@ -1732,6 +1734,7 @@ static int unix_accept(struct socket *sock, struct socket *newsock, int flags,
 	}
 
 	tsk = skb->sk;
+	unix_sk(tsk)->listener = NULL;
 	skb_free_datagram(sk, skb);
 	wake_up_interruptible(&unix_sk(sk)->peer_wait);
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 net-next 05/14] af_unix: Fix up unix_edge.successor for embryo socket.
  2024-02-16 21:05 [PATCH v2 net-next 00/14] af_unix: Rework GC Kuniyuki Iwashima
                   ` (3 preceding siblings ...)
  2024-02-16 21:05 ` [PATCH v2 net-next 04/14] af_unix: Save listener for embryo socket Kuniyuki Iwashima
@ 2024-02-16 21:05 ` Kuniyuki Iwashima
  2024-02-16 21:05 ` [PATCH v2 net-next 06/14] af_unix: Bulk update unix_tot_inflight/unix_inflight when queuing skb Kuniyuki Iwashima
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Kuniyuki Iwashima @ 2024-02-16 21:05 UTC (permalink / raw
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

To garbage collect inflight AF_UNIX sockets, we must define the
cyclic reference appropriately.  This is a bit tricky if the loop
consists of embryo sockets.

Suppose that the fd of AF_UNIX socket A is passed to D and the fd B
to C and that C and D are embryo sockets of A and B, respectively.
It may appear that there are two separate graphs, A (-> D) and
B (-> C), but this is not correct.

     A --. .-- B
          X
     C <-' `-> D

Now, D holds A's refcount, and C has B's refcount, so unix_release()
will never be called for A and B when we close() them.  However, no
one can call close() for D and C to free skbs holding refcounts of A
and B because C/D is in A/B's receive queue, which should have been
purged by unix_release() for A and B.

So, here's a new type of cyclic reference.  When a fd of an AF_UNIX
socket is passed to an embryo socket, the reference is indirectly
held by its parent listening socket.

  .-> A                            .-> B
  |   `- sk_receive_queue          |   `- sk_receive_queue
  |      `- skb                    |      `- skb
  |         `- sk == C             |         `- sk == D
  |            `- sk_receive_queue |           `- sk_receive_queue
  |               `- skb +---------'               `- skb +--.
  |                                                          |
  `----------------------------------------------------------'

Technically, the graph must be denoted as A <-> B instead of A (-> D)
and B (-> C) to find such a cyclic reference without touching each
socket's receive queue.

  .-> A --. .-- B <-.
  |        X        |  ==  A <-> B
  `-- C <-' `-> D --'

We apply this fixup in unix_add_edges() if the receiver is an embryo
socket.

We also link such edges to the embryo socket using another list_head
field, embryo_entry, because we need to restore the original separate
graphs A (-> D) and B (-> C) in unix_update_edges() once accept() is
called.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 include/net/af_unix.h |  2 ++
 net/unix/af_unix.c    |  2 +-
 net/unix/garbage.c    | 27 ++++++++++++++++++++++++++-
 3 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/include/net/af_unix.h b/include/net/af_unix.h
index 438d2a18ba2e..2d8e93775e61 100644
--- a/include/net/af_unix.h
+++ b/include/net/af_unix.h
@@ -25,6 +25,7 @@ void unix_notinflight(struct user_struct *user, struct file *fp);
 void unix_init_vertex(struct unix_sock *u);
 void unix_add_edges(struct scm_fp_list *fpl, struct unix_sock *receiver);
 void unix_del_edges(struct scm_fp_list *fpl);
+void unix_update_edges(struct unix_sock *receiver);
 int unix_alloc_edges(struct scm_fp_list *fpl);
 void unix_free_edges(struct scm_fp_list *fpl);
 void unix_gc(void);
@@ -40,6 +41,7 @@ struct unix_edge {
 	struct unix_vertex *predecessor;
 	struct unix_vertex *successor;
 	struct list_head entry;
+	struct list_head embryo_entry;
 };
 
 struct sock *unix_peer_get(struct sock *sk);
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 1ebc3c15f972..dab5d8d96e87 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1734,7 +1734,7 @@ static int unix_accept(struct socket *sock, struct socket *newsock, int flags,
 	}
 
 	tsk = skb->sk;
-	unix_sk(tsk)->listener = NULL;
+	unix_update_edges(unix_sk(tsk));
 	skb_free_datagram(sk, skb);
 	wake_up_interruptible(&unix_sk(sk)->peer_wait);
 
diff --git a/net/unix/garbage.c b/net/unix/garbage.c
index 353416f38738..97a43f8ec5a5 100644
--- a/net/unix/garbage.c
+++ b/net/unix/garbage.c
@@ -114,10 +114,16 @@ static LIST_HEAD(unix_unvisited_vertices);
 
 void unix_add_edges(struct scm_fp_list *fpl, struct unix_sock *receiver)
 {
+	struct unix_vertex *successor;
 	int i = 0, j = 0;
 
 	spin_lock(&unix_gc_lock);
 
+	if (receiver->listener)
+		successor = &unix_sk(receiver->listener)->vertex;
+	else
+		successor = &receiver->vertex;
+
 	while (i < fpl->count_unix) {
 		struct unix_sock *inflight = unix_get_socket(fpl->fp[j++]);
 		struct unix_edge *edge;
@@ -127,12 +133,15 @@ void unix_add_edges(struct scm_fp_list *fpl, struct unix_sock *receiver)
 
 		edge = fpl->edges + i++;
 		edge->predecessor = &inflight->vertex;
-		edge->successor = &receiver->vertex;
+		edge->successor = successor;
 
 		if (!edge->predecessor->out_degree++)
 			list_add_tail(&edge->predecessor->entry, &unix_unvisited_vertices);
 
 		list_add_tail(&edge->entry, &edge->predecessor->edges);
+
+		if (receiver->listener)
+			list_add_tail(&edge->embryo_entry, &receiver->vertex.edges);
 	}
 
 	spin_unlock(&unix_gc_lock);
@@ -160,6 +169,22 @@ void unix_del_edges(struct scm_fp_list *fpl)
 	fpl->inflight = false;
 }
 
+void unix_update_edges(struct unix_sock *receiver)
+{
+	struct unix_edge *edge;
+
+	spin_lock(&unix_gc_lock);
+
+	list_for_each_entry(edge, &receiver->vertex.edges, embryo_entry)
+		edge->successor = &receiver->vertex;
+
+	list_del_init(&receiver->vertex.edges);
+
+	receiver->listener = NULL;
+
+	spin_unlock(&unix_gc_lock);
+}
+
 int unix_alloc_edges(struct scm_fp_list *fpl)
 {
 	if (!fpl->count_unix)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 net-next 06/14] af_unix: Bulk update unix_tot_inflight/unix_inflight when queuing skb.
  2024-02-16 21:05 [PATCH v2 net-next 00/14] af_unix: Rework GC Kuniyuki Iwashima
                   ` (4 preceding siblings ...)
  2024-02-16 21:05 ` [PATCH v2 net-next 05/14] af_unix: Fix up unix_edge.successor " Kuniyuki Iwashima
@ 2024-02-16 21:05 ` Kuniyuki Iwashima
  2024-02-16 21:05 ` [PATCH v2 net-next 07/14] af_unix: Detect Strongly Connected Components Kuniyuki Iwashima
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Kuniyuki Iwashima @ 2024-02-16 21:05 UTC (permalink / raw
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

Currently, we track the number of inflight sockets in two variables.
unix_tot_inflight is the total number of inflight AF_UNIX sockets on
the host, and user->unix_inflight is the number of inflight fds per
user.

We update them one by one in unix_inflight(), which can be done once
in batch.  Also, sendmsg() could fail even after unix_inflight(), then
we need to acquire unix_gc_lock only to decrement the counters.

Let's bulk update the counters in unix_add_edges() and unix_del_edges(),
which is called only for successfully passed fds.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 net/unix/garbage.c | 18 +++++++-----------
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/net/unix/garbage.c b/net/unix/garbage.c
index 97a43f8ec5a5..93d7760e4ab1 100644
--- a/net/unix/garbage.c
+++ b/net/unix/garbage.c
@@ -111,6 +111,7 @@ void unix_init_vertex(struct unix_sock *u)
 
 DEFINE_SPINLOCK(unix_gc_lock);
 static LIST_HEAD(unix_unvisited_vertices);
+unsigned int unix_tot_inflight;
 
 void unix_add_edges(struct scm_fp_list *fpl, struct unix_sock *receiver)
 {
@@ -144,6 +145,9 @@ void unix_add_edges(struct scm_fp_list *fpl, struct unix_sock *receiver)
 			list_add_tail(&edge->embryo_entry, &receiver->vertex.edges);
 	}
 
+	WRITE_ONCE(unix_tot_inflight, unix_tot_inflight + fpl->count_unix);
+	WRITE_ONCE(fpl->user->unix_inflight, fpl->user->unix_inflight + fpl->count);
+
 	spin_unlock(&unix_gc_lock);
 
 	fpl->inflight = true;
@@ -164,6 +168,9 @@ void unix_del_edges(struct scm_fp_list *fpl)
 			list_del_init(&edge->predecessor->entry);
 	}
 
+	WRITE_ONCE(unix_tot_inflight, unix_tot_inflight - fpl->count_unix);
+	WRITE_ONCE(fpl->user->unix_inflight, fpl->user->unix_inflight - fpl->count);
+
 	spin_unlock(&unix_gc_lock);
 
 	fpl->inflight = false;
@@ -206,7 +213,6 @@ void unix_free_edges(struct scm_fp_list *fpl)
 	kvfree(fpl->edges);
 }
 
-unsigned int unix_tot_inflight;
 static LIST_HEAD(gc_candidates);
 static LIST_HEAD(gc_inflight_list);
 
@@ -227,13 +233,8 @@ void unix_inflight(struct user_struct *user, struct file *filp)
 			WARN_ON_ONCE(list_empty(&u->link));
 		}
 		u->inflight++;
-
-		/* Paired with READ_ONCE() in wait_for_unix_gc() */
-		WRITE_ONCE(unix_tot_inflight, unix_tot_inflight + 1);
 	}
 
-	WRITE_ONCE(user->unix_inflight, user->unix_inflight + 1);
-
 	spin_unlock(&unix_gc_lock);
 }
 
@@ -250,13 +251,8 @@ void unix_notinflight(struct user_struct *user, struct file *filp)
 		u->inflight--;
 		if (!u->inflight)
 			list_del_init(&u->link);
-
-		/* Paired with READ_ONCE() in wait_for_unix_gc() */
-		WRITE_ONCE(unix_tot_inflight, unix_tot_inflight - 1);
 	}
 
-	WRITE_ONCE(user->unix_inflight, user->unix_inflight - 1);
-
 	spin_unlock(&unix_gc_lock);
 }
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 net-next 07/14] af_unix: Detect Strongly Connected Components.
  2024-02-16 21:05 [PATCH v2 net-next 00/14] af_unix: Rework GC Kuniyuki Iwashima
                   ` (5 preceding siblings ...)
  2024-02-16 21:05 ` [PATCH v2 net-next 06/14] af_unix: Bulk update unix_tot_inflight/unix_inflight when queuing skb Kuniyuki Iwashima
@ 2024-02-16 21:05 ` Kuniyuki Iwashima
  2024-02-16 21:05 ` [PATCH v2 net-next 08/14] af_unix: Save O(n) setup of Tarjan's algo Kuniyuki Iwashima
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Kuniyuki Iwashima @ 2024-02-16 21:05 UTC (permalink / raw
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

In the new GC, we use a simple graph algorithm, Tarjan's Strongly
Connected Components (SCC) algorithm, to find cyclic references.

The algorithm visits every vertex exactly once using depth-first
search (DFS).  We implement it without recursion so that no one
can abuse it.

There could be multiple graphs, so we iterate unix_unvisited_vertices
in unix_walk_scc() and do DFS in __unix_walk_scc(), where we move
visited vertices to another list, unix_visited_vertices, not to
restart DFS twice on a visited vertex later in unix_walk_scc().

DFS starts by pushing an input vertex to a stack and assigning it
a unique number.  Two fields, index and lowlink, are initialised
with the number, but lowlink could be updated later during DFS.

If a vertex has an edge to an unvisited inflight vertex, we visit
it and do the same processing.  So, we will have vertices in the
stack in the order they appear and number them consecutively in
the same order.

If a vertex has a back-edge to a visited vertex in the stack,
we update the predecessor's lowlink with the successor's index.

After iterating edges from the vertex, we check if its index
equals its lowlink.

If the lowlink is different from the index, it shows there was a
back-edge.  Then, we propagate the lowlink to its predecessor and
go back to the predecessor to resume checking from the next edge
of the back-edge.

If the lowlink is the same as the index, we pop vertices before
and including the vertex from the stack.  Then, the set of vertices
is SCC, possibly forming a cycle.  At the same time, we move the
vertices to unix_visited_vertices.

When we finish the algorithm, all vertices in each SCC will be
linked via unix_vertex.scc_entry.

Let's take an example.  We have a graph including five inflight
vertices (F is not inflight):

  A -> B -> C -> D -> E (-> F)
       ^         |
       `---------'

Suppose that we start DFS from C.  We will visit C, D, and B first
and initialise their index and lowlink.  Then, the stack looks like
this:

  > B = (3, 3)  (index, lowlink)
    D = (2, 2)
    C = (1, 1)

When checking B's edge to C, we update B's lowlink with C's index
and propagate it to D.

    B = (3, 1)  (index, lowlink)
  > D = (2, 1)
    C = (1, 1)

Next, we visit E, which has no edge to an inflight vertex.

  > E = (4, 4)  (index, lowlink)
    B = (3, 1)
    D = (2, 1)
    C = (1, 1)

When we leave from E, its index and lowlink are the same, so we
pop E from the stack as single-vertex SCC.  Next, we leave from
D but do nothing because its lowlink is different from its index.

    B = (3, 1)  (index, lowlink)
    D = (2, 1)
  > C = (1, 1)

Then, we leave from C, whose index and lowlink are the same, so
we pop B, D and C as SCC.

Last, we do DFS for the rest of vertices, A, which is also a
single-vertex SCC.

Finally, each unix_vertex.scc_entry is linked as follows:

  A -.  B -> C -> D  E -.
  ^  |  ^         |  ^  |
  `--'  `---------'  `--'

We use SCC later to decide whether we can garbage-collect the
sockets.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 include/net/af_unix.h |  5 +++
 net/unix/garbage.c    | 77 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 82 insertions(+)

diff --git a/include/net/af_unix.h b/include/net/af_unix.h
index 2d8e93775e61..0874f6b41adc 100644
--- a/include/net/af_unix.h
+++ b/include/net/af_unix.h
@@ -34,7 +34,11 @@ void wait_for_unix_gc(struct scm_fp_list *fpl);
 struct unix_vertex {
 	struct list_head edges;
 	struct list_head entry;
+	struct list_head scc_entry;
 	unsigned long out_degree;
+	unsigned long index;
+	unsigned long lowlink;
+	bool on_stack;
 };
 
 struct unix_edge {
@@ -42,6 +46,7 @@ struct unix_edge {
 	struct unix_vertex *successor;
 	struct list_head entry;
 	struct list_head embryo_entry;
+	struct list_head stack_entry;
 };
 
 struct sock *unix_peer_get(struct sock *sk);
diff --git a/net/unix/garbage.c b/net/unix/garbage.c
index 93d7760e4ab1..88f370307c80 100644
--- a/net/unix/garbage.c
+++ b/net/unix/garbage.c
@@ -213,6 +213,81 @@ void unix_free_edges(struct scm_fp_list *fpl)
 	kvfree(fpl->edges);
 }
 
+enum unix_vertex_index {
+	UNIX_VERTEX_INDEX_UNVISITED,
+	UNIX_VERTEX_INDEX_START,
+};
+
+static LIST_HEAD(unix_visited_vertices);
+
+static void __unix_walk_scc(struct unix_vertex *vertex)
+{
+	unsigned long index = UNIX_VERTEX_INDEX_START;
+	LIST_HEAD(vertex_stack);
+	struct unix_edge *edge;
+	LIST_HEAD(edge_stack);
+
+next_vertex:
+	vertex->index = index;
+	vertex->lowlink = index;
+	index++;
+
+	vertex->on_stack = true;
+	list_add(&vertex->scc_entry, &vertex_stack);
+
+	list_for_each_entry(edge, &vertex->edges, entry) {
+		if (!edge->successor->out_degree)
+			continue;
+
+		if (edge->successor->index == UNIX_VERTEX_INDEX_UNVISITED) {
+			list_add(&edge->stack_entry, &edge_stack);
+
+			vertex = edge->successor;
+			goto next_vertex;
+prev_vertex:
+			edge = list_first_entry(&edge_stack, typeof(*edge), stack_entry);
+			list_del_init(&edge->stack_entry);
+
+			vertex = edge->predecessor;
+			vertex->lowlink = min(vertex->lowlink, edge->successor->lowlink);
+		} else if (edge->successor->on_stack) {
+			vertex->lowlink = min(vertex->lowlink, edge->successor->index);
+		}
+	}
+
+	if (vertex->index == vertex->lowlink) {
+		struct list_head scc;
+
+		__list_cut_position(&scc, &vertex_stack, &vertex->scc_entry);
+
+		list_for_each_entry_reverse(vertex, &scc, scc_entry) {
+			list_move_tail(&vertex->entry, &unix_visited_vertices);
+
+			vertex->on_stack = false;
+		}
+
+		list_del(&scc);
+	}
+
+	if (!list_empty(&edge_stack))
+		goto prev_vertex;
+}
+
+static void unix_walk_scc(void)
+{
+	struct unix_vertex *vertex;
+
+	list_for_each_entry(vertex, &unix_unvisited_vertices, entry)
+		vertex->index = UNIX_VERTEX_INDEX_UNVISITED;
+
+	while (!list_empty(&unix_unvisited_vertices)) {
+		vertex = list_first_entry(&unix_unvisited_vertices, typeof(*vertex), entry);
+		__unix_walk_scc(vertex);
+	}
+
+	list_replace_init(&unix_visited_vertices, &unix_unvisited_vertices);
+}
+
 static LIST_HEAD(gc_candidates);
 static LIST_HEAD(gc_inflight_list);
 
@@ -360,6 +435,8 @@ static void __unix_gc(struct work_struct *work)
 
 	spin_lock(&unix_gc_lock);
 
+	unix_walk_scc();
+
 	/* First, select candidates for garbage collection.  Only
 	 * in-flight sockets are considered, and from those only ones
 	 * which don't have any external reference.
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 net-next 08/14] af_unix: Save O(n) setup of Tarjan's algo.
  2024-02-16 21:05 [PATCH v2 net-next 00/14] af_unix: Rework GC Kuniyuki Iwashima
                   ` (6 preceding siblings ...)
  2024-02-16 21:05 ` [PATCH v2 net-next 07/14] af_unix: Detect Strongly Connected Components Kuniyuki Iwashima
@ 2024-02-16 21:05 ` Kuniyuki Iwashima
  2024-02-16 21:05 ` [PATCH v2 net-next 09/14] af_unix: Skip GC if no cycle exists Kuniyuki Iwashima
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Kuniyuki Iwashima @ 2024-02-16 21:05 UTC (permalink / raw
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

Before starting Tarjan's algorithm, we need to mark all vertices
as unvisited.  We can save this O(n) setup by reserving two special
indices (0, 1) and using two variables.

The first time we link a vertex to unix_unvisited_vertices, we set
unix_vertex_unvisited_index to index.

During DFS, we can see that the index of unvisited vertices is the
same as unix_vertex_unvisited_index.

When we finalise SCC later, we set unix_vertex_grouped_index to each
vertex's index.

Then, we can know (i) that the vertex is on the stack if the index
of a visited vertex is >= 2 and (ii) that it is not on the stack and
belongs to a different SCC if the index is unix_vertex_grouped_index.

After the whole algorithm, all indices of vertices are set as
unix_vertex_grouped_index.

Next time we start DFS, we know that all unvisited vertices have
unix_vertex_grouped_index, and we can use unix_vertex_unvisited_index
as the not-on-stack marker.

To use the same variable in __unix_walk_scc(), we can swap
unix_vertex_(grouped|unvisited)_index at the end of Tarjan's
algorithm.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 include/net/af_unix.h |  1 -
 net/unix/garbage.c    | 34 +++++++++++++++++++---------------
 2 files changed, 19 insertions(+), 16 deletions(-)

diff --git a/include/net/af_unix.h b/include/net/af_unix.h
index 0874f6b41adc..b3ba5e949d62 100644
--- a/include/net/af_unix.h
+++ b/include/net/af_unix.h
@@ -38,7 +38,6 @@ struct unix_vertex {
 	unsigned long out_degree;
 	unsigned long index;
 	unsigned long lowlink;
-	bool on_stack;
 };
 
 struct unix_edge {
diff --git a/net/unix/garbage.c b/net/unix/garbage.c
index 88f370307c80..c4b0cc438c64 100644
--- a/net/unix/garbage.c
+++ b/net/unix/garbage.c
@@ -113,6 +113,14 @@ DEFINE_SPINLOCK(unix_gc_lock);
 static LIST_HEAD(unix_unvisited_vertices);
 unsigned int unix_tot_inflight;
 
+enum unix_vertex_index {
+	UNIX_VERTEX_INDEX_MARK1,
+	UNIX_VERTEX_INDEX_MARK2,
+	UNIX_VERTEX_INDEX_START,
+};
+
+static unsigned long unix_vertex_unvisited_index = UNIX_VERTEX_INDEX_MARK1;
+
 void unix_add_edges(struct scm_fp_list *fpl, struct unix_sock *receiver)
 {
 	struct unix_vertex *successor;
@@ -136,8 +144,11 @@ void unix_add_edges(struct scm_fp_list *fpl, struct unix_sock *receiver)
 		edge->predecessor = &inflight->vertex;
 		edge->successor = successor;
 
-		if (!edge->predecessor->out_degree++)
+		if (!edge->predecessor->out_degree++) {
+			edge->predecessor->index = unix_vertex_unvisited_index;
+
 			list_add_tail(&edge->predecessor->entry, &unix_unvisited_vertices);
+		}
 
 		list_add_tail(&edge->entry, &edge->predecessor->edges);
 
@@ -213,12 +224,8 @@ void unix_free_edges(struct scm_fp_list *fpl)
 	kvfree(fpl->edges);
 }
 
-enum unix_vertex_index {
-	UNIX_VERTEX_INDEX_UNVISITED,
-	UNIX_VERTEX_INDEX_START,
-};
-
 static LIST_HEAD(unix_visited_vertices);
+static unsigned long unix_vertex_grouped_index = UNIX_VERTEX_INDEX_MARK2;
 
 static void __unix_walk_scc(struct unix_vertex *vertex)
 {
@@ -232,14 +239,13 @@ static void __unix_walk_scc(struct unix_vertex *vertex)
 	vertex->lowlink = index;
 	index++;
 
-	vertex->on_stack = true;
 	list_add(&vertex->scc_entry, &vertex_stack);
 
 	list_for_each_entry(edge, &vertex->edges, entry) {
 		if (!edge->successor->out_degree)
 			continue;
 
-		if (edge->successor->index == UNIX_VERTEX_INDEX_UNVISITED) {
+		if (edge->successor->index == unix_vertex_unvisited_index) {
 			list_add(&edge->stack_entry, &edge_stack);
 
 			vertex = edge->successor;
@@ -250,7 +256,7 @@ static void __unix_walk_scc(struct unix_vertex *vertex)
 
 			vertex = edge->predecessor;
 			vertex->lowlink = min(vertex->lowlink, edge->successor->lowlink);
-		} else if (edge->successor->on_stack) {
+		} else if (edge->successor->index != unix_vertex_grouped_index) {
 			vertex->lowlink = min(vertex->lowlink, edge->successor->index);
 		}
 	}
@@ -263,7 +269,7 @@ static void __unix_walk_scc(struct unix_vertex *vertex)
 		list_for_each_entry_reverse(vertex, &scc, scc_entry) {
 			list_move_tail(&vertex->entry, &unix_visited_vertices);
 
-			vertex->on_stack = false;
+			vertex->index = unix_vertex_grouped_index;
 		}
 
 		list_del(&scc);
@@ -275,17 +281,15 @@ static void __unix_walk_scc(struct unix_vertex *vertex)
 
 static void unix_walk_scc(void)
 {
-	struct unix_vertex *vertex;
-
-	list_for_each_entry(vertex, &unix_unvisited_vertices, entry)
-		vertex->index = UNIX_VERTEX_INDEX_UNVISITED;
-
 	while (!list_empty(&unix_unvisited_vertices)) {
+		struct unix_vertex *vertex;
+
 		vertex = list_first_entry(&unix_unvisited_vertices, typeof(*vertex), entry);
 		__unix_walk_scc(vertex);
 	}
 
 	list_replace_init(&unix_visited_vertices, &unix_unvisited_vertices);
+	swap(unix_vertex_unvisited_index, unix_vertex_grouped_index);
 }
 
 static LIST_HEAD(gc_candidates);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 net-next 09/14] af_unix: Skip GC if no cycle exists.
  2024-02-16 21:05 [PATCH v2 net-next 00/14] af_unix: Rework GC Kuniyuki Iwashima
                   ` (7 preceding siblings ...)
  2024-02-16 21:05 ` [PATCH v2 net-next 08/14] af_unix: Save O(n) setup of Tarjan's algo Kuniyuki Iwashima
@ 2024-02-16 21:05 ` Kuniyuki Iwashima
  2024-02-20 12:56   ` Paolo Abeni
  2024-02-16 21:05 ` [PATCH v2 net-next 10/14] af_unix: Avoid Tarjan's algorithm if unnecessary Kuniyuki Iwashima
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 22+ messages in thread
From: Kuniyuki Iwashima @ 2024-02-16 21:05 UTC (permalink / raw
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

We do not need to run GC if there is no possible cyclic reference.
We use unix_graph_maybe_cyclic to decide if we should run GC.

If a fd of an AF_UNIX socket is passed to an already inflight AF_UNIX
socket, they could form a cyclic reference.  Then, we set true to
unix_graph_maybe_cyclic and later run Tarjan's algorithm to group
them into SCC.

Once we run Tarjan's algorithm, we are 100% sure whether cyclic
references exist or not.  If there is no cycle, we set false to
unix_graph_maybe_cyclic and can skip the entire garbage collection
next time.

When finalising SCC, we set true to unix_graph_maybe_cyclic if SCC
consists of multiple vertices.

Even if SCC is a single vertex, a cycle might exist as self-fd passing.

To detect the corner case, we can check all edges of the vertex, but
instead, we add a new field that counts the number of self-references
to finish the decision in O(1) time.

With this change, __unix_gc() is just a spin_lock() dance in the normal
usage.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 include/net/af_unix.h |  1 +
 net/unix/garbage.c    | 56 +++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/include/net/af_unix.h b/include/net/af_unix.h
index b3ba5e949d62..59ec8d7880ce 100644
--- a/include/net/af_unix.h
+++ b/include/net/af_unix.h
@@ -36,6 +36,7 @@ struct unix_vertex {
 	struct list_head entry;
 	struct list_head scc_entry;
 	unsigned long out_degree;
+	unsigned long self_degree;
 	unsigned long index;
 	unsigned long lowlink;
 };
diff --git a/net/unix/garbage.c b/net/unix/garbage.c
index c4b0cc438c64..90f04d786dae 100644
--- a/net/unix/garbage.c
+++ b/net/unix/garbage.c
@@ -106,9 +106,23 @@ void unix_init_vertex(struct unix_sock *u)
 	struct unix_vertex *vertex = &u->vertex;
 
 	vertex->out_degree = 0;
+	vertex->self_degree = 0;
 	INIT_LIST_HEAD(&vertex->edges);
 }
 
+static bool unix_graph_maybe_cyclic;
+
+static void unix_graph_update(struct unix_edge *edge)
+{
+	if (unix_graph_maybe_cyclic)
+		return;
+
+	if (!edge->successor->out_degree)
+		return;
+
+	unix_graph_maybe_cyclic = true;
+}
+
 DEFINE_SPINLOCK(unix_gc_lock);
 static LIST_HEAD(unix_unvisited_vertices);
 unsigned int unix_tot_inflight;
@@ -144,6 +158,9 @@ void unix_add_edges(struct scm_fp_list *fpl, struct unix_sock *receiver)
 		edge->predecessor = &inflight->vertex;
 		edge->successor = successor;
 
+		if (edge->predecessor == edge->successor)
+			edge->predecessor->self_degree++;
+
 		if (!edge->predecessor->out_degree++) {
 			edge->predecessor->index = unix_vertex_unvisited_index;
 
@@ -154,6 +171,8 @@ void unix_add_edges(struct scm_fp_list *fpl, struct unix_sock *receiver)
 
 		if (receiver->listener)
 			list_add_tail(&edge->embryo_entry, &receiver->vertex.edges);
+
+		unix_graph_update(edge);
 	}
 
 	WRITE_ONCE(unix_tot_inflight, unix_tot_inflight + fpl->count_unix);
@@ -173,10 +192,15 @@ void unix_del_edges(struct scm_fp_list *fpl)
 	while (i < fpl->count_unix) {
 		struct unix_edge *edge = fpl->edges + i++;
 
+		unix_graph_update(edge);
+
 		list_del(&edge->entry);
 
 		if (!--edge->predecessor->out_degree)
 			list_del_init(&edge->predecessor->entry);
+
+		if (edge->predecessor == edge->successor)
+			edge->predecessor->self_degree--;
 	}
 
 	WRITE_ONCE(unix_tot_inflight, unix_tot_inflight - fpl->count_unix);
@@ -193,8 +217,14 @@ void unix_update_edges(struct unix_sock *receiver)
 
 	spin_lock(&unix_gc_lock);
 
-	list_for_each_entry(edge, &receiver->vertex.edges, embryo_entry)
+	list_for_each_entry(edge, &receiver->vertex.edges, embryo_entry) {
+		unix_graph_update(edge);
+
+		if (edge->predecessor == edge->successor)
+			edge->predecessor->self_degree--;
+
 		edge->successor = &receiver->vertex;
+	}
 
 	list_del_init(&receiver->vertex.edges);
 
@@ -224,6 +254,20 @@ void unix_free_edges(struct scm_fp_list *fpl)
 	kvfree(fpl->edges);
 }
 
+static bool unix_scc_cyclic(struct list_head *scc)
+{
+	struct unix_vertex *vertex;
+
+	if (!list_is_singular(scc))
+		return true;
+
+	vertex = list_first_entry(scc, typeof(*vertex), scc_entry);
+	if (vertex->self_degree)
+		return true;
+
+	return false;
+}
+
 static LIST_HEAD(unix_visited_vertices);
 static unsigned long unix_vertex_grouped_index = UNIX_VERTEX_INDEX_MARK2;
 
@@ -272,6 +316,9 @@ static void __unix_walk_scc(struct unix_vertex *vertex)
 			vertex->index = unix_vertex_grouped_index;
 		}
 
+		if (!unix_graph_maybe_cyclic)
+			unix_graph_maybe_cyclic = unix_scc_cyclic(&scc);
+
 		list_del(&scc);
 	}
 
@@ -281,6 +328,8 @@ static void __unix_walk_scc(struct unix_vertex *vertex)
 
 static void unix_walk_scc(void)
 {
+	unix_graph_maybe_cyclic = false;
+
 	while (!list_empty(&unix_unvisited_vertices)) {
 		struct unix_vertex *vertex;
 
@@ -439,6 +488,9 @@ static void __unix_gc(struct work_struct *work)
 
 	spin_lock(&unix_gc_lock);
 
+	if (!unix_graph_maybe_cyclic)
+		goto skip_gc;
+
 	unix_walk_scc();
 
 	/* First, select candidates for garbage collection.  Only
@@ -536,7 +588,7 @@ static void __unix_gc(struct work_struct *work)
 
 	/* All candidates should have been detached by now. */
 	WARN_ON_ONCE(!list_empty(&gc_candidates));
-
+skip_gc:
 	/* Paired with READ_ONCE() in wait_for_unix_gc(). */
 	WRITE_ONCE(gc_in_progress, false);
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 net-next 10/14] af_unix: Avoid Tarjan's algorithm if unnecessary.
  2024-02-16 21:05 [PATCH v2 net-next 00/14] af_unix: Rework GC Kuniyuki Iwashima
                   ` (8 preceding siblings ...)
  2024-02-16 21:05 ` [PATCH v2 net-next 09/14] af_unix: Skip GC if no cycle exists Kuniyuki Iwashima
@ 2024-02-16 21:05 ` Kuniyuki Iwashima
  2024-02-16 21:05 ` [PATCH v2 net-next 11/14] af_unix: Assign a unique index to SCC Kuniyuki Iwashima
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Kuniyuki Iwashima @ 2024-02-16 21:05 UTC (permalink / raw
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

Once a cyclic reference is formed, we need to run GC to check if
there is dead SCC.

However, we do not need to run Tarjan's algorithm if we know that
the shape of the inflight graph has not been changed.

If an edge is added/updated/deleted and the edge's successor is
inflight, we set false to unix_graph_grouped, which means we need
to re-classify SCC.

Once we finalise SCC, we set false to unix_graph_grouped.

While unix_graph_grouped is false, we can iterate the grouped
SCC using vertex->scc_entry in unix_walk_scc_fast().

list_add() and list_for_each_entry_reverse() uses seem weird, but
they are to keep the vertex order consistent and make writing test
easier.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 net/unix/garbage.c | 27 ++++++++++++++++++++++++++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/net/unix/garbage.c b/net/unix/garbage.c
index 90f04d786dae..1e919fe65737 100644
--- a/net/unix/garbage.c
+++ b/net/unix/garbage.c
@@ -108,9 +108,11 @@ void unix_init_vertex(struct unix_sock *u)
 	vertex->out_degree = 0;
 	vertex->self_degree = 0;
 	INIT_LIST_HEAD(&vertex->edges);
+	INIT_LIST_HEAD(&vertex->scc_entry);
 }
 
 static bool unix_graph_maybe_cyclic;
+static bool unix_graph_grouped;
 
 static void unix_graph_update(struct unix_edge *edge)
 {
@@ -121,6 +123,7 @@ static void unix_graph_update(struct unix_edge *edge)
 		return;
 
 	unix_graph_maybe_cyclic = true;
+	unix_graph_grouped = false;
 }
 
 DEFINE_SPINLOCK(unix_gc_lock);
@@ -339,6 +342,25 @@ static void unix_walk_scc(void)
 
 	list_replace_init(&unix_visited_vertices, &unix_unvisited_vertices);
 	swap(unix_vertex_unvisited_index, unix_vertex_grouped_index);
+	unix_graph_grouped = true;
+}
+
+static void unix_walk_scc_fast(void)
+{
+	while (!list_empty(&unix_unvisited_vertices)) {
+		struct unix_vertex *vertex;
+		struct list_head scc;
+
+		vertex = list_first_entry(&unix_unvisited_vertices, typeof(*vertex), entry);
+		list_add(&scc, &vertex->scc_entry);
+
+		list_for_each_entry_reverse(vertex, &scc, scc_entry)
+			list_move_tail(&vertex->entry, &unix_visited_vertices);
+
+		list_del(&scc);
+	}
+
+	list_replace_init(&unix_visited_vertices, &unix_unvisited_vertices);
 }
 
 static LIST_HEAD(gc_candidates);
@@ -491,7 +513,10 @@ static void __unix_gc(struct work_struct *work)
 	if (!unix_graph_maybe_cyclic)
 		goto skip_gc;
 
-	unix_walk_scc();
+	if (unix_graph_grouped)
+		unix_walk_scc_fast();
+	else
+		unix_walk_scc();
 
 	/* First, select candidates for garbage collection.  Only
 	 * in-flight sockets are considered, and from those only ones
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 net-next 11/14] af_unix: Assign a unique index to SCC.
  2024-02-16 21:05 [PATCH v2 net-next 00/14] af_unix: Rework GC Kuniyuki Iwashima
                   ` (9 preceding siblings ...)
  2024-02-16 21:05 ` [PATCH v2 net-next 10/14] af_unix: Avoid Tarjan's algorithm if unnecessary Kuniyuki Iwashima
@ 2024-02-16 21:05 ` Kuniyuki Iwashima
  2024-02-16 21:05 ` [PATCH v2 net-next 12/14] af_unix: Detect dead SCC Kuniyuki Iwashima
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Kuniyuki Iwashima @ 2024-02-16 21:05 UTC (permalink / raw
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

The definition of the lowlink in Tarjan's algorithm is the
smallest index of a vertex that is reachable with at most one
back-edge in SCC.  This is not useful for a cross-edge.

If we start traversing from A in the following graph, the final
lowlink of D is 3.  The cross-edge here is one between D and C.

  A -> B -> D   D = (4, 3)  (index, lowlink)
  ^    |    |   C = (3, 1)
  |    V    |   B = (2, 1)
  `--- C <--'   A = (1, 1)

This is because the lowlink of D is updated with the index of C.

In the following patch, we detect a dead SCC by checking two
conditions for each vertex.

  1) vertex has no edge directed to another SCC (no bridge)
  2) vertex's out_degree is the same as the refcount of its file

If 1) is false, there is a receiver of all fds of the SCC and
its ancestor SCC.

To evaluate 1), we need to assign a unique index to each SCC and
assign it to all vertices in the SCC.

This patch changes the lowlink update logic for cross-edge so
that in the above example, the lowlink of D is updated with the
lowlink of C.

  A -> B -> D   D = (4, 1)  (index, lowlink)
  ^    |    |   C = (3, 1)
  |    V    |   B = (2, 1)
  `--- C <--'   A = (1, 1)

Then, all vertices in the same SCC have the same lowlink, and we
can quickly find the bridge connecting to different SCC if exists.

However, it is no longer called lowlink, so we rename it to
scc_index.  (It's sometimes called lowpoint.)

Also, we add a global variable to hold the last index used in DFS
so that we do not reset the initial index in each DFS.

This patch can be squashed to the SCC detection patch but is
split deliberately for anyone wondering why lowlink is not used
as used in the original Tarjan's algorithm and many reference
implementations.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 include/net/af_unix.h |  2 +-
 net/unix/garbage.c    | 15 ++++++++-------
 2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/include/net/af_unix.h b/include/net/af_unix.h
index 59ec8d7880ce..66c8cf835625 100644
--- a/include/net/af_unix.h
+++ b/include/net/af_unix.h
@@ -38,7 +38,7 @@ struct unix_vertex {
 	unsigned long out_degree;
 	unsigned long self_degree;
 	unsigned long index;
-	unsigned long lowlink;
+	unsigned long scc_index;
 };
 
 struct unix_edge {
diff --git a/net/unix/garbage.c b/net/unix/garbage.c
index 1e919fe65737..0e6d0e96e7cf 100644
--- a/net/unix/garbage.c
+++ b/net/unix/garbage.c
@@ -273,18 +273,18 @@ static bool unix_scc_cyclic(struct list_head *scc)
 
 static LIST_HEAD(unix_visited_vertices);
 static unsigned long unix_vertex_grouped_index = UNIX_VERTEX_INDEX_MARK2;
+static unsigned long unix_vertex_last_index = UNIX_VERTEX_INDEX_START;
 
 static void __unix_walk_scc(struct unix_vertex *vertex)
 {
-	unsigned long index = UNIX_VERTEX_INDEX_START;
 	LIST_HEAD(vertex_stack);
 	struct unix_edge *edge;
 	LIST_HEAD(edge_stack);
 
 next_vertex:
-	vertex->index = index;
-	vertex->lowlink = index;
-	index++;
+	vertex->index = unix_vertex_last_index;
+	vertex->scc_index = unix_vertex_last_index;
+	unix_vertex_last_index++;
 
 	list_add(&vertex->scc_entry, &vertex_stack);
 
@@ -302,13 +302,13 @@ static void __unix_walk_scc(struct unix_vertex *vertex)
 			list_del_init(&edge->stack_entry);
 
 			vertex = edge->predecessor;
-			vertex->lowlink = min(vertex->lowlink, edge->successor->lowlink);
+			vertex->scc_index = min(vertex->scc_index, edge->successor->scc_index);
 		} else if (edge->successor->index != unix_vertex_grouped_index) {
-			vertex->lowlink = min(vertex->lowlink, edge->successor->index);
+			vertex->scc_index = min(vertex->scc_index, edge->successor->scc_index);
 		}
 	}
 
-	if (vertex->index == vertex->lowlink) {
+	if (vertex->index == vertex->scc_index) {
 		struct list_head scc;
 
 		__list_cut_position(&scc, &vertex_stack, &vertex->scc_entry);
@@ -331,6 +331,7 @@ static void __unix_walk_scc(struct unix_vertex *vertex)
 
 static void unix_walk_scc(void)
 {
+	unix_vertex_last_index = UNIX_VERTEX_INDEX_START;
 	unix_graph_maybe_cyclic = false;
 
 	while (!list_empty(&unix_unvisited_vertices)) {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 net-next 12/14] af_unix: Detect dead SCC.
  2024-02-16 21:05 [PATCH v2 net-next 00/14] af_unix: Rework GC Kuniyuki Iwashima
                   ` (10 preceding siblings ...)
  2024-02-16 21:05 ` [PATCH v2 net-next 11/14] af_unix: Assign a unique index to SCC Kuniyuki Iwashima
@ 2024-02-16 21:05 ` Kuniyuki Iwashima
  2024-02-16 21:05 ` [PATCH v2 net-next 13/14] af_unix: Replace garbage collection algorithm Kuniyuki Iwashima
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Kuniyuki Iwashima @ 2024-02-16 21:05 UTC (permalink / raw
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

When iterating SCC, we call unix_vertex_dead() for each vertex
to check if the vertex is close()d and has no bridge to another
SCC.

If both conditions are true for every vertex in SCC, we can
execute garbage collection for all skb in the SCC.

The actual garbage collection is done in the following patch,
replacing the old implementation.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 net/unix/garbage.c | 34 +++++++++++++++++++++++++++++++++-
 1 file changed, 33 insertions(+), 1 deletion(-)

diff --git a/net/unix/garbage.c b/net/unix/garbage.c
index 0e6d0e96e7cf..bdfee2788db5 100644
--- a/net/unix/garbage.c
+++ b/net/unix/garbage.c
@@ -257,6 +257,29 @@ void unix_free_edges(struct scm_fp_list *fpl)
 	kvfree(fpl->edges);
 }
 
+static bool unix_vertex_dead(struct unix_vertex *vertex)
+{
+	struct unix_edge *edge;
+	struct unix_sock *u;
+	long total_ref;
+
+	list_for_each_entry(edge, &vertex->edges, entry) {
+		if (!edge->successor->out_degree)
+			return false;
+
+		if (edge->successor->scc_index != vertex->scc_index)
+			return false;
+	}
+
+	u = container_of(vertex, typeof(*u), vertex);
+	total_ref = file_count(u->sk.sk_socket->file);
+
+	if (total_ref != vertex->out_degree)
+		return false;
+
+	return true;
+}
+
 static bool unix_scc_cyclic(struct list_head *scc)
 {
 	struct unix_vertex *vertex;
@@ -310,6 +333,7 @@ static void __unix_walk_scc(struct unix_vertex *vertex)
 
 	if (vertex->index == vertex->scc_index) {
 		struct list_head scc;
+		bool dead = true;
 
 		__list_cut_position(&scc, &vertex_stack, &vertex->scc_entry);
 
@@ -317,6 +341,9 @@ static void __unix_walk_scc(struct unix_vertex *vertex)
 			list_move_tail(&vertex->entry, &unix_visited_vertices);
 
 			vertex->index = unix_vertex_grouped_index;
+
+			if (dead)
+				dead = unix_vertex_dead(vertex);
 		}
 
 		if (!unix_graph_maybe_cyclic)
@@ -351,13 +378,18 @@ static void unix_walk_scc_fast(void)
 	while (!list_empty(&unix_unvisited_vertices)) {
 		struct unix_vertex *vertex;
 		struct list_head scc;
+		bool dead = true;
 
 		vertex = list_first_entry(&unix_unvisited_vertices, typeof(*vertex), entry);
 		list_add(&scc, &vertex->scc_entry);
 
-		list_for_each_entry_reverse(vertex, &scc, scc_entry)
+		list_for_each_entry_reverse(vertex, &scc, scc_entry) {
 			list_move_tail(&vertex->entry, &unix_visited_vertices);
 
+			if (dead)
+				dead = unix_vertex_dead(vertex);
+		}
+
 		list_del(&scc);
 	}
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 net-next 13/14] af_unix: Replace garbage collection algorithm.
  2024-02-16 21:05 [PATCH v2 net-next 00/14] af_unix: Rework GC Kuniyuki Iwashima
                   ` (11 preceding siblings ...)
  2024-02-16 21:05 ` [PATCH v2 net-next 12/14] af_unix: Detect dead SCC Kuniyuki Iwashima
@ 2024-02-16 21:05 ` Kuniyuki Iwashima
  2024-02-16 21:05 ` [PATCH v2 net-next 14/14] selftest: af_unix: Test GC for SCM_RIGHTS Kuniyuki Iwashima
  2024-02-16 21:28 ` [PATCH v2 net-next 00/14] af_unix: Rework GC Eric Dumazet
  14 siblings, 0 replies; 22+ messages in thread
From: Kuniyuki Iwashima @ 2024-02-16 21:05 UTC (permalink / raw
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

If we find a dead SCC during iteration, we call unix_collect_skb()
to splice all skb in the SCC to the global sk_buff_head, hitlist.

After iterating all SCC, we unlock unix_gc_lock and purge the queue.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 include/net/af_unix.h |   7 --
 net/unix/af_unix.c    |  12 --
 net/unix/garbage.c    | 286 +++++++-----------------------------------
 3 files changed, 48 insertions(+), 257 deletions(-)

diff --git a/include/net/af_unix.h b/include/net/af_unix.h
index 66c8cf835625..78ab1107ffb3 100644
--- a/include/net/af_unix.h
+++ b/include/net/af_unix.h
@@ -20,8 +20,6 @@ static inline struct unix_sock *unix_get_socket(struct file *filp)
 extern spinlock_t unix_gc_lock;
 extern unsigned int unix_tot_inflight;
 
-void unix_inflight(struct user_struct *user, struct file *fp);
-void unix_notinflight(struct user_struct *user, struct file *fp);
 void unix_init_vertex(struct unix_sock *u);
 void unix_add_edges(struct scm_fp_list *fpl, struct unix_sock *receiver);
 void unix_del_edges(struct scm_fp_list *fpl);
@@ -88,12 +86,7 @@ struct unix_sock {
 	struct sock		*peer;
 	struct sock		*listener;
 	struct unix_vertex	vertex;
-	struct list_head	link;
-	unsigned long		inflight;
 	spinlock_t		lock;
-	unsigned long		gc_flags;
-#define UNIX_GC_CANDIDATE	0
-#define UNIX_GC_MAYBE_CYCLE	1
 	struct socket_wq	peer_wq;
 	wait_queue_entry_t	peer_wake;
 	struct scm_stat		scm_stat;
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index dab5d8d96e87..0b70f7b5b5a8 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -993,12 +993,10 @@ static struct sock *unix_create1(struct net *net, struct socket *sock, int kern,
 	sk->sk_destruct		= unix_sock_destructor;
 	u = unix_sk(sk);
 	u->listener = NULL;
-	u->inflight = 0;
 	u->path.dentry = NULL;
 	u->path.mnt = NULL;
 	spin_lock_init(&u->lock);
 	unix_init_vertex(u);
-	INIT_LIST_HEAD(&u->link);
 	mutex_init(&u->iolock); /* single task reading lock */
 	mutex_init(&u->bindlock); /* single task binding lock */
 	init_waitqueue_head(&u->peer_wait);
@@ -1806,8 +1804,6 @@ static inline bool too_many_unix_fds(struct task_struct *p)
 
 static int unix_attach_fds(struct scm_cookie *scm, struct sk_buff *skb)
 {
-	int i;
-
 	if (too_many_unix_fds(current))
 		return -ETOOMANYREFS;
 
@@ -1819,9 +1815,6 @@ static int unix_attach_fds(struct scm_cookie *scm, struct sk_buff *skb)
 	if (!UNIXCB(skb).fp)
 		return -ENOMEM;
 
-	for (i = scm->fp->count - 1; i >= 0; i--)
-		unix_inflight(scm->fp->user, scm->fp->fp[i]);
-
 	if (unix_alloc_edges(UNIXCB(skb).fp))
 		return -ENOMEM;
 
@@ -1830,15 +1823,10 @@ static int unix_attach_fds(struct scm_cookie *scm, struct sk_buff *skb)
 
 static void unix_detach_fds(struct scm_cookie *scm, struct sk_buff *skb)
 {
-	int i;
-
 	scm->fp = UNIXCB(skb).fp;
 	UNIXCB(skb).fp = NULL;
 
 	unix_free_edges(scm->fp);
-
-	for (i = scm->fp->count - 1; i >= 0; i--)
-		unix_notinflight(scm->fp->user, scm->fp->fp[i]);
 }
 
 static void unix_peek_fds(struct scm_cookie *scm, struct sk_buff *skb)
diff --git a/net/unix/garbage.c b/net/unix/garbage.c
index bdfee2788db5..81f2dbcadc73 100644
--- a/net/unix/garbage.c
+++ b/net/unix/garbage.c
@@ -280,6 +280,43 @@ static bool unix_vertex_dead(struct unix_vertex *vertex)
 	return true;
 }
 
+static struct sk_buff_head hitlist;
+
+static void unix_collect_skb(struct list_head *scc)
+{
+	struct unix_vertex *vertex;
+
+	list_for_each_entry_reverse(vertex, scc, scc_entry) {
+		struct unix_sock *u = container_of(vertex, typeof(*u), vertex);
+		struct sk_buff_head *queue = &u->sk.sk_receive_queue;
+
+		spin_lock(&queue->lock);
+
+		if (u->sk.sk_state == TCP_LISTEN) {
+			struct sk_buff *skb;
+
+			skb_queue_walk(queue, skb) {
+				struct sk_buff_head *embryo_queue = &skb->sk->sk_receive_queue;
+
+				spin_lock(&embryo_queue->lock);
+				skb_queue_splice_init(embryo_queue, &hitlist);
+				spin_unlock(&embryo_queue->lock);
+			}
+		} else {
+			skb_queue_splice_init(queue, &hitlist);
+
+#if IS_ENABLED(CONFIG_AF_UNIX_OOB)
+			if (u->oob_skb) {
+				kfree_skb(u->oob_skb);
+				u->oob_skb = NULL;
+			}
+#endif
+		}
+
+		spin_unlock(&queue->lock);
+	}
+}
+
 static bool unix_scc_cyclic(struct list_head *scc)
 {
 	struct unix_vertex *vertex;
@@ -346,7 +383,9 @@ static void __unix_walk_scc(struct unix_vertex *vertex)
 				dead = unix_vertex_dead(vertex);
 		}
 
-		if (!unix_graph_maybe_cyclic)
+		if (dead)
+			unix_collect_skb(&scc);
+		else if (!unix_graph_maybe_cyclic)
 			unix_graph_maybe_cyclic = unix_scc_cyclic(&scc);
 
 		list_del(&scc);
@@ -390,267 +429,38 @@ static void unix_walk_scc_fast(void)
 				dead = unix_vertex_dead(vertex);
 		}
 
+		if (dead)
+			unix_collect_skb(&scc);
+
 		list_del(&scc);
 	}
 
 	list_replace_init(&unix_visited_vertices, &unix_unvisited_vertices);
 }
 
-static LIST_HEAD(gc_candidates);
-static LIST_HEAD(gc_inflight_list);
-
-/* Keep the number of times in flight count for the file
- * descriptor if it is for an AF_UNIX socket.
- */
-void unix_inflight(struct user_struct *user, struct file *filp)
-{
-	struct unix_sock *u = unix_get_socket(filp);
-
-	spin_lock(&unix_gc_lock);
-
-	if (u) {
-		if (!u->inflight) {
-			WARN_ON_ONCE(!list_empty(&u->link));
-			list_add_tail(&u->link, &gc_inflight_list);
-		} else {
-			WARN_ON_ONCE(list_empty(&u->link));
-		}
-		u->inflight++;
-	}
-
-	spin_unlock(&unix_gc_lock);
-}
-
-void unix_notinflight(struct user_struct *user, struct file *filp)
-{
-	struct unix_sock *u = unix_get_socket(filp);
-
-	spin_lock(&unix_gc_lock);
-
-	if (u) {
-		WARN_ON_ONCE(!u->inflight);
-		WARN_ON_ONCE(list_empty(&u->link));
-
-		u->inflight--;
-		if (!u->inflight)
-			list_del_init(&u->link);
-	}
-
-	spin_unlock(&unix_gc_lock);
-}
-
-static void scan_inflight(struct sock *x, void (*func)(struct unix_sock *),
-			  struct sk_buff_head *hitlist)
-{
-	struct sk_buff *skb;
-	struct sk_buff *next;
-
-	spin_lock(&x->sk_receive_queue.lock);
-	skb_queue_walk_safe(&x->sk_receive_queue, skb, next) {
-		/* Do we have file descriptors ? */
-		if (UNIXCB(skb).fp) {
-			bool hit = false;
-			/* Process the descriptors of this socket */
-			int nfd = UNIXCB(skb).fp->count;
-			struct file **fp = UNIXCB(skb).fp->fp;
-
-			while (nfd--) {
-				/* Get the socket the fd matches if it indeed does so */
-				struct unix_sock *u = unix_get_socket(*fp++);
-
-				/* Ignore non-candidates, they could have been added
-				 * to the queues after starting the garbage collection
-				 */
-				if (u && test_bit(UNIX_GC_CANDIDATE, &u->gc_flags)) {
-					hit = true;
-
-					func(u);
-				}
-			}
-			if (hit && hitlist != NULL) {
-				__skb_unlink(skb, &x->sk_receive_queue);
-				__skb_queue_tail(hitlist, skb);
-			}
-		}
-	}
-	spin_unlock(&x->sk_receive_queue.lock);
-}
-
-static void scan_children(struct sock *x, void (*func)(struct unix_sock *),
-			  struct sk_buff_head *hitlist)
-{
-	if (x->sk_state != TCP_LISTEN) {
-		scan_inflight(x, func, hitlist);
-	} else {
-		struct sk_buff *skb;
-		struct sk_buff *next;
-		struct unix_sock *u;
-		LIST_HEAD(embryos);
-
-		/* For a listening socket collect the queued embryos
-		 * and perform a scan on them as well.
-		 */
-		spin_lock(&x->sk_receive_queue.lock);
-		skb_queue_walk_safe(&x->sk_receive_queue, skb, next) {
-			u = unix_sk(skb->sk);
-
-			/* An embryo cannot be in-flight, so it's safe
-			 * to use the list link.
-			 */
-			WARN_ON_ONCE(!list_empty(&u->link));
-			list_add_tail(&u->link, &embryos);
-		}
-		spin_unlock(&x->sk_receive_queue.lock);
-
-		while (!list_empty(&embryos)) {
-			u = list_entry(embryos.next, struct unix_sock, link);
-			scan_inflight(&u->sk, func, hitlist);
-			list_del_init(&u->link);
-		}
-	}
-}
-
-static void dec_inflight(struct unix_sock *usk)
-{
-	usk->inflight--;
-}
-
-static void inc_inflight(struct unix_sock *usk)
-{
-	usk->inflight++;
-}
-
-static void inc_inflight_move_tail(struct unix_sock *u)
-{
-	u->inflight++;
-
-	/* If this still might be part of a cycle, move it to the end
-	 * of the list, so that it's checked even if it was already
-	 * passed over
-	 */
-	if (test_bit(UNIX_GC_MAYBE_CYCLE, &u->gc_flags))
-		list_move_tail(&u->link, &gc_candidates);
-}
-
 static bool gc_in_progress;
 
 static void __unix_gc(struct work_struct *work)
 {
-	struct sk_buff_head hitlist;
-	struct unix_sock *u, *next;
-	LIST_HEAD(not_cycle_list);
-	struct list_head cursor;
-
 	spin_lock(&unix_gc_lock);
 
-	if (!unix_graph_maybe_cyclic)
+	if (!unix_graph_maybe_cyclic) {
+		spin_unlock(&unix_gc_lock);
 		goto skip_gc;
+	}
+
+	__skb_queue_head_init(&hitlist);
 
 	if (unix_graph_grouped)
 		unix_walk_scc_fast();
 	else
 		unix_walk_scc();
 
-	/* First, select candidates for garbage collection.  Only
-	 * in-flight sockets are considered, and from those only ones
-	 * which don't have any external reference.
-	 *
-	 * Holding unix_gc_lock will protect these candidates from
-	 * being detached, and hence from gaining an external
-	 * reference.  Since there are no possible receivers, all
-	 * buffers currently on the candidates' queues stay there
-	 * during the garbage collection.
-	 *
-	 * We also know that no new candidate can be added onto the
-	 * receive queues.  Other, non candidate sockets _can_ be
-	 * added to queue, so we must make sure only to touch
-	 * candidates.
-	 */
-	list_for_each_entry_safe(u, next, &gc_inflight_list, link) {
-		long total_refs;
-
-		total_refs = file_count(u->sk.sk_socket->file);
-
-		WARN_ON_ONCE(!u->inflight);
-		WARN_ON_ONCE(total_refs < u->inflight);
-		if (total_refs == u->inflight) {
-			list_move_tail(&u->link, &gc_candidates);
-			__set_bit(UNIX_GC_CANDIDATE, &u->gc_flags);
-			__set_bit(UNIX_GC_MAYBE_CYCLE, &u->gc_flags);
-		}
-	}
-
-	/* Now remove all internal in-flight reference to children of
-	 * the candidates.
-	 */
-	list_for_each_entry(u, &gc_candidates, link)
-		scan_children(&u->sk, dec_inflight, NULL);
-
-	/* Restore the references for children of all candidates,
-	 * which have remaining references.  Do this recursively, so
-	 * only those remain, which form cyclic references.
-	 *
-	 * Use a "cursor" link, to make the list traversal safe, even
-	 * though elements might be moved about.
-	 */
-	list_add(&cursor, &gc_candidates);
-	while (cursor.next != &gc_candidates) {
-		u = list_entry(cursor.next, struct unix_sock, link);
-
-		/* Move cursor to after the current position. */
-		list_move(&cursor, &u->link);
-
-		if (u->inflight) {
-			list_move_tail(&u->link, &not_cycle_list);
-			__clear_bit(UNIX_GC_MAYBE_CYCLE, &u->gc_flags);
-			scan_children(&u->sk, inc_inflight_move_tail, NULL);
-		}
-	}
-	list_del(&cursor);
-
-	/* Now gc_candidates contains only garbage.  Restore original
-	 * inflight counters for these as well, and remove the skbuffs
-	 * which are creating the cycle(s).
-	 */
-	skb_queue_head_init(&hitlist);
-	list_for_each_entry(u, &gc_candidates, link)
-		scan_children(&u->sk, inc_inflight, &hitlist);
-
-	/* not_cycle_list contains those sockets which do not make up a
-	 * cycle.  Restore these to the inflight list.
-	 */
-	while (!list_empty(&not_cycle_list)) {
-		u = list_entry(not_cycle_list.next, struct unix_sock, link);
-		__clear_bit(UNIX_GC_CANDIDATE, &u->gc_flags);
-		list_move_tail(&u->link, &gc_inflight_list);
-	}
-
 	spin_unlock(&unix_gc_lock);
 
-	/* Here we are. Hitlist is filled. Die. */
 	__skb_queue_purge(&hitlist);
-
-#if IS_ENABLED(CONFIG_AF_UNIX_OOB)
-	while (!list_empty(&gc_candidates)) {
-		u = list_entry(gc_candidates.next, struct unix_sock, link);
-		if (u->oob_skb) {
-			struct sk_buff *skb = u->oob_skb;
-
-			u->oob_skb = NULL;
-			kfree_skb(skb);
-		}
-	}
-#endif
-
-	spin_lock(&unix_gc_lock);
-
-	/* All candidates should have been detached by now. */
-	WARN_ON_ONCE(!list_empty(&gc_candidates));
 skip_gc:
-	/* Paired with READ_ONCE() in wait_for_unix_gc(). */
 	WRITE_ONCE(gc_in_progress, false);
-
-	spin_unlock(&unix_gc_lock);
 }
 
 static DECLARE_WORK(unix_gc_work, __unix_gc);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 net-next 14/14] selftest: af_unix: Test GC for SCM_RIGHTS.
  2024-02-16 21:05 [PATCH v2 net-next 00/14] af_unix: Rework GC Kuniyuki Iwashima
                   ` (12 preceding siblings ...)
  2024-02-16 21:05 ` [PATCH v2 net-next 13/14] af_unix: Replace garbage collection algorithm Kuniyuki Iwashima
@ 2024-02-16 21:05 ` Kuniyuki Iwashima
  2024-02-16 21:28 ` [PATCH v2 net-next 00/14] af_unix: Rework GC Eric Dumazet
  14 siblings, 0 replies; 22+ messages in thread
From: Kuniyuki Iwashima @ 2024-02-16 21:05 UTC (permalink / raw
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

This patch adds test cases to verify the new GC.

We run each test for the following cases:

  * SOCK_DGRAM
  * SOCK_STREAM without embryo socket
  * SOCK_STREAM without embryo socket + MSG_OOB
  * SOCK_STREAM with embryo sockets
  * SOCK_STREAM with embryo sockets + MSG_OOB

Before and after running each test case, we ensure that there is
no AF_UNIX socket left in the netns by reading /proc/net/protocols.

We cannot use /proc/net/unix and UNIX_DIAG because the embryo socket
does not show up there.

Each test creates multiple sockets in an array.  We pass sockets in
the even index using the peer sockets in the odd index.

So, send_fd(0, 1) actually sends fd[0] to fd[2] via fd[0 + 1].

  Test 1 : A <-> A
  Test 2 : A <-> B
  Test 3 : A -> B -> C <- D
           ^.___|___.'    ^
                `---------'

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 tools/testing/selftests/net/.gitignore        |   1 +
 tools/testing/selftests/net/af_unix/Makefile  |   2 +-
 .../selftests/net/af_unix/scm_rights.c        | 286 ++++++++++++++++++
 3 files changed, 288 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/net/af_unix/scm_rights.c

diff --git a/tools/testing/selftests/net/.gitignore b/tools/testing/selftests/net/.gitignore
index 2f9d378edec3..d996a0ab0765 100644
--- a/tools/testing/selftests/net/.gitignore
+++ b/tools/testing/selftests/net/.gitignore
@@ -31,6 +31,7 @@ reuseport_dualstack
 rxtimestamp
 sctp_hello
 scm_pidfd
+scm_rights
 sk_bind_sendto_listen
 sk_connect_zero_addr
 socket
diff --git a/tools/testing/selftests/net/af_unix/Makefile b/tools/testing/selftests/net/af_unix/Makefile
index 221c387a7d7f..3b83c797650d 100644
--- a/tools/testing/selftests/net/af_unix/Makefile
+++ b/tools/testing/selftests/net/af_unix/Makefile
@@ -1,4 +1,4 @@
 CFLAGS += $(KHDR_INCLUDES)
-TEST_GEN_PROGS := diag_uid test_unix_oob unix_connect scm_pidfd
+TEST_GEN_PROGS := diag_uid test_unix_oob unix_connect scm_pidfd scm_rights
 
 include ../../lib.mk
diff --git a/tools/testing/selftests/net/af_unix/scm_rights.c b/tools/testing/selftests/net/af_unix/scm_rights.c
new file mode 100644
index 000000000000..bab606c9f1eb
--- /dev/null
+++ b/tools/testing/selftests/net/af_unix/scm_rights.c
@@ -0,0 +1,286 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright Amazon.com Inc. or its affiliates. */
+#define _GNU_SOURCE
+#include <sched.h>
+
+#include <stdio.h>
+#include <string.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <sys/un.h>
+
+#include "../../kselftest_harness.h"
+
+FIXTURE(scm_rights)
+{
+	int fd[16];
+};
+
+FIXTURE_VARIANT(scm_rights)
+{
+	char name[16];
+	int type;
+	int flags;
+	bool test_listener;
+};
+
+FIXTURE_VARIANT_ADD(scm_rights, dgram)
+{
+	.name = "UNIX ",
+	.type = SOCK_DGRAM,
+	.flags = 0,
+	.test_listener = false,
+};
+
+FIXTURE_VARIANT_ADD(scm_rights, stream)
+{
+	.name = "UNIX-STREAM ",
+	.type = SOCK_STREAM,
+	.flags = 0,
+	.test_listener = false,
+};
+
+FIXTURE_VARIANT_ADD(scm_rights, stream_oob)
+{
+	.name = "UNIX-STREAM ",
+	.type = SOCK_STREAM,
+	.flags = MSG_OOB,
+	.test_listener = false,
+};
+
+FIXTURE_VARIANT_ADD(scm_rights, stream_listener)
+{
+	.name = "UNIX-STREAM ",
+	.type = SOCK_STREAM,
+	.flags = 0,
+	.test_listener = true,
+};
+
+FIXTURE_VARIANT_ADD(scm_rights, stream_listener_oob)
+{
+	.name = "UNIX-STREAM ",
+	.type = SOCK_STREAM,
+	.flags = MSG_OOB,
+	.test_listener = true,
+};
+
+static int count_sockets(struct __test_metadata *_metadata,
+			 const FIXTURE_VARIANT(scm_rights) *variant)
+{
+	int sockets = -1, len, ret;
+	char *line = NULL;
+	size_t unused;
+	FILE *f;
+
+	f = fopen("/proc/net/protocols", "r");
+	ASSERT_NE(NULL, f);
+
+	len = strlen(variant->name);
+
+	while (getline(&line, &unused, f) != -1) {
+		int unused2;
+
+		if (strncmp(line, variant->name, len))
+			continue;
+
+		ret = sscanf(line + len, "%d %d", &unused2, &sockets);
+		ASSERT_EQ(2, ret);
+
+		break;
+	}
+
+	free(line);
+
+	ret = fclose(f);
+	ASSERT_EQ(0, ret);
+
+	return sockets;
+}
+
+FIXTURE_SETUP(scm_rights)
+{
+	int ret;
+
+	ret = unshare(CLONE_NEWNET);
+	ASSERT_EQ(0, ret);
+
+	ret = count_sockets(_metadata, variant);
+	ASSERT_EQ(0, ret);
+}
+
+FIXTURE_TEARDOWN(scm_rights)
+{
+	int ret;
+
+	sleep(1);
+
+	ret = count_sockets(_metadata, variant);
+	ASSERT_EQ(0, ret);
+}
+
+static void create_listeners(struct __test_metadata *_metadata,
+			     FIXTURE_DATA(scm_rights) *self,
+			     int n)
+{
+	struct sockaddr_un addr = {
+		.sun_family = AF_UNIX,
+	};
+	socklen_t addrlen;
+	int i, ret;
+
+	for (i = 0; i < n * 2; i += 2) {
+		self->fd[i] = socket(AF_UNIX, SOCK_STREAM, 0);
+		ASSERT_LE(0, self->fd[i]);
+
+		addrlen = sizeof(addr.sun_family);
+		ret = bind(self->fd[i], (struct sockaddr *)&addr, addrlen);
+		ASSERT_EQ(0, ret);
+
+		ret = listen(self->fd[i], -1);
+		ASSERT_EQ(0, ret);
+
+		addrlen = sizeof(addr);
+		ret = getsockname(self->fd[i], (struct sockaddr *)&addr, &addrlen);
+		ASSERT_EQ(0, ret);
+
+		self->fd[i + 1] = socket(AF_UNIX, SOCK_STREAM, 0);
+		ASSERT_LE(0, self->fd[i + 1]);
+
+		ret = connect(self->fd[i + 1], (struct sockaddr *)&addr, addrlen);
+		ASSERT_EQ(0, ret);
+	}
+}
+
+static void create_socketpairs(struct __test_metadata *_metadata,
+			       FIXTURE_DATA(scm_rights) *self,
+			       const FIXTURE_VARIANT(scm_rights) *variant,
+			       int n)
+{
+	int i, ret;
+
+	ASSERT_GE(sizeof(self->fd) / sizeof(int), n);
+
+	for (i = 0; i < n * 2; i += 2) {
+		ret = socketpair(AF_UNIX, variant->type, 0, self->fd + i);
+		ASSERT_EQ(0, ret);
+	}
+}
+
+static void __create_sockets(struct __test_metadata *_metadata,
+			     FIXTURE_DATA(scm_rights) *self,
+			     const FIXTURE_VARIANT(scm_rights) *variant,
+			     int n)
+{
+	if (variant->test_listener)
+		create_listeners(_metadata, self, n);
+	else
+		create_socketpairs(_metadata, self, variant, n);
+}
+
+static void __close_sockets(struct __test_metadata *_metadata,
+			    FIXTURE_DATA(scm_rights) *self,
+			    int n)
+{
+	int i, ret;
+
+	ASSERT_GE(sizeof(self->fd) / sizeof(int), n);
+
+	for (i = 0; i < n * 2; i++) {
+		ret = close(self->fd[i]);
+		ASSERT_EQ(0, ret);
+	}
+}
+
+void __send_fd(struct __test_metadata *_metadata,
+	       const FIXTURE_DATA(scm_rights) *self,
+	       const FIXTURE_VARIANT(scm_rights) *variant,
+	       int inflight, int receiver)
+{
+#define MSG "nop"
+#define MSGLEN 3
+	struct {
+		struct cmsghdr cmsghdr;
+		int fd[2];
+	} cmsg = {
+		.cmsghdr = {
+			.cmsg_len = CMSG_LEN(sizeof(cmsg.fd)),
+			.cmsg_level = SOL_SOCKET,
+			.cmsg_type = SCM_RIGHTS,
+		},
+		.fd = {
+			self->fd[inflight * 2],
+			self->fd[inflight * 2],
+		},
+	};
+	struct iovec iov = {
+		.iov_base = MSG,
+		.iov_len = MSGLEN,
+	};
+	struct msghdr msg = {
+		.msg_name = NULL,
+		.msg_namelen = 0,
+		.msg_iov = &iov,
+		.msg_iovlen = 1,
+		.msg_control = &cmsg,
+		.msg_controllen = CMSG_SPACE(sizeof(cmsg.fd)),
+	};
+	int ret;
+
+	ret = sendmsg(self->fd[receiver * 2 + 1], &msg, variant->flags);
+	ASSERT_EQ(MSGLEN, ret);
+}
+
+#define create_sockets(n)					\
+	__create_sockets(_metadata, self, variant, n)
+#define close_sockets(n)					\
+	__close_sockets(_metadata, self, n)
+#define send_fd(inflight, receiver)				\
+	__send_fd(_metadata, self, variant, inflight, receiver)
+
+TEST_F(scm_rights, self_ref)
+{
+	create_sockets(2);
+
+	send_fd(0, 0);
+
+	send_fd(1, 1);
+
+	close_sockets(2);
+}
+
+TEST_F(scm_rights, triangle)
+{
+	create_sockets(6);
+
+	send_fd(0, 1);
+	send_fd(1, 2);
+	send_fd(2, 0);
+
+	send_fd(3, 4);
+	send_fd(4, 5);
+	send_fd(5, 3);
+
+	close_sockets(6);
+}
+
+TEST_F(scm_rights, cross_edge)
+{
+	create_sockets(8);
+
+	send_fd(0, 1);
+	send_fd(1, 2);
+	send_fd(2, 0);
+	send_fd(1, 3);
+	send_fd(3, 2);
+
+	send_fd(4, 5);
+	send_fd(5, 6);
+	send_fd(6, 4);
+	send_fd(5, 7);
+	send_fd(7, 6);
+
+	close_sockets(8);
+}
+
+TEST_HARNESS_MAIN
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 net-next 00/14] af_unix: Rework GC.
  2024-02-16 21:05 [PATCH v2 net-next 00/14] af_unix: Rework GC Kuniyuki Iwashima
                   ` (13 preceding siblings ...)
  2024-02-16 21:05 ` [PATCH v2 net-next 14/14] selftest: af_unix: Test GC for SCM_RIGHTS Kuniyuki Iwashima
@ 2024-02-16 21:28 ` Eric Dumazet
  2024-02-16 21:42   ` Kuniyuki Iwashima
  14 siblings, 1 reply; 22+ messages in thread
From: Eric Dumazet @ 2024-02-16 21:28 UTC (permalink / raw
  To: Kuniyuki Iwashima
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Kuniyuki Iwashima,
	netdev

On Fri, Feb 16, 2024 at 10:06 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
>
> When we pass a file descriptor to an AF_UNIX socket via SCM_RIGTHS,
> the underlying struct file of the inflight fd gets its refcount bumped.
> If the fd is of an AF_UNIX socket, we need to track it in case it forms
> cyclic references.
>
> Let's say we send a fd of AF_UNIX socket A to B and vice versa and
> close() both sockets.
>
> When created, each socket's struct file initially has one reference.
> After the fd exchange, both refcounts are bumped up to 2.  Then, close()
> decreases both to 1.  From this point on, no one can touch the file/socket.

Note that I have pending syzbot reports about af_unix. (apparently
syzbot found its way with SO_PEEK_OFF)

I would prefer we wait a bit before reworking the whole thing.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 net-next 00/14] af_unix: Rework GC.
  2024-02-16 21:28 ` [PATCH v2 net-next 00/14] af_unix: Rework GC Eric Dumazet
@ 2024-02-16 21:42   ` Kuniyuki Iwashima
  0 siblings, 0 replies; 22+ messages in thread
From: Kuniyuki Iwashima @ 2024-02-16 21:42 UTC (permalink / raw
  To: edumazet; +Cc: davem, kuba, kuni1840, kuniyu, netdev, pabeni

From: Eric Dumazet <edumazet@google.com>
Date: Fri, 16 Feb 2024 22:28:11 +0100
> On Fri, Feb 16, 2024 at 10:06 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> >
> > When we pass a file descriptor to an AF_UNIX socket via SCM_RIGTHS,
> > the underlying struct file of the inflight fd gets its refcount bumped.
> > If the fd is of an AF_UNIX socket, we need to track it in case it forms
> > cyclic references.
> >
> > Let's say we send a fd of AF_UNIX socket A to B and vice versa and
> > close() both sockets.
> >
> > When created, each socket's struct file initially has one reference.
> > After the fd exchange, both refcounts are bumped up to 2.  Then, close()
> > decreases both to 1.  From this point on, no one can touch the file/socket.
> 
> Note that I have pending syzbot reports about af_unix. (apparently
> syzbot found its way with SO_PEEK_OFF)

Interesting :)

> 
> I would prefer we wait a bit before reworking the whole thing.

Sure, I'll wait to avoid a sad situation that large refactoring
happens to fix the bug and we need to backport or cook another
fix for stable.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 net-next 03/14] af_unix: Link struct unix_edge when queuing skb.
  2024-02-16 21:05 ` [PATCH v2 net-next 03/14] af_unix: Link struct unix_edge when queuing skb Kuniyuki Iwashima
@ 2024-02-20 12:11   ` Paolo Abeni
  2024-02-20 17:44     ` Kuniyuki Iwashima
  0 siblings, 1 reply; 22+ messages in thread
From: Paolo Abeni @ 2024-02-20 12:11 UTC (permalink / raw
  To: Kuniyuki Iwashima, David S. Miller, Eric Dumazet, Jakub Kicinski
  Cc: Kuniyuki Iwashima, netdev

On Fri, 2024-02-16 at 13:05 -0800, Kuniyuki Iwashima wrote:
> Just before queuing skb with inflight fds, we call scm_stat_add(),
> which is a good place to set up the preallocated struct unix_edge
> in UNIXCB(skb).fp->edges.
> 
> Then, we call unix_add_edges() and construct the directed graph
> as follows:
> 
>   1. Set the inflight socket's unix_vertex to unix_edge.predecessor
>   2. Set the receiver's unix_vertex to unix_edge.successor
>   3. Link unix_edge.entry to the inflight socket's unix_vertex.edges
>   4. Link inflight socket's unix_vertex.entry to unix_unvisited_vertices.
> 
> Let's say we pass the fd of AF_UNIX socket A to B and the fd of B
> to C.  The graph looks like this:
> 
>   +-------------------------+
>   | unix_unvisited_vertices | <------------------------.
>   +-------------------------+                          |
>   +                                                    |
>   |   +-------------+                +-------------+   |            +-------------+
>   |   | unix_sock A |                | unix_sock B |   |            | unix_sock C |
>   |   +-------------+                +-------------+   |            +-------------+
>   |   | unix_vertex | <----.  .----> | unix_vertex | <-|--.  .----> | unix_vertex |
>   |   | +-----------+      |  |      | +-----------+   |  |  |      | +-----------+
>   `-> | |   entry   | +------------> | |   entry   | +-'  |  |      | |   entry   |
>       | |-----------|      |  |      | |-----------|      |  |      | |-----------|
>       | |   edges   | <-.  |  |      | |   edges   | <-.  |  |      | |   edges   |
>       +-+-----------+   |  |  |      +-+-----------+   |  |  |      +-+-----------+
>                         |  |  |                        |  |  |
>   .---------------------'  |  |  .---------------------'  |  |
>   |                        |  |  |                        |  |
>   |   +-------------+      |  |  |   +-------------+      |  |
>   |   |  unix_edge  |      |  |  |   |  unix_edge  |      |  |
>   |   +-------------+      |  |  |   +-------------+      |  |
>   `-> |    entry    |      |  |  `-> |    entry    |      |  |
>       |-------------|      |  |      |-------------|      |  |
>       | predecessor | +----'  |      | predecessor | +----'  |
>       |-------------|         |      |-------------|         |
>       |  successor  | +-------'      |  successor  | +-------'
>       +-------------+                +-------------+
> 
> Henceforth, we denote such a graph as A -> B (-> C).
> 
> Now, we can express all inflight fd graphs that do not contain
> embryo sockets.  The following two patches will support the
> particular case.
> 
> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> ---
>  include/net/af_unix.h |  2 ++
>  include/net/scm.h     |  1 +
>  net/core/scm.c        |  2 ++
>  net/unix/af_unix.c    |  8 +++++--
>  net/unix/garbage.c    | 55 ++++++++++++++++++++++++++++++++++++++++++-
>  5 files changed, 65 insertions(+), 3 deletions(-)
> 
> diff --git a/include/net/af_unix.h b/include/net/af_unix.h
> index cab9dfb666f3..54d62467a70b 100644
> --- a/include/net/af_unix.h
> +++ b/include/net/af_unix.h
> @@ -23,6 +23,8 @@ extern unsigned int unix_tot_inflight;
>  void unix_inflight(struct user_struct *user, struct file *fp);
>  void unix_notinflight(struct user_struct *user, struct file *fp);
>  void unix_init_vertex(struct unix_sock *u);
> +void unix_add_edges(struct scm_fp_list *fpl, struct unix_sock *receiver);
> +void unix_del_edges(struct scm_fp_list *fpl);
>  int unix_alloc_edges(struct scm_fp_list *fpl);
>  void unix_free_edges(struct scm_fp_list *fpl);
>  void unix_gc(void);
> diff --git a/include/net/scm.h b/include/net/scm.h
> index a1142dee086c..7d807fe466a3 100644
> --- a/include/net/scm.h
> +++ b/include/net/scm.h
> @@ -32,6 +32,7 @@ struct scm_fp_list {
>  	short			count_unix;
>  	short			max;
>  #ifdef CONFIG_UNIX
> +	bool			inflight;
>  	struct unix_edge	*edges;
>  #endif
>  	struct user_struct	*user;
> diff --git a/net/core/scm.c b/net/core/scm.c
> index bc75b6927222..cad0c153ac93 100644
> --- a/net/core/scm.c
> +++ b/net/core/scm.c
> @@ -88,6 +88,7 @@ static int scm_fp_copy(struct cmsghdr *cmsg, struct scm_fp_list **fplp)
>  		fpl->count = 0;
>  		fpl->count_unix = 0;
>  #if IS_ENABLED(CONFIG_UNIX)
> +		fpl->inflight = false;
>  		fpl->edges = NULL;
>  #endif
>  		fpl->max = SCM_MAX_FD;
> @@ -381,6 +382,7 @@ struct scm_fp_list *scm_fp_dup(struct scm_fp_list *fpl)
>  			get_file(fpl->fp[i]);
>  
>  #if IS_ENABLED(CONFIG_UNIX)
> +		new_fpl->inflight = false;
>  		new_fpl->edges = NULL;
>  #endif
>  		new_fpl->max = new_fpl->count;
> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> index 0391f66546a6..ea7bac18a781 100644
> --- a/net/unix/af_unix.c
> +++ b/net/unix/af_unix.c
> @@ -1956,8 +1956,10 @@ static void scm_stat_add(struct sock *sk, struct sk_buff *skb)
>  	struct scm_fp_list *fp = UNIXCB(skb).fp;
>  	struct unix_sock *u = unix_sk(sk);
>  
> -	if (unlikely(fp && fp->count))
> +	if (unlikely(fp && fp->count)) {
>  		atomic_add(fp->count, &u->scm_stat.nr_fds);
> +		unix_add_edges(fp, u);
> +	}
>  }
>  
>  static void scm_stat_del(struct sock *sk, struct sk_buff *skb)
> @@ -1965,8 +1967,10 @@ static void scm_stat_del(struct sock *sk, struct sk_buff *skb)
>  	struct scm_fp_list *fp = UNIXCB(skb).fp;
>  	struct unix_sock *u = unix_sk(sk);
>  
> -	if (unlikely(fp && fp->count))
> +	if (unlikely(fp && fp->count)) {
>  		atomic_sub(fp->count, &u->scm_stat.nr_fds);
> +		unix_del_edges(fp);
> +	}
>  }
>  
>  /*
> diff --git a/net/unix/garbage.c b/net/unix/garbage.c
> index ec998c7d6b4c..353416f38738 100644
> --- a/net/unix/garbage.c
> +++ b/net/unix/garbage.c
> @@ -109,6 +109,57 @@ void unix_init_vertex(struct unix_sock *u)
>  	INIT_LIST_HEAD(&vertex->edges);
>  }
>  
> +DEFINE_SPINLOCK(unix_gc_lock);
> +static LIST_HEAD(unix_unvisited_vertices);
> +
> +void unix_add_edges(struct scm_fp_list *fpl, struct unix_sock *receiver)
> +{
> +	int i = 0, j = 0;
> +
> +	spin_lock(&unix_gc_lock);
> +
> +	while (i < fpl->count_unix) {
> +		struct unix_sock *inflight = unix_get_socket(fpl->fp[j++]);
> +		struct unix_edge *edge;
> +
> +		if (!inflight)
> +			continue;
> +
> +		edge = fpl->edges + i++;
> +		edge->predecessor = &inflight->vertex;
> +		edge->successor = &receiver->vertex;
> +
> +		if (!edge->predecessor->out_degree++)
> +			list_add_tail(&edge->predecessor->entry, &unix_unvisited_vertices);
> +
> +		list_add_tail(&edge->entry, &edge->predecessor->edges);

Note that I confusingly replied to the previous revision of this patch,
but I think the points still stand.

		INIT_LIST_HEAD(&edge->entry);	

disappeared from the above, but I can't find where it landed?!? 

Thanks,

Paolo	


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 net-next 09/14] af_unix: Skip GC if no cycle exists.
  2024-02-16 21:05 ` [PATCH v2 net-next 09/14] af_unix: Skip GC if no cycle exists Kuniyuki Iwashima
@ 2024-02-20 12:56   ` Paolo Abeni
  2024-02-20 17:47     ` Kuniyuki Iwashima
  0 siblings, 1 reply; 22+ messages in thread
From: Paolo Abeni @ 2024-02-20 12:56 UTC (permalink / raw
  To: Kuniyuki Iwashima, David S. Miller, Eric Dumazet, Jakub Kicinski
  Cc: Kuniyuki Iwashima, netdev

On Fri, 2024-02-16 at 13:05 -0800, Kuniyuki Iwashima wrote:
> diff --git a/include/net/af_unix.h b/include/net/af_unix.h
> index b3ba5e949d62..59ec8d7880ce 100644
> --- a/include/net/af_unix.h
> +++ b/include/net/af_unix.h
> @@ -36,6 +36,7 @@ struct unix_vertex {
>  	struct list_head entry;
>  	struct list_head scc_entry;
>  	unsigned long out_degree;
> +	unsigned long self_degree;
>  	unsigned long index;
>  	unsigned long lowlink;
>  };

The series increases considerably struct unix_sock size. Have you
considered dynamically allocating unix_vertex only when using
SCM_RIGTHS?

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 net-next 03/14] af_unix: Link struct unix_edge when queuing skb.
  2024-02-20 12:11   ` Paolo Abeni
@ 2024-02-20 17:44     ` Kuniyuki Iwashima
  2024-02-20 17:53       ` Paolo Abeni
  0 siblings, 1 reply; 22+ messages in thread
From: Kuniyuki Iwashima @ 2024-02-20 17:44 UTC (permalink / raw
  To: pabeni; +Cc: davem, edumazet, kuba, kuni1840, kuniyu, netdev

From: Paolo Abeni <pabeni@redhat.com>
Date: Tue, 20 Feb 2024 13:11:09 +0100
> On Fri, 2024-02-16 at 13:05 -0800, Kuniyuki Iwashima wrote:
> > Just before queuing skb with inflight fds, we call scm_stat_add(),
> > which is a good place to set up the preallocated struct unix_edge
> > in UNIXCB(skb).fp->edges.
> > 
> > Then, we call unix_add_edges() and construct the directed graph
> > as follows:
> > 
> >   1. Set the inflight socket's unix_vertex to unix_edge.predecessor
> >   2. Set the receiver's unix_vertex to unix_edge.successor
> >   3. Link unix_edge.entry to the inflight socket's unix_vertex.edges
> >   4. Link inflight socket's unix_vertex.entry to unix_unvisited_vertices.
> > 
> > Let's say we pass the fd of AF_UNIX socket A to B and the fd of B
> > to C.  The graph looks like this:
> > 
> >   +-------------------------+
> >   | unix_unvisited_vertices | <------------------------.
> >   +-------------------------+                          |
> >   +                                                    |
> >   |   +-------------+                +-------------+   |            +-------------+
> >   |   | unix_sock A |                | unix_sock B |   |            | unix_sock C |
> >   |   +-------------+                +-------------+   |            +-------------+
> >   |   | unix_vertex | <----.  .----> | unix_vertex | <-|--.  .----> | unix_vertex |
> >   |   | +-----------+      |  |      | +-----------+   |  |  |      | +-----------+
> >   `-> | |   entry   | +------------> | |   entry   | +-'  |  |      | |   entry   |
> >       | |-----------|      |  |      | |-----------|      |  |      | |-----------|
> >       | |   edges   | <-.  |  |      | |   edges   | <-.  |  |      | |   edges   |
> >       +-+-----------+   |  |  |      +-+-----------+   |  |  |      +-+-----------+
> >                         |  |  |                        |  |  |
> >   .---------------------'  |  |  .---------------------'  |  |
> >   |                        |  |  |                        |  |
> >   |   +-------------+      |  |  |   +-------------+      |  |
> >   |   |  unix_edge  |      |  |  |   |  unix_edge  |      |  |
> >   |   +-------------+      |  |  |   +-------------+      |  |
> >   `-> |    entry    |      |  |  `-> |    entry    |      |  |
> >       |-------------|      |  |      |-------------|      |  |
> >       | predecessor | +----'  |      | predecessor | +----'  |
> >       |-------------|         |      |-------------|         |
> >       |  successor  | +-------'      |  successor  | +-------'
> >       +-------------+                +-------------+
> > 
> > Henceforth, we denote such a graph as A -> B (-> C).
> > 
> > Now, we can express all inflight fd graphs that do not contain
> > embryo sockets.  The following two patches will support the
> > particular case.
> > 
> > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> > ---
> >  include/net/af_unix.h |  2 ++
> >  include/net/scm.h     |  1 +
> >  net/core/scm.c        |  2 ++
> >  net/unix/af_unix.c    |  8 +++++--
> >  net/unix/garbage.c    | 55 ++++++++++++++++++++++++++++++++++++++++++-
> >  5 files changed, 65 insertions(+), 3 deletions(-)
> > 
> > diff --git a/include/net/af_unix.h b/include/net/af_unix.h
> > index cab9dfb666f3..54d62467a70b 100644
> > --- a/include/net/af_unix.h
> > +++ b/include/net/af_unix.h
> > @@ -23,6 +23,8 @@ extern unsigned int unix_tot_inflight;
> >  void unix_inflight(struct user_struct *user, struct file *fp);
> >  void unix_notinflight(struct user_struct *user, struct file *fp);
> >  void unix_init_vertex(struct unix_sock *u);
> > +void unix_add_edges(struct scm_fp_list *fpl, struct unix_sock *receiver);
> > +void unix_del_edges(struct scm_fp_list *fpl);
> >  int unix_alloc_edges(struct scm_fp_list *fpl);
> >  void unix_free_edges(struct scm_fp_list *fpl);
> >  void unix_gc(void);
> > diff --git a/include/net/scm.h b/include/net/scm.h
> > index a1142dee086c..7d807fe466a3 100644
> > --- a/include/net/scm.h
> > +++ b/include/net/scm.h
> > @@ -32,6 +32,7 @@ struct scm_fp_list {
> >  	short			count_unix;
> >  	short			max;
> >  #ifdef CONFIG_UNIX
> > +	bool			inflight;
> >  	struct unix_edge	*edges;
> >  #endif
> >  	struct user_struct	*user;
> > diff --git a/net/core/scm.c b/net/core/scm.c
> > index bc75b6927222..cad0c153ac93 100644
> > --- a/net/core/scm.c
> > +++ b/net/core/scm.c
> > @@ -88,6 +88,7 @@ static int scm_fp_copy(struct cmsghdr *cmsg, struct scm_fp_list **fplp)
> >  		fpl->count = 0;
> >  		fpl->count_unix = 0;
> >  #if IS_ENABLED(CONFIG_UNIX)
> > +		fpl->inflight = false;
> >  		fpl->edges = NULL;
> >  #endif
> >  		fpl->max = SCM_MAX_FD;
> > @@ -381,6 +382,7 @@ struct scm_fp_list *scm_fp_dup(struct scm_fp_list *fpl)
> >  			get_file(fpl->fp[i]);
> >  
> >  #if IS_ENABLED(CONFIG_UNIX)
> > +		new_fpl->inflight = false;
> >  		new_fpl->edges = NULL;
> >  #endif
> >  		new_fpl->max = new_fpl->count;
> > diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> > index 0391f66546a6..ea7bac18a781 100644
> > --- a/net/unix/af_unix.c
> > +++ b/net/unix/af_unix.c
> > @@ -1956,8 +1956,10 @@ static void scm_stat_add(struct sock *sk, struct sk_buff *skb)
> >  	struct scm_fp_list *fp = UNIXCB(skb).fp;
> >  	struct unix_sock *u = unix_sk(sk);
> >  
> > -	if (unlikely(fp && fp->count))
> > +	if (unlikely(fp && fp->count)) {
> >  		atomic_add(fp->count, &u->scm_stat.nr_fds);
> > +		unix_add_edges(fp, u);
> > +	}
> >  }
> >  
> >  static void scm_stat_del(struct sock *sk, struct sk_buff *skb)
> > @@ -1965,8 +1967,10 @@ static void scm_stat_del(struct sock *sk, struct sk_buff *skb)
> >  	struct scm_fp_list *fp = UNIXCB(skb).fp;
> >  	struct unix_sock *u = unix_sk(sk);
> >  
> > -	if (unlikely(fp && fp->count))
> > +	if (unlikely(fp && fp->count)) {
> >  		atomic_sub(fp->count, &u->scm_stat.nr_fds);
> > +		unix_del_edges(fp);
> > +	}
> >  }
> >  
> >  /*
> > diff --git a/net/unix/garbage.c b/net/unix/garbage.c
> > index ec998c7d6b4c..353416f38738 100644
> > --- a/net/unix/garbage.c
> > +++ b/net/unix/garbage.c
> > @@ -109,6 +109,57 @@ void unix_init_vertex(struct unix_sock *u)
> >  	INIT_LIST_HEAD(&vertex->edges);
> >  }
> >  
> > +DEFINE_SPINLOCK(unix_gc_lock);
> > +static LIST_HEAD(unix_unvisited_vertices);
> > +
> > +void unix_add_edges(struct scm_fp_list *fpl, struct unix_sock *receiver)
> > +{
> > +	int i = 0, j = 0;
> > +
> > +	spin_lock(&unix_gc_lock);
> > +
> > +	while (i < fpl->count_unix) {
> > +		struct unix_sock *inflight = unix_get_socket(fpl->fp[j++]);
> > +		struct unix_edge *edge;
> > +
> > +		if (!inflight)
> > +			continue;
> > +
> > +		edge = fpl->edges + i++;
> > +		edge->predecessor = &inflight->vertex;
> > +		edge->successor = &receiver->vertex;
> > +
> > +		if (!edge->predecessor->out_degree++)
> > +			list_add_tail(&edge->predecessor->entry, &unix_unvisited_vertices);
> > +
> > +		list_add_tail(&edge->entry, &edge->predecessor->edges);
> 
> Note that I confusingly replied to the previous revision of this patch,
> but I think the points still stand.
> 
> 		INIT_LIST_HEAD(&edge->entry);	
> 
> disappeared from the above, but I can't find where it landed?!? 

Sorry, I forgot to mention this change in the coverletter.

Initially, I placed INIT_LIST_HEAD() in unix_alloc_edges(), but in v1,
I moved it to unix_add_edges(), and later I noticed it's unnecessary
in the first place because edge->entry is not used as head of a list,
so I removed it completely in v2.


From: Paolo Abeni <pabeni@redhat.com>
Date: Tue, 20 Feb 2024 13:06:18 +0100
> Here  'edge->predecessor->entry' and 'edge->entry' refer to different
> object types right ? edge vs vertices. Perhaps using different field
> names could clarify the code a bit? 

Regarding the name of edge->entry, I agree a diffrent name would be
easy to understand.  I'll rename it to edge->vertex_entry unless there
is a better name :)

Thanks!

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 net-next 09/14] af_unix: Skip GC if no cycle exists.
  2024-02-20 12:56   ` Paolo Abeni
@ 2024-02-20 17:47     ` Kuniyuki Iwashima
  0 siblings, 0 replies; 22+ messages in thread
From: Kuniyuki Iwashima @ 2024-02-20 17:47 UTC (permalink / raw
  To: pabeni; +Cc: davem, edumazet, kuba, kuni1840, kuniyu, netdev

From: Paolo Abeni <pabeni@redhat.com>
Date: Tue, 20 Feb 2024 13:56:13 +0100
> On Fri, 2024-02-16 at 13:05 -0800, Kuniyuki Iwashima wrote:
> > diff --git a/include/net/af_unix.h b/include/net/af_unix.h
> > index b3ba5e949d62..59ec8d7880ce 100644
> > --- a/include/net/af_unix.h
> > +++ b/include/net/af_unix.h
> > @@ -36,6 +36,7 @@ struct unix_vertex {
> >  	struct list_head entry;
> >  	struct list_head scc_entry;
> >  	unsigned long out_degree;
> > +	unsigned long self_degree;
> >  	unsigned long index;
> >  	unsigned long lowlink;
> >  };
> 
> The series increases considerably struct unix_sock size. Have you
> considered dynamically allocating unix_vertex only when using
> SCM_RIGTHS?

Ah good point.
Will try to do that in v3.

Thanks!

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 net-next 03/14] af_unix: Link struct unix_edge when queuing skb.
  2024-02-20 17:44     ` Kuniyuki Iwashima
@ 2024-02-20 17:53       ` Paolo Abeni
  0 siblings, 0 replies; 22+ messages in thread
From: Paolo Abeni @ 2024-02-20 17:53 UTC (permalink / raw
  To: Kuniyuki Iwashima; +Cc: davem, edumazet, kuba, kuni1840, netdev

On Tue, 2024-02-20 at 09:44 -0800, Kuniyuki Iwashima wrote:
> From: Paolo Abeni <pabeni@redhat.com>
> Date: Tue, 20 Feb 2024 13:06:18 +0100
> > Here  'edge->predecessor->entry' and 'edge->entry' refer to different
> > object types right ? edge vs vertices. Perhaps using different field
> > names could clarify the code a bit? 
> 
> Regarding the name of edge->entry, I agree a diffrent name would be
> easy to understand.  I'll rename it to edge->vertex_entry unless there
> is a better name :)

[side note: thanks for copying with my reply mistake!] 

vertex_entry WFM! but please note that I'm quite bad ad peeking
variables names ;)

Cheers,

Paolo


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2024-02-20 17:53 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-16 21:05 [PATCH v2 net-next 00/14] af_unix: Rework GC Kuniyuki Iwashima
2024-02-16 21:05 ` [PATCH v2 net-next 01/14] af_unix: Add struct unix_vertex in struct unix_sock Kuniyuki Iwashima
2024-02-16 21:05 ` [PATCH v2 net-next 02/14] af_unix: Allocate struct unix_edge for each inflight AF_UNIX fd Kuniyuki Iwashima
2024-02-16 21:05 ` [PATCH v2 net-next 03/14] af_unix: Link struct unix_edge when queuing skb Kuniyuki Iwashima
2024-02-20 12:11   ` Paolo Abeni
2024-02-20 17:44     ` Kuniyuki Iwashima
2024-02-20 17:53       ` Paolo Abeni
2024-02-16 21:05 ` [PATCH v2 net-next 04/14] af_unix: Save listener for embryo socket Kuniyuki Iwashima
2024-02-16 21:05 ` [PATCH v2 net-next 05/14] af_unix: Fix up unix_edge.successor " Kuniyuki Iwashima
2024-02-16 21:05 ` [PATCH v2 net-next 06/14] af_unix: Bulk update unix_tot_inflight/unix_inflight when queuing skb Kuniyuki Iwashima
2024-02-16 21:05 ` [PATCH v2 net-next 07/14] af_unix: Detect Strongly Connected Components Kuniyuki Iwashima
2024-02-16 21:05 ` [PATCH v2 net-next 08/14] af_unix: Save O(n) setup of Tarjan's algo Kuniyuki Iwashima
2024-02-16 21:05 ` [PATCH v2 net-next 09/14] af_unix: Skip GC if no cycle exists Kuniyuki Iwashima
2024-02-20 12:56   ` Paolo Abeni
2024-02-20 17:47     ` Kuniyuki Iwashima
2024-02-16 21:05 ` [PATCH v2 net-next 10/14] af_unix: Avoid Tarjan's algorithm if unnecessary Kuniyuki Iwashima
2024-02-16 21:05 ` [PATCH v2 net-next 11/14] af_unix: Assign a unique index to SCC Kuniyuki Iwashima
2024-02-16 21:05 ` [PATCH v2 net-next 12/14] af_unix: Detect dead SCC Kuniyuki Iwashima
2024-02-16 21:05 ` [PATCH v2 net-next 13/14] af_unix: Replace garbage collection algorithm Kuniyuki Iwashima
2024-02-16 21:05 ` [PATCH v2 net-next 14/14] selftest: af_unix: Test GC for SCM_RIGHTS Kuniyuki Iwashima
2024-02-16 21:28 ` [PATCH v2 net-next 00/14] af_unix: Rework GC Eric Dumazet
2024-02-16 21:42   ` Kuniyuki Iwashima

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).