[PATCH net-next v3 00/24] Introducing OpenVPN Data Channel Offload

Netdev Archive mirror
 help / color / mirror / Atom feed

* [PATCH net-next v3 00/24] Introducing OpenVPN Data Channel Offload
@ 2024-05-06  1:16 Antonio Quartulli
  2024-05-06  1:16 ` [PATCH net-next v3 01/24] netlink: add NLA_POLICY_MAX_LEN macro Antonio Quartulli
                   ` (24 more replies)
  0 siblings, 25 replies; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-06  1:16 UTC (permalink / raw
  To: netdev
  Cc: Jakub Kicinski, Sergey Ryazanov, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Esben Haabendal, Antonio Quartulli

Hi all!

I am finally back with version 3 of the ovpn patchset.
It took a while to address all comments I have received on v2, but I
am happy to say that I addressed 99% of the feedback I collected.

The 1% I did not make yet is using BQL for handling the packets queue.

Although such change looks pretty simple in terms of code, I need to
spend some more time understanding the concept behind and therefore
I decided to postpone this change to the (near) future in order to not
slow down the whole review/merge process.

Major changes from v2 are:
* added YAML documentation for the netlink uAPI
** uapi/linnu/ovpn.h, driners/net/ovpn/netlink-gen.{c,h} are now self
   generated by the tools/net/ynl/ynl-regen.sh script
* the first patch now also modifies the ynl script to account for the
  new MAX_LEN() policy macro
* added more doxygen documentation
* added kselftest unit for ovpn in tools/testing/selftest/ovpn with
  some basic tests
* fixed various typ0s in documentation
* moved includes of local headers last
* wrapped code at 80 chars
* rearranged includes a bit to reduce double inclusions
* set default ifname to ovpn%d and allowed users to not specify any
* now sending reply to NEW_IFACE NL command containing actual new ifname
* used GENL_REQ_ATTR_CHECK() when possible
* turned carrier off in iface create function
* turned carrier on in open function and clearly explain why we keep it
  always on (new patch)
* left ethtool info ->version empty
* removed internal driver version
* checked return value of alloc_netdev
* renamed _lookup() functions to _get()
* removed memset-zero from init function as netdev is already zero'd
* added missing TCP component initialization in ovpn_init
* .. included various small fixes as requested by reviewers

The latest code can also be found at:

https://github.com/OpenVPN/linux-kernel-ovpn

Thanks to the new kunitest component, it is now pssible to run
basic ovpn tests. Peers are emulated by using multiple network
namespaces which are interconnected by means of veth pairs.

Please note that patches have been split for easier review, but if
required, I can send a long 1/1 with all courses and dishes in one go :)

Thanks so far!

Below is the original description posted with the first patchest:
===================================================================

`ovpn` is essentialy a device driver that allows creating a virtual
network interface to handle the OpenVPN data channel. Any traffic
entering the interface is encrypted, encapsulated and sent to the
appropriate destination.

`ovpn` requires OpenVPN in userspace
to run along its side in order to be properly configured and maintained
during its life cycle.

The `ovpn` interface can be created/destroyed and then
configured via Netlink API.

Specifically OpenVPN in userspace will:
* create the `ovpn` interface
* establish the connection with one or more peers
* perform TLS handshake and negotiate any protocol parameter
* configure the `ovpn` interface with peer data (ip/port, keys, etc.)
* handle any subsequent control channel communication

I'd like to point out the control channel is fully handles in userspace.
The idea is to keep the `ovpn` kernel module as simple as possible and
let userspace handle all the non-data (non-fast-path) features.

NOTE: some of you may already know `ovpn-dco` the out-of-tree predecessor
of `ovpn`. However, be aware that the two are not API compatible and
therefore OpenVPN 2.6 will not work with this new `ovpn` module.
More adjustments are required.

For more technical details please refer to the actual patches.

Any comment, concern or statement will be appreciated!
Thanks a lot!!

Best Regards,

Antonio Quartulli
OpenVPN Inc.

======================

Antonio Quartulli (24):
  netlink: add NLA_POLICY_MAX_LEN macro
  net: introduce OpenVPN Data Channel Offload (ovpn)
  ovpn: add basic netlink support
  ovpn: add basic interface creation/destruction/management routines
  ovpn: implement interface creation/destruction via netlink
  ovpn: keep carrier always on
  ovpn: introduce the ovpn_peer object
  ovpn: introduce the ovpn_socket object
  ovpn: implement basic TX path (UDP)
  ovpn: implement basic RX path (UDP)
  ovpn: implement packet processing
  ovpn: store tunnel and transport statistics
  ovpn: implement TCP transport
  ovpn: implement multi-peer support
  ovpn: implement peer lookup logic
  ovpn: implement keepalive mechanism
  ovpn: add support for updating local UDP endpoint
  ovpn: add support for peer floating
  ovpn: implement peer add/dump/delete via netlink
  ovpn: implement key add/del/swap via netlink
  ovpn: kill key and notify userspace in case of IV exhaustion
  ovpn: notify userspace when a peer is deleted
  ovpn: add basic ethtool support
  testing/selftest: add test tool and scripts for ovpn module

 Documentation/netlink/specs/ovpn.yaml      |  331 ++++
 MAINTAINERS                                |    8 +
 drivers/net/Kconfig                        |   13 +
 drivers/net/Makefile                       |    1 +
 drivers/net/ovpn/Makefile                  |   22 +
 drivers/net/ovpn/bind.c                    |   61 +
 drivers/net/ovpn/bind.h                    |  130 ++
 drivers/net/ovpn/crypto.c                  |  162 ++
 drivers/net/ovpn/crypto.h                  |  138 ++
 drivers/net/ovpn/crypto_aead.c             |  378 +++++
 drivers/net/ovpn/crypto_aead.h             |   30 +
 drivers/net/ovpn/io.c                      |  566 +++++++
 drivers/net/ovpn/io.h                      |   35 +
 drivers/net/ovpn/main.c                    |  320 ++++
 drivers/net/ovpn/main.h                    |   56 +
 drivers/net/ovpn/netlink-gen.c             |  206 +++
 drivers/net/ovpn/netlink-gen.h             |   41 +
 drivers/net/ovpn/netlink.c                 |  993 ++++++++++++
 drivers/net/ovpn/netlink.h                 |   46 +
 drivers/net/ovpn/ovpnstruct.h              |   48 +
 drivers/net/ovpn/packet.h                  |   40 +
 drivers/net/ovpn/peer.c                    | 1077 +++++++++++++
 drivers/net/ovpn/peer.h                    |  303 ++++
 drivers/net/ovpn/pktid.c                   |  132 ++
 drivers/net/ovpn/pktid.h                   |   85 +
 drivers/net/ovpn/proto.h                   |  115 ++
 drivers/net/ovpn/skb.h                     |   51 +
 drivers/net/ovpn/socket.c                  |  150 ++
 drivers/net/ovpn/socket.h                  |   81 +
 drivers/net/ovpn/stats.c                   |   21 +
 drivers/net/ovpn/stats.h                   |   52 +
 drivers/net/ovpn/tcp.c                     |  511 ++++++
 drivers/net/ovpn/tcp.h                     |   42 +
 drivers/net/ovpn/udp.c                     |  393 +++++
 drivers/net/ovpn/udp.h                     |   47 +
 include/net/netlink.h                      |    1 +
 include/uapi/linux/ovpn.h                  |  109 ++
 include/uapi/linux/udp.h                   |    1 +
 tools/net/ynl/ynl-gen-c.py                 |    2 +
 tools/testing/selftests/Makefile           |    1 +
 tools/testing/selftests/ovpn/Makefile      |   15 +
 tools/testing/selftests/ovpn/config        |    8 +
 tools/testing/selftests/ovpn/data64.key    |    5 +
 tools/testing/selftests/ovpn/float-test.sh |  113 ++
 tools/testing/selftests/ovpn/netns-test.sh |  118 ++
 tools/testing/selftests/ovpn/ovpn-cli.c    | 1640 ++++++++++++++++++++
 tools/testing/selftests/ovpn/run.sh        |   12 +
 tools/testing/selftests/ovpn/tcp_peers.txt |    1 +
 tools/testing/selftests/ovpn/udp_peers.txt |    5 +
 49 files changed, 8716 insertions(+)
 create mode 100644 Documentation/netlink/specs/ovpn.yaml
 create mode 100644 drivers/net/ovpn/Makefile
 create mode 100644 drivers/net/ovpn/bind.c
 create mode 100644 drivers/net/ovpn/bind.h
 create mode 100644 drivers/net/ovpn/crypto.c
 create mode 100644 drivers/net/ovpn/crypto.h
 create mode 100644 drivers/net/ovpn/crypto_aead.c
 create mode 100644 drivers/net/ovpn/crypto_aead.h
 create mode 100644 drivers/net/ovpn/io.c
 create mode 100644 drivers/net/ovpn/io.h
 create mode 100644 drivers/net/ovpn/main.c
 create mode 100644 drivers/net/ovpn/main.h
 create mode 100644 drivers/net/ovpn/netlink-gen.c
 create mode 100644 drivers/net/ovpn/netlink-gen.h
 create mode 100644 drivers/net/ovpn/netlink.c
 create mode 100644 drivers/net/ovpn/netlink.h
 create mode 100644 drivers/net/ovpn/ovpnstruct.h
 create mode 100644 drivers/net/ovpn/packet.h
 create mode 100644 drivers/net/ovpn/peer.c
 create mode 100644 drivers/net/ovpn/peer.h
 create mode 100644 drivers/net/ovpn/pktid.c
 create mode 100644 drivers/net/ovpn/pktid.h
 create mode 100644 drivers/net/ovpn/proto.h
 create mode 100644 drivers/net/ovpn/skb.h
 create mode 100644 drivers/net/ovpn/socket.c
 create mode 100644 drivers/net/ovpn/socket.h
 create mode 100644 drivers/net/ovpn/stats.c
 create mode 100644 drivers/net/ovpn/stats.h
 create mode 100644 drivers/net/ovpn/tcp.c
 create mode 100644 drivers/net/ovpn/tcp.h
 create mode 100644 drivers/net/ovpn/udp.c
 create mode 100644 drivers/net/ovpn/udp.h
 create mode 100644 include/uapi/linux/ovpn.h
 create mode 100644 tools/testing/selftests/ovpn/Makefile
 create mode 100644 tools/testing/selftests/ovpn/config
 create mode 100644 tools/testing/selftests/ovpn/data64.key
 create mode 100644 tools/testing/selftests/ovpn/float-test.sh
 create mode 100644 tools/testing/selftests/ovpn/netns-test.sh
 create mode 100644 tools/testing/selftests/ovpn/ovpn-cli.c
 create mode 100644 tools/testing/selftests/ovpn/run.sh
 create mode 100644 tools/testing/selftests/ovpn/tcp_peers.txt
 create mode 100644 tools/testing/selftests/ovpn/udp_peers.txt

-- 
2.43.2

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH net-next v3 01/24] netlink: add NLA_POLICY_MAX_LEN macro
  2024-05-06  1:16 [PATCH net-next v3 00/24] Introducing OpenVPN Data Channel Offload Antonio Quartulli
@ 2024-05-06  1:16 ` Antonio Quartulli
  2024-05-06  1:16 ` [PATCH net-next v3 02/24] net: introduce OpenVPN Data Channel Offload (ovpn) Antonio Quartulli
                   ` (23 subsequent siblings)
  24 siblings, 0 replies; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-06  1:16 UTC (permalink / raw
  To: netdev
  Cc: Jakub Kicinski, Sergey Ryazanov, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Esben Haabendal, Antonio Quartulli

Similarly to NLA_POLICY_MIN_LEN, NLA_POLICY_MAX_LEN defines a policy
with a maximum length value.

The netlink generator for YAML specs has been extended accordingly.

Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
---
 include/net/netlink.h      | 1 +
 tools/net/ynl/ynl-gen-c.py | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/include/net/netlink.h b/include/net/netlink.h
index 61cef3bd2d31..24b23547b0af 100644
--- a/include/net/netlink.h
+++ b/include/net/netlink.h
@@ -469,6 +469,7 @@ struct nla_policy {
 	.max = _len						\
 }
 #define NLA_POLICY_MIN_LEN(_len)	NLA_POLICY_MIN(NLA_BINARY, _len)
+#define NLA_POLICY_MAX_LEN(_len)	NLA_POLICY_MAX(NLA_BINARY, _len)
 
 /**
  * struct nl_info - netlink source information
diff --git a/tools/net/ynl/ynl-gen-c.py b/tools/net/ynl/ynl-gen-c.py
index c0b90c104d92..dd60c51617fd 100755
--- a/tools/net/ynl/ynl-gen-c.py
+++ b/tools/net/ynl/ynl-gen-c.py
@@ -466,6 +466,8 @@ class TypeBinary(Type):
     def _attr_policy(self, policy):
         if 'exact-len' in self.checks:
             mem = 'NLA_POLICY_EXACT_LEN(' + str(self.checks['exact-len']) + ')'
+        elif 'max-len' in self.checks:
+            mem = 'NLA_POLICY_MAX_LEN(' + str(self.checks['max-len']) + ')'
         else:
             mem = '{ '
             if len(self.checks) == 1 and 'min-len' in self.checks:
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH net-next v3 02/24] net: introduce OpenVPN Data Channel Offload (ovpn)
  2024-05-06  1:16 [PATCH net-next v3 00/24] Introducing OpenVPN Data Channel Offload Antonio Quartulli
  2024-05-06  1:16 ` [PATCH net-next v3 01/24] netlink: add NLA_POLICY_MAX_LEN macro Antonio Quartulli
@ 2024-05-06  1:16 ` Antonio Quartulli
  2024-05-06  1:16 ` [PATCH net-next v3 03/24] ovpn: add basic netlink support Antonio Quartulli
                   ` (22 subsequent siblings)
  24 siblings, 0 replies; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-06  1:16 UTC (permalink / raw
  To: netdev
  Cc: Jakub Kicinski, Sergey Ryazanov, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Esben Haabendal, Antonio Quartulli

OpenVPN is a userspace software existing since around 2005 that allows
users to create secure tunnels.

So far OpenVPN has implemented all operations in userspace, which
implies several back and forth between kernel and user land in order to
process packets (encapsulate/decapsulate, encrypt/decrypt, rerouting..).

With `ovpn` we intend to move the fast path (data channel) entirely
in kernel space and thus improve user measured throughput over the
tunnel.

`ovpn` is implemented as a simple virtual network device driver, that
can be manipulated by means of the standard RTNL APIs. A device of kind
`ovpn` allows only IPv4/6 traffic and can be of type:
* P2P (peer-to-peer): any packet sent over the interface will be
  encapsulated and transmitted to the other side (typical OpenVPN
  client or peer-to-peer behaviour);
* P2MP (point-to-multipoint): packets sent over the interface are
  transmitted to peers based on existing routes (typical OpenVPN
  server behaviour).

After the interface has been created, OpenVPN in userspace can
configure it using a new Netlink API. Specifically it is possible
to manage peers and their keys.

The OpenVPN control channel is multiplexed over the same transport
socket by means of OP codes. Anything that is not DATA_V2 (OpenVPN
OP code for data traffic) is sent to userspace and handled there.
This way the `ovpn` codebase is kept as compact as possible while
focusing on handling data traffic only (fast path).

Any OpenVPN control feature (like cipher negotiation, TLS handshake,
rekeying, etc.) is still fully handled by the userspace process.

When userspace establishes a new connection with a peer, it first
performs the handshake and then passes the socket to the `ovpn` kernel
module, which takes ownership. From this moment on `ovpn` will handle
data traffic for the new peer.
When control packets are received on the link, they are forwarded to
userspace through the same transport socket they were received on, as
userspace is still listening to them.

Some events (like peer deletion) are sent to a Netlink multicast group.

Although it wasn't easy to convince the community, `ovpn` implements
only a limited number of the data-channel features supported by the
userspace program.

Each feature that made it to `ovpn` was attentively vetted to
avoid carrying too much legacy along with us (and to give a clear cut to
old and probalby-not-so-useful features).

Notably, only encryption using AEAD ciphers (specifically
ChaCha20Poly1305 and AES-GCM) was implemented. Supporting any other
cipher out there was not deemed useful.

Both UDP and TCP sockets ae supported.

As explained above, in case of P2MP mode, OpenVPN will use the main system
routing table to decide which packet goes to which peer. This implies
that no routing table was re-implemented in the `ovpn` kernel module.

This kernel module can be enabled by selecting the CONFIG_OVPN entry
in the networking drivers section.

NOTE: this first patch introduces the very basic framework only.
Features are then added patch by patch, however, although each patch
will compile and possibly not break at runtime, only after having
applied the full set it is expected to see the ovpn module fully working.

Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
---
 MAINTAINERS               |  8 ++++
 drivers/net/Kconfig       | 13 ++++++
 drivers/net/Makefile      |  1 +
 drivers/net/ovpn/Makefile | 11 ++++++
 drivers/net/ovpn/io.c     | 22 +++++++++++
 drivers/net/ovpn/io.h     | 15 +++++++
 drivers/net/ovpn/main.c   | 83 +++++++++++++++++++++++++++++++++++++++
 drivers/net/ovpn/main.h   | 21 ++++++++++
 include/uapi/linux/udp.h  |  1 +
 9 files changed, 175 insertions(+)
 create mode 100644 drivers/net/ovpn/Makefile
 create mode 100644 drivers/net/ovpn/io.c
 create mode 100644 drivers/net/ovpn/io.h
 create mode 100644 drivers/net/ovpn/main.c
 create mode 100644 drivers/net/ovpn/main.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 294e472d7de8..5de52e983e86 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -16660,6 +16660,14 @@ T:	git git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs.git
 F:	Documentation/filesystems/overlayfs.rst
 F:	fs/overlayfs/
 
+OPENVPN DATA CHANNEL OFFLOAD
+M:	Antonio Quartulli <antonio@openvpn.net>
+L:	openvpn-devel@lists.sourceforge.net (moderated for non-subscribers)
+L:	netdev@vger.kernel.org
+S:	Maintained
+F:	drivers/net/ovpn/
+F:	include/uapi/linux/ovpn.h
+
 P54 WIRELESS DRIVER
 M:	Christian Lamparter <chunkeey@googlemail.com>
 L:	linux-wireless@vger.kernel.org
diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 9920b3a68ed1..c5743288242d 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -115,6 +115,19 @@ config WIREGUARD_DEBUG
 
 	  Say N here unless you know what you're doing.
 
+config OVPN
+	tristate "OpenVPN data channel offload"
+	depends on NET && INET
+	select NET_UDP_TUNNEL
+	select DST_CACHE
+	select CRYPTO
+	select CRYPTO_AES
+	select CRYPTO_GCM
+	select CRYPTO_CHACHA20POLY1305
+	help
+	  This module enhances the performance of the OpenVPN userspace software
+	  by offloading the data channel processing to kernelspace.
+
 config EQUALIZER
 	tristate "EQL (serial line load balancing) support"
 	help
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 9c053673d6b2..4981cb7ffc03 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -11,6 +11,7 @@ obj-$(CONFIG_IPVLAN) += ipvlan/
 obj-$(CONFIG_IPVTAP) += ipvlan/
 obj-$(CONFIG_DUMMY) += dummy.o
 obj-$(CONFIG_WIREGUARD) += wireguard/
+obj-$(CONFIG_OVPN) += ovpn/
 obj-$(CONFIG_EQUALIZER) += eql.o
 obj-$(CONFIG_IFB) += ifb.o
 obj-$(CONFIG_MACSEC) += macsec.o
diff --git a/drivers/net/ovpn/Makefile b/drivers/net/ovpn/Makefile
new file mode 100644
index 000000000000..53fb197027d7
--- /dev/null
+++ b/drivers/net/ovpn/Makefile
@@ -0,0 +1,11 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# ovpn -- OpenVPN data channel offload in kernel space
+#
+# Copyright (C) 2020-2024 OpenVPN, Inc.
+#
+# Author:	Antonio Quartulli <antonio@openvpn.net>
+
+obj-$(CONFIG_OVPN) := ovpn.o
+ovpn-y += main.o
+ovpn-y += io.o
diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c
new file mode 100644
index 000000000000..ad3813419c33
--- /dev/null
+++ b/drivers/net/ovpn/io.c
@@ -0,0 +1,22 @@
+// SPDX-License-Identifier: GPL-2.0
+/*  OpenVPN data channel offload
+ *
+ *  Copyright (C) 2019-2024 OpenVPN, Inc.
+ *
+ *  Author:	James Yonan <james@openvpn.net>
+ *		Antonio Quartulli <antonio@openvpn.net>
+ */
+
+#include <linux/netdevice.h>
+#include <linux/skbuff.h>
+
+#include "io.h"
+
+/* Send user data to the network
+ */
+netdev_tx_t ovpn_net_xmit(struct sk_buff *skb, struct net_device *dev)
+{
+	skb_tx_error(skb);
+	kfree_skb(skb);
+	return NET_XMIT_DROP;
+}
diff --git a/drivers/net/ovpn/io.h b/drivers/net/ovpn/io.h
new file mode 100644
index 000000000000..aa259be66441
--- /dev/null
+++ b/drivers/net/ovpn/io.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* OpenVPN data channel offload
+ *
+ *  Copyright (C) 2019-2024 OpenVPN, Inc.
+ *
+ *  Author:	James Yonan <james@openvpn.net>
+ *		Antonio Quartulli <antonio@openvpn.net>
+ */
+
+#ifndef _NET_OVPN_OVPN_H_
+#define _NET_OVPN_OVPN_H_
+
+netdev_tx_t ovpn_net_xmit(struct sk_buff *skb, struct net_device *dev);
+
+#endif /* _NET_OVPN_OVPN_H_ */
diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c
new file mode 100644
index 000000000000..47d9ed0d9ff0
--- /dev/null
+++ b/drivers/net/ovpn/main.c
@@ -0,0 +1,83 @@
+// SPDX-License-Identifier: GPL-2.0
+/*  OpenVPN data channel offload
+ *
+ *  Copyright (C) 2020-2024 OpenVPN, Inc.
+ *
+ *  Author:	Antonio Quartulli <antonio@openvpn.net>
+ *		James Yonan <james@openvpn.net>
+ */
+
+#include <linux/module.h>
+#include <linux/netdevice.h>
+#include <linux/version.h>
+
+#include "main.h"
+#include "io.h"
+
+/* Driver info */
+#define DRV_DESCRIPTION	"OpenVPN data channel offload (ovpn)"
+#define DRV_COPYRIGHT	"(C) 2020-2024 OpenVPN, Inc."
+
+bool ovpn_dev_is_valid(const struct net_device *dev)
+{
+	return dev->netdev_ops->ndo_start_xmit == ovpn_net_xmit;
+}
+
+static int ovpn_netdev_notifier_call(struct notifier_block *nb,
+				     unsigned long state, void *ptr)
+{
+	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
+
+	if (!ovpn_dev_is_valid(dev))
+		return NOTIFY_DONE;
+
+	switch (state) {
+	case NETDEV_REGISTER:
+		/* add device to internal list for later destruction upon
+		 * unregistration
+		 */
+		break;
+	case NETDEV_UNREGISTER:
+		/* can be delivered multiple times, so check registered flag,
+		 * then destroy the interface
+		 */
+		break;
+	case NETDEV_POST_INIT:
+	case NETDEV_GOING_DOWN:
+	case NETDEV_DOWN:
+	case NETDEV_UP:
+	case NETDEV_PRE_UP:
+	default:
+		return NOTIFY_DONE;
+	}
+
+	return NOTIFY_OK;
+}
+
+static struct notifier_block ovpn_netdev_notifier = {
+	.notifier_call = ovpn_netdev_notifier_call,
+};
+
+static int __init ovpn_init(void)
+{
+	int err = register_netdevice_notifier(&ovpn_netdev_notifier);
+
+	if (err) {
+		pr_err("ovpn: can't register netdevice notifier: %d\n", err);
+		return err;
+	}
+
+	return 0;
+}
+
+static __exit void ovpn_cleanup(void)
+{
+	unregister_netdevice_notifier(&ovpn_netdev_notifier);
+}
+
+module_init(ovpn_init);
+module_exit(ovpn_cleanup);
+
+MODULE_DESCRIPTION(DRV_DESCRIPTION);
+MODULE_AUTHOR(DRV_COPYRIGHT);
+MODULE_LICENSE("GPL");
diff --git a/drivers/net/ovpn/main.h b/drivers/net/ovpn/main.h
new file mode 100644
index 000000000000..380adb593d0c
--- /dev/null
+++ b/drivers/net/ovpn/main.h
@@ -0,0 +1,21 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*  OpenVPN data channel offload
+ *
+ *  Copyright (C) 2019-2024 OpenVPN, Inc.
+ *
+ *  Author:	James Yonan <james@openvpn.net>
+ *		Antonio Quartulli <antonio@openvpn.net>
+ */
+
+#ifndef _NET_OVPN_MAIN_H_
+#define _NET_OVPN_MAIN_H_
+
+/**
+ * ovpn_dev_is_valid - check if the netdevice is of type 'ovpn'
+ * @dev: the interface to check
+ *
+ * Return: whether the netdevice is of type 'ovpn'
+ */
+bool ovpn_dev_is_valid(const struct net_device *dev);
+
+#endif /* _NET_OVPN_MAIN_H_ */
diff --git a/include/uapi/linux/udp.h b/include/uapi/linux/udp.h
index 4828794efcf8..0dd94757127f 100644
--- a/include/uapi/linux/udp.h
+++ b/include/uapi/linux/udp.h
@@ -43,5 +43,6 @@ struct udphdr {
 #define UDP_ENCAP_GTP1U		5 /* 3GPP TS 29.060 */
 #define UDP_ENCAP_RXRPC		6
 #define TCP_ENCAP_ESPINTCP	7 /* Yikes, this is really xfrm encap types. */
+#define UDP_ENCAP_OVPNINUDP	8 /* OpenVPN traffic */
 
 #endif /* _UAPI_LINUX_UDP_H */
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH net-next v3 03/24] ovpn: add basic netlink support
  2024-05-06  1:16 [PATCH net-next v3 00/24] Introducing OpenVPN Data Channel Offload Antonio Quartulli
  2024-05-06  1:16 ` [PATCH net-next v3 01/24] netlink: add NLA_POLICY_MAX_LEN macro Antonio Quartulli
  2024-05-06  1:16 ` [PATCH net-next v3 02/24] net: introduce OpenVPN Data Channel Offload (ovpn) Antonio Quartulli
@ 2024-05-06  1:16 ` Antonio Quartulli
  2024-05-08  0:10   ` Jakub Kicinski
  2024-05-08 14:42   ` Sabrina Dubroca
  2024-05-06  1:16 ` [PATCH net-next v3 04/24] ovpn: add basic interface creation/destruction/management routines Antonio Quartulli
                   ` (21 subsequent siblings)
  24 siblings, 2 replies; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-06  1:16 UTC (permalink / raw
  To: netdev
  Cc: Jakub Kicinski, Sergey Ryazanov, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Esben Haabendal, Antonio Quartulli

This commit introduces basic netlink support with family
registration/unregistration functionalities and stub pre/post-doit.

More importantly it introduces the YAML uAPI description along
with its auto-generated files:
- include/uapi/linux/ovpn.h
- drivers/net/ovpn/netlink-gen.c
- drivers/net/ovpn/netlink-gen.h

Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
---
 Documentation/netlink/specs/ovpn.yaml | 331 ++++++++++++++++++++++++++
 drivers/net/ovpn/Makefile             |   2 +
 drivers/net/ovpn/main.c               |  15 ++
 drivers/net/ovpn/netlink-gen.c        | 206 ++++++++++++++++
 drivers/net/ovpn/netlink-gen.h        |  41 ++++
 drivers/net/ovpn/netlink.c            | 154 ++++++++++++
 drivers/net/ovpn/netlink.h            |  30 +++
 drivers/net/ovpn/ovpnstruct.h         |  21 ++
 include/uapi/linux/ovpn.h             | 109 +++++++++
 9 files changed, 909 insertions(+)
 create mode 100644 Documentation/netlink/specs/ovpn.yaml
 create mode 100644 drivers/net/ovpn/netlink-gen.c
 create mode 100644 drivers/net/ovpn/netlink-gen.h
 create mode 100644 drivers/net/ovpn/netlink.c
 create mode 100644 drivers/net/ovpn/netlink.h
 create mode 100644 drivers/net/ovpn/ovpnstruct.h
 create mode 100644 include/uapi/linux/ovpn.h

diff --git a/Documentation/netlink/specs/ovpn.yaml b/Documentation/netlink/specs/ovpn.yaml
new file mode 100644
index 000000000000..aa474250ed6c
--- /dev/null
+++ b/Documentation/netlink/specs/ovpn.yaml
@@ -0,0 +1,331 @@
+# SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)
+#
+# Author: Antonio Quartulli <antonio@openvpn.net>
+#
+# Copyright (c) 2024, OpenVPN Inc.
+#
+
+name: ovpn
+
+protocol: genetlink
+
+doc: Netlink protocol to control OpenVPN network devices
+
+definitions:
+  -
+    type: const
+    name: nonce_tail_size
+    value: 8
+  -
+    type: enum
+    name: cipher_alg
+    value-start: 0
+    entries: [ none, aes_gcm, chacha20_poly1305 ]
+  -
+    type: enum
+    name: del_peer_reason
+    value-start: 0
+    entries: [ teardown, userspace, expired, transport_error, transport_disconnect ]
+  -
+    type: enum
+    name: key_slot
+    value-start: 0
+    entries: [ primary, secondary ]
+  -
+    type: enum
+    name: mode
+    value-start: 0
+    entries: [ p2p, mp ]
+
+attribute-sets:
+  -
+    name: peer
+    attributes:
+      -
+        name: id
+        type: u32
+        doc: |
+          The unique Id of the peer. To be used to identify peers during
+          operations
+        checks:
+          max: 0xFFFFFF
+      -
+        name: sockaddr_remote
+        type: binary
+        doc: |
+          The sockaddr_in/in6 object identifying the remote address/port of the
+          peer
+      -
+        name: socket
+        type: u32
+        doc: The socket to be used to communicate with the peer
+      -
+        name: vpn_ipv4
+        type: u32
+        doc: The IPv4 assigned to the peer by the server
+        display-hint: ipv4
+      -
+        name: vpn_ipv6
+        type: binary
+        doc: The IPv6 assigned to the peer by the server
+        display-hint: ipv6
+        checks:
+          exact-len: 16
+      -
+        name: local_ip
+        type: binary
+        doc: The local IP to be used to send packets to the peer (UDP only)
+        checks:
+          max-len: 16
+      -
+        name: local_port
+        type: u32
+        doc: The local port to be used to send packets to the peer (UDP only)
+        checks:
+          min: 1
+          max: u16-max
+      -
+        name: keepalive_interval
+        type: u32
+        doc: |
+          The number of seconds after which a keep alive message is sent to the
+          peer
+      -
+        name: keepalive_timeout
+        type: u32
+        doc: |
+          The number of seconds from the last activity after which the peer is
+          assumed dead
+      -
+        name: del_reason
+        type: u32
+        doc: The reason why a peer was deleted
+        enum: del_peer_reason
+      -
+        name: keyconf
+        type: nest
+        doc: Peer specific cipher configuration
+        nested-attributes: keyconf
+      -
+        name: vpn_rx_bytes
+        type: uint
+        doc: Number of bytes received over the tunnel
+      -
+        name: vpn_tx_bytes
+        type: uint
+        doc: Number of bytes transmitted over the tunnel
+      -
+        name: vpn_rx_packets
+        type: u32
+        doc: Number of packets received over the tunnel
+      -
+        name: vpn_tx_packets
+        type: u32
+        doc: Number of packets transmitted over the tunnel
+      -
+        name: link_rx_bytes
+        type: uint
+        doc: Number of bytes received at the transport level
+      -
+        name: link_tx_bytes
+        type: uint
+        doc: Number of bytes transmitted at the transport level
+      -
+        name: link_rx_packets
+        type: u32
+        doc: Number of packets received at the transport level
+      -
+        name: link_tx_packets
+        type: u32
+        doc: Number of packets transmitted at the transport level
+  -
+    name: keyconf
+    attributes:
+      -
+        name: slot
+        type: u32
+        doc: The slot where the key should be stored
+        enum: key_slot
+      -
+        name: key_id
+        doc: |
+          The unique ID for the key. Used to fetch the correct key upon
+          decryption
+        type: u32
+        checks:
+          max: 2
+      -
+        name: cipher_alg
+        type: u32
+        doc: The cipher to be used when communicating with the peer
+        enum: cipher_alg
+      -
+        name: encrypt_dir
+        type: nest
+        doc: Key material for encrypt direction
+        nested-attributes: keydir
+      -
+        name: decrypt_dir
+        type: nest
+        doc: Key material for decrypt direction
+        nested-attributes: keydir
+  -
+    name: keydir
+    attributes:
+      -
+        name: cipher_key
+        type: binary
+        doc: The actual key to be used by the cipher
+        checks:
+         max-len: 256
+      -
+        name: nonce_tail
+        type: binary
+        doc: |
+          Random nonce to be concatenated to the packet ID, in order to
+          obtain the actua cipher IV
+        checks:
+         exact-len: OVPN_NONCE_TAIL_SIZE
+  -
+    name: ovpn
+    attributes:
+      -
+        name: ifindex
+        type: u32
+        doc: Index of the ovpn interface to operate on
+      -
+        name: ifname
+        type: string
+        doc: Name of the ovpn interface that is being created
+      -
+        name: mode
+        type: u32
+        enum: mode
+        doc: |
+          Oper mode instructing an interface to act as Point2Point or
+          MultiPoint
+      -
+        name: peer
+        type: nest
+        doc: |
+          The peer object containing the attributed of interest for the specific
+          operation
+        nested-attributes: peer
+
+      -
+        name: pad
+        type: pad
+
+operations:
+  list:
+    -
+      name: new_iface
+      attribute-set: ovpn
+      flags: [ admin-perm ]
+      doc: Create a new interface
+      do:
+        request:
+          attributes:
+            - ifname
+            - mode
+        reply:
+          attributes:
+            - ifname
+    -
+      name: del_iface
+      attribute-set: ovpn
+      flags: [ admin-perm ]
+      doc: Delete existing interface
+      do:
+        pre: ovpn-nl-pre-doit
+        post: ovpn-nl-post-doit
+        request:
+          attributes:
+            - ifindex
+    -
+      name: set_peer
+      attribute-set: ovpn
+      flags: [ admin-perm ]
+      doc: Add or modify a remote peer
+      do:
+        pre: ovpn-nl-pre-doit
+        post: ovpn-nl-post-doit
+        request:
+          attributes:
+            - ifindex
+            - peer
+    -
+      name: get_peer
+      attribute-set: ovpn
+      flags: [ admin-perm ]
+      doc: Retrieve data about existing remote peers (or a specific one)
+      do:
+        pre: ovpn-nl-pre-doit
+        post: ovpn-nl-post-doit
+        request:
+          attributes:
+            - ifindex
+            - peer
+        reply:
+          attributes:
+            - peer
+      dump:
+        request:
+          attributes:
+            - ifindex
+        reply:
+          attributes:
+            - peer
+    -
+      name: del_peer
+      attribute-set: ovpn
+      flags: [ admin-perm ]
+      doc: Delete existing remote peer
+      do:
+        pre: ovpn-nl-pre-doit
+        post: ovpn-nl-post-doit
+        request:
+          attributes:
+            - ifindex
+            - peer
+    -
+      name: set_key
+      attribute-set: ovpn
+      flags: [ admin-perm ]
+      doc: Add or modify a cipher key for a specific peer
+      do:
+        pre: ovpn-nl-pre-doit
+        post: ovpn-nl-post-doit
+        request:
+          attributes:
+            - ifindex
+            - peer
+    -
+      name: swap_keys
+      attribute-set: ovpn
+      flags: [ admin-perm ]
+      doc: Swap primary and secondary session keys for a specific peer
+      do:
+        pre: ovpn-nl-pre-doit
+        post: ovpn-nl-post-doit
+        request:
+          attributes:
+            - ifindex
+            - peer
+    -
+      name: del_key
+      attribute-set: ovpn
+      flags: [ admin-perm ]
+      doc: Delete cipher key for a specific peer
+      do:
+        pre: ovpn-nl-pre-doit
+        post: ovpn-nl-post-doit
+        request:
+          attributes:
+            - ifindex
+            - peer
+
+mcast-groups:
+  list:
+    -
+      name: peers
diff --git a/drivers/net/ovpn/Makefile b/drivers/net/ovpn/Makefile
index 53fb197027d7..201dc001419f 100644
--- a/drivers/net/ovpn/Makefile
+++ b/drivers/net/ovpn/Makefile
@@ -9,3 +9,5 @@
 obj-$(CONFIG_OVPN) := ovpn.o
 ovpn-y += main.o
 ovpn-y += io.o
+ovpn-y += netlink.o
+ovpn-y += netlink-gen.o
diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c
index 47d9ed0d9ff0..33c0b004ce16 100644
--- a/drivers/net/ovpn/main.c
+++ b/drivers/net/ovpn/main.c
@@ -7,11 +7,15 @@
  *		James Yonan <james@openvpn.net>
  */
 
+#include <linux/genetlink.h>
 #include <linux/module.h>
 #include <linux/netdevice.h>
 #include <linux/version.h>
+#include <uapi/linux/ovpn.h>
 
+#include "ovpnstruct.h"
 #include "main.h"
+#include "netlink.h"
 #include "io.h"
 
 /* Driver info */
@@ -67,11 +71,22 @@ static int __init ovpn_init(void)
 		return err;
 	}
 
+	err = ovpn_nl_register();
+	if (err) {
+		pr_err("ovpn: can't register netlink family: %d\n", err);
+		goto unreg_netdev;
+	}
+
 	return 0;
+
+unreg_netdev:
+	unregister_netdevice_notifier(&ovpn_netdev_notifier);
+	return err;
 }
 
 static __exit void ovpn_cleanup(void)
 {
+	ovpn_nl_unregister();
 	unregister_netdevice_notifier(&ovpn_netdev_notifier);
 }
 
diff --git a/drivers/net/ovpn/netlink-gen.c b/drivers/net/ovpn/netlink-gen.c
new file mode 100644
index 000000000000..931638d88fd5
--- /dev/null
+++ b/drivers/net/ovpn/netlink-gen.c
@@ -0,0 +1,206 @@
+// SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)
+/* Do not edit directly, auto-generated from: */
+/*	Documentation/netlink/specs/ovpn.yaml */
+/* YNL-GEN kernel source */
+
+#include <net/netlink.h>
+#include <net/genetlink.h>
+
+#include "netlink-gen.h"
+
+#include <uapi/linux/ovpn.h>
+
+/* Integer value ranges */
+static const struct netlink_range_validation ovpn_a_peer_id_range = {
+	.max	= 16777215ULL,
+};
+
+static const struct netlink_range_validation ovpn_a_peer_local_port_range = {
+	.min	= 1ULL,
+	.max	= 65535ULL,
+};
+
+/* Common nested types */
+const struct nla_policy ovpn_keyconf_nl_policy[OVPN_A_KEYCONF_DECRYPT_DIR + 1] = {
+	[OVPN_A_KEYCONF_SLOT] = NLA_POLICY_MAX(NLA_U32, 1),
+	[OVPN_A_KEYCONF_KEY_ID] = NLA_POLICY_MAX(NLA_U32, 2),
+	[OVPN_A_KEYCONF_CIPHER_ALG] = NLA_POLICY_MAX(NLA_U32, 2),
+	[OVPN_A_KEYCONF_ENCRYPT_DIR] = NLA_POLICY_NESTED(ovpn_keydir_nl_policy),
+	[OVPN_A_KEYCONF_DECRYPT_DIR] = NLA_POLICY_NESTED(ovpn_keydir_nl_policy),
+};
+
+const struct nla_policy ovpn_keydir_nl_policy[OVPN_A_KEYDIR_NONCE_TAIL + 1] = {
+	[OVPN_A_KEYDIR_CIPHER_KEY] = NLA_POLICY_MAX_LEN(256),
+	[OVPN_A_KEYDIR_NONCE_TAIL] = NLA_POLICY_EXACT_LEN(OVPN_NONCE_TAIL_SIZE),
+};
+
+const struct nla_policy ovpn_peer_nl_policy[OVPN_A_PEER_LINK_TX_PACKETS + 1] = {
+	[OVPN_A_PEER_ID] = NLA_POLICY_FULL_RANGE(NLA_U32, &ovpn_a_peer_id_range),
+	[OVPN_A_PEER_SOCKADDR_REMOTE] = { .type = NLA_BINARY, },
+	[OVPN_A_PEER_SOCKET] = { .type = NLA_U32, },
+	[OVPN_A_PEER_VPN_IPV4] = { .type = NLA_U32, },
+	[OVPN_A_PEER_VPN_IPV6] = NLA_POLICY_EXACT_LEN(16),
+	[OVPN_A_PEER_LOCAL_IP] = NLA_POLICY_MAX_LEN(16),
+	[OVPN_A_PEER_LOCAL_PORT] = NLA_POLICY_FULL_RANGE(NLA_U32, &ovpn_a_peer_local_port_range),
+	[OVPN_A_PEER_KEEPALIVE_INTERVAL] = { .type = NLA_U32, },
+	[OVPN_A_PEER_KEEPALIVE_TIMEOUT] = { .type = NLA_U32, },
+	[OVPN_A_PEER_DEL_REASON] = NLA_POLICY_MAX(NLA_U32, 4),
+	[OVPN_A_PEER_KEYCONF] = NLA_POLICY_NESTED(ovpn_keyconf_nl_policy),
+	[OVPN_A_PEER_VPN_RX_BYTES] = { .type = NLA_UINT, },
+	[OVPN_A_PEER_VPN_TX_BYTES] = { .type = NLA_UINT, },
+	[OVPN_A_PEER_VPN_RX_PACKETS] = { .type = NLA_U32, },
+	[OVPN_A_PEER_VPN_TX_PACKETS] = { .type = NLA_U32, },
+	[OVPN_A_PEER_LINK_RX_BYTES] = { .type = NLA_UINT, },
+	[OVPN_A_PEER_LINK_TX_BYTES] = { .type = NLA_UINT, },
+	[OVPN_A_PEER_LINK_RX_PACKETS] = { .type = NLA_U32, },
+	[OVPN_A_PEER_LINK_TX_PACKETS] = { .type = NLA_U32, },
+};
+
+/* OVPN_CMD_NEW_IFACE - do */
+static const struct nla_policy ovpn_new_iface_nl_policy[OVPN_A_MODE + 1] = {
+	[OVPN_A_IFNAME] = { .type = NLA_NUL_STRING, },
+	[OVPN_A_MODE] = NLA_POLICY_MAX(NLA_U32, 1),
+};
+
+/* OVPN_CMD_DEL_IFACE - do */
+static const struct nla_policy ovpn_del_iface_nl_policy[OVPN_A_IFINDEX + 1] = {
+	[OVPN_A_IFINDEX] = { .type = NLA_U32, },
+};
+
+/* OVPN_CMD_SET_PEER - do */
+static const struct nla_policy ovpn_set_peer_nl_policy[OVPN_A_PEER + 1] = {
+	[OVPN_A_IFINDEX] = { .type = NLA_U32, },
+	[OVPN_A_PEER] = NLA_POLICY_NESTED(ovpn_peer_nl_policy),
+};
+
+/* OVPN_CMD_GET_PEER - do */
+static const struct nla_policy ovpn_get_peer_do_nl_policy[OVPN_A_PEER + 1] = {
+	[OVPN_A_IFINDEX] = { .type = NLA_U32, },
+	[OVPN_A_PEER] = NLA_POLICY_NESTED(ovpn_peer_nl_policy),
+};
+
+/* OVPN_CMD_GET_PEER - dump */
+static const struct nla_policy ovpn_get_peer_dump_nl_policy[OVPN_A_IFINDEX + 1] = {
+	[OVPN_A_IFINDEX] = { .type = NLA_U32, },
+};
+
+/* OVPN_CMD_DEL_PEER - do */
+static const struct nla_policy ovpn_del_peer_nl_policy[OVPN_A_PEER + 1] = {
+	[OVPN_A_IFINDEX] = { .type = NLA_U32, },
+	[OVPN_A_PEER] = NLA_POLICY_NESTED(ovpn_peer_nl_policy),
+};
+
+/* OVPN_CMD_SET_KEY - do */
+static const struct nla_policy ovpn_set_key_nl_policy[OVPN_A_PEER + 1] = {
+	[OVPN_A_IFINDEX] = { .type = NLA_U32, },
+	[OVPN_A_PEER] = NLA_POLICY_NESTED(ovpn_peer_nl_policy),
+};
+
+/* OVPN_CMD_SWAP_KEYS - do */
+static const struct nla_policy ovpn_swap_keys_nl_policy[OVPN_A_PEER + 1] = {
+	[OVPN_A_IFINDEX] = { .type = NLA_U32, },
+	[OVPN_A_PEER] = NLA_POLICY_NESTED(ovpn_peer_nl_policy),
+};
+
+/* OVPN_CMD_DEL_KEY - do */
+static const struct nla_policy ovpn_del_key_nl_policy[OVPN_A_PEER + 1] = {
+	[OVPN_A_IFINDEX] = { .type = NLA_U32, },
+	[OVPN_A_PEER] = NLA_POLICY_NESTED(ovpn_peer_nl_policy),
+};
+
+/* Ops table for ovpn */
+static const struct genl_split_ops ovpn_nl_ops[] = {
+	{
+		.cmd		= OVPN_CMD_NEW_IFACE,
+		.doit		= ovpn_nl_new_iface_doit,
+		.policy		= ovpn_new_iface_nl_policy,
+		.maxattr	= OVPN_A_MODE,
+		.flags		= GENL_ADMIN_PERM | GENL_CMD_CAP_DO,
+	},
+	{
+		.cmd		= OVPN_CMD_DEL_IFACE,
+		.pre_doit	= ovpn_nl_pre_doit,
+		.doit		= ovpn_nl_del_iface_doit,
+		.post_doit	= ovpn_nl_post_doit,
+		.policy		= ovpn_del_iface_nl_policy,
+		.maxattr	= OVPN_A_IFINDEX,
+		.flags		= GENL_ADMIN_PERM | GENL_CMD_CAP_DO,
+	},
+	{
+		.cmd		= OVPN_CMD_SET_PEER,
+		.pre_doit	= ovpn_nl_pre_doit,
+		.doit		= ovpn_nl_set_peer_doit,
+		.post_doit	= ovpn_nl_post_doit,
+		.policy		= ovpn_set_peer_nl_policy,
+		.maxattr	= OVPN_A_PEER,
+		.flags		= GENL_ADMIN_PERM | GENL_CMD_CAP_DO,
+	},
+	{
+		.cmd		= OVPN_CMD_GET_PEER,
+		.pre_doit	= ovpn_nl_pre_doit,
+		.doit		= ovpn_nl_get_peer_doit,
+		.post_doit	= ovpn_nl_post_doit,
+		.policy		= ovpn_get_peer_do_nl_policy,
+		.maxattr	= OVPN_A_PEER,
+		.flags		= GENL_ADMIN_PERM | GENL_CMD_CAP_DO,
+	},
+	{
+		.cmd		= OVPN_CMD_GET_PEER,
+		.dumpit		= ovpn_nl_get_peer_dumpit,
+		.policy		= ovpn_get_peer_dump_nl_policy,
+		.maxattr	= OVPN_A_IFINDEX,
+		.flags		= GENL_ADMIN_PERM | GENL_CMD_CAP_DUMP,
+	},
+	{
+		.cmd		= OVPN_CMD_DEL_PEER,
+		.pre_doit	= ovpn_nl_pre_doit,
+		.doit		= ovpn_nl_del_peer_doit,
+		.post_doit	= ovpn_nl_post_doit,
+		.policy		= ovpn_del_peer_nl_policy,
+		.maxattr	= OVPN_A_PEER,
+		.flags		= GENL_ADMIN_PERM | GENL_CMD_CAP_DO,
+	},
+	{
+		.cmd		= OVPN_CMD_SET_KEY,
+		.pre_doit	= ovpn_nl_pre_doit,
+		.doit		= ovpn_nl_set_key_doit,
+		.post_doit	= ovpn_nl_post_doit,
+		.policy		= ovpn_set_key_nl_policy,
+		.maxattr	= OVPN_A_PEER,
+		.flags		= GENL_ADMIN_PERM | GENL_CMD_CAP_DO,
+	},
+	{
+		.cmd		= OVPN_CMD_SWAP_KEYS,
+		.pre_doit	= ovpn_nl_pre_doit,
+		.doit		= ovpn_nl_swap_keys_doit,
+		.post_doit	= ovpn_nl_post_doit,
+		.policy		= ovpn_swap_keys_nl_policy,
+		.maxattr	= OVPN_A_PEER,
+		.flags		= GENL_ADMIN_PERM | GENL_CMD_CAP_DO,
+	},
+	{
+		.cmd		= OVPN_CMD_DEL_KEY,
+		.pre_doit	= ovpn_nl_pre_doit,
+		.doit		= ovpn_nl_del_key_doit,
+		.post_doit	= ovpn_nl_post_doit,
+		.policy		= ovpn_del_key_nl_policy,
+		.maxattr	= OVPN_A_PEER,
+		.flags		= GENL_ADMIN_PERM | GENL_CMD_CAP_DO,
+	},
+};
+
+static const struct genl_multicast_group ovpn_nl_mcgrps[] = {
+	[OVPN_NLGRP_PEERS] = { "peers", },
+};
+
+struct genl_family ovpn_nl_family __ro_after_init = {
+	.name		= OVPN_FAMILY_NAME,
+	.version	= OVPN_FAMILY_VERSION,
+	.netnsok	= true,
+	.parallel_ops	= true,
+	.module		= THIS_MODULE,
+	.split_ops	= ovpn_nl_ops,
+	.n_split_ops	= ARRAY_SIZE(ovpn_nl_ops),
+	.mcgrps		= ovpn_nl_mcgrps,
+	.n_mcgrps	= ARRAY_SIZE(ovpn_nl_mcgrps),
+};
diff --git a/drivers/net/ovpn/netlink-gen.h b/drivers/net/ovpn/netlink-gen.h
new file mode 100644
index 000000000000..ce11f74e1b56
--- /dev/null
+++ b/drivers/net/ovpn/netlink-gen.h
@@ -0,0 +1,41 @@
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) */
+/* Do not edit directly, auto-generated from: */
+/*	Documentation/netlink/specs/ovpn.yaml */
+/* YNL-GEN kernel header */
+
+#ifndef _LINUX_OVPN_GEN_H
+#define _LINUX_OVPN_GEN_H
+
+#include <net/netlink.h>
+#include <net/genetlink.h>
+
+#include <uapi/linux/ovpn.h>
+
+/* Common nested types */
+extern const struct nla_policy ovpn_keyconf_nl_policy[OVPN_A_KEYCONF_DECRYPT_DIR + 1];
+extern const struct nla_policy ovpn_keydir_nl_policy[OVPN_A_KEYDIR_NONCE_TAIL + 1];
+extern const struct nla_policy ovpn_peer_nl_policy[OVPN_A_PEER_LINK_TX_PACKETS + 1];
+
+int ovpn_nl_pre_doit(const struct genl_split_ops *ops, struct sk_buff *skb,
+		     struct genl_info *info);
+void
+ovpn_nl_post_doit(const struct genl_split_ops *ops, struct sk_buff *skb,
+		  struct genl_info *info);
+
+int ovpn_nl_new_iface_doit(struct sk_buff *skb, struct genl_info *info);
+int ovpn_nl_del_iface_doit(struct sk_buff *skb, struct genl_info *info);
+int ovpn_nl_set_peer_doit(struct sk_buff *skb, struct genl_info *info);
+int ovpn_nl_get_peer_doit(struct sk_buff *skb, struct genl_info *info);
+int ovpn_nl_get_peer_dumpit(struct sk_buff *skb, struct netlink_callback *cb);
+int ovpn_nl_del_peer_doit(struct sk_buff *skb, struct genl_info *info);
+int ovpn_nl_set_key_doit(struct sk_buff *skb, struct genl_info *info);
+int ovpn_nl_swap_keys_doit(struct sk_buff *skb, struct genl_info *info);
+int ovpn_nl_del_key_doit(struct sk_buff *skb, struct genl_info *info);
+
+enum {
+	OVPN_NLGRP_PEERS,
+};
+
+extern struct genl_family ovpn_nl_family;
+
+#endif /* _LINUX_OVPN_GEN_H */
diff --git a/drivers/net/ovpn/netlink.c b/drivers/net/ovpn/netlink.c
new file mode 100644
index 000000000000..c0a9f58e0e87
--- /dev/null
+++ b/drivers/net/ovpn/netlink.c
@@ -0,0 +1,154 @@
+// SPDX-License-Identifier: GPL-2.0
+/*  OpenVPN data channel offload
+ *
+ *  Copyright (C) 2020-2024 OpenVPN, Inc.
+ *
+ *  Author:	Antonio Quartulli <antonio@openvpn.net>
+ */
+
+#include <linux/netdevice.h>
+#include <net/genetlink.h>
+
+#include <uapi/linux/ovpn.h>
+
+#include "ovpnstruct.h"
+#include "main.h"
+#include "io.h"
+#include "netlink.h"
+#include "netlink-gen.h"
+
+MODULE_ALIAS_GENL_FAMILY(OVPN_FAMILY_NAME);
+
+/**
+ * ovpn_get_dev_from_attrs - retrieve the netdevice a netlink message is
+ *                           targeting
+ * @net: network namespace where to look for the interface
+ * @attrs: attributes of the received message
+ *
+ * Return: the netdevice, if found, or an error otherwise
+ */
+static struct net_device *
+ovpn_get_dev_from_attrs(struct net *net, struct nlattr **attrs)
+{
+	struct net_device *dev;
+	int ifindex;
+
+	if (!attrs[OVPN_A_IFINDEX])
+		return ERR_PTR(-EINVAL);
+
+	ifindex = nla_get_u32(attrs[OVPN_A_IFINDEX]);
+
+	dev = dev_get_by_index(net, ifindex);
+	if (!dev)
+		return ERR_PTR(-ENODEV);
+
+	if (!ovpn_dev_is_valid(dev))
+		goto err_put_dev;
+
+	return dev;
+
+err_put_dev:
+	dev_put(dev);
+
+	return ERR_PTR(-EINVAL);
+}
+
+int ovpn_nl_pre_doit(const struct genl_split_ops *ops, struct sk_buff *skb,
+		     struct genl_info *info)
+{
+	struct net *net = genl_info_net(info);
+	struct net_device *dev = ovpn_get_dev_from_attrs(net, info->attrs);
+
+	if (IS_ERR(dev))
+		return PTR_ERR(dev);
+
+	info->user_ptr[0] = netdev_priv(dev);
+
+	return 0;
+}
+
+void ovpn_nl_post_doit(const struct genl_split_ops *ops, struct sk_buff *skb,
+		       struct genl_info *info)
+{
+	struct ovpn_struct *ovpn = info->user_ptr[0];
+
+	if (ovpn)
+		dev_put(ovpn->dev);
+}
+
+int ovpn_nl_new_iface_doit(struct sk_buff *skb, struct genl_info *info)
+{
+	return -ENOTSUPP;
+}
+
+int ovpn_nl_del_iface_doit(struct sk_buff *skb, struct genl_info *info)
+{
+	return -ENOTSUPP;
+}
+
+int ovpn_nl_set_peer_doit(struct sk_buff *skb, struct genl_info *info)
+{
+	return -ENOTSUPP;
+}
+
+int ovpn_nl_get_peer_doit(struct sk_buff *skb, struct genl_info *info)
+{
+	return -ENOTSUPP;
+}
+
+int ovpn_nl_get_peer_dumpit(struct sk_buff *skb, struct netlink_callback *cb)
+{
+	return -ENOTSUPP;
+}
+
+int ovpn_nl_del_peer_doit(struct sk_buff *skb, struct genl_info *info)
+{
+	return -ENOTSUPP;
+}
+
+int ovpn_nl_set_key_doit(struct sk_buff *skb, struct genl_info *info)
+{
+	return -ENOTSUPP;
+}
+
+int ovpn_nl_swap_keys_doit(struct sk_buff *skb, struct genl_info *info)
+{
+	return -ENOTSUPP;
+}
+
+int ovpn_nl_del_key_doit(struct sk_buff *skb, struct genl_info *info)
+{
+	return -ENOTSUPP;
+}
+
+/**
+ * ovpn_nl_init - perform any ovpn specific netlink initialization
+ * @ovpn: the openvpn instance object
+ */
+int ovpn_nl_init(struct ovpn_struct *ovpn)
+{
+	return 0;
+}
+
+/**
+ * ovpn_nl_register - register the ovpn genl nl family
+ */
+int __init ovpn_nl_register(void)
+{
+	int ret = genl_register_family(&ovpn_nl_family);
+
+	if (ret) {
+		pr_err("ovpn: genl_register_family failed: %d\n", ret);
+		return ret;
+	}
+
+	return 0;
+}
+
+/**
+ * ovpn_nl_unregister - unregister the ovpn genl netlink family
+ */
+void ovpn_nl_unregister(void)
+{
+	genl_unregister_family(&ovpn_nl_family);
+}
diff --git a/drivers/net/ovpn/netlink.h b/drivers/net/ovpn/netlink.h
new file mode 100644
index 000000000000..d79f3ca604b0
--- /dev/null
+++ b/drivers/net/ovpn/netlink.h
@@ -0,0 +1,30 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*  OpenVPN data channel offload
+ *
+ *  Copyright (C) 2020-2024 OpenVPN, Inc.
+ *
+ *  Author:	Antonio Quartulli <antonio@openvpn.net>
+ */
+
+#ifndef _NET_OVPN_NETLINK_H_
+#define _NET_OVPN_NETLINK_H_
+
+/**
+ * ovpn_nl_init - initialize netlink specific members
+ * @ovpn: the openvpn instance to initialize
+ *
+ * Return 0 on succe or a negative error code otherwise
+ */
+int ovpn_nl_init(struct ovpn_struct *ovpn);
+
+/**
+ * ovpn_nl_register - perform any needed registration in the NL subsustem
+ */
+int ovpn_nl_register(void);
+
+/**
+ * ovpn_nl_unregister - undo any module wide netlink registration
+ */
+void ovpn_nl_unregister(void);
+
+#endif /* _NET_OVPN_NETLINK_H_ */
diff --git a/drivers/net/ovpn/ovpnstruct.h b/drivers/net/ovpn/ovpnstruct.h
new file mode 100644
index 000000000000..ff248cad1401
--- /dev/null
+++ b/drivers/net/ovpn/ovpnstruct.h
@@ -0,0 +1,21 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*  OpenVPN data channel offload
+ *
+ *  Copyright (C) 2019-2024 OpenVPN, Inc.
+ *
+ *  Author:	James Yonan <james@openvpn.net>
+ *		Antonio Quartulli <antonio@openvpn.net>
+ */
+
+#ifndef _NET_OVPN_OVPNSTRUCT_H_
+#define _NET_OVPN_OVPNSTRUCT_H_
+
+/**
+ * struct ovpn_struct - per ovpn interface state
+ * @dev: the actual netdev representing the tunnel
+ */
+struct ovpn_struct {
+	struct net_device *dev;
+};
+
+#endif /* _NET_OVPN_OVPNSTRUCT_H_ */
diff --git a/include/uapi/linux/ovpn.h b/include/uapi/linux/ovpn.h
new file mode 100644
index 000000000000..3c89e83450d3
--- /dev/null
+++ b/include/uapi/linux/ovpn.h
@@ -0,0 +1,109 @@
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) */
+/* Do not edit directly, auto-generated from: */
+/*	Documentation/netlink/specs/ovpn.yaml */
+/* YNL-GEN uapi header */
+
+#ifndef _UAPI_LINUX_OVPN_H
+#define _UAPI_LINUX_OVPN_H
+
+#define OVPN_FAMILY_NAME	"ovpn"
+#define OVPN_FAMILY_VERSION	1
+
+#define OVPN_NONCE_TAIL_SIZE	8
+
+enum ovpn_cipher_alg {
+	OVPN_CIPHER_ALG_NONE,
+	OVPN_CIPHER_ALG_AES_GCM,
+	OVPN_CIPHER_ALG_CHACHA20_POLY1305,
+};
+
+enum ovpn_del_peer_reason {
+	OVPN_DEL_PEER_REASON_TEARDOWN,
+	OVPN_DEL_PEER_REASON_USERSPACE,
+	OVPN_DEL_PEER_REASON_EXPIRED,
+	OVPN_DEL_PEER_REASON_TRANSPORT_ERROR,
+	OVPN_DEL_PEER_REASON_TRANSPORT_DISCONNECT,
+};
+
+enum ovpn_key_slot {
+	OVPN_KEY_SLOT_PRIMARY,
+	OVPN_KEY_SLOT_SECONDARY,
+};
+
+enum ovpn_mode {
+	OVPN_MODE_P2P,
+	OVPN_MODE_MP,
+};
+
+enum {
+	OVPN_A_PEER_ID = 1,
+	OVPN_A_PEER_SOCKADDR_REMOTE,
+	OVPN_A_PEER_SOCKET,
+	OVPN_A_PEER_VPN_IPV4,
+	OVPN_A_PEER_VPN_IPV6,
+	OVPN_A_PEER_LOCAL_IP,
+	OVPN_A_PEER_LOCAL_PORT,
+	OVPN_A_PEER_KEEPALIVE_INTERVAL,
+	OVPN_A_PEER_KEEPALIVE_TIMEOUT,
+	OVPN_A_PEER_DEL_REASON,
+	OVPN_A_PEER_KEYCONF,
+	OVPN_A_PEER_VPN_RX_BYTES,
+	OVPN_A_PEER_VPN_TX_BYTES,
+	OVPN_A_PEER_VPN_RX_PACKETS,
+	OVPN_A_PEER_VPN_TX_PACKETS,
+	OVPN_A_PEER_LINK_RX_BYTES,
+	OVPN_A_PEER_LINK_TX_BYTES,
+	OVPN_A_PEER_LINK_RX_PACKETS,
+	OVPN_A_PEER_LINK_TX_PACKETS,
+
+	__OVPN_A_PEER_MAX,
+	OVPN_A_PEER_MAX = (__OVPN_A_PEER_MAX - 1)
+};
+
+enum {
+	OVPN_A_KEYCONF_SLOT = 1,
+	OVPN_A_KEYCONF_KEY_ID,
+	OVPN_A_KEYCONF_CIPHER_ALG,
+	OVPN_A_KEYCONF_ENCRYPT_DIR,
+	OVPN_A_KEYCONF_DECRYPT_DIR,
+
+	__OVPN_A_KEYCONF_MAX,
+	OVPN_A_KEYCONF_MAX = (__OVPN_A_KEYCONF_MAX - 1)
+};
+
+enum {
+	OVPN_A_KEYDIR_CIPHER_KEY = 1,
+	OVPN_A_KEYDIR_NONCE_TAIL,
+
+	__OVPN_A_KEYDIR_MAX,
+	OVPN_A_KEYDIR_MAX = (__OVPN_A_KEYDIR_MAX - 1)
+};
+
+enum {
+	OVPN_A_IFINDEX = 1,
+	OVPN_A_IFNAME,
+	OVPN_A_MODE,
+	OVPN_A_PEER,
+	OVPN_A_PAD,
+
+	__OVPN_A_MAX,
+	OVPN_A_MAX = (__OVPN_A_MAX - 1)
+};
+
+enum {
+	OVPN_CMD_NEW_IFACE = 1,
+	OVPN_CMD_DEL_IFACE,
+	OVPN_CMD_SET_PEER,
+	OVPN_CMD_GET_PEER,
+	OVPN_CMD_DEL_PEER,
+	OVPN_CMD_SET_KEY,
+	OVPN_CMD_SWAP_KEYS,
+	OVPN_CMD_DEL_KEY,
+
+	__OVPN_CMD_MAX,
+	OVPN_CMD_MAX = (__OVPN_CMD_MAX - 1)
+};
+
+#define OVPN_MCGRP_PEERS	"peers"
+
+#endif /* _UAPI_LINUX_OVPN_H */
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH net-next v3 04/24] ovpn: add basic interface creation/destruction/management routines
  2024-05-06  1:16 [PATCH net-next v3 00/24] Introducing OpenVPN Data Channel Offload Antonio Quartulli
                   ` (2 preceding siblings ...)
  2024-05-06  1:16 ` [PATCH net-next v3 03/24] ovpn: add basic netlink support Antonio Quartulli
@ 2024-05-06  1:16 ` Antonio Quartulli
  2024-05-08  0:18   ` Jakub Kicinski
  2024-05-08 14:52   ` Sabrina Dubroca
  2024-05-06  1:16 ` [PATCH net-next v3 05/24] ovpn: implement interface creation/destruction via netlink Antonio Quartulli
                   ` (20 subsequent siblings)
  24 siblings, 2 replies; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-06  1:16 UTC (permalink / raw
  To: netdev
  Cc: Jakub Kicinski, Sergey Ryazanov, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Esben Haabendal, Antonio Quartulli

Add basic infrastructure for handling ovpn interfaces.

Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
---
 drivers/net/ovpn/io.c         |  20 ++++
 drivers/net/ovpn/io.h         |   7 ++
 drivers/net/ovpn/main.c       | 183 +++++++++++++++++++++++++++++++++-
 drivers/net/ovpn/main.h       |  31 ++++++
 drivers/net/ovpn/ovpnstruct.h |   8 ++
 drivers/net/ovpn/packet.h     |  40 ++++++++
 6 files changed, 285 insertions(+), 4 deletions(-)
 create mode 100644 drivers/net/ovpn/packet.h

diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c
index ad3813419c33..338e99dfe886 100644
--- a/drivers/net/ovpn/io.c
+++ b/drivers/net/ovpn/io.c
@@ -11,6 +11,26 @@
 #include <linux/skbuff.h>
 
 #include "io.h"
+#include "ovpnstruct.h"
+#include "netlink.h"
+
+int ovpn_struct_init(struct net_device *dev)
+{
+	struct ovpn_struct *ovpn = netdev_priv(dev);
+	int err;
+
+	ovpn->dev = dev;
+
+	err = ovpn_nl_init(ovpn);
+	if (err < 0)
+		return err;
+
+	dev->tstats = netdev_alloc_pcpu_stats(struct pcpu_sw_netstats);
+	if (!dev->tstats)
+		return -ENOMEM;
+
+	return 0;
+}
 
 /* Send user data to the network
  */
diff --git a/drivers/net/ovpn/io.h b/drivers/net/ovpn/io.h
index aa259be66441..61a2485e16b5 100644
--- a/drivers/net/ovpn/io.h
+++ b/drivers/net/ovpn/io.h
@@ -10,6 +10,13 @@
 #ifndef _NET_OVPN_OVPN_H_
 #define _NET_OVPN_OVPN_H_
 
+/**
+ * ovpn_struct_init - Initialize the netdevice private area
+ * @dev: the device to initialize
+ *
+ * Return: 0 on success or a negative error code otherwise
+ */
+int ovpn_struct_init(struct net_device *dev);
 netdev_tx_t ovpn_net_xmit(struct sk_buff *skb, struct net_device *dev);
 
 #endif /* _NET_OVPN_OVPN_H_ */
diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c
index 33c0b004ce16..584cd7286aff 100644
--- a/drivers/net/ovpn/main.c
+++ b/drivers/net/ovpn/main.c
@@ -10,47 +10,195 @@
 #include <linux/genetlink.h>
 #include <linux/module.h>
 #include <linux/netdevice.h>
+#include <linux/inetdevice.h>
 #include <linux/version.h>
+#include <net/ip.h>
+#include <uapi/linux/if_arp.h>
 #include <uapi/linux/ovpn.h>
 
 #include "ovpnstruct.h"
 #include "main.h"
 #include "netlink.h"
 #include "io.h"
+#include "packet.h"
 
 /* Driver info */
 #define DRV_DESCRIPTION	"OpenVPN data channel offload (ovpn)"
 #define DRV_COPYRIGHT	"(C) 2020-2024 OpenVPN, Inc."
 
+static LIST_HEAD(dev_list);
+
+static void ovpn_struct_free(struct net_device *net)
+{
+	struct ovpn_struct *ovpn = netdev_priv(net);
+
+	rtnl_lock();
+	list_del(&ovpn->dev_list);
+	rtnl_unlock();
+
+	free_percpu(net->tstats);
+}
+
+static int ovpn_net_open(struct net_device *dev)
+{
+	struct in_device *dev_v4 = __in_dev_get_rtnl(dev);
+
+	if (dev_v4) {
+		/* disable redirects as Linux gets confused by ovpn handling
+		 * same-LAN routing
+		 */
+		IN_DEV_CONF_SET(dev_v4, SEND_REDIRECTS, false);
+		IPV4_DEVCONF_ALL(dev_net(dev), SEND_REDIRECTS) = false;
+	}
+
+	netif_tx_start_all_queues(dev);
+	return 0;
+}
+
+static int ovpn_net_stop(struct net_device *dev)
+{
+	netif_tx_stop_all_queues(dev);
+	return 0;
+}
+
+static const struct net_device_ops ovpn_netdev_ops = {
+	.ndo_open		= ovpn_net_open,
+	.ndo_stop		= ovpn_net_stop,
+	.ndo_start_xmit		= ovpn_net_xmit,
+	.ndo_get_stats64        = dev_get_tstats64,
+};
+
 bool ovpn_dev_is_valid(const struct net_device *dev)
 {
 	return dev->netdev_ops->ndo_start_xmit == ovpn_net_xmit;
 }
 
+static void ovpn_setup(struct net_device *dev)
+{
+	/* compute the overhead considering AEAD encryption */
+	const int overhead = sizeof(u32) + NONCE_WIRE_SIZE + 16 +
+			     sizeof(struct udphdr) +
+			     max(sizeof(struct ipv6hdr), sizeof(struct iphdr));
+
+	netdev_features_t feat = NETIF_F_SG | NETIF_F_LLTX |
+				 NETIF_F_HW_CSUM | NETIF_F_RXCSUM |
+				 NETIF_F_GSO | NETIF_F_GSO_SOFTWARE |
+				 NETIF_F_HIGHDMA;
+
+	dev->needs_free_netdev = true;
+
+	dev->netdev_ops = &ovpn_netdev_ops;
+
+	dev->priv_destructor = ovpn_struct_free;
+
+	dev->hard_header_len = 0;
+	dev->addr_len = 0;
+	dev->mtu = ETH_DATA_LEN - overhead;
+	dev->min_mtu = IPV4_MIN_MTU;
+	dev->max_mtu = IP_MAX_MTU - overhead;
+
+	dev->type = ARPHRD_NONE;
+	dev->flags = IFF_POINTOPOINT | IFF_NOARP;
+
+	dev->features |= feat;
+	dev->hw_features |= feat;
+	dev->hw_enc_features |= feat;
+
+	dev->needed_headroom = OVPN_HEAD_ROOM;
+	dev->needed_tailroom = OVPN_MAX_PADDING;
+}
+
+struct net_device *ovpn_iface_create(const char *name, enum ovpn_mode mode,
+				     struct net *net)
+{
+	struct ovpn_struct *ovpn;
+	struct net_device *dev;
+	int ret;
+
+	dev = alloc_netdev(sizeof(struct ovpn_struct), name, NET_NAME_USER,
+			   ovpn_setup);
+	if (!dev)
+		return ERR_PTR(-ENOMEM);
+
+	dev_net_set(dev, net);
+
+	ret = ovpn_struct_init(dev);
+	if (ret < 0)
+		goto err;
+
+	ovpn = netdev_priv(dev);
+	ovpn->mode = mode;
+
+	rtnl_lock();
+
+	ret = register_netdevice(dev);
+	if (ret < 0) {
+		netdev_err(dev, "cannot register interface: %d\n", ret);
+		rtnl_unlock();
+		goto err;
+	}
+
+	list_add(&ovpn->dev_list, &dev_list);
+	rtnl_unlock();
+
+	/* turn carrier explicitly off after registration, this way state is
+	 * clearly defined
+	 */
+	netif_carrier_off(dev);
+
+	return dev;
+
+err:
+	free_netdev(dev);
+	return ERR_PTR(ret);
+}
+
+void ovpn_iface_destruct(struct ovpn_struct *ovpn)
+{
+	ASSERT_RTNL();
+
+	netif_carrier_off(ovpn->dev);
+
+	ovpn->registered = false;
+
+	unregister_netdevice(ovpn->dev);
+	synchronize_net();
+}
+
 static int ovpn_netdev_notifier_call(struct notifier_block *nb,
 				     unsigned long state, void *ptr)
 {
 	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
+	struct ovpn_struct *ovpn;
 
 	if (!ovpn_dev_is_valid(dev))
 		return NOTIFY_DONE;
 
+	ovpn = netdev_priv(dev);
+
 	switch (state) {
 	case NETDEV_REGISTER:
-		/* add device to internal list for later destruction upon
-		 * unregistration
-		 */
+		ovpn->registered = true;
 		break;
 	case NETDEV_UNREGISTER:
+		/* twiddle thumbs on netns device moves */
+		if (dev->reg_state != NETREG_UNREGISTERING)
+			break;
+
 		/* can be delivered multiple times, so check registered flag,
 		 * then destroy the interface
 		 */
+		if (!ovpn->registered)
+			return NOTIFY_DONE;
+
+		ovpn_iface_destruct(ovpn);
 		break;
 	case NETDEV_POST_INIT:
 	case NETDEV_GOING_DOWN:
 	case NETDEV_DOWN:
 	case NETDEV_UP:
 	case NETDEV_PRE_UP:
+		break;
 	default:
 		return NOTIFY_DONE;
 	}
@@ -62,6 +210,24 @@ static struct notifier_block ovpn_netdev_notifier = {
 	.notifier_call = ovpn_netdev_notifier_call,
 };
 
+static void ovpn_netns_pre_exit(struct net *net)
+{
+	struct ovpn_struct *ovpn;
+
+	rtnl_lock();
+	list_for_each_entry(ovpn, &dev_list, dev_list) {
+		if (dev_net(ovpn->dev) != net)
+			continue;
+
+		ovpn_iface_destruct(ovpn);
+	}
+	rtnl_unlock();
+}
+
+static struct pernet_operations ovpn_pernet_ops = {
+	.pre_exit = ovpn_netns_pre_exit
+};
+
 static int __init ovpn_init(void)
 {
 	int err = register_netdevice_notifier(&ovpn_netdev_notifier);
@@ -71,14 +237,22 @@ static int __init ovpn_init(void)
 		return err;
 	}
 
+	err = register_pernet_device(&ovpn_pernet_ops);
+	if (err) {
+		pr_err("ovpn: can't register pernet ops: %d\n", err);
+		goto unreg_netdev;
+	}
+
 	err = ovpn_nl_register();
 	if (err) {
 		pr_err("ovpn: can't register netlink family: %d\n", err);
-		goto unreg_netdev;
+		goto unreg_pernet;
 	}
 
 	return 0;
 
+unreg_pernet:
+	unregister_pernet_device(&ovpn_pernet_ops);
 unreg_netdev:
 	unregister_netdevice_notifier(&ovpn_netdev_notifier);
 	return err;
@@ -87,6 +261,7 @@ static int __init ovpn_init(void)
 static __exit void ovpn_cleanup(void)
 {
 	ovpn_nl_unregister();
+	unregister_pernet_device(&ovpn_pernet_ops);
 	unregister_netdevice_notifier(&ovpn_netdev_notifier);
 }
 
diff --git a/drivers/net/ovpn/main.h b/drivers/net/ovpn/main.h
index 380adb593d0c..21d6bfb27d67 100644
--- a/drivers/net/ovpn/main.h
+++ b/drivers/net/ovpn/main.h
@@ -10,6 +10,30 @@
 #ifndef _NET_OVPN_MAIN_H_
 #define _NET_OVPN_MAIN_H_
 
+/**
+ * ovpn_iface_create - create and initialize a new 'ovpn' netdevice
+ * @name: the name of the new device
+ * @mode: the OpenVPN mode to set this device to
+ * @net: the netns this device should be created in
+ *
+ * A new netdevice is created and registered.
+ * Its private area is initialized with an empty ovpn_struct object.
+ *
+ * Return: a pointer to the new device on success or a negative error code
+ *         otherwise
+ */
+struct net_device *ovpn_iface_create(const char *name, enum ovpn_mode mode,
+				     struct net *net);
+
+/**
+ * ovpn_iface_destruct - tear down netdevice
+ * @ovpn: the ovpn instance objected related to the interface to tear down
+ *
+ * This function takes care of tearing down an ovpn device and can be invoked
+ * internally or upon UNREGISTER netdev event
+ */
+void ovpn_iface_destruct(struct ovpn_struct *ovpn);
+
 /**
  * ovpn_dev_is_valid - check if the netdevice is of type 'ovpn'
  * @dev: the interface to check
@@ -18,4 +42,11 @@
  */
 bool ovpn_dev_is_valid(const struct net_device *dev);
 
+#define SKB_HEADER_LEN                                       \
+	(max(sizeof(struct iphdr), sizeof(struct ipv6hdr)) + \
+	 sizeof(struct udphdr) + NET_SKB_PAD)
+
+#define OVPN_HEAD_ROOM ALIGN(16 + SKB_HEADER_LEN, 4)
+#define OVPN_MAX_PADDING 16
+
 #endif /* _NET_OVPN_MAIN_H_ */
diff --git a/drivers/net/ovpn/ovpnstruct.h b/drivers/net/ovpn/ovpnstruct.h
index ff248cad1401..ee05b8a2c61d 100644
--- a/drivers/net/ovpn/ovpnstruct.h
+++ b/drivers/net/ovpn/ovpnstruct.h
@@ -10,12 +10,20 @@
 #ifndef _NET_OVPN_OVPNSTRUCT_H_
 #define _NET_OVPN_OVPNSTRUCT_H_
 
+#include <uapi/linux/ovpn.h>
+
 /**
  * struct ovpn_struct - per ovpn interface state
  * @dev: the actual netdev representing the tunnel
+ * @registered: whether dev is still registered with netdev or not
+ * @mode: device operation mode (i.e. p2p, mp, ..)
+ * @dev_list: entry for the module wide device list
  */
 struct ovpn_struct {
 	struct net_device *dev;
+	bool registered;
+	enum ovpn_mode mode;
+	struct list_head dev_list;
 };
 
 #endif /* _NET_OVPN_OVPNSTRUCT_H_ */
diff --git a/drivers/net/ovpn/packet.h b/drivers/net/ovpn/packet.h
new file mode 100644
index 000000000000..7ed146f5932a
--- /dev/null
+++ b/drivers/net/ovpn/packet.h
@@ -0,0 +1,40 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*  OpenVPN data channel offload
+ *
+ *  Copyright (C) 2020-2024 OpenVPN, Inc.
+ *
+ *  Author:	Antonio Quartulli <antonio@openvpn.net>
+ *		James Yonan <james@openvpn.net>
+ */
+
+#ifndef _NET_OVPN_PACKET_H_
+#define _NET_OVPN_PACKET_H_
+
+/* When the OpenVPN protocol is ran in AEAD mode, use
+ * the OpenVPN packet ID as the AEAD nonce:
+ *
+ *    00000005 521c3b01 4308c041
+ *    [seq # ] [  nonce_tail   ]
+ *    [     12-byte full IV    ] -> NONCE_SIZE
+ *    [4-bytes                   -> NONCE_WIRE_SIZE
+ *    on wire]
+ */
+
+/* OpenVPN nonce size */
+#define NONCE_SIZE 12
+
+/* OpenVPN nonce size reduced by 8-byte nonce tail -- this is the
+ * size of the AEAD Associated Data (AD) sent over the wire
+ * and is normally the head of the IV
+ */
+#define NONCE_WIRE_SIZE (NONCE_SIZE - sizeof(struct ovpn_nonce_tail))
+
+/* Last 8 bytes of AEAD nonce
+ * Provided by userspace and usually derived from
+ * key material generated during TLS handshake
+ */
+struct ovpn_nonce_tail {
+	u8 u8[OVPN_NONCE_TAIL_SIZE];
+};
+
+#endif /* _NET_OVPN_PACKET_H_ */
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH net-next v3 05/24] ovpn: implement interface creation/destruction via netlink
  2024-05-06  1:16 [PATCH net-next v3 00/24] Introducing OpenVPN Data Channel Offload Antonio Quartulli
                   ` (3 preceding siblings ...)
  2024-05-06  1:16 ` [PATCH net-next v3 04/24] ovpn: add basic interface creation/destruction/management routines Antonio Quartulli
@ 2024-05-06  1:16 ` Antonio Quartulli
  2024-05-08  0:21   ` Jakub Kicinski
  2024-05-06  1:16 ` [PATCH net-next v3 06/24] ovpn: keep carrier always on Antonio Quartulli
                   ` (19 subsequent siblings)
  24 siblings, 1 reply; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-06  1:16 UTC (permalink / raw
  To: netdev
  Cc: Jakub Kicinski, Sergey Ryazanov, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Esben Haabendal, Antonio Quartulli

Allow userspace to create and destroy an interface using netlink
commands.

Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
---
 drivers/net/ovpn/main.h    |  2 ++
 drivers/net/ovpn/netlink.c | 55 ++++++++++++++++++++++++++++++++++++--
 2 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ovpn/main.h b/drivers/net/ovpn/main.h
index 21d6bfb27d67..12b8d7e4a0fe 100644
--- a/drivers/net/ovpn/main.h
+++ b/drivers/net/ovpn/main.h
@@ -10,6 +10,8 @@
 #ifndef _NET_OVPN_MAIN_H_
 #define _NET_OVPN_MAIN_H_
 
+#define OVPN_DEFAULT_IFNAME "ovpn%d"
+
 /**
  * ovpn_iface_create - create and initialize a new 'ovpn' netdevice
  * @name: the name of the new device
diff --git a/drivers/net/ovpn/netlink.c b/drivers/net/ovpn/netlink.c
index c0a9f58e0e87..66f5c6fbe8e4 100644
--- a/drivers/net/ovpn/netlink.c
+++ b/drivers/net/ovpn/netlink.c
@@ -7,6 +7,7 @@
  */
 
 #include <linux/netdevice.h>
+#include <linux/rtnetlink.h>
 #include <net/genetlink.h>
 
 #include <uapi/linux/ovpn.h>
@@ -78,12 +79,62 @@ void ovpn_nl_post_doit(const struct genl_split_ops *ops, struct sk_buff *skb,
 
 int ovpn_nl_new_iface_doit(struct sk_buff *skb, struct genl_info *info)
 {
-	return -ENOTSUPP;
+	const char *ifname = OVPN_DEFAULT_IFNAME;
+	enum ovpn_mode mode = OVPN_MODE_P2P;
+	struct net_device *dev;
+	struct sk_buff *msg;
+	void *hdr;
+
+	if (info->attrs[OVPN_A_IFNAME])
+		ifname = nla_data(info->attrs[OVPN_A_IFNAME]);
+
+	if (info->attrs[OVPN_A_MODE]) {
+		mode = nla_get_u32(info->attrs[OVPN_A_MODE]);
+		pr_debug("ovpn: setting device (%s) mode: %u\n", ifname, mode);
+	}
+
+	dev = ovpn_iface_create(ifname, mode, genl_info_net(info));
+	if (IS_ERR(dev)) {
+		pr_err("ovpn: error while creating interface %s: %ld\n", ifname,
+		       PTR_ERR(dev));
+		return PTR_ERR(dev);
+	}
+
+	msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!msg)
+		return -ENOMEM;
+
+	hdr = genlmsg_put(msg, info->snd_portid, info->snd_seq, &ovpn_nl_family,
+			  0, OVPN_CMD_NEW_IFACE);
+	if (!hdr) {
+		netdev_err(dev, "%s: cannot create message header\n", __func__);
+		return -EMSGSIZE;
+	}
+
+	if (nla_put(msg, OVPN_A_IFNAME, strlen(dev->name) + 1, dev->name)) {
+		netdev_err(dev, "%s: cannot add ifname to reply\n", __func__);
+		genlmsg_cancel(msg, hdr);
+		nlmsg_free(msg);
+		return -EMSGSIZE;
+	}
+
+	genlmsg_end(msg, hdr);
+
+	return genlmsg_reply(msg, info);
 }
 
 int ovpn_nl_del_iface_doit(struct sk_buff *skb, struct genl_info *info)
 {
-	return -ENOTSUPP;
+	struct ovpn_struct *ovpn = info->user_ptr[0];
+
+	rtnl_lock();
+	ovpn_iface_destruct(ovpn);
+	dev_put(ovpn->dev);
+	rtnl_unlock();
+
+	synchronize_net();
+
+	return 0;
 }
 
 int ovpn_nl_set_peer_doit(struct sk_buff *skb, struct genl_info *info)
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH net-next v3 06/24] ovpn: keep carrier always on
  2024-05-06  1:16 [PATCH net-next v3 00/24] Introducing OpenVPN Data Channel Offload Antonio Quartulli
                   ` (4 preceding siblings ...)
  2024-05-06  1:16 ` [PATCH net-next v3 05/24] ovpn: implement interface creation/destruction via netlink Antonio Quartulli
@ 2024-05-06  1:16 ` Antonio Quartulli
  2024-05-06  1:16 ` [PATCH net-next v3 07/24] ovpn: introduce the ovpn_peer object Antonio Quartulli
                   ` (18 subsequent siblings)
  24 siblings, 0 replies; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-06  1:16 UTC (permalink / raw
  To: netdev
  Cc: Jakub Kicinski, Sergey Ryazanov, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Esben Haabendal, Antonio Quartulli

An ovpn interface will keep carrier always on and let the user
decide when an interface should be considered disconnected.

This way, even if an ovpn interface is not connected to any peer,
it can still retain all IPs and routes and thus prevent any data
leak.

Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
---
 drivers/net/ovpn/main.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c
index 584cd7286aff..cc8a97a1a189 100644
--- a/drivers/net/ovpn/main.c
+++ b/drivers/net/ovpn/main.c
@@ -51,6 +51,13 @@ static int ovpn_net_open(struct net_device *dev)
 		IPV4_DEVCONF_ALL(dev_net(dev), SEND_REDIRECTS) = false;
 	}
 
+	/* ovpn keeps the carrier always on to avoid losing IP or route
+	 * configuration upon disconnection. This way it can prevent leaks
+	 * of traffic outside of the VPN tunnel.
+	 * The user may override this behaviour by tearing down the interface
+	 * manually.
+	 */
+	netif_carrier_on(dev);
 	netif_tx_start_all_queues(dev);
 	return 0;
 }
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH net-next v3 07/24] ovpn: introduce the ovpn_peer object
  2024-05-06  1:16 [PATCH net-next v3 00/24] Introducing OpenVPN Data Channel Offload Antonio Quartulli
                   ` (5 preceding siblings ...)
  2024-05-06  1:16 ` [PATCH net-next v3 06/24] ovpn: keep carrier always on Antonio Quartulli
@ 2024-05-06  1:16 ` Antonio Quartulli
  2024-05-08 16:06   ` Sabrina Dubroca
  2024-05-13 10:09   ` Simon Horman
  2024-05-06  1:16 ` [PATCH net-next v3 08/24] ovpn: introduce the ovpn_socket object Antonio Quartulli
                   ` (17 subsequent siblings)
  24 siblings, 2 replies; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-06  1:16 UTC (permalink / raw
  To: netdev
  Cc: Jakub Kicinski, Sergey Ryazanov, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Esben Haabendal, Antonio Quartulli

An ovpn_peer object holds the whole status of a remote peer
(regardless whether it is a server or a client).

This includes status for crypto, tx/rx buffers, napi, etc.

Only support for one peer is introduced (P2P mode).
Multi peer support is introduced with a later patch.

Along with the ovpn_peer, also the ovpn_bind object is introcued
as the two are strictly related.
An ovpn_bind object wraps a sockaddr representing the local
coordinates being used to talk to a specific peer.

Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
---
 drivers/net/ovpn/Makefile     |   2 +
 drivers/net/ovpn/bind.c       |  60 ++++++
 drivers/net/ovpn/bind.h       | 130 ++++++++++++
 drivers/net/ovpn/io.c         |   8 +
 drivers/net/ovpn/main.c       |  10 +
 drivers/net/ovpn/main.h       |   2 +
 drivers/net/ovpn/ovpnstruct.h |   7 +
 drivers/net/ovpn/peer.c       | 379 ++++++++++++++++++++++++++++++++++
 drivers/net/ovpn/peer.h       | 152 ++++++++++++++
 9 files changed, 750 insertions(+)
 create mode 100644 drivers/net/ovpn/bind.c
 create mode 100644 drivers/net/ovpn/bind.h
 create mode 100644 drivers/net/ovpn/peer.c
 create mode 100644 drivers/net/ovpn/peer.h

diff --git a/drivers/net/ovpn/Makefile b/drivers/net/ovpn/Makefile
index 201dc001419f..ce13499b3e17 100644
--- a/drivers/net/ovpn/Makefile
+++ b/drivers/net/ovpn/Makefile
@@ -7,7 +7,9 @@
 # Author:	Antonio Quartulli <antonio@openvpn.net>
 
 obj-$(CONFIG_OVPN) := ovpn.o
+ovpn-y += bind.o
 ovpn-y += main.o
 ovpn-y += io.o
 ovpn-y += netlink.o
 ovpn-y += netlink-gen.o
+ovpn-y += peer.o
diff --git a/drivers/net/ovpn/bind.c b/drivers/net/ovpn/bind.c
new file mode 100644
index 000000000000..c1f842c06e32
--- /dev/null
+++ b/drivers/net/ovpn/bind.c
@@ -0,0 +1,60 @@
+// SPDX-License-Identifier: GPL-2.0
+/*  OpenVPN data channel offload
+ *
+ *  Copyright (C) 2012-2024 OpenVPN, Inc.
+ *
+ *  Author:	James Yonan <james@openvpn.net>
+ *		Antonio Quartulli <antonio@openvpn.net>
+ */
+
+#include <linux/netdevice.h>
+#include <linux/socket.h>
+
+#include "ovpnstruct.h"
+#include "io.h"
+#include "bind.h"
+#include "peer.h"
+
+struct ovpn_bind *ovpn_bind_from_sockaddr(const struct sockaddr_storage *ss)
+{
+	struct ovpn_bind *bind;
+	size_t sa_len;
+
+	if (ss->ss_family == AF_INET)
+		sa_len = sizeof(struct sockaddr_in);
+	else if (ss->ss_family == AF_INET6)
+		sa_len = sizeof(struct sockaddr_in6);
+	else
+		return ERR_PTR(-EAFNOSUPPORT);
+
+	bind = kzalloc(sizeof(*bind), GFP_ATOMIC);
+	if (unlikely(!bind))
+		return ERR_PTR(-ENOMEM);
+
+	memcpy(&bind->sa, ss, sa_len);
+
+	return bind;
+}
+
+/**
+ * ovpn_bind_release_rcu - RCU callback for releasing binding
+ * @head: the RCU head member
+ */
+static void ovpn_bind_release_rcu(struct rcu_head *head)
+{
+	struct ovpn_bind *bind = container_of(head, struct ovpn_bind, rcu);
+
+	kfree(bind);
+}
+
+void ovpn_bind_reset(struct ovpn_peer *peer, struct ovpn_bind *new)
+{
+	struct ovpn_bind *old;
+
+	spin_lock_bh(&peer->lock);
+	old = rcu_replace_pointer(peer->bind, new, true);
+	spin_unlock_bh(&peer->lock);
+
+	if (old)
+		call_rcu(&old->rcu, ovpn_bind_release_rcu);
+}
diff --git a/drivers/net/ovpn/bind.h b/drivers/net/ovpn/bind.h
new file mode 100644
index 000000000000..61433550a961
--- /dev/null
+++ b/drivers/net/ovpn/bind.h
@@ -0,0 +1,130 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*  OpenVPN data channel offload
+ *
+ *  Copyright (C) 2012-2024 OpenVPN, Inc.
+ *
+ *  Author:	James Yonan <james@openvpn.net>
+ *		Antonio Quartulli <antonio@openvpn.net>
+ */
+
+#ifndef _NET_OVPN_OVPNBIND_H_
+#define _NET_OVPN_OVPNBIND_H_
+
+#include <net/ip.h>
+#include <linux/in.h>
+#include <linux/in6.h>
+#include <linux/rcupdate.h>
+#include <linux/skbuff.h>
+#include <linux/spinlock.h>
+
+struct ovpn_peer;
+
+/**
+ * struct ovpn_sockaddr - basic transport layer address
+ * @in4: IPv4 address
+ * @in6: IPv6 address
+ */
+struct ovpn_sockaddr {
+	union {
+		struct sockaddr_in in4;
+		struct sockaddr_in6 in6;
+	};
+};
+
+/**
+ * struct ovpn_bind - remote peer binding
+ * @sa: the remote peer sockaddress
+ * @local.ipv4: local IPv4 used to talk to the peer
+ * @local.ipv6: local IPv6 used to talk to the peer
+ * @rcu: used to schedule RCU cleanup job
+ */
+struct ovpn_bind {
+	struct ovpn_sockaddr sa;  /* remote sockaddr */
+
+	union {
+		struct in_addr ipv4;
+		struct in6_addr ipv6;
+	} local;
+
+	struct rcu_head rcu;
+};
+
+/**
+ * skb_protocol_to_family - translate skb->protocol to AF_INET or AF_INET6
+ * @skb: the packet sk_buff to inspect
+ *
+ * Return: AF_INET, AF_INET6 or 0 in case of unknown protocol
+ */
+static inline unsigned short skb_protocol_to_family(const struct sk_buff *skb)
+{
+	switch (skb->protocol) {
+	case htons(ETH_P_IP):
+		return AF_INET;
+	case htons(ETH_P_IPV6):
+		return AF_INET6;
+	default:
+		return 0;
+	}
+}
+
+/**
+ * ovpn_bind_skb_src_match - match packet source with binding
+ * @bind: the binding to match
+ * @skb: the packet to match
+ *
+ * Return: true if the packet source matches the remote peer sockaddr
+ * in the binding
+ */
+static inline bool ovpn_bind_skb_src_match(const struct ovpn_bind *bind,
+					   struct sk_buff *skb)
+{
+	const unsigned short family = skb_protocol_to_family(skb);
+	const struct ovpn_sockaddr *sa;
+
+	if (unlikely(!bind))
+		return false;
+
+	sa = &bind->sa;
+
+	if (unlikely(sa->in4.sin_family != family))
+		return false;
+
+	switch (family) {
+	case AF_INET:
+		if (unlikely(sa->in4.sin_addr.s_addr != ip_hdr(skb)->saddr))
+			return false;
+
+		if (unlikely(sa->in4.sin_port != udp_hdr(skb)->source))
+			return false;
+		break;
+	case AF_INET6:
+		if (unlikely(!ipv6_addr_equal(&sa->in6.sin6_addr,
+					      &ipv6_hdr(skb)->saddr)))
+			return false;
+
+		if (unlikely(sa->in6.sin6_port != udp_hdr(skb)->source))
+			return false;
+		break;
+	default:
+		return false;
+	}
+
+	return true;
+}
+
+/**
+ * ovpn_bind_from_sockaddr - retrieve binding matching sockaddr
+ * @sa: the sockaddr to match
+ *
+ * Return: the bind matching the passed sockaddr if found, NULL otherwise
+ */
+struct ovpn_bind *ovpn_bind_from_sockaddr(const struct sockaddr_storage *sa);
+
+/**
+ * ovpn_bind_reset - assign new binding to peer
+ * @peer: the peer whose binding has to be replaced
+ * @bind: the new bind to assign
+ */
+void ovpn_bind_reset(struct ovpn_peer *peer, struct ovpn_bind *bind);
+
+#endif /* _NET_OVPN_OVPNBIND_H_ */
diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c
index 338e99dfe886..a420bb45f25f 100644
--- a/drivers/net/ovpn/io.c
+++ b/drivers/net/ovpn/io.c
@@ -13,6 +13,7 @@
 #include "io.h"
 #include "ovpnstruct.h"
 #include "netlink.h"
+#include "peer.h"
 
 int ovpn_struct_init(struct net_device *dev)
 {
@@ -25,6 +26,13 @@ int ovpn_struct_init(struct net_device *dev)
 	if (err < 0)
 		return err;
 
+	spin_lock_init(&ovpn->lock);
+
+	ovpn->events_wq = alloc_workqueue("ovpn-events-wq-%s", WQ_MEM_RECLAIM,
+					  0, dev->name);
+	if (!ovpn->events_wq)
+		return -ENOMEM;
+
 	dev->tstats = netdev_alloc_pcpu_stats(struct pcpu_sw_netstats);
 	if (!dev->tstats)
 		return -ENOMEM;
diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c
index cc8a97a1a189..dba35ecb236b 100644
--- a/drivers/net/ovpn/main.c
+++ b/drivers/net/ovpn/main.c
@@ -11,6 +11,7 @@
 #include <linux/module.h>
 #include <linux/netdevice.h>
 #include <linux/inetdevice.h>
+//#include <linux/rcupdate.h>
 #include <linux/version.h>
 #include <net/ip.h>
 #include <uapi/linux/if_arp.h>
@@ -21,6 +22,7 @@
 #include "netlink.h"
 #include "io.h"
 #include "packet.h"
+#include "peer.h"
 
 /* Driver info */
 #define DRV_DESCRIPTION	"OpenVPN data channel offload (ovpn)"
@@ -37,6 +39,9 @@ static void ovpn_struct_free(struct net_device *net)
 	rtnl_unlock();
 
 	free_percpu(net->tstats);
+	flush_workqueue(ovpn->events_wq);
+	destroy_workqueue(ovpn->events_wq);
+	rcu_barrier();
 }
 
 static int ovpn_net_open(struct net_device *dev)
@@ -168,6 +173,9 @@ void ovpn_iface_destruct(struct ovpn_struct *ovpn)
 
 	ovpn->registered = false;
 
+	if (ovpn->mode == OVPN_MODE_P2P)
+		ovpn_peer_release_p2p(ovpn);
+
 	unregister_netdevice(ovpn->dev);
 	synchronize_net();
 }
@@ -270,6 +278,8 @@ static __exit void ovpn_cleanup(void)
 	ovpn_nl_unregister();
 	unregister_pernet_device(&ovpn_pernet_ops);
 	unregister_netdevice_notifier(&ovpn_netdev_notifier);
+
+	rcu_barrier();
 }
 
 module_init(ovpn_init);
diff --git a/drivers/net/ovpn/main.h b/drivers/net/ovpn/main.h
index 12b8d7e4a0fe..c08354e3ac8d 100644
--- a/drivers/net/ovpn/main.h
+++ b/drivers/net/ovpn/main.h
@@ -51,4 +51,6 @@ bool ovpn_dev_is_valid(const struct net_device *dev);
 #define OVPN_HEAD_ROOM ALIGN(16 + SKB_HEADER_LEN, 4)
 #define OVPN_MAX_PADDING 16
 
+#define OVPN_QUEUE_LEN 1024
+
 #endif /* _NET_OVPN_MAIN_H_ */
diff --git a/drivers/net/ovpn/ovpnstruct.h b/drivers/net/ovpn/ovpnstruct.h
index ee05b8a2c61d..b79d4f0474b0 100644
--- a/drivers/net/ovpn/ovpnstruct.h
+++ b/drivers/net/ovpn/ovpnstruct.h
@@ -17,12 +17,19 @@
  * @dev: the actual netdev representing the tunnel
  * @registered: whether dev is still registered with netdev or not
  * @mode: device operation mode (i.e. p2p, mp, ..)
+ * @lock: protect this object
+ * @event_wq: used to schedule generic events that may sleep and that need to be
+ *            performed outside of softirq context
+ * @peer: in P2P mode, this is the only remote peer
  * @dev_list: entry for the module wide device list
  */
 struct ovpn_struct {
 	struct net_device *dev;
 	bool registered;
 	enum ovpn_mode mode;
+	spinlock_t lock; /* protect writing to the ovpn_struct object */
+	struct workqueue_struct *events_wq;
+	struct ovpn_peer __rcu *peer;
 	struct list_head dev_list;
 };
 
diff --git a/drivers/net/ovpn/peer.c b/drivers/net/ovpn/peer.c
new file mode 100644
index 000000000000..2948b7320d47
--- /dev/null
+++ b/drivers/net/ovpn/peer.c
@@ -0,0 +1,379 @@
+// SPDX-License-Identifier: GPL-2.0
+/*  OpenVPN data channel offload
+ *
+ *  Copyright (C) 2020-2024 OpenVPN, Inc.
+ *
+ *  Author:	James Yonan <james@openvpn.net>
+ *		Antonio Quartulli <antonio@openvpn.net>
+ */
+
+#include <linux/skbuff.h>
+#include <linux/list.h>
+#include <linux/workqueue.h>
+
+#include "ovpnstruct.h"
+#include "bind.h"
+#include "io.h"
+#include "main.h"
+#include "netlink.h"
+#include "peer.h"
+
+struct ovpn_peer *ovpn_peer_new(struct ovpn_struct *ovpn, u32 id)
+{
+	struct ovpn_peer *peer;
+	int ret;
+
+	/* alloc and init peer object */
+	peer = kzalloc(sizeof(*peer), GFP_KERNEL);
+	if (!peer)
+		return ERR_PTR(-ENOMEM);
+
+	peer->id = id;
+	peer->halt = false;
+	peer->ovpn = ovpn;
+
+	peer->vpn_addrs.ipv4.s_addr = htonl(INADDR_ANY);
+	peer->vpn_addrs.ipv6 = in6addr_any;
+
+	RCU_INIT_POINTER(peer->bind, NULL);
+	spin_lock_init(&peer->lock);
+	kref_init(&peer->refcount);
+
+	ret = dst_cache_init(&peer->dst_cache, GFP_KERNEL);
+	if (ret < 0) {
+		netdev_err(ovpn->dev, "%s: cannot initialize dst cache\n",
+			   __func__);
+		goto err;
+	}
+
+	ret = ptr_ring_init(&peer->tx_ring, OVPN_QUEUE_LEN, GFP_KERNEL);
+	if (ret < 0) {
+		netdev_err(ovpn->dev, "%s: cannot allocate TX ring\n",
+			   __func__);
+		goto err_dst_cache;
+	}
+
+	ret = ptr_ring_init(&peer->rx_ring, OVPN_QUEUE_LEN, GFP_KERNEL);
+	if (ret < 0) {
+		netdev_err(ovpn->dev, "%s: cannot allocate RX ring\n",
+			   __func__);
+		goto err_tx_ring;
+	}
+
+	ret = ptr_ring_init(&peer->netif_rx_ring, OVPN_QUEUE_LEN, GFP_KERNEL);
+	if (ret < 0) {
+		netdev_err(ovpn->dev, "%s: cannot allocate NETIF RX ring\n",
+			   __func__);
+		goto err_rx_ring;
+	}
+
+	dev_hold(ovpn->dev);
+
+	return peer;
+err_rx_ring:
+	ptr_ring_cleanup(&peer->rx_ring, NULL);
+err_tx_ring:
+	ptr_ring_cleanup(&peer->tx_ring, NULL);
+err_dst_cache:
+	dst_cache_destroy(&peer->dst_cache);
+err:
+	kfree(peer);
+	return ERR_PTR(ret);
+}
+
+#define ovpn_peer_index(_tbl, _key, _key_len)		\
+	(jhash(_key, _key_len, 0) % HASH_SIZE(_tbl))	\
+
+/**
+ * ovpn_peer_free - release private members and free peer object
+ * @peer: the peer to free
+ */
+static void ovpn_peer_free(struct ovpn_peer *peer)
+{
+	ovpn_bind_reset(peer, NULL);
+
+	WARN_ON(!__ptr_ring_empty(&peer->tx_ring));
+	ptr_ring_cleanup(&peer->tx_ring, NULL);
+	WARN_ON(!__ptr_ring_empty(&peer->rx_ring));
+	ptr_ring_cleanup(&peer->rx_ring, NULL);
+	WARN_ON(!__ptr_ring_empty(&peer->netif_rx_ring));
+	ptr_ring_cleanup(&peer->netif_rx_ring, NULL);
+
+	dst_cache_destroy(&peer->dst_cache);
+
+	dev_put(peer->ovpn->dev);
+
+	kfree(peer);
+}
+
+/**
+ * ovpn_peer_release_rcu - RCU callback for releasing peer
+ * @head: the RCU head member
+ */
+static void ovpn_peer_release_rcu(struct rcu_head *head)
+{
+	struct ovpn_peer *peer = container_of(head, struct ovpn_peer, rcu);
+
+	ovpn_peer_free(peer);
+}
+
+void ovpn_peer_release(struct ovpn_peer *peer)
+{
+	call_rcu(&peer->rcu, ovpn_peer_release_rcu);
+}
+
+/**
+ * ovpn_peer_delete_work - work scheduled to release peer in process context
+ * @work: the work object
+ */
+static void ovpn_peer_delete_work(struct work_struct *work)
+{
+	struct ovpn_peer *peer = container_of(work, struct ovpn_peer,
+					      delete_work);
+	ovpn_peer_release(peer);
+}
+
+void ovpn_peer_release_kref(struct kref *kref)
+{
+	struct ovpn_peer *peer = container_of(kref, struct ovpn_peer, refcount);
+
+	INIT_WORK(&peer->delete_work, ovpn_peer_delete_work);
+	queue_work(peer->ovpn->events_wq, &peer->delete_work);
+}
+
+/**
+ * ovpn_peer_skb_to_sockaddr - fill sockaddr with skb source address
+ * @skb: the packet to extract data from
+ * @ss: the sockaddr to fill
+ *
+ * Return: true on success or false otherwise
+ */
+static bool ovpn_peer_skb_to_sockaddr(struct sk_buff *skb,
+				      struct sockaddr_storage *ss)
+{
+	struct sockaddr_in6 *sa6;
+	struct sockaddr_in *sa4;
+
+	ss->ss_family = skb_protocol_to_family(skb);
+	switch (ss->ss_family) {
+	case AF_INET:
+		sa4 = (struct sockaddr_in *)ss;
+		sa4->sin_family = AF_INET;
+		sa4->sin_addr.s_addr = ip_hdr(skb)->saddr;
+		sa4->sin_port = udp_hdr(skb)->source;
+		break;
+	case AF_INET6:
+		sa6 = (struct sockaddr_in6 *)ss;
+		sa6->sin6_family = AF_INET6;
+		sa6->sin6_addr = ipv6_hdr(skb)->saddr;
+		sa6->sin6_port = udp_hdr(skb)->source;
+		break;
+	default:
+		return false;
+	}
+
+	return true;
+}
+
+/**
+ * ovpn_peer_transp_match - check if sockaddr and peer binding match
+ * @peer: the peer to get the binding from
+ * @ss: the sockaddr to match
+ *
+ * Return: true if sockaddr and binding match or false otherwise
+ */
+static bool ovpn_peer_transp_match(struct ovpn_peer *peer,
+				   struct sockaddr_storage *ss)
+{
+	struct ovpn_bind *bind = rcu_dereference(peer->bind);
+	struct sockaddr_in6 *sa6;
+	struct sockaddr_in *sa4;
+
+	if (unlikely(!bind))
+		return false;
+
+	if (ss->ss_family != bind->sa.in4.sin_family)
+		return false;
+
+	switch (ss->ss_family) {
+	case AF_INET:
+		sa4 = (struct sockaddr_in *)ss;
+		if (sa4->sin_addr.s_addr != bind->sa.in4.sin_addr.s_addr)
+			return false;
+		if (sa4->sin_port != bind->sa.in4.sin_port)
+			return false;
+		break;
+	case AF_INET6:
+		sa6 = (struct sockaddr_in6 *)ss;
+		if (memcmp(&sa6->sin6_addr, &bind->sa.in6.sin6_addr,
+			   sizeof(struct in6_addr)))
+			return false;
+		if (sa6->sin6_port != bind->sa.in6.sin6_port)
+			return false;
+		break;
+	default:
+		return false;
+	}
+
+	return true;
+}
+
+/**
+ * ovpn_peer_get_by_transp_addr_p2p - get peer by transport address in a P2P
+ *                                    instance
+ * @ovpn: the openvpn instance to search
+ * @ss: the transport socket address
+ *
+ * Return: the peer if found or NULL otherwise
+ */
+static struct ovpn_peer *
+ovpn_peer_get_by_transp_addr_p2p(struct ovpn_struct *ovpn,
+				 struct sockaddr_storage *ss)
+{
+	struct ovpn_peer *tmp, *peer = NULL;
+
+	rcu_read_lock();
+	tmp = rcu_dereference(ovpn->peer);
+	if (likely(tmp && ovpn_peer_transp_match(tmp, ss) &&
+		   ovpn_peer_hold(tmp)))
+		peer = tmp;
+	rcu_read_unlock();
+
+	return peer;
+}
+
+struct ovpn_peer *ovpn_peer_get_by_transp_addr(struct ovpn_struct *ovpn,
+					       struct sk_buff *skb)
+{
+	struct ovpn_peer *peer = NULL;
+	struct sockaddr_storage ss = { 0 };
+
+	if (unlikely(!ovpn_peer_skb_to_sockaddr(skb, &ss)))
+		return NULL;
+
+	if (ovpn->mode == OVPN_MODE_P2P)
+		peer = ovpn_peer_get_by_transp_addr_p2p(ovpn, &ss);
+
+	return peer;
+}
+
+/**
+ * ovpn_peer_get_by_id_p2p - get peer by ID in a P2P instance
+ * @ovpn: the openvpn instance to search
+ * @peer_id: the ID of the peer to find
+ *
+ * Return: the peer if found or NULL otherwise
+ */
+static struct ovpn_peer *ovpn_peer_get_by_id_p2p(struct ovpn_struct *ovpn,
+						 u32 peer_id)
+{
+	struct ovpn_peer *tmp, *peer = NULL;
+
+	rcu_read_lock();
+	tmp = rcu_dereference(ovpn->peer);
+	if (likely(tmp && tmp->id == peer_id && ovpn_peer_hold(tmp)))
+		peer = tmp;
+	rcu_read_unlock();
+
+	return peer;
+}
+
+struct ovpn_peer *ovpn_peer_get_by_id(struct ovpn_struct *ovpn, u32 peer_id)
+{
+	struct ovpn_peer *peer = NULL;
+
+	if (ovpn->mode == OVPN_MODE_P2P)
+		peer = ovpn_peer_get_by_id_p2p(ovpn, peer_id);
+
+	return peer;
+}
+
+/**
+ * ovpn_peer_add_p2p - add per to related tables in a P2P instance
+ * @ovpn: the instance to add the peer to
+ * @peer: the peer to add
+ *
+ * Return: 0 on success or a negative error code otherwise
+ */
+static int ovpn_peer_add_p2p(struct ovpn_struct *ovpn, struct ovpn_peer *peer)
+{
+	struct ovpn_peer *tmp;
+
+	spin_lock_bh(&ovpn->lock);
+	/* in p2p mode it is possible to have a single peer only, therefore the
+	 * old one is released and substituted by the new one
+	 */
+	tmp = rcu_dereference(ovpn->peer);
+	if (tmp) {
+		tmp->delete_reason = OVPN_DEL_PEER_REASON_TEARDOWN;
+		ovpn_peer_put(tmp);
+	}
+
+	rcu_assign_pointer(ovpn->peer, peer);
+	spin_unlock_bh(&ovpn->lock);
+
+	return 0;
+}
+
+int ovpn_peer_add(struct ovpn_struct *ovpn, struct ovpn_peer *peer)
+{
+	switch (ovpn->mode) {
+	case OVPN_MODE_P2P:
+		return ovpn_peer_add_p2p(ovpn, peer);
+	default:
+		return -EOPNOTSUPP;
+	}
+}
+
+/**
+ * ovpn_peer_del_p2p - delete peer from related tables in a P2P instance
+ * @peer: the peer to delete
+ * @reason: reason why the peer was deleted (sent to userspace)
+ *
+ * Return: 0 on success or a negative error code otherwise
+ */
+static int ovpn_peer_del_p2p(struct ovpn_peer *peer,
+			     enum ovpn_del_peer_reason reason)
+{
+	struct ovpn_peer *tmp;
+	int ret = -ENOENT;
+
+	spin_lock_bh(&peer->ovpn->lock);
+	tmp = rcu_dereference(peer->ovpn->peer);
+	if (tmp != peer)
+		goto unlock;
+
+	ovpn_peer_put(tmp);
+	tmp->delete_reason = reason;
+	RCU_INIT_POINTER(peer->ovpn->peer, NULL);
+	ret = 0;
+
+unlock:
+	spin_unlock_bh(&peer->ovpn->lock);
+
+	return ret;
+}
+
+void ovpn_peer_release_p2p(struct ovpn_struct *ovpn)
+{
+	struct ovpn_peer *tmp;
+
+	rcu_read_lock();
+	tmp = rcu_dereference(ovpn->peer);
+	if (!tmp)
+		goto unlock;
+
+	ovpn_peer_del_p2p(tmp, OVPN_DEL_PEER_REASON_TEARDOWN);
+unlock:
+	rcu_read_unlock();
+}
+
+int ovpn_peer_del(struct ovpn_peer *peer, enum ovpn_del_peer_reason reason)
+{
+	switch (peer->ovpn->mode) {
+	case OVPN_MODE_P2P:
+		return ovpn_peer_del_p2p(peer, reason);
+	default:
+		return -EOPNOTSUPP;
+	}
+}
diff --git a/drivers/net/ovpn/peer.h b/drivers/net/ovpn/peer.h
new file mode 100644
index 000000000000..659df320525c
--- /dev/null
+++ b/drivers/net/ovpn/peer.h
@@ -0,0 +1,152 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* OpenVPN data channel offload
+ *
+ *  Copyright (C) 2020-2024 OpenVPN, Inc.
+ *
+ *  Author:	James Yonan <james@openvpn.net>
+ *		Antonio Quartulli <antonio@openvpn.net>
+ */
+
+#ifndef _NET_OVPN_OVPNPEER_H_
+#define _NET_OVPN_OVPNPEER_H_
+
+#include "bind.h"
+
+#include <linux/ptr_ring.h>
+#include <net/dst_cache.h>
+#include <uapi/linux/ovpn.h>
+
+/**
+ * struct ovpn_peer - the main remote peer object
+ * @ovpn: main openvpn instance this peer belongs to
+ * @id: unique identifier
+ * @vpn_addrs.ipv4: IPv4 assigned to peer on the tunnel
+ * @vpn_addrs.ipv6: IPv6 assigned to peer on the tunnel
+ * @tx_ring: queue of outgoing poackets to this peer
+ * @rx_ring: queue of incoming packets from this peer
+ * @netif_rx_ring: queue of packets to be sent to the netdevice via NAPI
+ * @dst_cache: cache for dst_entry used to send to peer
+ * @bind: remote peer binding
+ * @halt: true if ovpn_peer_mark_delete was called
+ * @delete_reason: why peer was deleted (i.e. timeout, transport error, ..)
+ * @lock: protects binding to peer (bind)
+ * @refcount: reference counter
+ * @rcu: used to free peer in an RCU safe way
+ * @delete_work: deferred cleanup work, used to notify userspace
+ */
+struct ovpn_peer {
+	struct ovpn_struct *ovpn;
+	u32 id;
+	struct {
+		struct in_addr ipv4;
+		struct in6_addr ipv6;
+	} vpn_addrs;
+	struct ptr_ring tx_ring;
+	struct ptr_ring rx_ring;
+	struct ptr_ring netif_rx_ring;
+	struct dst_cache dst_cache;
+	struct ovpn_bind __rcu *bind;
+	bool halt;
+	enum ovpn_del_peer_reason delete_reason;
+	spinlock_t lock; /* protects bind */
+	struct kref refcount;
+	struct rcu_head rcu;
+	struct work_struct delete_work;
+};
+
+/**
+ * ovpn_peer_release_kref - callback for kref_put
+ * @kref: the kref object belonging to the peer
+ */
+void ovpn_peer_release_kref(struct kref *kref);
+
+/**
+ * ovpn_peer_release - schedule RCU cleanup work
+ * @peer: the peer to release
+ */
+void ovpn_peer_release(struct ovpn_peer *peer);
+
+/**
+ * ovpn_peer_hold - increase reference counter
+ * @peer: the peer whose counter should be increased
+ *
+ * Return: true if the counter was increased or false if it was zero already
+ */
+static inline bool ovpn_peer_hold(struct ovpn_peer *peer)
+{
+	return kref_get_unless_zero(&peer->refcount);
+}
+
+/**
+ * ovpn_peer_put - decrease reference counter
+ * @peer: the peer whose counter should be decreased
+ */
+static inline void ovpn_peer_put(struct ovpn_peer *peer)
+{
+	kref_put(&peer->refcount, ovpn_peer_release_kref);
+}
+
+/**
+ * ovpn_peer_new - allocate and initialize a new peer object
+ * @ovpn: the openvpn instance inside which the peer should be created
+ * @id: the ID assigned to this peer
+ *
+ * Return: a pointer to the new peer on success or an error code otherwise
+ */
+struct ovpn_peer *ovpn_peer_new(struct ovpn_struct *ovpn, u32 id);
+
+/**
+ * ovpn_peer_add - add peer to the related tables
+ * @ovpn: the openvpn instance the peer belongs to
+ * @peer: the peer object to add
+ *
+ * Assume refcounter was increased by caller
+ *
+ * Return: 0 on success or a negative error code otherwise
+ */
+int ovpn_peer_add(struct ovpn_struct *ovpn, struct ovpn_peer *peer);
+
+/**
+ * ovpn_peer_del - delete peer from related tables
+ * @peer: the peer object to delete
+ * @reason: reason for deleting peer (will be sent to userspace)
+ *
+ * Return: 0 on success or a negative error code otherwise
+ */
+int ovpn_peer_del(struct ovpn_peer *peer, enum ovpn_del_peer_reason reason);
+
+/**
+ * ovpn_peer_find - find peer having the specified ID
+ * @ovpn: the openvpn instance to search
+ * @peer_id: the ID of the peer to find
+ *
+ * Return: a pointer to the peer if found or NULL otherwise
+ */
+struct ovpn_peer *ovpn_peer_find(struct ovpn_struct *ovpn, u32 peer_id);
+
+/**
+ * ovpn_peer_release_p2p - release peer upon P2P device teardown
+ * @ovpn: the instance being torn down
+ */
+void ovpn_peer_release_p2p(struct ovpn_struct *ovpn);
+
+/**
+ * ovpn_peer_get_by_transp_addr - retrieve peer by transport address
+ * @ovpn: the openvpn instance to search
+ * @skb: the skb to retrieve the source transport address from
+ *
+ * Return: a pointer to the peer if found or NULL otherwise
+ */
+struct ovpn_peer *ovpn_peer_get_by_transp_addr(struct ovpn_struct *ovpn,
+					       struct sk_buff *skb);
+
+/**
+ * ovpn_peer_get_by_id - retrieve peer by ID
+ * @ovpn: the openvpn instance to search
+ * @peer_id: the unique peer identifier to match
+ *
+ * Return: a pointer to the peer if found or NULL otherwise
+ */
+struct ovpn_peer *ovpn_peer_get_by_id(struct ovpn_struct *ovpn, u32 peer_id);
+
+#endif /* _NET_OVPN_OVPNPEER_H_ */
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH net-next v3 08/24] ovpn: introduce the ovpn_socket object
  2024-05-06  1:16 [PATCH net-next v3 00/24] Introducing OpenVPN Data Channel Offload Antonio Quartulli
                   ` (6 preceding siblings ...)
  2024-05-06  1:16 ` [PATCH net-next v3 07/24] ovpn: introduce the ovpn_peer object Antonio Quartulli
@ 2024-05-06  1:16 ` Antonio Quartulli
  2024-05-08 17:10   ` Sabrina Dubroca
  2024-05-06  1:16 ` [PATCH net-next v3 09/24] ovpn: implement basic TX path (UDP) Antonio Quartulli
                   ` (16 subsequent siblings)
  24 siblings, 1 reply; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-06  1:16 UTC (permalink / raw
  To: netdev
  Cc: Jakub Kicinski, Sergey Ryazanov, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Esben Haabendal, Antonio Quartulli

This specific structure is used in the ovpn kernel module
to wrap and carry around a standard kernel socket.

ovpn takes ownership of passed sockets and therefore an ovpn
specific objects is attathced to them for status tracking
purposes.

Initially only UDP support is introduced. TCP will come in a later
patch.

Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
---
 drivers/net/ovpn/Makefile |   2 +
 drivers/net/ovpn/socket.c | 105 ++++++++++++++++++++++++++++++++++++++
 drivers/net/ovpn/socket.h |  68 ++++++++++++++++++++++++
 drivers/net/ovpn/udp.c    |  45 ++++++++++++++++
 drivers/net/ovpn/udp.h    |  27 ++++++++++
 5 files changed, 247 insertions(+)
 create mode 100644 drivers/net/ovpn/socket.c
 create mode 100644 drivers/net/ovpn/socket.h
 create mode 100644 drivers/net/ovpn/udp.c
 create mode 100644 drivers/net/ovpn/udp.h

diff --git a/drivers/net/ovpn/Makefile b/drivers/net/ovpn/Makefile
index ce13499b3e17..56bddc9bef83 100644
--- a/drivers/net/ovpn/Makefile
+++ b/drivers/net/ovpn/Makefile
@@ -13,3 +13,5 @@ ovpn-y += io.o
 ovpn-y += netlink.o
 ovpn-y += netlink-gen.o
 ovpn-y += peer.o
+ovpn-y += socket.o
+ovpn-y += udp.o
diff --git a/drivers/net/ovpn/socket.c b/drivers/net/ovpn/socket.c
new file mode 100644
index 000000000000..a4a4d69162f0
--- /dev/null
+++ b/drivers/net/ovpn/socket.c
@@ -0,0 +1,105 @@
+// SPDX-License-Identifier: GPL-2.0
+/*  OpenVPN data channel offload
+ *
+ *  Copyright (C) 2020-2024 OpenVPN, Inc.
+ *
+ *  Author:	James Yonan <james@openvpn.net>
+ *		Antonio Quartulli <antonio@openvpn.net>
+ */
+
+#include <linux/net.h>
+#include <linux/netdevice.h>
+
+#include "ovpnstruct.h"
+#include "main.h"
+#include "io.h"
+#include "peer.h"
+#include "socket.h"
+#include "udp.h"
+
+/* Finalize release of socket, called after RCU grace period */
+static void ovpn_socket_detach(struct socket *sock)
+{
+	if (!sock)
+		return;
+
+	sockfd_put(sock);
+}
+
+void ovpn_socket_release_kref(struct kref *kref)
+{
+	struct ovpn_socket *sock = container_of(kref, struct ovpn_socket,
+						refcount);
+
+	ovpn_socket_detach(sock->sock);
+	kfree_rcu(sock, rcu);
+}
+
+static bool ovpn_socket_hold(struct ovpn_socket *sock)
+{
+	return kref_get_unless_zero(&sock->refcount);
+}
+
+static struct ovpn_socket *ovpn_socket_get(struct socket *sock)
+{
+	struct ovpn_socket *ovpn_sock;
+
+	rcu_read_lock();
+	ovpn_sock = rcu_dereference_sk_user_data(sock->sk);
+	if (!ovpn_socket_hold(ovpn_sock)) {
+		pr_warn("%s: found ovpn_socket with ref = 0\n", __func__);
+		ovpn_sock = NULL;
+	}
+	rcu_read_unlock();
+
+	return ovpn_sock;
+}
+
+/* Finalize release of socket, called after RCU grace period */
+static int ovpn_socket_attach(struct socket *sock, struct ovpn_peer *peer)
+{
+	int ret = -EOPNOTSUPP;
+
+	if (!sock || !peer)
+		return -EINVAL;
+
+	if (sock->sk->sk_protocol == IPPROTO_UDP)
+		ret = ovpn_udp_socket_attach(sock, peer->ovpn);
+
+	return ret;
+}
+
+struct ovpn_socket *ovpn_socket_new(struct socket *sock, struct ovpn_peer *peer)
+{
+	struct ovpn_socket *ovpn_sock;
+	int ret;
+
+	ret = ovpn_socket_attach(sock, peer);
+	if (ret < 0 && ret != -EALREADY)
+		return ERR_PTR(ret);
+
+	/* if this socket is already owned by this interface, just increase the
+	 * refcounter
+	 */
+	if (ret == -EALREADY) {
+		/* caller is expected to increase the sock refcounter before
+		 * passing it to this function. For this reason we drop it if
+		 * not needed, like when this socket is already owned.
+		 */
+		ovpn_sock = ovpn_socket_get(sock);
+		sockfd_put(sock);
+		return ovpn_sock;
+	}
+
+	ovpn_sock = kzalloc(sizeof(*ovpn_sock), GFP_KERNEL);
+	if (!ovpn_sock)
+		return ERR_PTR(-ENOMEM);
+
+	ovpn_sock->ovpn = peer->ovpn;
+	ovpn_sock->sock = sock;
+	kref_init(&ovpn_sock->refcount);
+
+	rcu_assign_sk_user_data(sock->sk, ovpn_sock);
+
+	return ovpn_sock;
+}
diff --git a/drivers/net/ovpn/socket.h b/drivers/net/ovpn/socket.h
new file mode 100644
index 000000000000..0d23de5a9344
--- /dev/null
+++ b/drivers/net/ovpn/socket.h
@@ -0,0 +1,68 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*  OpenVPN data channel offload
+ *
+ *  Copyright (C) 2020-2024 OpenVPN, Inc.
+ *
+ *  Author:	James Yonan <james@openvpn.net>
+ *		Antonio Quartulli <antonio@openvpn.net>
+ */
+
+#ifndef _NET_OVPN_SOCK_H_
+#define _NET_OVPN_SOCK_H_
+
+#include <linux/net.h>
+#include <linux/kref.h>
+#include <linux/ptr_ring.h>
+#include <net/sock.h>
+
+struct ovpn_struct;
+struct ovpn_peer;
+
+/**
+ * struct ovpn_socket - a kernel socket referenced in the ovpn code
+ * @ovpn: ovpn instance owning this socket (UDP only)
+ * @sock: the low level sock object
+ * @refcount: amount of contexts currently referencing this object
+ * @rcu: member used to schedule RCU destructor callback
+ */
+struct ovpn_socket {
+	struct ovpn_struct *ovpn;
+	struct socket *sock;
+	struct kref refcount;
+	struct rcu_head rcu;
+};
+
+/**
+ * ovpn_from_udp_sock - retrieve ovpn instance object from UDP sock
+ * @sk: the sock to retrieve the instance from
+ *
+ * Return: the ovpn instance that this sock is bound to
+ */
+struct ovpn_struct *ovpn_from_udp_sock(struct sock *sk);
+
+/**
+ * ovpn_socket_release_kref - kref_put callback
+ * @kref: the kref object
+ */
+void ovpn_socket_release_kref(struct kref *kref);
+
+/**
+ * ovpn_socket_put - decrease reference counter
+ * @sock: the socket whose reference counter should be decreased
+ */
+static inline void ovpn_socket_put(struct ovpn_socket *sock)
+{
+	kref_put(&sock->refcount, ovpn_socket_release_kref);
+}
+
+/**
+ * ovpn_socket_new - create a new socket and initialize it
+ * @sock: the kernel socket to embed
+ * @peer: the peer reachable via this socket
+ *
+ * Return: an openvpn socket on success or a negative error code otherwise
+ */
+struct ovpn_socket *ovpn_socket_new(struct socket *sock,
+				    struct ovpn_peer *peer);
+
+#endif /* _NET_OVPN_SOCK_H_ */
diff --git a/drivers/net/ovpn/udp.c b/drivers/net/ovpn/udp.c
new file mode 100644
index 000000000000..4b7d96a13df0
--- /dev/null
+++ b/drivers/net/ovpn/udp.c
@@ -0,0 +1,45 @@
+// SPDX-License-Identifier: GPL-2.0
+/*  OpenVPN data channel offload
+ *
+ *  Copyright (C) 2019-2024 OpenVPN, Inc.
+ *
+ *  Author:	Antonio Quartulli <antonio@openvpn.net>
+ */
+
+#include <linux/netdevice.h>
+#include <linux/socket.h>
+
+#include "ovpnstruct.h"
+#include "main.h"
+#include "socket.h"
+#include "udp.h"
+
+int ovpn_udp_socket_attach(struct socket *sock, struct ovpn_struct *ovpn)
+{
+	struct ovpn_socket *old_data;
+
+	/* sanity check */
+	if (sock->sk->sk_protocol != IPPROTO_UDP) {
+		netdev_err(ovpn->dev, "%s: expected UDP socket\n", __func__);
+		return -EINVAL;
+	}
+
+	/* make sure no pre-existing encapsulation handler exists */
+	rcu_read_lock();
+	old_data = rcu_dereference_sk_user_data(sock->sk);
+	rcu_read_unlock();
+	if (old_data) {
+		if (old_data->ovpn == ovpn) {
+			netdev_dbg(ovpn->dev,
+				   "%s: provided socket already owned by this interface\n",
+				   __func__);
+			return -EALREADY;
+		}
+
+		netdev_err(ovpn->dev, "%s: provided socket already taken by other user\n",
+			   __func__);
+		return -EBUSY;
+	}
+
+	return 0;
+}
diff --git a/drivers/net/ovpn/udp.h b/drivers/net/ovpn/udp.h
new file mode 100644
index 000000000000..16422a649cb9
--- /dev/null
+++ b/drivers/net/ovpn/udp.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*  OpenVPN data channel offload
+ *
+ *  Copyright (C) 2019-2024 OpenVPN, Inc.
+ *
+ *  Author:	Antonio Quartulli <antonio@openvpn.net>
+ */
+
+#ifndef _NET_OVPN_UDP_H_
+#define _NET_OVPN_UDP_H_
+
+struct ovpn_struct;
+struct socket;
+
+/**
+ * ovpn_udp_socket_attach - set udp-tunnel CBs on socket and link it to ovpn
+ * @sock: socket to configure
+ * @ovpn: the openvp instance to link
+ *
+ * After invoking this function, the sock will be controlled by ovpn so that
+ * any incoming packet may be processed by ovpn first.
+ *
+ * Return: 0 on success or a negative error code otherwise
+ */
+int ovpn_udp_socket_attach(struct socket *sock, struct ovpn_struct *ovpn);
+
+#endif /* _NET_OVPN_UDP_H_ */
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH net-next v3 09/24] ovpn: implement basic TX path (UDP)
  2024-05-06  1:16 [PATCH net-next v3 00/24] Introducing OpenVPN Data Channel Offload Antonio Quartulli
                   ` (7 preceding siblings ...)
  2024-05-06  1:16 ` [PATCH net-next v3 08/24] ovpn: introduce the ovpn_socket object Antonio Quartulli
@ 2024-05-06  1:16 ` Antonio Quartulli
  2024-05-10 13:01   ` Sabrina Dubroca
  2024-05-12 21:35   ` Sabrina Dubroca
  2024-05-06  1:16 ` [PATCH net-next v3 10/24] ovpn: implement basic RX " Antonio Quartulli
                   ` (15 subsequent siblings)
  24 siblings, 2 replies; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-06  1:16 UTC (permalink / raw
  To: netdev
  Cc: Jakub Kicinski, Sergey Ryazanov, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Esben Haabendal, Antonio Quartulli

Packets sent over the ovpn interface are processed and transmitted to the
connected peer, if any.

Implementation is UDP only. TCP will be added by a later patch.

Note: no crypto/encapsulation exists yet. packets are just captured and
sent.

Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
---
 drivers/net/ovpn/io.c         | 174 ++++++++++++++++++++++++++-
 drivers/net/ovpn/io.h         |   2 +
 drivers/net/ovpn/main.c       |   2 +
 drivers/net/ovpn/ovpnstruct.h |   2 +
 drivers/net/ovpn/peer.c       |  37 ++++++
 drivers/net/ovpn/peer.h       |   9 ++
 drivers/net/ovpn/udp.c        | 219 ++++++++++++++++++++++++++++++++++
 drivers/net/ovpn/udp.h        |  14 +++
 8 files changed, 458 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c
index a420bb45f25f..36cfb95edbf4 100644
--- a/drivers/net/ovpn/io.c
+++ b/drivers/net/ovpn/io.c
@@ -9,11 +9,13 @@
 
 #include <linux/netdevice.h>
 #include <linux/skbuff.h>
+#include <net/gso.h>
 
 #include "io.h"
 #include "ovpnstruct.h"
 #include "netlink.h"
 #include "peer.h"
+#include "udp.h"
 
 int ovpn_struct_init(struct net_device *dev)
 {
@@ -28,6 +30,12 @@ int ovpn_struct_init(struct net_device *dev)
 
 	spin_lock_init(&ovpn->lock);
 
+	ovpn->crypto_wq = alloc_workqueue("ovpn-crypto-wq-%s",
+					  WQ_CPU_INTENSIVE | WQ_MEM_RECLAIM, 0,
+					  dev->name);
+	if (!ovpn->crypto_wq)
+		return -ENOMEM;
+
 	ovpn->events_wq = alloc_workqueue("ovpn-events-wq-%s", WQ_MEM_RECLAIM,
 					  0, dev->name);
 	if (!ovpn->events_wq)
@@ -40,11 +48,175 @@ int ovpn_struct_init(struct net_device *dev)
 	return 0;
 }
 
+static bool ovpn_encrypt_one(struct ovpn_peer *peer, struct sk_buff *skb)
+{
+	return true;
+}
+
+/* Process packets in TX queue in a transport-specific way.
+ *
+ * UDP transport - encrypt and send across the tunnel.
+ */
+void ovpn_encrypt_work(struct work_struct *work)
+{
+	struct sk_buff *skb, *curr, *next;
+	struct ovpn_peer *peer;
+
+	peer = container_of(work, struct ovpn_peer, encrypt_work);
+	while ((skb = ptr_ring_consume_bh(&peer->tx_ring))) {
+		/* this might be a GSO-segmented skb list: process each skb
+		 * independently
+		 */
+		skb_list_walk_safe(skb, curr, next) {
+			/* if one segment fails encryption, we drop the entire
+			 * packet, because it does not really make sense to send
+			 * only part of it at this point
+			 */
+			if (unlikely(!ovpn_encrypt_one(peer, curr))) {
+				kfree_skb_list(skb);
+				skb = NULL;
+				break;
+			}
+		}
+
+		/* successful encryption */
+		if (likely(skb)) {
+			skb_list_walk_safe(skb, curr, next) {
+				skb_mark_not_on_list(curr);
+
+				switch (peer->sock->sock->sk->sk_protocol) {
+				case IPPROTO_UDP:
+					ovpn_udp_send_skb(peer->ovpn, peer,
+							  curr);
+					break;
+				default:
+					/* no transport configured yet */
+					consume_skb(skb);
+					break;
+				}
+			}
+		}
+
+		/* give a chance to be rescheduled if needed */
+		cond_resched();
+	}
+	ovpn_peer_put(peer);
+}
+
+/* send skb to connected peer, if any */
+static void ovpn_queue_skb(struct ovpn_struct *ovpn, struct sk_buff *skb,
+			   struct ovpn_peer *peer)
+{
+	int ret;
+
+	if (likely(!peer))
+		/* retrieve peer serving the destination IP of this packet */
+		peer = ovpn_peer_get_by_dst(ovpn, skb);
+	if (unlikely(!peer)) {
+		net_dbg_ratelimited("%s: no peer to send data to\n",
+				    ovpn->dev->name);
+		goto drop;
+	}
+
+	ret = ptr_ring_produce_bh(&peer->tx_ring, skb);
+	if (unlikely(ret < 0)) {
+		net_err_ratelimited("%s: cannot queue packet to TX ring\n",
+				    peer->ovpn->dev->name);
+		goto drop;
+	}
+
+	if (!queue_work(ovpn->crypto_wq, &peer->encrypt_work))
+		ovpn_peer_put(peer);
+
+	return;
+drop:
+	if (peer)
+		ovpn_peer_put(peer);
+	kfree_skb_list(skb);
+}
+
+/* Return IP protocol version from skb header.
+ * Return 0 if protocol is not IPv4/IPv6 or cannot be read.
+ */
+static __be16 ovpn_ip_check_protocol(struct sk_buff *skb)
+{
+	__be16 proto = 0;
+
+	/* skb could be non-linear, make sure IP header is in non-fragmented
+	 * part
+	 */
+	if (!pskb_network_may_pull(skb, sizeof(struct iphdr)))
+		return 0;
+
+	if (ip_hdr(skb)->version == 4)
+		proto = htons(ETH_P_IP);
+	else if (ip_hdr(skb)->version == 6)
+		proto = htons(ETH_P_IPV6);
+
+	return proto;
+}
+
 /* Send user data to the network
  */
 netdev_tx_t ovpn_net_xmit(struct sk_buff *skb, struct net_device *dev)
 {
+	struct ovpn_struct *ovpn = netdev_priv(dev);
+	struct sk_buff *segments, *tmp, *curr, *next;
+	struct sk_buff_head skb_list;
+	__be16 proto;
+	int ret;
+
+	/* reset netfilter state */
+	nf_reset_ct(skb);
+
+	/* verify IP header size in network packet */
+	proto = ovpn_ip_check_protocol(skb);
+	if (unlikely(!proto || skb->protocol != proto)) {
+		net_err_ratelimited("%s: dropping malformed payload packet\n",
+				    dev->name);
+		goto drop;
+	}
+
+	if (skb_is_gso(skb)) {
+		segments = skb_gso_segment(skb, 0);
+		if (IS_ERR(segments)) {
+			ret = PTR_ERR(segments);
+			net_err_ratelimited("%s: cannot segment packet: %d\n",
+					    dev->name, ret);
+			goto drop;
+		}
+
+		consume_skb(skb);
+		skb = segments;
+	}
+
+	/* from this moment on, "skb" might be a list */
+
+	__skb_queue_head_init(&skb_list);
+	skb_list_walk_safe(skb, curr, next) {
+		skb_mark_not_on_list(curr);
+
+		tmp = skb_share_check(curr, GFP_ATOMIC);
+		if (unlikely(!tmp)) {
+			kfree_skb_list(next);
+			net_err_ratelimited("%s: skb_share_check failed\n",
+					    dev->name);
+			goto drop_list;
+		}
+
+		__skb_queue_tail(&skb_list, tmp);
+	}
+	skb_list.prev->next = NULL;
+
+	ovpn_queue_skb(ovpn, skb_list.next, NULL);
+
+	return NETDEV_TX_OK;
+
+drop_list:
+	skb_queue_walk_safe(&skb_list, curr, next)
+		kfree_skb(curr);
+drop:
 	skb_tx_error(skb);
-	kfree_skb(skb);
+	kfree_skb_list(skb);
 	return NET_XMIT_DROP;
 }
diff --git a/drivers/net/ovpn/io.h b/drivers/net/ovpn/io.h
index 61a2485e16b5..171e87f584b6 100644
--- a/drivers/net/ovpn/io.h
+++ b/drivers/net/ovpn/io.h
@@ -19,4 +19,6 @@
 int ovpn_struct_init(struct net_device *dev);
 netdev_tx_t ovpn_net_xmit(struct sk_buff *skb, struct net_device *dev);
 
+void ovpn_encrypt_work(struct work_struct *work);
+
 #endif /* _NET_OVPN_OVPN_H_ */
diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c
index dba35ecb236b..9ae9844dd281 100644
--- a/drivers/net/ovpn/main.c
+++ b/drivers/net/ovpn/main.c
@@ -39,7 +39,9 @@ static void ovpn_struct_free(struct net_device *net)
 	rtnl_unlock();
 
 	free_percpu(net->tstats);
+	flush_workqueue(ovpn->crypto_wq);
 	flush_workqueue(ovpn->events_wq);
+	destroy_workqueue(ovpn->crypto_wq);
 	destroy_workqueue(ovpn->events_wq);
 	rcu_barrier();
 }
diff --git a/drivers/net/ovpn/ovpnstruct.h b/drivers/net/ovpn/ovpnstruct.h
index b79d4f0474b0..7414c2459fb9 100644
--- a/drivers/net/ovpn/ovpnstruct.h
+++ b/drivers/net/ovpn/ovpnstruct.h
@@ -18,6 +18,7 @@
  * @registered: whether dev is still registered with netdev or not
  * @mode: device operation mode (i.e. p2p, mp, ..)
  * @lock: protect this object
+ * @crypto_wq: used to schedule crypto work that may sleep during TX/RX
  * @event_wq: used to schedule generic events that may sleep and that need to be
  *            performed outside of softirq context
  * @peer: in P2P mode, this is the only remote peer
@@ -28,6 +29,7 @@ struct ovpn_struct {
 	bool registered;
 	enum ovpn_mode mode;
 	spinlock_t lock; /* protect writing to the ovpn_struct object */
+	struct workqueue_struct *crypto_wq;
 	struct workqueue_struct *events_wq;
 	struct ovpn_peer __rcu *peer;
 	struct list_head dev_list;
diff --git a/drivers/net/ovpn/peer.c b/drivers/net/ovpn/peer.c
index 2948b7320d47..f023f919b75d 100644
--- a/drivers/net/ovpn/peer.c
+++ b/drivers/net/ovpn/peer.c
@@ -39,6 +39,8 @@ struct ovpn_peer *ovpn_peer_new(struct ovpn_struct *ovpn, u32 id)
 	spin_lock_init(&peer->lock);
 	kref_init(&peer->refcount);
 
+	INIT_WORK(&peer->encrypt_work, ovpn_encrypt_work);
+
 	ret = dst_cache_init(&peer->dst_cache, GFP_KERNEL);
 	if (ret < 0) {
 		netdev_err(ovpn->dev, "%s: cannot initialize dst cache\n",
@@ -119,6 +121,9 @@ static void ovpn_peer_release_rcu(struct rcu_head *head)
 
 void ovpn_peer_release(struct ovpn_peer *peer)
 {
+	if (peer->sock)
+		ovpn_socket_put(peer->sock);
+
 	call_rcu(&peer->rcu, ovpn_peer_release_rcu);
 }
 
@@ -288,6 +293,38 @@ struct ovpn_peer *ovpn_peer_get_by_id(struct ovpn_struct *ovpn, u32 peer_id)
 	return peer;
 }
 
+/**
+ * ovpn_peer_get_by_dst - Lookup peer to send skb to
+ * @ovpn: the private data representing the current VPN session
+ * @skb: the skb to extract the destination address from
+ *
+ * This function takes a tunnel packet and looks up the peer to send it to
+ * after encapsulation. The skb is expected to be the in-tunnel packet, without
+ * any OpenVPN related header.
+ *
+ * Assume that the IP header is accessible in the skb data.
+ *
+ * Return: the peer if found or NULL otherwise.
+ */
+struct ovpn_peer *ovpn_peer_get_by_dst(struct ovpn_struct *ovpn,
+				       struct sk_buff *skb)
+{
+	struct ovpn_peer *tmp, *peer = NULL;
+
+	/* in P2P mode, no matter the destination, packets are always sent to
+	 * the single peer listening on the other side
+	 */
+	if (ovpn->mode == OVPN_MODE_P2P) {
+		rcu_read_lock();
+		tmp = rcu_dereference(ovpn->peer);
+		if (likely(tmp && ovpn_peer_hold(tmp)))
+			peer = tmp;
+		rcu_read_unlock();
+	}
+
+	return peer;
+}
+
 /**
  * ovpn_peer_add_p2p - add per to related tables in a P2P instance
  * @ovpn: the instance to add the peer to
diff --git a/drivers/net/ovpn/peer.h b/drivers/net/ovpn/peer.h
index 659df320525c..f915afa260c3 100644
--- a/drivers/net/ovpn/peer.h
+++ b/drivers/net/ovpn/peer.h
@@ -11,6 +11,7 @@
 #define _NET_OVPN_OVPNPEER_H_
 
 #include "bind.h"
+#include "socket.h"
 
 #include <linux/ptr_ring.h>
 #include <net/dst_cache.h>
@@ -22,9 +23,12 @@
  * @id: unique identifier
  * @vpn_addrs.ipv4: IPv4 assigned to peer on the tunnel
  * @vpn_addrs.ipv6: IPv6 assigned to peer on the tunnel
+ * @encrypt_work: work used to process outgoing packets
+ * @decrypt_work: work used to process incoming packets
  * @tx_ring: queue of outgoing poackets to this peer
  * @rx_ring: queue of incoming packets from this peer
  * @netif_rx_ring: queue of packets to be sent to the netdevice via NAPI
+ * @sock: the socket being used to talk to this peer
  * @dst_cache: cache for dst_entry used to send to peer
  * @bind: remote peer binding
  * @halt: true if ovpn_peer_mark_delete was called
@@ -41,9 +45,12 @@ struct ovpn_peer {
 		struct in_addr ipv4;
 		struct in6_addr ipv6;
 	} vpn_addrs;
+	struct work_struct encrypt_work;
+	struct work_struct decrypt_work;
 	struct ptr_ring tx_ring;
 	struct ptr_ring rx_ring;
 	struct ptr_ring netif_rx_ring;
+	struct ovpn_socket *sock;
 	struct dst_cache dst_cache;
 	struct ovpn_bind __rcu *bind;
 	bool halt;
@@ -148,5 +155,7 @@ struct ovpn_peer *ovpn_peer_get_by_transp_addr(struct ovpn_struct *ovpn,
  * Return: a pointer to the peer if found or NULL otherwise
  */
 struct ovpn_peer *ovpn_peer_get_by_id(struct ovpn_struct *ovpn, u32 peer_id);
+struct ovpn_peer *ovpn_peer_get_by_dst(struct ovpn_struct *ovpn,
+				       struct sk_buff *skb);
 
 #endif /* _NET_OVPN_OVPNPEER_H_ */
diff --git a/drivers/net/ovpn/udp.c b/drivers/net/ovpn/udp.c
index 4b7d96a13df0..f434da76dc0a 100644
--- a/drivers/net/ovpn/udp.c
+++ b/drivers/net/ovpn/udp.c
@@ -7,13 +7,232 @@
  */
 
 #include <linux/netdevice.h>
+#include <linux/inetdevice.h>
 #include <linux/socket.h>
+#include <net/addrconf.h>
+#include <net/dst_cache.h>
+#include <net/route.h>
+#include <net/ipv6_stubs.h>
+#include <net/udp_tunnel.h>
 
 #include "ovpnstruct.h"
 #include "main.h"
+#include "bind.h"
+#include "io.h"
+#include "peer.h"
 #include "socket.h"
 #include "udp.h"
 
+/**
+ * ovpn_udp4_output - send IPv4 packet over udp socket
+ * @ovpn: the openvpn instance
+ * @bind: the binding related to the destination peer
+ * @cache: dst cache
+ * @sk: the socket to send the packet over
+ * @skb: the packet to send
+ *
+ * Return: 0 on success or a negative error code otherwise
+ */
+static int ovpn_udp4_output(struct ovpn_struct *ovpn, struct ovpn_bind *bind,
+			    struct dst_cache *cache, struct sock *sk,
+			    struct sk_buff *skb)
+{
+	struct rtable *rt;
+	struct flowi4 fl = {
+		.saddr = bind->local.ipv4.s_addr,
+		.daddr = bind->sa.in4.sin_addr.s_addr,
+		.fl4_sport = inet_sk(sk)->inet_sport,
+		.fl4_dport = bind->sa.in4.sin_port,
+		.flowi4_proto = sk->sk_protocol,
+		.flowi4_mark = sk->sk_mark,
+	};
+	int ret;
+
+	local_bh_disable();
+	rt = dst_cache_get_ip4(cache, &fl.saddr);
+	if (rt)
+		goto transmit;
+
+	if (unlikely(!inet_confirm_addr(sock_net(sk), NULL, 0, fl.saddr,
+					RT_SCOPE_HOST))) {
+		/* we may end up here when the cached address is not usable
+		 * anymore. In this case we reset address/cache and perform a
+		 * new look up
+		 */
+		fl.saddr = 0;
+		bind->local.ipv4.s_addr = 0;
+		dst_cache_reset(cache);
+	}
+
+	rt = ip_route_output_flow(sock_net(sk), &fl, sk);
+	if (IS_ERR(rt) && PTR_ERR(rt) == -EINVAL) {
+		fl.saddr = 0;
+		bind->local.ipv4.s_addr = 0;
+		dst_cache_reset(cache);
+
+		rt = ip_route_output_flow(sock_net(sk), &fl, sk);
+	}
+
+	if (IS_ERR(rt)) {
+		ret = PTR_ERR(rt);
+		net_dbg_ratelimited("%s: no route to host %pISpc: %d\n",
+				    ovpn->dev->name, &bind->sa.in4, ret);
+		goto err;
+	}
+	dst_cache_set_ip4(cache, &rt->dst, fl.saddr);
+
+transmit:
+	udp_tunnel_xmit_skb(rt, sk, skb, fl.saddr, fl.daddr, 0,
+			    ip4_dst_hoplimit(&rt->dst), 0, fl.fl4_sport,
+			    fl.fl4_dport, false, sk->sk_no_check_tx);
+	ret = 0;
+err:
+	local_bh_enable();
+	return ret;
+}
+
+#if IS_ENABLED(CONFIG_IPV6)
+/**
+ * ovpn_udp6_output - send IPv6 packet over udp socket
+ * @ovpn: the openvpn instance
+ * @bind: the binding related to the destination peer
+ * @cache: dst cache
+ * @sk: the socket to send the packet over
+ * @skb: the packet to send
+ *
+ * Return: 0 on success or a negative error code otherwise
+ */
+static int ovpn_udp6_output(struct ovpn_struct *ovpn, struct ovpn_bind *bind,
+			    struct dst_cache *cache, struct sock *sk,
+			    struct sk_buff *skb)
+{
+	struct dst_entry *dst;
+	int ret;
+
+	struct flowi6 fl = {
+		.saddr = bind->local.ipv6,
+		.daddr = bind->sa.in6.sin6_addr,
+		.fl6_sport = inet_sk(sk)->inet_sport,
+		.fl6_dport = bind->sa.in6.sin6_port,
+		.flowi6_proto = sk->sk_protocol,
+		.flowi6_mark = sk->sk_mark,
+		.flowi6_oif = bind->sa.in6.sin6_scope_id,
+	};
+
+	local_bh_disable();
+	dst = dst_cache_get_ip6(cache, &fl.saddr);
+	if (dst)
+		goto transmit;
+
+	if (unlikely(!ipv6_chk_addr(sock_net(sk), &fl.saddr, NULL, 0))) {
+		/* we may end up here when the cached address is not usable
+		 * anymore. In this case we reset address/cache and perform a
+		 * new look up
+		 */
+		fl.saddr = in6addr_any;
+		bind->local.ipv6 = in6addr_any;
+		dst_cache_reset(cache);
+	}
+
+	dst = ipv6_stub->ipv6_dst_lookup_flow(sock_net(sk), sk, &fl, NULL);
+	if (IS_ERR(dst)) {
+		ret = PTR_ERR(dst);
+		net_dbg_ratelimited("%s: no route to host %pISpc: %d\n",
+				    ovpn->dev->name, &bind->sa.in6, ret);
+		goto err;
+	}
+	dst_cache_set_ip6(cache, dst, &fl.saddr);
+
+transmit:
+	udp_tunnel6_xmit_skb(dst, sk, skb, skb->dev, &fl.saddr, &fl.daddr, 0,
+			     ip6_dst_hoplimit(dst), 0, fl.fl6_sport,
+			     fl.fl6_dport, udp_get_no_check6_tx(sk));
+	ret = 0;
+err:
+	local_bh_enable();
+	return ret;
+}
+#endif
+
+/**
+ * ovpn_udp_output - transmit skb using udp-tunnel
+ * @ovpn: the openvpn instance
+ * @bind: the binding related to the destination peer
+ * @cache: dst cache
+ * @sk: the socket to send the packet over
+ * @skb: the packet to send
+ *
+ * rcu_read_lock should be held on entry.
+ * On return, the skb is consumed.
+ *
+ * Return: 0 on success or a negative error code otherwise
+ */
+static int ovpn_udp_output(struct ovpn_struct *ovpn, struct ovpn_bind *bind,
+			   struct dst_cache *cache, struct sock *sk,
+			   struct sk_buff *skb)
+{
+	int ret;
+
+	/* set sk to null if skb is already orphaned */
+	if (!skb->destructor)
+		skb->sk = NULL;
+
+	/* always permit openvpn-created packets to be (outside) fragmented */
+	skb->ignore_df = 1;
+
+	switch (bind->sa.in4.sin_family) {
+	case AF_INET:
+		ret = ovpn_udp4_output(ovpn, bind, cache, sk, skb);
+		break;
+#if IS_ENABLED(CONFIG_IPV6)
+	case AF_INET6:
+		ret = ovpn_udp6_output(ovpn, bind, cache, sk, skb);
+		break;
+#endif
+	default:
+		ret = -EAFNOSUPPORT;
+		break;
+	}
+
+	return ret;
+}
+
+void ovpn_udp_send_skb(struct ovpn_struct *ovpn, struct ovpn_peer *peer,
+		       struct sk_buff *skb)
+{
+	struct ovpn_bind *bind;
+	struct socket *sock;
+	int ret = -1;
+
+	skb->dev = ovpn->dev;
+	/* no checksum performed at this layer */
+	skb->ip_summed = CHECKSUM_NONE;
+
+	/* get socket info */
+	sock = peer->sock->sock;
+	if (unlikely(!sock)) {
+		net_warn_ratelimited("%s: no sock for remote peer\n", __func__);
+		goto out;
+	}
+
+	rcu_read_lock();
+	/* get binding */
+	bind = rcu_dereference(peer->bind);
+	if (unlikely(!bind)) {
+		net_warn_ratelimited("%s: no bind for remote peer\n", __func__);
+		goto out_unlock;
+	}
+
+	/* crypto layer -> transport (UDP) */
+	ret = ovpn_udp_output(ovpn, bind, &peer->dst_cache, sock->sk, skb);
+
+out_unlock:
+	rcu_read_unlock();
+out:
+	if (ret < 0)
+		kfree_skb(skb);
+}
+
 int ovpn_udp_socket_attach(struct socket *sock, struct ovpn_struct *ovpn)
 {
 	struct ovpn_socket *old_data;
diff --git a/drivers/net/ovpn/udp.h b/drivers/net/ovpn/udp.h
index 16422a649cb9..f4eb1e63e103 100644
--- a/drivers/net/ovpn/udp.h
+++ b/drivers/net/ovpn/udp.h
@@ -9,7 +9,12 @@
 #ifndef _NET_OVPN_UDP_H_
 #define _NET_OVPN_UDP_H_
 
+#include <linux/skbuff.h>
+#include <net/sock.h>
+
+struct ovpn_peer;
 struct ovpn_struct;
+struct sk_buff;
 struct socket;
 
 /**
@@ -24,4 +29,13 @@ struct socket;
  */
 int ovpn_udp_socket_attach(struct socket *sock, struct ovpn_struct *ovpn);
 
+/**
+ * ovpn_udp_send_skb - prepare skb and send it over via UDP
+ * @ovpn: the openvpn instance
+ * @peer: the destination peer
+ * @skb: the packet to send
+ */
+void ovpn_udp_send_skb(struct ovpn_struct *ovpn, struct ovpn_peer *peer,
+		       struct sk_buff *skb);
+
 #endif /* _NET_OVPN_UDP_H_ */
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH net-next v3 10/24] ovpn: implement basic RX path (UDP)
  2024-05-06  1:16 [PATCH net-next v3 00/24] Introducing OpenVPN Data Channel Offload Antonio Quartulli
                   ` (8 preceding siblings ...)
  2024-05-06  1:16 ` [PATCH net-next v3 09/24] ovpn: implement basic TX path (UDP) Antonio Quartulli
@ 2024-05-06  1:16 ` Antonio Quartulli
  2024-05-10 13:45   ` Sabrina Dubroca
  2024-05-06  1:16 ` [PATCH net-next v3 11/24] ovpn: implement packet processing Antonio Quartulli
                   ` (14 subsequent siblings)
  24 siblings, 1 reply; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-06  1:16 UTC (permalink / raw
  To: netdev
  Cc: Jakub Kicinski, Sergey Ryazanov, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Esben Haabendal, Antonio Quartulli

Packets received over the socket are forwarded to the user device.

Implementation is UDP only. TCP will be added by a later patch.

Note: no decryption/decapsulation exists yet, packets are forwarded as
they arrive without much processing.

Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
---
 drivers/net/ovpn/io.c     | 114 +++++++++++++++++++++++++++++++++-
 drivers/net/ovpn/io.h     |   5 ++
 drivers/net/ovpn/peer.c   |   9 +++
 drivers/net/ovpn/peer.h   |   2 +
 drivers/net/ovpn/proto.h  | 115 +++++++++++++++++++++++++++++++++++
 drivers/net/ovpn/socket.c |  24 ++++++++
 drivers/net/ovpn/udp.c    | 125 +++++++++++++++++++++++++++++++++++++-
 drivers/net/ovpn/udp.h    |   6 ++
 8 files changed, 397 insertions(+), 3 deletions(-)
 create mode 100644 drivers/net/ovpn/proto.h

diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c
index 36cfb95edbf4..9935a863bffe 100644
--- a/drivers/net/ovpn/io.c
+++ b/drivers/net/ovpn/io.c
@@ -11,10 +11,10 @@
 #include <linux/skbuff.h>
 #include <net/gso.h>
 
-#include "io.h"
 #include "ovpnstruct.h"
-#include "netlink.h"
 #include "peer.h"
+#include "io.h"
+#include "netlink.h"
 #include "udp.h"
 
 int ovpn_struct_init(struct net_device *dev)
@@ -48,6 +48,116 @@ int ovpn_struct_init(struct net_device *dev)
 	return 0;
 }
 
+/* Called after decrypt to write the IP packet to the device.
+ * This method is expected to manage/free the skb.
+ */
+static void ovpn_netdev_write(struct ovpn_peer *peer, struct sk_buff *skb)
+{
+	/* packet integrity was verified on the VPN layer - no need to perform
+	 * any additional check along the stack
+	 */
+	skb->ip_summed = CHECKSUM_UNNECESSARY;
+	skb->csum_level = ~0;
+
+	/* skb hash for transport packet no longer valid after decapsulation */
+	skb_clear_hash(skb);
+
+	/* post-decrypt scrub -- prepare to inject encapsulated packet onto the
+	 * interface, based on __skb_tunnel_rx() in dst.h
+	 */
+	skb->dev = peer->ovpn->dev;
+	skb_set_queue_mapping(skb, 0);
+	skb_scrub_packet(skb, true);
+
+	skb_reset_network_header(skb);
+	skb_reset_transport_header(skb);
+	skb_probe_transport_header(skb);
+	skb_reset_inner_headers(skb);
+
+	/* update per-cpu RX stats with the stored size of encrypted packet */
+
+	/* we are in softirq context - hence no locking nor disable preemption
+	 * needed
+	 */
+	dev_sw_netstats_rx_add(peer->ovpn->dev, skb->len);
+
+	/* cause packet to be "received" by the interface */
+	napi_gro_receive(&peer->napi, skb);
+}
+
+int ovpn_napi_poll(struct napi_struct *napi, int budget)
+{
+	struct ovpn_peer *peer = container_of(napi, struct ovpn_peer, napi);
+	struct sk_buff *skb;
+	int work_done = 0;
+
+	if (unlikely(budget <= 0))
+		return 0;
+	/* this function should schedule at most 'budget' number of
+	 * packets for delivery to the interface.
+	 * If in the queue we have more packets than what allowed by the
+	 * budget, the next polling will take care of those
+	 */
+	while ((work_done < budget) &&
+	       (skb = ptr_ring_consume_bh(&peer->netif_rx_ring))) {
+		ovpn_netdev_write(peer, skb);
+		work_done++;
+	}
+
+	if (work_done < budget)
+		napi_complete_done(napi, work_done);
+
+	return work_done;
+}
+
+/* Entry point for processing an incoming packet (in skb form)
+ *
+ * Enqueue the packet and schedule RX consumer.
+ * Reference to peer is dropped only in case of success.
+ *
+ * Return 0  if the packet was handled (and consumed)
+ * Return <0 in case of error (return value is error code)
+ */
+int ovpn_recv(struct ovpn_struct *ovpn, struct ovpn_peer *peer,
+	      struct sk_buff *skb)
+{
+	if (unlikely(ptr_ring_produce_bh(&peer->rx_ring, skb) < 0))
+		return -ENOSPC;
+
+	if (!queue_work(ovpn->crypto_wq, &peer->decrypt_work))
+		ovpn_peer_put(peer);
+
+	return 0;
+}
+
+static int ovpn_decrypt_one(struct ovpn_peer *peer, struct sk_buff *skb)
+{
+	return true;
+}
+
+/* pick next packet from RX queue, decrypt and forward it to the device */
+void ovpn_decrypt_work(struct work_struct *work)
+{
+	struct ovpn_peer *peer;
+	struct sk_buff *skb;
+
+	peer = container_of(work, struct ovpn_peer, decrypt_work);
+	while ((skb = ptr_ring_consume_bh(&peer->rx_ring))) {
+		if (likely(ovpn_decrypt_one(peer, skb) == 0)) {
+			/* if a packet has been enqueued for NAPI, signal
+			 * availability to the networking stack
+			 */
+			local_bh_disable();
+			napi_schedule(&peer->napi);
+			local_bh_enable();
+		}
+
+		/* give a chance to be rescheduled if needed */
+		cond_resched();
+	}
+	ovpn_peer_put(peer);
+}
+
 static bool ovpn_encrypt_one(struct ovpn_peer *peer, struct sk_buff *skb)
 {
 	return true;
diff --git a/drivers/net/ovpn/io.h b/drivers/net/ovpn/io.h
index 171e87f584b6..63d549c8c53b 100644
--- a/drivers/net/ovpn/io.h
+++ b/drivers/net/ovpn/io.h
@@ -18,7 +18,12 @@
  */
 int ovpn_struct_init(struct net_device *dev);
 netdev_tx_t ovpn_net_xmit(struct sk_buff *skb, struct net_device *dev);
+int ovpn_napi_poll(struct napi_struct *napi, int budget);
+
+int ovpn_recv(struct ovpn_struct *ovpn, struct ovpn_peer *peer,
+	      struct sk_buff *skb);
 
 void ovpn_encrypt_work(struct work_struct *work);
+void ovpn_decrypt_work(struct work_struct *work);
 
 #endif /* _NET_OVPN_OVPN_H_ */
diff --git a/drivers/net/ovpn/peer.c b/drivers/net/ovpn/peer.c
index f023f919b75d..4e5bb659f169 100644
--- a/drivers/net/ovpn/peer.c
+++ b/drivers/net/ovpn/peer.c
@@ -40,6 +40,7 @@ struct ovpn_peer *ovpn_peer_new(struct ovpn_struct *ovpn, u32 id)
 	kref_init(&peer->refcount);
 
 	INIT_WORK(&peer->encrypt_work, ovpn_encrypt_work);
+	INIT_WORK(&peer->decrypt_work, ovpn_decrypt_work);
 
 	ret = dst_cache_init(&peer->dst_cache, GFP_KERNEL);
 	if (ret < 0) {
@@ -69,6 +70,11 @@ struct ovpn_peer *ovpn_peer_new(struct ovpn_struct *ovpn, u32 id)
 		goto err_rx_ring;
 	}
 
+	/* configure and start NAPI */
+	netif_napi_add_tx_weight(ovpn->dev, &peer->napi, ovpn_napi_poll,
+				 NAPI_POLL_WEIGHT);
+	napi_enable(&peer->napi);
+
 	dev_hold(ovpn->dev);
 
 	return peer;
@@ -121,6 +127,9 @@ static void ovpn_peer_release_rcu(struct rcu_head *head)
 
 void ovpn_peer_release(struct ovpn_peer *peer)
 {
+	napi_disable(&peer->napi);
+	netif_napi_del(&peer->napi);
+
 	if (peer->sock)
 		ovpn_socket_put(peer->sock);
 
diff --git a/drivers/net/ovpn/peer.h b/drivers/net/ovpn/peer.h
index f915afa260c3..f8b2157b416f 100644
--- a/drivers/net/ovpn/peer.h
+++ b/drivers/net/ovpn/peer.h
@@ -28,6 +28,7 @@
  * @tx_ring: queue of outgoing poackets to this peer
  * @rx_ring: queue of incoming packets from this peer
  * @netif_rx_ring: queue of packets to be sent to the netdevice via NAPI
+ * @napi: NAPI object
  * @sock: the socket being used to talk to this peer
  * @dst_cache: cache for dst_entry used to send to peer
  * @bind: remote peer binding
@@ -50,6 +51,7 @@ struct ovpn_peer {
 	struct ptr_ring tx_ring;
 	struct ptr_ring rx_ring;
 	struct ptr_ring netif_rx_ring;
+	struct napi_struct napi;
 	struct ovpn_socket *sock;
 	struct dst_cache dst_cache;
 	struct ovpn_bind __rcu *bind;
diff --git a/drivers/net/ovpn/proto.h b/drivers/net/ovpn/proto.h
new file mode 100644
index 000000000000..0a51104ed931
--- /dev/null
+++ b/drivers/net/ovpn/proto.h
@@ -0,0 +1,115 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*  OpenVPN data channel offload
+ *
+ *  Copyright (C) 2020-2024 OpenVPN, Inc.
+ *
+ *  Author:	Antonio Quartulli <antonio@openvpn.net>
+ *		James Yonan <james@openvpn.net>
+ */
+
+#ifndef _NET_OVPN_OVPNPROTO_H_
+#define _NET_OVPN_OVPNPROTO_H_
+
+#include "main.h"
+
+#include <linux/skbuff.h>
+
+/* Methods for operating on the initial command
+ * byte of the OpenVPN protocol.
+ */
+
+/* packet opcode (high 5 bits) and key-id (low 3 bits) are combined in
+ * one byte
+ */
+#define OVPN_KEY_ID_MASK 0x07
+#define OVPN_OPCODE_SHIFT 3
+#define OVPN_OPCODE_MASK 0x1F
+/* upper bounds on opcode and key ID */
+#define OVPN_KEY_ID_MAX (OVPN_KEY_ID_MASK + 1)
+#define OVPN_OPCODE_MAX (OVPN_OPCODE_MASK + 1)
+/* packet opcodes of interest to us */
+#define OVPN_DATA_V1 6 /* data channel V1 packet */
+#define OVPN_DATA_V2 9 /* data channel V2 packet */
+/* size of initial packet opcode */
+#define OVPN_OP_SIZE_V1 1
+#define OVPN_OP_SIZE_V2	4
+#define OVPN_PEER_ID_MASK 0x00FFFFFF
+#define OVPN_PEER_ID_UNDEF 0x00FFFFFF
+/* first byte of keepalive message */
+#define OVPN_KEEPALIVE_FIRST_BYTE 0x2a
+/* first byte of exit message */
+#define OVPN_EXPLICIT_EXIT_NOTIFY_FIRST_BYTE 0x28
+
+/**
+ * ovpn_opcode_from_byte - extract OP code from the specified byte
+ * @byte: the byte in wire format to extract the OP code from
+ *
+ * Return: the OP code
+ */
+static inline u8 ovpn_opcode_from_byte(u8 byte)
+{
+	return byte >> OVPN_OPCODE_SHIFT;
+}
+
+/**
+ * ovpn_opcode_from_skb - extract OP code from skb at specified offset
+ * @skb: the packet to extract the OP code from
+ * @offset: the offset in the data buffer where the OP code is located
+ *
+ * Note: this function assumes that the skb head was pulled enough
+ * to access the first byte.
+ *
+ * Return: the OP code
+ */
+static inline u8 ovpn_opcode_from_skb(const struct sk_buff *skb, u16 offset)
+{
+	return ovpn_opcode_from_byte(*(skb->data + offset));
+}
+
+/**
+ * ovpn_key_id_from_skb - extract key ID from the skb head
+ * @skb: the packet to extract the key ID code from
+ *
+ * Note: this function assumes that the skb head was pulled enough
+ * to access the first byte.
+ *
+ * Return: the key ID
+ */
+static inline u8 ovpn_key_id_from_skb(const struct sk_buff *skb)
+{
+	return *skb->data & OVPN_KEY_ID_MASK;
+}
+
+/**
+ * ovpn_peer_id_from_skb - extract peer ID from skb at specified offset
+ * @skb: the packet to extract the OP code from
+ * @offset: the offset in the data buffer where the OP code is located
+ *
+ * Note: this function assumes that the skb head was pulled enough
+ * to access the first 4 bytes.
+ *
+ * Return: the peer ID.
+ */
+static inline u32 ovpn_peer_id_from_skb(const struct sk_buff *skb, u16 offset)
+{
+	return ntohl(*(__be32 *)(skb->data + offset)) & OVPN_PEER_ID_MASK;
+}
+
+/**
+ * ovpn_opcode_compose - combine OP code, key ID and peer ID to wire format
+ * @opcode: the OP code
+ * @key_id: the key ID
+ * @peer_id: the peer ID
+ *
+ * Return: a 4 bytes integer obtained combining all input values following the
+ * OpenVPN wire format. This integer can then be written to the packet header.
+ */
+static inline u32 ovpn_opcode_compose(u8 opcode, u8 key_id, u32 peer_id)
+{
+	const u8 op = (opcode << OVPN_OPCODE_SHIFT) |
+		      (key_id & OVPN_KEY_ID_MASK);
+
+	return (op << 24) | (peer_id & OVPN_PEER_ID_MASK);
+}
+
+#endif /* _NET_OVPN_OVPNPROTO_H_ */
diff --git a/drivers/net/ovpn/socket.c b/drivers/net/ovpn/socket.c
index a4a4d69162f0..2ae04e883e13 100644
--- a/drivers/net/ovpn/socket.c
+++ b/drivers/net/ovpn/socket.c
@@ -23,6 +23,9 @@ static void ovpn_socket_detach(struct socket *sock)
 	if (!sock)
 		return;
 
+	if (sock->sk->sk_protocol == IPPROTO_UDP)
+		ovpn_udp_socket_detach(sock);
+
 	sockfd_put(sock);
 }
 
@@ -69,6 +72,27 @@ static int ovpn_socket_attach(struct socket *sock, struct ovpn_peer *peer)
 	return ret;
 }
 
+/* Retrieve the corresponding ovpn object from a UDP socket
+ * rcu_read_lock must be held on entry
+ */
+struct ovpn_struct *ovpn_from_udp_sock(struct sock *sk)
+{
+	struct ovpn_socket *ovpn_sock;
+
+	if (unlikely(READ_ONCE(udp_sk(sk)->encap_type) != UDP_ENCAP_OVPNINUDP))
+		return NULL;
+
+	ovpn_sock = rcu_dereference_sk_user_data(sk);
+	if (unlikely(!ovpn_sock))
+		return NULL;
+
+	/* make sure that sk matches our stored transport socket */
+	if (unlikely(!ovpn_sock->sock || sk != ovpn_sock->sock->sk))
+		return NULL;
+
+	return ovpn_sock->ovpn;
+}
+
 struct ovpn_socket *ovpn_socket_new(struct socket *sock, struct ovpn_peer *peer)
 {
 	struct ovpn_socket *ovpn_sock;
diff --git a/drivers/net/ovpn/udp.c b/drivers/net/ovpn/udp.c
index f434da76dc0a..07182703e598 100644
--- a/drivers/net/ovpn/udp.c
+++ b/drivers/net/ovpn/udp.c
@@ -20,9 +20,117 @@
 #include "bind.h"
 #include "io.h"
 #include "peer.h"
+#include "proto.h"
 #include "socket.h"
 #include "udp.h"
 
+/**
+ * ovpn_udp_encap_recv - Start processing a received UDP packet.
+ * @sk: socket over which the packet was received
+ * @skb: the received packet
+ *
+ * If the first byte of the payload is DATA_V2, the packet is further processed,
+ * otherwise it is forwarded to the UDP stack for delivery to user space.
+ *
+ * Return:
+ *  0 if skb was consumed or dropped
+ * >0 if skb should be passed up to userspace as UDP (packet not consumed)
+ * <0 if skb should be resubmitted as proto -N (packet not consumed)
+ */
+static int ovpn_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
+{
+	struct ovpn_peer *peer = NULL;
+	struct ovpn_struct *ovpn;
+	u32 peer_id;
+	u8 opcode;
+	int ret;
+
+	ovpn = ovpn_from_udp_sock(sk);
+	if (unlikely(!ovpn)) {
+		net_err_ratelimited("%s: cannot obtain ovpn object from UDP socket\n",
+				    __func__);
+		goto drop;
+	}
+
+	/* Make sure the first 4 bytes of the skb data buffer after the UDP
+	 * header are accessible.
+	 * They are required to fetch the OP code, the key ID and the peer ID.
+	 */
+	if (unlikely(!pskb_may_pull(skb, sizeof(struct udphdr) + 4))) {
+		net_dbg_ratelimited("%s: packet too small\n", __func__);
+		goto drop;
+	}
+
+	opcode = ovpn_opcode_from_skb(skb, sizeof(struct udphdr));
+	if (unlikely(opcode != OVPN_DATA_V2)) {
+		/* DATA_V1 is not supported */
+		if (opcode == OVPN_DATA_V1)
+			goto drop;
+
+		/* unknown or control packet: let it bubble up to userspace */
+		return 1;
+	}
+
+	peer_id = ovpn_peer_id_from_skb(skb, sizeof(struct udphdr));
+	/* some OpenVPN server implementations send data packets with the
+	 * peer-id set to undef. In this case we skip the peer lookup by peer-id
+	 * and we try with the transport address
+	 */
+	if (peer_id != OVPN_PEER_ID_UNDEF) {
+		peer = ovpn_peer_get_by_id(ovpn, peer_id);
+		if (!peer) {
+			net_err_ratelimited("%s: received data from unknown peer (id: %d)\n",
+					    __func__, peer_id);
+			goto drop;
+		}
+	}
+
+	if (!peer) {
+		/* data packet with undef peer-id */
+		peer = ovpn_peer_get_by_transp_addr(ovpn, skb);
+		if (unlikely(!peer)) {
+			netdev_dbg(ovpn->dev,
+				   "%s: received data with undef peer-id from unknown source\n",
+				   __func__);
+			goto drop;
+		}
+	}
+
+	/* At this point we know the packet is from a configured peer.
+	 * DATA_V2 packets are handled in kernel space, the rest goes to user
+	 * space.
+	 *
+	 * Return 1 to instruct the stack to let the packet bubble up to
+	 * userspace
+	 */
+	if (unlikely(opcode != OVPN_DATA_V2)) {
+		ovpn_peer_put(peer);
+		return 1;
+	}
+
+	/* pop off outer UDP header */
+	__skb_pull(skb, sizeof(struct udphdr));
+
+	ret = ovpn_recv(ovpn, peer, skb);
+	if (unlikely(ret < 0)) {
+		net_err_ratelimited("%s: cannot handle incoming packet from peer %d: %d\n",
+				    __func__, peer->id, ret);
+		goto drop;
+	}
+
+	/* should this be a non DATA_V2 packet, ret will be >0 and this will
+	 * instruct the UDP stack to continue processing this packet as usual
+	 * (i.e. deliver to user space)
+	 */
+	return ret;
+
+drop:
+	if (peer)
+		ovpn_peer_put(peer);
+	kfree_skb(skb);
+	return 0;
+}
+
 /**
  * ovpn_udp4_output - send IPv4 packet over udp socket
  * @ovpn: the openvpn instance
@@ -235,6 +343,11 @@ void ovpn_udp_send_skb(struct ovpn_struct *ovpn, struct ovpn_peer *peer,
 
 int ovpn_udp_socket_attach(struct socket *sock, struct ovpn_struct *ovpn)
 {
+	struct udp_tunnel_sock_cfg cfg = {
+		.sk_user_data = ovpn,
+		.encap_type = UDP_ENCAP_OVPNINUDP,
+		.encap_rcv = ovpn_udp_encap_recv,
+	};
 	struct ovpn_socket *old_data;
 
 	/* sanity check */
@@ -255,10 +368,20 @@ int ovpn_udp_socket_attach(struct socket *sock, struct ovpn_struct *ovpn)
 			return -EALREADY;
 		}
 
-		netdev_err(ovpn->dev, "%s: provided socket already taken by other user\n",
+		netdev_err(ovpn->dev,
+			   "%s: provided socket already taken by other user\n",
 			   __func__);
 		return -EBUSY;
 	}
 
+	setup_udp_tunnel_sock(sock_net(sock->sk), sock, &cfg);
+
 	return 0;
 }
+
+void ovpn_udp_socket_detach(struct socket *sock)
+{
+	struct udp_tunnel_sock_cfg cfg = { };
+
+	setup_udp_tunnel_sock(sock_net(sock->sk), sock, &cfg);
+}
diff --git a/drivers/net/ovpn/udp.h b/drivers/net/ovpn/udp.h
index f4eb1e63e103..46329aefd052 100644
--- a/drivers/net/ovpn/udp.h
+++ b/drivers/net/ovpn/udp.h
@@ -29,6 +29,12 @@ struct socket;
  */
 int ovpn_udp_socket_attach(struct socket *sock, struct ovpn_struct *ovpn);
 
+/**
+ * ovpn_udp_socket_detach - clean udp-tunnel status for this socket
+ * @sock: the socket to clean
+ */
+void ovpn_udp_socket_detach(struct socket *sock);
+
 /**
  * ovpn_udp_send_skb - prepare skb and send it over via UDP
  * @ovpn: the openvpn instance
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH net-next v3 11/24] ovpn: implement packet processing
  2024-05-06  1:16 [PATCH net-next v3 00/24] Introducing OpenVPN Data Channel Offload Antonio Quartulli
                   ` (9 preceding siblings ...)
  2024-05-06  1:16 ` [PATCH net-next v3 10/24] ovpn: implement basic RX " Antonio Quartulli
@ 2024-05-06  1:16 ` Antonio Quartulli
  2024-05-12  8:46   ` Sabrina Dubroca
  2024-05-06  1:16 ` [PATCH net-next v3 12/24] ovpn: store tunnel and transport statistics Antonio Quartulli
                   ` (13 subsequent siblings)
  24 siblings, 1 reply; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-06  1:16 UTC (permalink / raw
  To: netdev
  Cc: Jakub Kicinski, Sergey Ryazanov, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Esben Haabendal, Antonio Quartulli

This change implements encryption/decryption and
encapsulation/decapsulation of OpenVPN packets.

Support for generic crypto state is added along with
a wrapper for the AEAD crypto kernel API.

Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
---
 drivers/net/ovpn/Makefile      |   3 +
 drivers/net/ovpn/bind.c        |   1 +
 drivers/net/ovpn/crypto.c      | 162 ++++++++++++++
 drivers/net/ovpn/crypto.h      | 138 ++++++++++++
 drivers/net/ovpn/crypto_aead.c | 378 +++++++++++++++++++++++++++++++++
 drivers/net/ovpn/crypto_aead.h |  30 +++
 drivers/net/ovpn/io.c          | 158 +++++++++++---
 drivers/net/ovpn/packet.h      |   2 +-
 drivers/net/ovpn/peer.c        |  24 +++
 drivers/net/ovpn/peer.h        |  14 ++
 drivers/net/ovpn/pktid.c       | 132 ++++++++++++
 drivers/net/ovpn/pktid.h       |  85 ++++++++
 drivers/net/ovpn/socket.c      |   1 +
 drivers/net/ovpn/udp.c         |   1 +
 14 files changed, 1104 insertions(+), 25 deletions(-)
 create mode 100644 drivers/net/ovpn/crypto.c
 create mode 100644 drivers/net/ovpn/crypto.h
 create mode 100644 drivers/net/ovpn/crypto_aead.c
 create mode 100644 drivers/net/ovpn/crypto_aead.h
 create mode 100644 drivers/net/ovpn/pktid.c
 create mode 100644 drivers/net/ovpn/pktid.h

diff --git a/drivers/net/ovpn/Makefile b/drivers/net/ovpn/Makefile
index 56bddc9bef83..ccdaeced1982 100644
--- a/drivers/net/ovpn/Makefile
+++ b/drivers/net/ovpn/Makefile
@@ -8,10 +8,13 @@
 
 obj-$(CONFIG_OVPN) := ovpn.o
 ovpn-y += bind.o
+ovpn-y += crypto.o
+ovpn-y += crypto_aead.o
 ovpn-y += main.o
 ovpn-y += io.o
 ovpn-y += netlink.o
 ovpn-y += netlink-gen.o
 ovpn-y += peer.o
+ovpn-y += pktid.o
 ovpn-y += socket.o
 ovpn-y += udp.o
diff --git a/drivers/net/ovpn/bind.c b/drivers/net/ovpn/bind.c
index c1f842c06e32..7240d1036fb7 100644
--- a/drivers/net/ovpn/bind.c
+++ b/drivers/net/ovpn/bind.c
@@ -13,6 +13,7 @@
 #include "ovpnstruct.h"
 #include "io.h"
 #include "bind.h"
+#include "packet.h"
 #include "peer.h"
 
 struct ovpn_bind *ovpn_bind_from_sockaddr(const struct sockaddr_storage *ss)
diff --git a/drivers/net/ovpn/crypto.c b/drivers/net/ovpn/crypto.c
new file mode 100644
index 000000000000..98ef1ceb75e0
--- /dev/null
+++ b/drivers/net/ovpn/crypto.c
@@ -0,0 +1,162 @@
+// SPDX-License-Identifier: GPL-2.0
+/*  OpenVPN data channel offload
+ *
+ *  Copyright (C) 2020-2024 OpenVPN, Inc.
+ *
+ *  Author:	James Yonan <james@openvpn.net>
+ *		Antonio Quartulli <antonio@openvpn.net>
+ */
+
+#include <linux/types.h>
+#include <linux/net.h>
+#include <linux/netdevice.h>
+//#include <linux/skbuff.h>
+#include <uapi/linux/ovpn.h>
+
+#include "ovpnstruct.h"
+#include "main.h"
+#include "packet.h"
+#include "pktid.h"
+#include "crypto_aead.h"
+#include "crypto.h"
+
+static void ovpn_ks_destroy_rcu(struct rcu_head *head)
+{
+	struct ovpn_crypto_key_slot *ks;
+
+	ks = container_of(head, struct ovpn_crypto_key_slot, rcu);
+	ovpn_aead_crypto_key_slot_destroy(ks);
+}
+
+void ovpn_crypto_key_slot_release(struct kref *kref)
+{
+	struct ovpn_crypto_key_slot *ks;
+
+	ks = container_of(kref, struct ovpn_crypto_key_slot, refcount);
+	call_rcu(&ks->rcu, ovpn_ks_destroy_rcu);
+}
+
+/* can only be invoked when all peer references have been dropped (i.e. RCU
+ * release routine)
+ */
+void ovpn_crypto_state_release(struct ovpn_crypto_state *cs)
+{
+	struct ovpn_crypto_key_slot *ks;
+
+	ks = rcu_access_pointer(cs->primary);
+	if (ks) {
+		RCU_INIT_POINTER(cs->primary, NULL);
+		ovpn_crypto_key_slot_put(ks);
+	}
+
+	ks = rcu_access_pointer(cs->secondary);
+	if (ks) {
+		RCU_INIT_POINTER(cs->secondary, NULL);
+		ovpn_crypto_key_slot_put(ks);
+	}
+
+	mutex_destroy(&cs->mutex);
+}
+
+/* removes the primary key from the crypto context */
+void ovpn_crypto_kill_primary(struct ovpn_crypto_state *cs)
+{
+	struct ovpn_crypto_key_slot *ks;
+
+	mutex_lock(&cs->mutex);
+	ks = rcu_replace_pointer(cs->primary, NULL,
+				 lockdep_is_held(&cs->mutex));
+	ovpn_crypto_key_slot_put(ks);
+	mutex_unlock(&cs->mutex);
+}
+
+/* Reset the ovpn_crypto_state object in a way that is atomic
+ * to RCU readers.
+ */
+int ovpn_crypto_state_reset(struct ovpn_crypto_state *cs,
+			    const struct ovpn_peer_key_reset *pkr)
+	__must_hold(cs->mutex)
+{
+	struct ovpn_crypto_key_slot *old = NULL;
+	struct ovpn_crypto_key_slot *new;
+
+	lockdep_assert_held(&cs->mutex);
+
+	new = ovpn_aead_crypto_key_slot_new(&pkr->key);
+	if (IS_ERR(new))
+		return PTR_ERR(new);
+
+	switch (pkr->slot) {
+	case OVPN_KEY_SLOT_PRIMARY:
+		old = rcu_replace_pointer(cs->primary, new,
+					  lockdep_is_held(&cs->mutex));
+		break;
+	case OVPN_KEY_SLOT_SECONDARY:
+		old = rcu_replace_pointer(cs->secondary, new,
+					  lockdep_is_held(&cs->mutex));
+		break;
+	default:
+		goto free_key;
+	}
+
+	if (old)
+		ovpn_crypto_key_slot_put(old);
+
+	return 0;
+free_key:
+	ovpn_crypto_key_slot_put(new);
+	return -EINVAL;
+}
+
+void ovpn_crypto_key_slot_delete(struct ovpn_crypto_state *cs,
+				 enum ovpn_key_slot slot)
+{
+	struct ovpn_crypto_key_slot *ks = NULL;
+
+	mutex_lock(&cs->mutex);
+	switch (slot) {
+	case OVPN_KEY_SLOT_PRIMARY:
+		ks = rcu_replace_pointer(cs->primary, NULL,
+					 lockdep_is_held(&cs->mutex));
+		break;
+	case OVPN_KEY_SLOT_SECONDARY:
+		ks = rcu_replace_pointer(cs->secondary, NULL,
+					 lockdep_is_held(&cs->mutex));
+		break;
+	default:
+		pr_warn("Invalid slot to release: %u\n", slot);
+		break;
+	}
+	mutex_unlock(&cs->mutex);
+
+	if (!ks) {
+		pr_debug("Key slot already released: %u\n", slot);
+		return;
+	}
+	pr_debug("deleting key slot %u, key_id=%u\n", slot, ks->key_id);
+
+	ovpn_crypto_key_slot_put(ks);
+}
+
+/* this swap is not atomic, but there will be a very short time frame where the
+ * old_secondary key won't be available. This should not be a big deal as most
+ * likely both peers are already using the new primary at this point.
+ */
+void ovpn_crypto_key_slots_swap(struct ovpn_crypto_state *cs)
+{
+	const struct ovpn_crypto_key_slot *old_primary, *old_secondary;
+
+	mutex_lock(&cs->mutex);
+
+	old_secondary = rcu_dereference_protected(cs->secondary,
+						  lockdep_is_held(&cs->mutex));
+	old_primary = rcu_replace_pointer(cs->primary, old_secondary,
+					  lockdep_is_held(&cs->mutex));
+	rcu_assign_pointer(cs->secondary, old_primary);
+
+	pr_debug("key swapped: %u <-> %u\n",
+		 old_primary ? old_primary->key_id : 0,
+		 old_secondary ? old_secondary->key_id : 0);
+
+	mutex_unlock(&cs->mutex);
+}
diff --git a/drivers/net/ovpn/crypto.h b/drivers/net/ovpn/crypto.h
new file mode 100644
index 000000000000..0b6796850e60
--- /dev/null
+++ b/drivers/net/ovpn/crypto.h
@@ -0,0 +1,138 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*  OpenVPN data channel offload
+ *
+ *  Copyright (C) 2020-2024 OpenVPN, Inc.
+ *
+ *  Author:	James Yonan <james@openvpn.net>
+ *		Antonio Quartulli <antonio@openvpn.net>
+ */
+
+#ifndef _NET_OVPN_OVPNCRYPTO_H_
+#define _NET_OVPN_OVPNCRYPTO_H_
+
+struct ovpn_peer;
+struct ovpn_crypto_key_slot;
+
+/* info needed for both encrypt and decrypt directions */
+struct ovpn_key_direction {
+	const u8 *cipher_key;
+	size_t cipher_key_size;
+	const u8 *nonce_tail; /* only needed for GCM modes */
+	size_t nonce_tail_size; /* only needed for GCM modes */
+};
+
+/* all info for a particular symmetric key (primary or secondary) */
+struct ovpn_key_config {
+	enum ovpn_cipher_alg cipher_alg;
+	u8 key_id;
+	struct ovpn_key_direction encrypt;
+	struct ovpn_key_direction decrypt;
+};
+
+/* used to pass settings from netlink to the crypto engine */
+struct ovpn_peer_key_reset {
+	enum ovpn_key_slot slot;
+	struct ovpn_key_config key;
+};
+
+struct ovpn_crypto_key_slot {
+	u8 key_id;
+
+	struct crypto_aead *encrypt;
+	struct crypto_aead *decrypt;
+	struct ovpn_nonce_tail nonce_tail_xmit;
+	struct ovpn_nonce_tail nonce_tail_recv;
+
+	struct ovpn_pktid_recv pid_recv ____cacheline_aligned_in_smp;
+	struct ovpn_pktid_xmit pid_xmit ____cacheline_aligned_in_smp;
+	struct kref refcount;
+	struct rcu_head rcu;
+};
+
+struct ovpn_crypto_state {
+	struct ovpn_crypto_key_slot __rcu *primary;
+	struct ovpn_crypto_key_slot __rcu *secondary;
+
+	/* protects primary and secondary slots */
+	struct mutex mutex;
+};
+
+static inline bool ovpn_crypto_key_slot_hold(struct ovpn_crypto_key_slot *ks)
+{
+	return kref_get_unless_zero(&ks->refcount);
+}
+
+static inline void ovpn_crypto_state_init(struct ovpn_crypto_state *cs)
+{
+	RCU_INIT_POINTER(cs->primary, NULL);
+	RCU_INIT_POINTER(cs->secondary, NULL);
+	mutex_init(&cs->mutex);
+}
+
+static inline struct ovpn_crypto_key_slot *
+ovpn_crypto_key_id_to_slot(const struct ovpn_crypto_state *cs, u8 key_id)
+{
+	struct ovpn_crypto_key_slot *ks;
+
+	if (unlikely(!cs))
+		return NULL;
+
+	rcu_read_lock();
+	ks = rcu_dereference(cs->primary);
+	if (ks && ks->key_id == key_id) {
+		if (unlikely(!ovpn_crypto_key_slot_hold(ks)))
+			ks = NULL;
+		goto out;
+	}
+
+	ks = rcu_dereference(cs->secondary);
+	if (ks && ks->key_id == key_id) {
+		if (unlikely(!ovpn_crypto_key_slot_hold(ks)))
+			ks = NULL;
+		goto out;
+	}
+
+	/* when both key slots are occupied but no matching key ID is found, ks
+	 * has to be reset to NULL to avoid carrying a stale pointer
+	 */
+	ks = NULL;
+out:
+	rcu_read_unlock();
+
+	return ks;
+}
+
+static inline struct ovpn_crypto_key_slot *
+ovpn_crypto_key_slot_primary(const struct ovpn_crypto_state *cs)
+{
+	struct ovpn_crypto_key_slot *ks;
+
+	rcu_read_lock();
+	ks = rcu_dereference(cs->primary);
+	if (unlikely(ks && !ovpn_crypto_key_slot_hold(ks)))
+		ks = NULL;
+	rcu_read_unlock();
+
+	return ks;
+}
+
+void ovpn_crypto_key_slot_release(struct kref *kref);
+
+static inline void ovpn_crypto_key_slot_put(struct ovpn_crypto_key_slot *ks)
+{
+	kref_put(&ks->refcount, ovpn_crypto_key_slot_release);
+}
+
+int ovpn_crypto_state_reset(struct ovpn_crypto_state *cs,
+			    const struct ovpn_peer_key_reset *pkr);
+
+void ovpn_crypto_key_slot_delete(struct ovpn_crypto_state *cs,
+				 enum ovpn_key_slot slot);
+
+void ovpn_crypto_state_release(struct ovpn_crypto_state *cs);
+
+void ovpn_crypto_key_slots_swap(struct ovpn_crypto_state *cs);
+
+void ovpn_crypto_kill_primary(struct ovpn_crypto_state *cs);
+
+#endif /* _NET_OVPN_OVPNCRYPTO_H_ */
diff --git a/drivers/net/ovpn/crypto_aead.c b/drivers/net/ovpn/crypto_aead.c
new file mode 100644
index 000000000000..bb6c2a17d5b1
--- /dev/null
+++ b/drivers/net/ovpn/crypto_aead.c
@@ -0,0 +1,378 @@
+// SPDX-License-Identifier: GPL-2.0
+/*  OpenVPN data channel offload
+ *
+ *  Copyright (C) 2020-2024 OpenVPN, Inc.
+ *
+ *  Author:	James Yonan <james@openvpn.net>
+ *		Antonio Quartulli <antonio@openvpn.net>
+ */
+
+#include <crypto/aead.h>
+#include <linux/skbuff.h>
+#include <net/ip.h>
+#include <net/ipv6.h>
+#include <net/udp.h>
+
+#include "ovpnstruct.h"
+#include "main.h"
+#include "packet.h"
+#include "pktid.h"
+#include "crypto_aead.h"
+#include "crypto.h"
+#include "proto.h"
+
+#define AUTH_TAG_SIZE	16
+
+static int ovpn_aead_encap_overhead(const struct ovpn_crypto_key_slot *ks)
+{
+	return  OVPN_OP_SIZE_V2 +			/* OP header size */
+		4 +					/* Packet ID */
+		crypto_aead_authsize(ks->encrypt);	/* Auth Tag */
+}
+
+int ovpn_aead_encrypt(struct ovpn_crypto_key_slot *ks, struct sk_buff *skb,
+		      u32 peer_id)
+{
+	const unsigned int tag_size = crypto_aead_authsize(ks->encrypt);
+	const unsigned int head_size = ovpn_aead_encap_overhead(ks);
+	struct scatterlist sg[MAX_SKB_FRAGS + 2];
+	DECLARE_CRYPTO_WAIT(wait);
+	struct aead_request *req;
+	struct sk_buff *trailer;
+	u8 iv[NONCE_SIZE];
+	int nfrags, ret;
+	u32 pktid, op;
+
+	/* Sample AEAD header format:
+	 * 48000001 00000005 7e7046bd 444a7e28 cc6387b1 64a4d6c1 380275a...
+	 * [ OP32 ] [seq # ] [             auth tag            ] [ payload ... ]
+	 *          [4-byte
+	 *          IV head]
+	 */
+
+	/* check that there's enough headroom in the skb for packet
+	 * encapsulation, after adding network header and encryption overhead
+	 */
+	if (unlikely(skb_cow_head(skb, OVPN_HEAD_ROOM + head_size)))
+		return -ENOBUFS;
+
+	/* get number of skb frags and ensure that packet data is writable */
+	nfrags = skb_cow_data(skb, 0, &trailer);
+	if (unlikely(nfrags < 0))
+		return nfrags;
+
+	if (unlikely(nfrags + 2 > ARRAY_SIZE(sg)))
+		return -ENOSPC;
+
+	req = aead_request_alloc(ks->encrypt, GFP_KERNEL);
+	if (unlikely(!req))
+		return -ENOMEM;
+
+	/* sg table:
+	 * 0: op, wire nonce (AD, len=OVPN_OP_SIZE_V2+NONCE_WIRE_SIZE),
+	 * 1, 2, 3, ..., n: payload,
+	 * n+1: auth_tag (len=tag_size)
+	 */
+	sg_init_table(sg, nfrags + 2);
+
+	/* build scatterlist to encrypt packet payload */
+	ret = skb_to_sgvec_nomark(skb, sg + 1, 0, skb->len);
+	if (unlikely(nfrags != ret)) {
+		ret = -EINVAL;
+		goto free_req;
+	}
+
+	/* append auth_tag onto scatterlist */
+	__skb_push(skb, tag_size);
+	sg_set_buf(sg + nfrags + 1, skb->data, tag_size);
+
+	/* obtain packet ID, which is used both as a first
+	 * 4 bytes of nonce and last 4 bytes of associated data.
+	 */
+	ret = ovpn_pktid_xmit_next(&ks->pid_xmit, &pktid);
+	if (unlikely(ret < 0))
+		goto free_req;
+
+	/* concat 4 bytes packet id and 8 bytes nonce tail into 12 bytes
+	 * nonce
+	 */
+	ovpn_pktid_aead_write(pktid, &ks->nonce_tail_xmit, iv);
+
+	/* make space for packet id and push it to the front */
+	__skb_push(skb, NONCE_WIRE_SIZE);
+	memcpy(skb->data, iv, NONCE_WIRE_SIZE);
+
+	/* add packet op as head of additional data */
+	op = ovpn_opcode_compose(OVPN_DATA_V2, ks->key_id, peer_id);
+	__skb_push(skb, OVPN_OP_SIZE_V2);
+	BUILD_BUG_ON(sizeof(op) != OVPN_OP_SIZE_V2);
+	*((__force __be32 *)skb->data) = htonl(op);
+
+	/* AEAD Additional data */
+	sg_set_buf(sg, skb->data, OVPN_OP_SIZE_V2 + NONCE_WIRE_SIZE);
+
+	/* setup async crypto operation */
+	aead_request_set_tfm(req, ks->encrypt);
+	aead_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG |
+				       CRYPTO_TFM_REQ_MAY_SLEEP,
+				  crypto_req_done, &wait);
+	aead_request_set_crypt(req, sg, sg, skb->len - head_size, iv);
+	aead_request_set_ad(req, OVPN_OP_SIZE_V2 + NONCE_WIRE_SIZE);
+
+	/* encrypt it */
+	ret = crypto_wait_req(crypto_aead_encrypt(req), &wait);
+	if (ret < 0)
+		net_err_ratelimited("%s: encrypt failed: %d\n", __func__, ret);
+
+free_req:
+	aead_request_free(req);
+	return ret;
+}
+
+int ovpn_aead_decrypt(struct ovpn_crypto_key_slot *ks, struct sk_buff *skb)
+{
+	const unsigned int tag_size = crypto_aead_authsize(ks->decrypt);
+	struct scatterlist sg[MAX_SKB_FRAGS + 2];
+	int ret, payload_len, nfrags;
+	u8 *sg_data, iv[NONCE_SIZE];
+	unsigned int payload_offset;
+	DECLARE_CRYPTO_WAIT(wait);
+	struct aead_request *req;
+	struct sk_buff *trailer;
+	unsigned int sg_len;
+	__be32 *pid;
+
+	payload_offset = OVPN_OP_SIZE_V2 + NONCE_WIRE_SIZE + tag_size;
+	payload_len = skb->len - payload_offset;
+
+	/* sanity check on packet size, payload size must be >= 0 */
+	if (unlikely(payload_len < 0))
+		return -EINVAL;
+
+	/* Prepare the skb data buffer to be accessed up until the auth tag.
+	 * This is required because this area is directly mapped into the sg
+	 * list.
+	 */
+	if (unlikely(!pskb_may_pull(skb, payload_offset)))
+		return -ENODATA;
+
+	/* get number of skb frags and ensure that packet data is writable */
+	nfrags = skb_cow_data(skb, 0, &trailer);
+	if (unlikely(nfrags < 0))
+		return nfrags;
+
+	if (unlikely(nfrags + 2 > ARRAY_SIZE(sg)))
+		return -ENOSPC;
+
+	req = aead_request_alloc(ks->decrypt, GFP_KERNEL);
+	if (unlikely(!req))
+		return -ENOMEM;
+
+	/* sg table:
+	 * 0: op, wire nonce (AD, len=OVPN_OP_SIZE_V2+NONCE_WIRE_SIZE),
+	 * 1, 2, 3, ..., n: payload,
+	 * n+1: auth_tag (len=tag_size)
+	 */
+	sg_init_table(sg, nfrags + 2);
+
+	/* packet op is head of additional data */
+	sg_data = skb->data;
+	sg_len = OVPN_OP_SIZE_V2 + NONCE_WIRE_SIZE;
+	sg_set_buf(sg, sg_data, sg_len);
+
+	/* build scatterlist to decrypt packet payload */
+	ret = skb_to_sgvec_nomark(skb, sg + 1, payload_offset, payload_len);
+	if (unlikely(nfrags != ret)) {
+		ret = -EINVAL;
+		goto free_req;
+	}
+
+	/* append auth_tag onto scatterlist */
+	sg_set_buf(sg + nfrags + 1, skb->data + sg_len, tag_size);
+
+	/* copy nonce into IV buffer */
+	memcpy(iv, skb->data + OVPN_OP_SIZE_V2, NONCE_WIRE_SIZE);
+	memcpy(iv + NONCE_WIRE_SIZE, ks->nonce_tail_recv.u8,
+	       sizeof(struct ovpn_nonce_tail));
+
+	/* setup async crypto operation */
+	aead_request_set_tfm(req, ks->decrypt);
+	aead_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG |
+				       CRYPTO_TFM_REQ_MAY_SLEEP,
+				  crypto_req_done, &wait);
+	aead_request_set_crypt(req, sg, sg, payload_len + tag_size, iv);
+
+	aead_request_set_ad(req, NONCE_WIRE_SIZE + OVPN_OP_SIZE_V2);
+
+	/* decrypt it */
+	ret = crypto_wait_req(crypto_aead_decrypt(req), &wait);
+	if (ret < 0) {
+		net_err_ratelimited("%s: decrypt failed: %d\n", __func__, ret);
+		goto free_req;
+	}
+
+	/* PID sits after the op */
+	pid = (__force __be32 *)(skb->data + OVPN_OP_SIZE_V2);
+	ret = ovpn_pktid_recv(&ks->pid_recv, ntohl(*pid), 0);
+	if (unlikely(ret < 0))
+		goto free_req;
+
+	/* point to encapsulated IP packet */
+	__skb_pull(skb, payload_offset);
+
+free_req:
+	aead_request_free(req);
+	return ret;
+}
+
+/* Initialize a struct crypto_aead object */
+struct crypto_aead *ovpn_aead_init(const char *title, const char *alg_name,
+				   const unsigned char *key,
+				   unsigned int keylen)
+{
+	struct crypto_aead *aead;
+	int ret;
+
+	aead = crypto_alloc_aead(alg_name, 0, 0);
+	if (IS_ERR(aead)) {
+		ret = PTR_ERR(aead);
+		pr_err("%s crypto_alloc_aead failed, err=%d\n", title, ret);
+		aead = NULL;
+		goto error;
+	}
+
+	ret = crypto_aead_setkey(aead, key, keylen);
+	if (ret) {
+		pr_err("%s crypto_aead_setkey size=%u failed, err=%d\n", title,
+		       keylen, ret);
+		goto error;
+	}
+
+	ret = crypto_aead_setauthsize(aead, AUTH_TAG_SIZE);
+	if (ret) {
+		pr_err("%s crypto_aead_setauthsize failed, err=%d\n", title,
+		       ret);
+		goto error;
+	}
+
+	/* basic AEAD assumption */
+	if (crypto_aead_ivsize(aead) != NONCE_SIZE) {
+		pr_err("%s IV size must be %d\n", title, NONCE_SIZE);
+		ret = -EINVAL;
+		goto error;
+	}
+
+	pr_debug("********* Cipher %s (%s)\n", alg_name, title);
+	pr_debug("*** IV size=%u\n", crypto_aead_ivsize(aead));
+	pr_debug("*** req size=%u\n", crypto_aead_reqsize(aead));
+	pr_debug("*** block size=%u\n", crypto_aead_blocksize(aead));
+	pr_debug("*** auth size=%u\n", crypto_aead_authsize(aead));
+	pr_debug("*** alignmask=0x%x\n", crypto_aead_alignmask(aead));
+
+	return aead;
+
+error:
+	crypto_free_aead(aead);
+	return ERR_PTR(ret);
+}
+
+void ovpn_aead_crypto_key_slot_destroy(struct ovpn_crypto_key_slot *ks)
+{
+	if (!ks)
+		return;
+
+	crypto_free_aead(ks->encrypt);
+	crypto_free_aead(ks->decrypt);
+	kfree(ks);
+}
+
+static struct ovpn_crypto_key_slot *
+ovpn_aead_crypto_key_slot_init(enum ovpn_cipher_alg alg,
+			       const unsigned char *encrypt_key,
+			       unsigned int encrypt_keylen,
+			       const unsigned char *decrypt_key,
+			       unsigned int decrypt_keylen,
+			       const unsigned char *encrypt_nonce_tail,
+			       unsigned int encrypt_nonce_tail_len,
+			       const unsigned char *decrypt_nonce_tail,
+			       unsigned int decrypt_nonce_tail_len,
+			       u16 key_id)
+{
+	struct ovpn_crypto_key_slot *ks = NULL;
+	const char *alg_name;
+	int ret;
+
+	/* validate crypto alg */
+	switch (alg) {
+	case OVPN_CIPHER_ALG_AES_GCM:
+		alg_name = "gcm(aes)";
+		break;
+	case OVPN_CIPHER_ALG_CHACHA20_POLY1305:
+		alg_name = "rfc7539(chacha20,poly1305)";
+		break;
+	default:
+		return ERR_PTR(-EOPNOTSUPP);
+	}
+
+	/* build the key slot */
+	ks = kmalloc(sizeof(*ks), GFP_KERNEL);
+	if (!ks)
+		return ERR_PTR(-ENOMEM);
+
+	ks->encrypt = NULL;
+	ks->decrypt = NULL;
+	kref_init(&ks->refcount);
+	ks->key_id = key_id;
+
+	ks->encrypt = ovpn_aead_init("encrypt", alg_name, encrypt_key,
+				     encrypt_keylen);
+	if (IS_ERR(ks->encrypt)) {
+		ret = PTR_ERR(ks->encrypt);
+		ks->encrypt = NULL;
+		goto destroy_ks;
+	}
+
+	ks->decrypt = ovpn_aead_init("decrypt", alg_name, decrypt_key,
+				     decrypt_keylen);
+	if (IS_ERR(ks->decrypt)) {
+		ret = PTR_ERR(ks->decrypt);
+		ks->decrypt = NULL;
+		goto destroy_ks;
+	}
+
+	if (sizeof(struct ovpn_nonce_tail) != encrypt_nonce_tail_len ||
+	    sizeof(struct ovpn_nonce_tail) != decrypt_nonce_tail_len) {
+		ret = -EINVAL;
+		goto destroy_ks;
+	}
+
+	memcpy(ks->nonce_tail_xmit.u8, encrypt_nonce_tail,
+	       sizeof(struct ovpn_nonce_tail));
+	memcpy(ks->nonce_tail_recv.u8, decrypt_nonce_tail,
+	       sizeof(struct ovpn_nonce_tail));
+
+	/* init packet ID generation/validation */
+	ovpn_pktid_xmit_init(&ks->pid_xmit);
+	ovpn_pktid_recv_init(&ks->pid_recv);
+
+	return ks;
+
+destroy_ks:
+	ovpn_aead_crypto_key_slot_destroy(ks);
+	return ERR_PTR(ret);
+}
+
+struct ovpn_crypto_key_slot *
+ovpn_aead_crypto_key_slot_new(const struct ovpn_key_config *kc)
+{
+	return ovpn_aead_crypto_key_slot_init(kc->cipher_alg,
+					      kc->encrypt.cipher_key,
+					      kc->encrypt.cipher_key_size,
+					      kc->decrypt.cipher_key,
+					      kc->decrypt.cipher_key_size,
+					      kc->encrypt.nonce_tail,
+					      kc->encrypt.nonce_tail_size,
+					      kc->decrypt.nonce_tail,
+					      kc->decrypt.nonce_tail_size,
+					      kc->key_id);
+}
diff --git a/drivers/net/ovpn/crypto_aead.h b/drivers/net/ovpn/crypto_aead.h
new file mode 100644
index 000000000000..c876e6a711cd
--- /dev/null
+++ b/drivers/net/ovpn/crypto_aead.h
@@ -0,0 +1,30 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*  OpenVPN data channel offload
+ *
+ *  Copyright (C) 2020-2024 OpenVPN, Inc.
+ *
+ *  Author:	James Yonan <james@openvpn.net>
+ *		Antonio Quartulli <antonio@openvpn.net>
+ */
+
+#ifndef _NET_OVPN_OVPNAEAD_H_
+#define _NET_OVPN_OVPNAEAD_H_
+
+#include "crypto.h"
+
+#include <asm/types.h>
+#include <linux/skbuff.h>
+
+struct crypto_aead *ovpn_aead_init(const char *title, const char *alg_name,
+				   const unsigned char *key,
+				   unsigned int keylen);
+
+int ovpn_aead_encrypt(struct ovpn_crypto_key_slot *ks, struct sk_buff *skb,
+		      u32 peer_id);
+int ovpn_aead_decrypt(struct ovpn_crypto_key_slot *ks, struct sk_buff *skb);
+
+struct ovpn_crypto_key_slot *
+ovpn_aead_crypto_key_slot_new(const struct ovpn_key_config *kc);
+void ovpn_aead_crypto_key_slot_destroy(struct ovpn_crypto_key_slot *ks);
+
+#endif /* _NET_OVPN_OVPNAEAD_H_ */
diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c
index 9935a863bffe..66a4c551c191 100644
--- a/drivers/net/ovpn/io.c
+++ b/drivers/net/ovpn/io.c
@@ -12,9 +12,13 @@
 #include <net/gso.h>
 
 #include "ovpnstruct.h"
-#include "peer.h"
 #include "io.h"
+#include "packet.h"
+#include "peer.h"
+#include "crypto.h"
+#include "crypto_aead.h"
 #include "netlink.h"
+#include "proto.h"
 #include "udp.h"
 
 int ovpn_struct_init(struct net_device *dev)
@@ -110,6 +114,27 @@ int ovpn_napi_poll(struct napi_struct *napi, int budget)
 	return work_done;
 }
 
+/* Return IP protocol version from skb header.
+ * Return 0 if protocol is not IPv4/IPv6 or cannot be read.
+ */
+static __be16 ovpn_ip_check_protocol(struct sk_buff *skb)
+{
+	__be16 proto = 0;
+
+	/* skb could be non-linear, make sure IP header is in non-fragmented
+	 * part
+	 */
+	if (!pskb_network_may_pull(skb, sizeof(struct iphdr)))
+		return 0;
+
+	if (ip_hdr(skb)->version == 4)
+		proto = htons(ETH_P_IP);
+	else if (ip_hdr(skb)->version == 6)
+		proto = htons(ETH_P_IPV6);
+
+	return proto;
+}
+
 /* Entry point for processing an incoming packet (in skb form)
  *
  * Enqueue the packet and schedule RX consumer.
@@ -132,7 +157,81 @@ int ovpn_recv(struct ovpn_struct *ovpn, struct ovpn_peer *peer,
 
 static int ovpn_decrypt_one(struct ovpn_peer *peer, struct sk_buff *skb)
 {
-	return true;
+	struct ovpn_peer *allowed_peer = NULL;
+	struct ovpn_crypto_key_slot *ks;
+	__be16 proto;
+	int ret = -1;
+	u8 key_id;
+
+	/* get the key slot matching the key Id in the received packet */
+	key_id = ovpn_key_id_from_skb(skb);
+	ks = ovpn_crypto_key_id_to_slot(&peer->crypto, key_id);
+	if (unlikely(!ks)) {
+		net_info_ratelimited("%s: no available key for peer %u, key-id: %u\n",
+				     peer->ovpn->dev->name, peer->id, key_id);
+		goto drop;
+	}
+
+	/* decrypt */
+	ret = ovpn_aead_decrypt(ks, skb);
+
+	ovpn_crypto_key_slot_put(ks);
+
+	if (unlikely(ret < 0)) {
+		net_err_ratelimited("%s: error during decryption for peer %u, key-id %u: %d\n",
+				    peer->ovpn->dev->name, peer->id, key_id,
+				    ret);
+		goto drop;
+	}
+
+	/* check if this is a valid datapacket that has to be delivered to the
+	 * tun interface
+	 */
+	skb_reset_network_header(skb);
+	proto = ovpn_ip_check_protocol(skb);
+	if (unlikely(!proto)) {
+		/* check if null packet */
+		if (unlikely(!pskb_may_pull(skb, 1))) {
+			netdev_dbg(peer->ovpn->dev,
+				   "NULL packet received from peer %u\n",
+				   peer->id);
+			ret = -EINVAL;
+			goto drop;
+		}
+
+		netdev_dbg(peer->ovpn->dev,
+			   "unsupported protocol received from peer %u\n",
+			   peer->id);
+
+		ret = -EPROTONOSUPPORT;
+		goto drop;
+	}
+	skb->protocol = proto;
+
+	/* perform Reverse Path Filtering (RPF) */
+	allowed_peer = ovpn_peer_get_by_src(peer->ovpn, skb);
+	if (unlikely(allowed_peer != peer)) {
+		if (skb_protocol_to_family(skb) == AF_INET6)
+			net_warn_ratelimited("%s: RPF dropped packet from peer %u, src: %pI6c\n",
+					     peer->ovpn->dev->name, peer->id,
+					     &ipv6_hdr(skb)->saddr);
+		else
+			net_warn_ratelimited("%s: RPF dropped packet from peer %u, src: %pI4\n",
+					     peer->ovpn->dev->name, peer->id,
+					     &ip_hdr(skb)->saddr);
+		ret = -EPERM;
+		goto drop;
+	}
+
+	ret = ptr_ring_produce_bh(&peer->netif_rx_ring, skb);
+drop:
+	if (likely(allowed_peer))
+		ovpn_peer_put(allowed_peer);
+
+	if (unlikely(ret < 0))
+		kfree_skb(skb);
+
+	return ret;
 }
 
 /* pick next packet from RX queue, decrypt and forward it to the device */
@@ -160,7 +259,39 @@ void ovpn_decrypt_work(struct work_struct *work)
 
 static bool ovpn_encrypt_one(struct ovpn_peer *peer, struct sk_buff *skb)
 {
-	return true;
+	struct ovpn_crypto_key_slot *ks;
+	bool success = false;
+	int ret;
+
+	/* get primary key to be used for encrypting data */
+	ks = ovpn_crypto_key_slot_primary(&peer->crypto);
+	if (unlikely(!ks)) {
+		net_warn_ratelimited("%s: error while retrieving primary key slot for peer %u\n",
+				     peer->ovpn->dev->name, peer->id);
+		return false;
+	}
+
+	if (unlikely(skb->ip_summed == CHECKSUM_PARTIAL &&
+		     skb_checksum_help(skb))) {
+		net_err_ratelimited("%s: cannot compute checksum for outgoing packet\n",
+				    peer->ovpn->dev->name);
+		goto err;
+	}
+
+	/* encrypt */
+	ret = ovpn_aead_encrypt(ks, skb, peer->id);
+	if (unlikely(ret < 0)) {
+		net_err_ratelimited("%s: error during encryption for peer %u, key-id %u: %d\n",
+				    peer->ovpn->dev->name, peer->id, ks->key_id,
+				    ret);
+		goto err;
+	}
+
+	success = true;
+
+err:
+	ovpn_crypto_key_slot_put(ks);
+	return success;
 }
 
 /* Process packets in TX queue in a transport-specific way.
@@ -245,27 +376,6 @@ static void ovpn_queue_skb(struct ovpn_struct *ovpn, struct sk_buff *skb,
 	kfree_skb_list(skb);
 }
 
-/* Return IP protocol version from skb header.
- * Return 0 if protocol is not IPv4/IPv6 or cannot be read.
- */
-static __be16 ovpn_ip_check_protocol(struct sk_buff *skb)
-{
-	__be16 proto = 0;
-
-	/* skb could be non-linear, make sure IP header is in non-fragmented
-	 * part
-	 */
-	if (!pskb_network_may_pull(skb, sizeof(struct iphdr)))
-		return 0;
-
-	if (ip_hdr(skb)->version == 4)
-		proto = htons(ETH_P_IP);
-	else if (ip_hdr(skb)->version == 6)
-		proto = htons(ETH_P_IPV6);
-
-	return proto;
-}
-
 /* Send user data to the network
  */
 netdev_tx_t ovpn_net_xmit(struct sk_buff *skb, struct net_device *dev)
diff --git a/drivers/net/ovpn/packet.h b/drivers/net/ovpn/packet.h
index 7ed146f5932a..e14c9bf464f7 100644
--- a/drivers/net/ovpn/packet.h
+++ b/drivers/net/ovpn/packet.h
@@ -10,7 +10,7 @@
 #ifndef _NET_OVPN_PACKET_H_
 #define _NET_OVPN_PACKET_H_
 
-/* When the OpenVPN protocol is ran in AEAD mode, use
+/* When the OpenVPN protocol is run in AEAD mode, use
  * the OpenVPN packet ID as the AEAD nonce:
  *
  *    00000005 521c3b01 4308c041
diff --git a/drivers/net/ovpn/peer.c b/drivers/net/ovpn/peer.c
index 4e5bb659f169..1b941deeede0 100644
--- a/drivers/net/ovpn/peer.c
+++ b/drivers/net/ovpn/peer.c
@@ -13,6 +13,9 @@
 
 #include "ovpnstruct.h"
 #include "bind.h"
+#include "packet.h"
+#include "pktid.h"
+#include "crypto.h"
 #include "io.h"
 #include "main.h"
 #include "netlink.h"
@@ -36,6 +39,7 @@ struct ovpn_peer *ovpn_peer_new(struct ovpn_struct *ovpn, u32 id)
 	peer->vpn_addrs.ipv6 = in6addr_any;
 
 	RCU_INIT_POINTER(peer->bind, NULL);
+	ovpn_crypto_state_init(&peer->crypto);
 	spin_lock_init(&peer->lock);
 	kref_init(&peer->refcount);
 
@@ -122,6 +126,7 @@ static void ovpn_peer_release_rcu(struct rcu_head *head)
 {
 	struct ovpn_peer *peer = container_of(head, struct ovpn_peer, rcu);
 
+	ovpn_crypto_state_release(&peer->crypto);
 	ovpn_peer_free(peer);
 }
 
@@ -334,6 +339,25 @@ struct ovpn_peer *ovpn_peer_get_by_dst(struct ovpn_struct *ovpn,
 	return peer;
 }
 
+struct ovpn_peer *ovpn_peer_get_by_src(struct ovpn_struct *ovpn,
+				       struct sk_buff *skb)
+{
+	struct ovpn_peer *tmp, *peer = NULL;
+
+	/* in P2P mode, no matter the destination, packets are always sent to
+	 * the single peer listening on the other side
+	 */
+	if (ovpn->mode == OVPN_MODE_P2P) {
+		rcu_read_lock();
+		tmp = rcu_dereference(ovpn->peer);
+		if (likely(tmp && ovpn_peer_hold(tmp)))
+			peer = tmp;
+		rcu_read_unlock();
+	}
+
+	return peer;
+}
+
 /**
  * ovpn_peer_add_p2p - add per to related tables in a P2P instance
  * @ovpn: the instance to add the peer to
diff --git a/drivers/net/ovpn/peer.h b/drivers/net/ovpn/peer.h
index f8b2157b416f..da41d711745c 100644
--- a/drivers/net/ovpn/peer.h
+++ b/drivers/net/ovpn/peer.h
@@ -11,6 +11,8 @@
 #define _NET_OVPN_OVPNPEER_H_
 
 #include "bind.h"
+#include "pktid.h"
+#include "crypto.h"
 #include "socket.h"
 
 #include <linux/ptr_ring.h>
@@ -30,6 +32,7 @@
  * @netif_rx_ring: queue of packets to be sent to the netdevice via NAPI
  * @napi: NAPI object
  * @sock: the socket being used to talk to this peer
+ * @crypto: the crypto configuration (ciphers, keys, etc..)
  * @dst_cache: cache for dst_entry used to send to peer
  * @bind: remote peer binding
  * @halt: true if ovpn_peer_mark_delete was called
@@ -53,6 +56,7 @@ struct ovpn_peer {
 	struct ptr_ring netif_rx_ring;
 	struct napi_struct napi;
 	struct ovpn_socket *sock;
+	struct ovpn_crypto_state crypto;
 	struct dst_cache dst_cache;
 	struct ovpn_bind __rcu *bind;
 	bool halt;
@@ -160,4 +164,14 @@ struct ovpn_peer *ovpn_peer_get_by_id(struct ovpn_struct *ovpn, u32 peer_id);
 struct ovpn_peer *ovpn_peer_get_by_dst(struct ovpn_struct *ovpn,
 				       struct sk_buff *skb);
 
+/**
+ * ovpn_peer_get_by_src - retrieve peer by matching skb source address
+ * @ovpn: the openvpn instance to search
+ * @skb: the packet to use for matching
+ *
+ * Return: the peer if found or NULL otherwise
+ */
+struct ovpn_peer *ovpn_peer_get_by_src(struct ovpn_struct *ovpn,
+				       struct sk_buff *skb);
+
 #endif /* _NET_OVPN_OVPNPEER_H_ */
diff --git a/drivers/net/ovpn/pktid.c b/drivers/net/ovpn/pktid.c
new file mode 100644
index 000000000000..f1fc4ead3336
--- /dev/null
+++ b/drivers/net/ovpn/pktid.c
@@ -0,0 +1,132 @@
+// SPDX-License-Identifier: GPL-2.0
+/*  OpenVPN data channel offload
+ *
+ *  Copyright (C) 2020-2024 OpenVPN, Inc.
+ *
+ *  Author:	Antonio Quartulli <antonio@openvpn.net>
+ *		James Yonan <james@openvpn.net>
+ */
+
+#include <linux/atomic.h>
+#include <linux/jiffies.h>
+#include <linux/net.h>
+#include <linux/netdevice.h>
+#include <linux/types.h>
+
+#include "ovpnstruct.h"
+#include "main.h"
+#include "packet.h"
+#include "pktid.h"
+
+void ovpn_pktid_xmit_init(struct ovpn_pktid_xmit *pid)
+{
+	atomic64_set(&pid->seq_num, 1);
+}
+
+void ovpn_pktid_recv_init(struct ovpn_pktid_recv *pr)
+{
+	memset(pr, 0, sizeof(*pr));
+	spin_lock_init(&pr->lock);
+}
+
+/* Packet replay detection.
+ * Allows ID backtrack of up to REPLAY_WINDOW_SIZE - 1.
+ */
+int ovpn_pktid_recv(struct ovpn_pktid_recv *pr, u32 pkt_id, u32 pkt_time)
+{
+	const unsigned long now = jiffies;
+	int ret;
+
+	spin_lock(&pr->lock);
+
+	/* expire backtracks at or below pr->id after PKTID_RECV_EXPIRE time */
+	if (unlikely(time_after_eq(now, pr->expire)))
+		pr->id_floor = pr->id;
+
+	/* ID must not be zero */
+	if (unlikely(pkt_id == 0)) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	/* time changed? */
+	if (unlikely(pkt_time != pr->time)) {
+		if (pkt_time > pr->time) {
+			/* time moved forward, accept */
+			pr->base = 0;
+			pr->extent = 0;
+			pr->id = 0;
+			pr->time = pkt_time;
+			pr->id_floor = 0;
+		} else {
+			/* time moved backward, reject */
+			ret = -ETIME;
+			goto out;
+		}
+	}
+
+	if (likely(pkt_id == pr->id + 1)) {
+		/* well-formed ID sequence (incremented by 1) */
+		pr->base = REPLAY_INDEX(pr->base, -1);
+		pr->history[pr->base / 8] |= (1 << (pr->base % 8));
+		if (pr->extent < REPLAY_WINDOW_SIZE)
+			++pr->extent;
+		pr->id = pkt_id;
+	} else if (pkt_id > pr->id) {
+		/* ID jumped forward by more than one */
+		const unsigned int delta = pkt_id - pr->id;
+
+		if (delta < REPLAY_WINDOW_SIZE) {
+			unsigned int i;
+
+			pr->base = REPLAY_INDEX(pr->base, -delta);
+			pr->history[pr->base / 8] |= (1 << (pr->base % 8));
+			pr->extent += delta;
+			if (pr->extent > REPLAY_WINDOW_SIZE)
+				pr->extent = REPLAY_WINDOW_SIZE;
+			for (i = 1; i < delta; ++i) {
+				unsigned int newb = REPLAY_INDEX(pr->base, i);
+
+				pr->history[newb / 8] &= ~BIT(newb % 8);
+			}
+		} else {
+			pr->base = 0;
+			pr->extent = REPLAY_WINDOW_SIZE;
+			memset(pr->history, 0, sizeof(pr->history));
+			pr->history[0] = 1;
+		}
+		pr->id = pkt_id;
+	} else {
+		/* ID backtrack */
+		const unsigned int delta = pr->id - pkt_id;
+
+		if (delta > pr->max_backtrack)
+			pr->max_backtrack = delta;
+		if (delta < pr->extent) {
+			if (pkt_id > pr->id_floor) {
+				const unsigned int ri = REPLAY_INDEX(pr->base,
+								     delta);
+				u8 *p = &pr->history[ri / 8];
+				const u8 mask = (1 << (ri % 8));
+
+				if (*p & mask) {
+					ret = -EINVAL;
+					goto out;
+				}
+				*p |= mask;
+			} else {
+				ret = -EINVAL;
+				goto out;
+			}
+		} else {
+			ret = -EINVAL;
+			goto out;
+		}
+	}
+
+	pr->expire = now + PKTID_RECV_EXPIRE;
+	ret = 0;
+out:
+	spin_unlock(&pr->lock);
+	return ret;
+}
diff --git a/drivers/net/ovpn/pktid.h b/drivers/net/ovpn/pktid.h
new file mode 100644
index 000000000000..c7356f5cb12b
--- /dev/null
+++ b/drivers/net/ovpn/pktid.h
@@ -0,0 +1,85 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*  OpenVPN data channel offload
+ *
+ *  Copyright (C) 2020-2024 OpenVPN, Inc.
+ *
+ *  Author:	Antonio Quartulli <antonio@openvpn.net>
+ *		James Yonan <james@openvpn.net>
+ */
+
+#ifndef _NET_OVPN_OVPNPKTID_H_
+#define _NET_OVPN_OVPNPKTID_H_
+
+/* If no packets received for this length of time, set a backtrack floor
+ * at highest received packet ID thus far.
+ */
+#define PKTID_RECV_EXPIRE (30 * HZ)
+
+/* Packet-ID state for transmitter */
+struct ovpn_pktid_xmit {
+	atomic64_t seq_num;
+};
+
+/* replay window sizing in bytes = 2^REPLAY_WINDOW_ORDER */
+#define REPLAY_WINDOW_ORDER 8
+
+#define REPLAY_WINDOW_BYTES BIT(REPLAY_WINDOW_ORDER)
+#define REPLAY_WINDOW_SIZE  (REPLAY_WINDOW_BYTES * 8)
+#define REPLAY_INDEX(base, i) (((base) + (i)) & (REPLAY_WINDOW_SIZE - 1))
+
+/* Packet-ID state for receiver.
+ * Other than lock member, can be zeroed to initialize.
+ */
+struct ovpn_pktid_recv {
+	/* "sliding window" bitmask of recent packet IDs received */
+	u8 history[REPLAY_WINDOW_BYTES];
+	/* bit position of deque base in history */
+	unsigned int base;
+	/* extent (in bits) of deque in history */
+	unsigned int extent;
+	/* expiration of history in jiffies */
+	unsigned long expire;
+	/* highest sequence number received */
+	u32 id;
+	/* highest time stamp received */
+	u32 time;
+	/* we will only accept backtrack IDs > id_floor */
+	u32 id_floor;
+	unsigned int max_backtrack;
+	/* protects entire pktd ID state */
+	spinlock_t lock;
+};
+
+/* Get the next packet ID for xmit */
+static inline int ovpn_pktid_xmit_next(struct ovpn_pktid_xmit *pid, u32 *pktid)
+{
+	const s64 seq_num = atomic64_fetch_add_unless(&pid->seq_num, 1,
+						      0x100000000LL);
+	/* when the 32bit space is over, we return an error because the packet
+	 * ID is used to create the cipher IV and we do not want to reuse the
+	 * same value more than once
+	 */
+	if (unlikely(seq_num == 0x100000000LL))
+		return -ERANGE;
+
+	*pktid = (u32)seq_num;
+
+	return 0;
+}
+
+/* Write 12-byte AEAD IV to dest */
+static inline void ovpn_pktid_aead_write(const u32 pktid,
+					 const struct ovpn_nonce_tail *nt,
+					 unsigned char *dest)
+{
+	*(__force __be32 *)(dest) = htonl(pktid);
+	BUILD_BUG_ON(4 + sizeof(struct ovpn_nonce_tail) != NONCE_SIZE);
+	memcpy(dest + 4, nt->u8, sizeof(struct ovpn_nonce_tail));
+}
+
+void ovpn_pktid_xmit_init(struct ovpn_pktid_xmit *pid);
+void ovpn_pktid_recv_init(struct ovpn_pktid_recv *pr);
+
+int ovpn_pktid_recv(struct ovpn_pktid_recv *pr, u32 pkt_id, u32 pkt_time);
+
+#endif /* _NET_OVPN_OVPNPKTID_H_ */
diff --git a/drivers/net/ovpn/socket.c b/drivers/net/ovpn/socket.c
index 2ae04e883e13..e099a61b03fa 100644
--- a/drivers/net/ovpn/socket.c
+++ b/drivers/net/ovpn/socket.c
@@ -13,6 +13,7 @@
 #include "ovpnstruct.h"
 #include "main.h"
 #include "io.h"
+#include "packet.h"
 #include "peer.h"
 #include "socket.h"
 #include "udp.h"
diff --git a/drivers/net/ovpn/udp.c b/drivers/net/ovpn/udp.c
index 07182703e598..c2a88d26defd 100644
--- a/drivers/net/ovpn/udp.c
+++ b/drivers/net/ovpn/udp.c
@@ -19,6 +19,7 @@
 #include "main.h"
 #include "bind.h"
 #include "io.h"
+#include "packet.h"
 #include "peer.h"
 #include "proto.h"
 #include "socket.h"
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH net-next v3 12/24] ovpn: store tunnel and transport statistics
  2024-05-06  1:16 [PATCH net-next v3 00/24] Introducing OpenVPN Data Channel Offload Antonio Quartulli
                   ` (10 preceding siblings ...)
  2024-05-06  1:16 ` [PATCH net-next v3 11/24] ovpn: implement packet processing Antonio Quartulli
@ 2024-05-06  1:16 ` Antonio Quartulli
  2024-05-12  8:47   ` Sabrina Dubroca
  2024-05-06  1:16 ` [PATCH net-next v3 13/24] ovpn: implement TCP transport Antonio Quartulli
                   ` (12 subsequent siblings)
  24 siblings, 1 reply; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-06  1:16 UTC (permalink / raw
  To: netdev
  Cc: Jakub Kicinski, Sergey Ryazanov, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Esben Haabendal, Antonio Quartulli

Byte/packet counters for in-tunnel and transport streams
are now initialized and updated as needed.

To be exported via netlink.

Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
---
 drivers/net/ovpn/Makefile |  1 +
 drivers/net/ovpn/io.c     | 10 ++++++++
 drivers/net/ovpn/peer.c   |  3 +++
 drivers/net/ovpn/peer.h   | 13 +++++++---
 drivers/net/ovpn/stats.c  | 21 ++++++++++++++++
 drivers/net/ovpn/stats.h  | 52 +++++++++++++++++++++++++++++++++++++++
 6 files changed, 96 insertions(+), 4 deletions(-)
 create mode 100644 drivers/net/ovpn/stats.c
 create mode 100644 drivers/net/ovpn/stats.h

diff --git a/drivers/net/ovpn/Makefile b/drivers/net/ovpn/Makefile
index ccdaeced1982..d43fda72646b 100644
--- a/drivers/net/ovpn/Makefile
+++ b/drivers/net/ovpn/Makefile
@@ -17,4 +17,5 @@ ovpn-y += netlink-gen.o
 ovpn-y += peer.o
 ovpn-y += pktid.o
 ovpn-y += socket.o
+ovpn-y += stats.o
 ovpn-y += udp.o
diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c
index 66a4c551c191..699e7f1274db 100644
--- a/drivers/net/ovpn/io.c
+++ b/drivers/net/ovpn/io.c
@@ -10,6 +10,7 @@
 #include <linux/netdevice.h>
 #include <linux/skbuff.h>
 #include <net/gso.h>
+#include <net/ip.h>
 
 #include "ovpnstruct.h"
 #include "io.h"
@@ -19,6 +20,7 @@
 #include "crypto_aead.h"
 #include "netlink.h"
 #include "proto.h"
+#include "socket.h"
 #include "udp.h"
 
 int ovpn_struct_init(struct net_device *dev)
@@ -163,6 +165,8 @@ static int ovpn_decrypt_one(struct ovpn_peer *peer, struct sk_buff *skb)
 	int ret = -1;
 	u8 key_id;
 
+	ovpn_peer_stats_increment_rx(&peer->link_stats, skb->len);
+
 	/* get the key slot matching the key Id in the received packet */
 	key_id = ovpn_key_id_from_skb(skb);
 	ks = ovpn_crypto_key_id_to_slot(&peer->crypto, key_id);
@@ -184,6 +188,9 @@ static int ovpn_decrypt_one(struct ovpn_peer *peer, struct sk_buff *skb)
 		goto drop;
 	}
 
+	/* increment RX stats */
+	ovpn_peer_stats_increment_rx(&peer->vpn_stats, skb->len);
+
 	/* check if this is a valid datapacket that has to be delivered to the
 	 * tun interface
 	 */
@@ -278,6 +285,8 @@ static bool ovpn_encrypt_one(struct ovpn_peer *peer, struct sk_buff *skb)
 		goto err;
 	}
 
+	ovpn_peer_stats_increment_tx(&peer->vpn_stats, skb->len);
+
 	/* encrypt */
 	ret = ovpn_aead_encrypt(ks, skb, peer->id);
 	if (unlikely(ret < 0)) {
@@ -289,6 +298,7 @@ static bool ovpn_encrypt_one(struct ovpn_peer *peer, struct sk_buff *skb)
 
 	success = true;
 
+	ovpn_peer_stats_increment_tx(&peer->link_stats, skb->len);
 err:
 	ovpn_crypto_key_slot_put(ks);
 	return success;
diff --git a/drivers/net/ovpn/peer.c b/drivers/net/ovpn/peer.c
index 1b941deeede0..99a2ae42a332 100644
--- a/drivers/net/ovpn/peer.c
+++ b/drivers/net/ovpn/peer.c
@@ -20,6 +20,7 @@
 #include "main.h"
 #include "netlink.h"
 #include "peer.h"
+#include "socket.h"
 
 struct ovpn_peer *ovpn_peer_new(struct ovpn_struct *ovpn, u32 id)
 {
@@ -42,6 +43,8 @@ struct ovpn_peer *ovpn_peer_new(struct ovpn_struct *ovpn, u32 id)
 	ovpn_crypto_state_init(&peer->crypto);
 	spin_lock_init(&peer->lock);
 	kref_init(&peer->refcount);
+	ovpn_peer_stats_init(&peer->vpn_stats);
+	ovpn_peer_stats_init(&peer->link_stats);
 
 	INIT_WORK(&peer->encrypt_work, ovpn_encrypt_work);
 	INIT_WORK(&peer->decrypt_work, ovpn_decrypt_work);
diff --git a/drivers/net/ovpn/peer.h b/drivers/net/ovpn/peer.h
index da41d711745c..b5ff59a4b40f 100644
--- a/drivers/net/ovpn/peer.h
+++ b/drivers/net/ovpn/peer.h
@@ -10,14 +10,15 @@
 #ifndef _NET_OVPN_OVPNPEER_H_
 #define _NET_OVPN_OVPNPEER_H_
 
+#include <linux/ptr_ring.h>
+#include <net/dst_cache.h>
+#include <uapi/linux/ovpn.h>
+
 #include "bind.h"
 #include "pktid.h"
 #include "crypto.h"
 #include "socket.h"
-
-#include <linux/ptr_ring.h>
-#include <net/dst_cache.h>
-#include <uapi/linux/ovpn.h>
+#include "stats.h"
 
 /**
  * struct ovpn_peer - the main remote peer object
@@ -36,6 +37,8 @@
  * @dst_cache: cache for dst_entry used to send to peer
  * @bind: remote peer binding
  * @halt: true if ovpn_peer_mark_delete was called
+ * @vpn_stats: per-peer in-VPN TX/RX stays
+ * @link_stats: per-peer link/transport TX/RX stats
  * @delete_reason: why peer was deleted (i.e. timeout, transport error, ..)
  * @lock: protects binding to peer (bind)
  * @refcount: reference counter
@@ -60,6 +63,8 @@ struct ovpn_peer {
 	struct dst_cache dst_cache;
 	struct ovpn_bind __rcu *bind;
 	bool halt;
+	struct ovpn_peer_stats vpn_stats;
+	struct ovpn_peer_stats link_stats;
 	enum ovpn_del_peer_reason delete_reason;
 	spinlock_t lock; /* protects bind */
 	struct kref refcount;
diff --git a/drivers/net/ovpn/stats.c b/drivers/net/ovpn/stats.c
new file mode 100644
index 000000000000..78cd030fa26e
--- /dev/null
+++ b/drivers/net/ovpn/stats.c
@@ -0,0 +1,21 @@
+// SPDX-License-Identifier: GPL-2.0
+/*  OpenVPN data channel offload
+ *
+ *  Copyright (C) 2020-2024 OpenVPN, Inc.
+ *
+ *  Author:	James Yonan <james@openvpn.net>
+ *		Antonio Quartulli <antonio@openvpn.net>
+ */
+
+#include <linux/atomic.h>
+
+#include "stats.h"
+
+void ovpn_peer_stats_init(struct ovpn_peer_stats *ps)
+{
+	atomic64_set(&ps->rx.bytes, 0);
+	atomic_set(&ps->rx.packets, 0);
+
+	atomic64_set(&ps->tx.bytes, 0);
+	atomic_set(&ps->tx.packets, 0);
+}
diff --git a/drivers/net/ovpn/stats.h b/drivers/net/ovpn/stats.h
new file mode 100644
index 000000000000..5134e49c0458
--- /dev/null
+++ b/drivers/net/ovpn/stats.h
@@ -0,0 +1,52 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*  OpenVPN data channel offload
+ *
+ *  Copyright (C) 2020-2024 OpenVPN, Inc.
+ *
+ *  Author:	James Yonan <james@openvpn.net>
+ *		Antonio Quartulli <antonio@openvpn.net>
+ *		Lev Stipakov <lev@openvpn.net>
+ */
+
+#ifndef _NET_OVPN_OVPNSTATS_H_
+#define _NET_OVPN_OVPNSTATS_H_
+
+//#include <linux/atomic.h>
+//#include <linux/jiffies.h>
+
+/* per-peer stats, measured on transport layer */
+
+/* one stat */
+struct ovpn_peer_stat {
+	atomic64_t bytes;
+	atomic_t packets;
+};
+
+/* rx and tx stats, enabled by notify_per != 0 or period != 0 */
+struct ovpn_peer_stats {
+	struct ovpn_peer_stat rx;
+	struct ovpn_peer_stat tx;
+};
+
+void ovpn_peer_stats_init(struct ovpn_peer_stats *ps);
+
+static inline void ovpn_peer_stats_increment(struct ovpn_peer_stat *stat,
+					     const unsigned int n)
+{
+	atomic64_add(n, &stat->bytes);
+	atomic_inc(&stat->packets);
+}
+
+static inline void ovpn_peer_stats_increment_rx(struct ovpn_peer_stats *stats,
+						const unsigned int n)
+{
+	ovpn_peer_stats_increment(&stats->rx, n);
+}
+
+static inline void ovpn_peer_stats_increment_tx(struct ovpn_peer_stats *stats,
+						const unsigned int n)
+{
+	ovpn_peer_stats_increment(&stats->tx, n);
+}
+
+#endif /* _NET_OVPN_OVPNSTATS_H_ */
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH net-next v3 13/24] ovpn: implement TCP transport
  2024-05-06  1:16 [PATCH net-next v3 00/24] Introducing OpenVPN Data Channel Offload Antonio Quartulli
                   ` (11 preceding siblings ...)
  2024-05-06  1:16 ` [PATCH net-next v3 12/24] ovpn: store tunnel and transport statistics Antonio Quartulli
@ 2024-05-06  1:16 ` Antonio Quartulli
  2024-05-13 13:37   ` Antonio Quartulli
  2024-05-13 14:50   ` Sabrina Dubroca
  2024-05-06  1:16 ` [PATCH net-next v3 14/24] ovpn: implement multi-peer support Antonio Quartulli
                   ` (11 subsequent siblings)
  24 siblings, 2 replies; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-06  1:16 UTC (permalink / raw
  To: netdev
  Cc: Jakub Kicinski, Sergey Ryazanov, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Esben Haabendal, Antonio Quartulli

With this changem ovpn is allowed to communicate to peers also via TCP.

Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
---
 drivers/net/ovpn/Makefile |   1 +
 drivers/net/ovpn/io.c     |   5 +
 drivers/net/ovpn/main.c   |   9 +-
 drivers/net/ovpn/peer.h   |  29 +++
 drivers/net/ovpn/skb.h    |  51 ++++
 drivers/net/ovpn/socket.c |  20 ++
 drivers/net/ovpn/socket.h |  15 +-
 drivers/net/ovpn/tcp.c    | 511 ++++++++++++++++++++++++++++++++++++++
 drivers/net/ovpn/tcp.h    |  42 ++++
 9 files changed, 681 insertions(+), 2 deletions(-)
 create mode 100644 drivers/net/ovpn/skb.h
 create mode 100644 drivers/net/ovpn/tcp.c
 create mode 100644 drivers/net/ovpn/tcp.h

diff --git a/drivers/net/ovpn/Makefile b/drivers/net/ovpn/Makefile
index d43fda72646b..f4d4bd87c851 100644
--- a/drivers/net/ovpn/Makefile
+++ b/drivers/net/ovpn/Makefile
@@ -18,4 +18,5 @@ ovpn-y += peer.o
 ovpn-y += pktid.o
 ovpn-y += socket.o
 ovpn-y += stats.o
+ovpn-y += tcp.o
 ovpn-y += udp.o
diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c
index 699e7f1274db..49efcfff963c 100644
--- a/drivers/net/ovpn/io.c
+++ b/drivers/net/ovpn/io.c
@@ -21,6 +21,7 @@
 #include "netlink.h"
 #include "proto.h"
 #include "socket.h"
+#include "tcp.h"
 #include "udp.h"
 
 int ovpn_struct_init(struct net_device *dev)
@@ -307,6 +308,7 @@ static bool ovpn_encrypt_one(struct ovpn_peer *peer, struct sk_buff *skb)
 /* Process packets in TX queue in a transport-specific way.
  *
  * UDP transport - encrypt and send across the tunnel.
+ * TCP transport - encrypt and put into TCP TX queue.
  */
 void ovpn_encrypt_work(struct work_struct *work)
 {
@@ -340,6 +342,9 @@ void ovpn_encrypt_work(struct work_struct *work)
 					ovpn_udp_send_skb(peer->ovpn, peer,
 							  curr);
 					break;
+				case IPPROTO_TCP:
+					ovpn_tcp_send_skb(peer, curr);
+					break;
 				default:
 					/* no transport configured yet */
 					consume_skb(skb);
diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c
index 9ae9844dd281..a04d6e55a473 100644
--- a/drivers/net/ovpn/main.c
+++ b/drivers/net/ovpn/main.c
@@ -23,6 +23,7 @@
 #include "io.h"
 #include "packet.h"
 #include "peer.h"
+#include "tcp.h"
 
 /* Driver info */
 #define DRV_DESCRIPTION	"OpenVPN data channel offload (ovpn)"
@@ -247,8 +248,14 @@ static struct pernet_operations ovpn_pernet_ops = {
 
 static int __init ovpn_init(void)
 {
-	int err = register_netdevice_notifier(&ovpn_netdev_notifier);
+	int err = ovpn_tcp_init();
 
+	if (err) {
+		pr_err("ovpn: cannot initialize TCP component: %d\n", err);
+		return err;
+	}
+
+	err = register_netdevice_notifier(&ovpn_netdev_notifier);
 	if (err) {
 		pr_err("ovpn: can't register netdevice notifier: %d\n", err);
 		return err;
diff --git a/drivers/net/ovpn/peer.h b/drivers/net/ovpn/peer.h
index b5ff59a4b40f..ac4907705d98 100644
--- a/drivers/net/ovpn/peer.h
+++ b/drivers/net/ovpn/peer.h
@@ -33,6 +33,16 @@
  * @netif_rx_ring: queue of packets to be sent to the netdevice via NAPI
  * @napi: NAPI object
  * @sock: the socket being used to talk to this peer
+ * @tcp.tx_ring: queue for packets to be forwarded to userspace (TCP only)
+ * @tcp.tx_work: work for processing outgoing socket data (TCP only)
+ * @tcp.rx_work: wok for processing incoming socket data (TCP only)
+ * @tcp.raw_len: next packet length as read from the stream (TCP only)
+ * @tcp.skb: next packet being filled with data from the stream (TCP only)
+ * @tcp.offset: position of the next byte to write in the skb (TCP only)
+ * @tcp.data_len: next packet length converted to host order (TCP only)
+ * @tcp.sk_cb.sk_data_ready: pointer to original cb
+ * @tcp.sk_cb.sk_write_space: pointer to original cb
+ * @tcp.sk_cb.prot: pointer to original prot object
  * @crypto: the crypto configuration (ciphers, keys, etc..)
  * @dst_cache: cache for dst_entry used to send to peer
  * @bind: remote peer binding
@@ -59,6 +69,25 @@ struct ovpn_peer {
 	struct ptr_ring netif_rx_ring;
 	struct napi_struct napi;
 	struct ovpn_socket *sock;
+	/* state of the TCP reading. Needed to keep track of how much of a
+	 * single packet has already been read from the stream and how much is
+	 * missing
+	 */
+	struct {
+		struct ptr_ring tx_ring;
+		struct work_struct tx_work;
+		struct work_struct rx_work;
+
+		u8 raw_len[sizeof(u16)];
+		struct sk_buff *skb;
+		u16 offset;
+		u16 data_len;
+		struct {
+			void (*sk_data_ready)(struct sock *sk);
+			void (*sk_write_space)(struct sock *sk);
+			struct proto *prot;
+		} sk_cb;
+	} tcp;
 	struct ovpn_crypto_state crypto;
 	struct dst_cache dst_cache;
 	struct ovpn_bind __rcu *bind;
diff --git a/drivers/net/ovpn/skb.h b/drivers/net/ovpn/skb.h
new file mode 100644
index 000000000000..ba92811e12ff
--- /dev/null
+++ b/drivers/net/ovpn/skb.h
@@ -0,0 +1,51 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*  OpenVPN data channel offload
+ *
+ *  Copyright (C) 2020-2024 OpenVPN, Inc.
+ *
+ *  Author:	Antonio Quartulli <antonio@openvpn.net>
+ *		James Yonan <james@openvpn.net>
+ */
+
+#ifndef _NET_OVPN_SKB_H_
+#define _NET_OVPN_SKB_H_
+
+#include <linux/in.h>
+#include <linux/in6.h>
+#include <linux/ip.h>
+#include <linux/skbuff.h>
+#include <linux/socket.h>
+#include <linux/types.h>
+
+#define OVPN_SKB_CB(skb) ((struct ovpn_skb_cb *)&((skb)->cb))
+
+struct ovpn_skb_cb {
+	union {
+		struct in_addr ipv4;
+		struct in6_addr ipv6;
+	} local;
+	sa_family_t sa_fam;
+};
+
+/* Return IP protocol version from skb header.
+ * Return 0 if protocol is not IPv4/IPv6 or cannot be read.
+ */
+static inline __be16 ovpn_ip_check_protocol(struct sk_buff *skb)
+{
+	__be16 proto = 0;
+
+	/* skb could be non-linear,
+	 * make sure IP header is in non-fragmented part
+	 */
+	if (!pskb_network_may_pull(skb, sizeof(struct iphdr)))
+		return 0;
+
+	if (ip_hdr(skb)->version == 4)
+		proto = htons(ETH_P_IP);
+	else if (ip_hdr(skb)->version == 6)
+		proto = htons(ETH_P_IPV6);
+
+	return proto;
+}
+
+#endif /* _NET_OVPN_SKB_H_ */
diff --git a/drivers/net/ovpn/socket.c b/drivers/net/ovpn/socket.c
index e099a61b03fa..004db5b13663 100644
--- a/drivers/net/ovpn/socket.c
+++ b/drivers/net/ovpn/socket.c
@@ -16,6 +16,7 @@
 #include "packet.h"
 #include "peer.h"
 #include "socket.h"
+#include "tcp.h"
 #include "udp.h"
 
 /* Finalize release of socket, called after RCU grace period */
@@ -26,6 +27,8 @@ static void ovpn_socket_detach(struct socket *sock)
 
 	if (sock->sk->sk_protocol == IPPROTO_UDP)
 		ovpn_udp_socket_detach(sock);
+	else if (sock->sk->sk_protocol == IPPROTO_TCP)
+		ovpn_tcp_socket_detach(sock);
 
 	sockfd_put(sock);
 }
@@ -69,6 +72,8 @@ static int ovpn_socket_attach(struct socket *sock, struct ovpn_peer *peer)
 
 	if (sock->sk->sk_protocol == IPPROTO_UDP)
 		ret = ovpn_udp_socket_attach(sock, peer->ovpn);
+	else if (sock->sk->sk_protocol == IPPROTO_TCP)
+		ret = ovpn_tcp_socket_attach(sock, peer);
 
 	return ret;
 }
@@ -124,6 +129,21 @@ struct ovpn_socket *ovpn_socket_new(struct socket *sock, struct ovpn_peer *peer)
 	ovpn_sock->sock = sock;
 	kref_init(&ovpn_sock->refcount);
 
+	/* TCP sockets are per-peer, therefore they are linked to their unique
+	 * peer
+	 */
+	if (sock->sk->sk_protocol == IPPROTO_TCP) {
+		ovpn_sock->peer = peer;
+		ret = ptr_ring_init(&ovpn_sock->recv_ring, OVPN_QUEUE_LEN,
+				    GFP_KERNEL);
+		if (ret < 0) {
+			netdev_err(peer->ovpn->dev, "%s: cannot allocate TCP recv ring\n",
+				   __func__);
+			kfree(ovpn_sock);
+			return ERR_PTR(ret);
+		}
+	}
+
 	rcu_assign_sk_user_data(sock->sk, ovpn_sock);
 
 	return ovpn_sock;
diff --git a/drivers/net/ovpn/socket.h b/drivers/net/ovpn/socket.h
index 0d23de5a9344..88c6271ba5c7 100644
--- a/drivers/net/ovpn/socket.h
+++ b/drivers/net/ovpn/socket.h
@@ -21,12 +21,25 @@ struct ovpn_peer;
 /**
  * struct ovpn_socket - a kernel socket referenced in the ovpn code
  * @ovpn: ovpn instance owning this socket (UDP only)
+ * @peer: unique peer transmitting over this socket (TCP only)
+ * @recv_ring: queue where non-data packets directed to userspace are stored
  * @sock: the low level sock object
  * @refcount: amount of contexts currently referencing this object
  * @rcu: member used to schedule RCU destructor callback
  */
 struct ovpn_socket {
-	struct ovpn_struct *ovpn;
+	union {
+		/* the VPN session object owning this socket (UDP only) */
+		struct ovpn_struct *ovpn;
+
+		/* TCP only */
+		struct {
+			/** @peer: unique peer transmitting over this socket */
+			struct ovpn_peer *peer;
+			struct ptr_ring recv_ring;
+		};
+	};
+
 	struct socket *sock;
 	struct kref refcount;
 	struct rcu_head rcu;
diff --git a/drivers/net/ovpn/tcp.c b/drivers/net/ovpn/tcp.c
new file mode 100644
index 000000000000..84ad7cd4fc4f
--- /dev/null
+++ b/drivers/net/ovpn/tcp.c
@@ -0,0 +1,511 @@
+// SPDX-License-Identifier: GPL-2.0
+/*  OpenVPN data channel offload
+ *
+ *  Copyright (C) 2019-2024 OpenVPN, Inc.
+ *
+ *  Author:	Antonio Quartulli <antonio@openvpn.net>
+ */
+
+#include <linux/ptr_ring.h>
+#include <linux/skbuff.h>
+#include <net/tcp.h>
+#include <net/route.h>
+
+#include "ovpnstruct.h"
+#include "main.h"
+#include "io.h"
+#include "packet.h"
+#include "peer.h"
+#include "proto.h"
+#include "skb.h"
+#include "socket.h"
+#include "tcp.h"
+
+static struct proto ovpn_tcp_prot;
+
+static int ovpn_tcp_read_sock(read_descriptor_t *desc, struct sk_buff *in_skb,
+			      unsigned int in_offset, size_t in_len)
+{
+	struct sock *sk = desc->arg.data;
+	struct ovpn_socket *sock;
+	struct ovpn_skb_cb *cb;
+	struct ovpn_peer *peer;
+	size_t chunk, copied = 0;
+	void *data;
+	u16 len;
+	int st;
+
+	rcu_read_lock();
+	sock = rcu_dereference_sk_user_data(sk);
+	rcu_read_unlock();
+
+	if (unlikely(!sock || !sock->peer)) {
+		pr_err("ovpn: read_sock triggered for socket with no metadata\n");
+		desc->error = -EINVAL;
+		return 0;
+	}
+
+	peer = sock->peer;
+
+	while (in_len > 0) {
+		/* no skb allocated means that we have to read (or finish
+		 * reading) the 2 bytes prefix containing the actual packet
+		 * size.
+		 */
+		if (!peer->tcp.skb) {
+			chunk = min_t(size_t, in_len,
+				      sizeof(u16) - peer->tcp.offset);
+			WARN_ON(skb_copy_bits(in_skb, in_offset,
+					      peer->tcp.raw_len +
+					      peer->tcp.offset, chunk) < 0);
+			peer->tcp.offset += chunk;
+
+			/* keep on reading until we got the whole packet size */
+			if (peer->tcp.offset != sizeof(u16))
+				goto next_read;
+
+			len = ntohs(*(__be16 *)peer->tcp.raw_len);
+			/* invalid packet length: this is a fatal TCP error */
+			if (!len) {
+				netdev_err(peer->ovpn->dev,
+					   "%s: received invalid packet length: %d\n",
+					   __func__, len);
+				desc->error = -EINVAL;
+				goto err;
+			}
+
+			/* add 2 bytes to allocated space (and immediately
+			 * reserve them) for packet length prepending, in case
+			 * the skb has to be forwarded to userspace
+			 */
+			peer->tcp.skb =
+				netdev_alloc_skb_ip_align(peer->ovpn->dev,
+							  len + sizeof(u16));
+			if (!peer->tcp.skb) {
+				desc->error = -ENOMEM;
+				goto err;
+			}
+			skb_reserve(peer->tcp.skb, sizeof(u16));
+
+			peer->tcp.offset = 0;
+			peer->tcp.data_len = len;
+		} else {
+			chunk = min_t(size_t, in_len,
+				      peer->tcp.data_len - peer->tcp.offset);
+
+			/* extend skb to accommodate the new chunk and copy it
+			 * from the input skb
+			 */
+			data = skb_put(peer->tcp.skb, chunk);
+			WARN_ON(skb_copy_bits(in_skb, in_offset, data,
+					      chunk) < 0);
+			peer->tcp.offset += chunk;
+
+			/* keep on reading until we get the full packet */
+			if (peer->tcp.offset != peer->tcp.data_len)
+				goto next_read;
+
+			/* do not perform IP caching for TCP connections */
+			cb = OVPN_SKB_CB(peer->tcp.skb);
+			cb->sa_fam = AF_UNSPEC;
+
+			/* At this point we know the packet is from a configured
+			 * peer.
+			 * DATA_V2 packets are handled in kernel space, the rest
+			 * goes to user space.
+			 *
+			 * Queue skb for sending to userspace via recvmsg on the
+			 * socket
+			 */
+			if (likely(ovpn_opcode_from_skb(peer->tcp.skb, 0) ==
+				   OVPN_DATA_V2)) {
+				/* hold reference to peer as required by
+				 * ovpn_recv().
+				 *
+				 * NOTE: in this context we should already be
+				 * holding a reference to this peer, therefore
+				 * ovpn_peer_hold() is not expected to fail
+				 */
+				WARN_ON(!ovpn_peer_hold(peer));
+				st = ovpn_recv(peer->ovpn, peer, peer->tcp.skb);
+				if (unlikely(st < 0))
+					ovpn_peer_put(peer);
+
+			} else {
+				/* prepend skb with packet len. this way
+				 * userspace can parse the packet as if it just
+				 * arrived from the remote endpoint
+				 */
+				void *raw_len = __skb_push(peer->tcp.skb,
+							   sizeof(u16));
+
+				memcpy(raw_len, peer->tcp.raw_len, sizeof(u16));
+
+				st = ptr_ring_produce_bh(&peer->sock->recv_ring,
+							 peer->tcp.skb);
+				if (likely(!st))
+					peer->tcp.sk_cb.sk_data_ready(sk);
+			}
+
+			/* skb not consumed - free it now */
+			if (unlikely(st < 0))
+				kfree_skb(peer->tcp.skb);
+
+			peer->tcp.skb = NULL;
+			peer->tcp.offset = 0;
+			peer->tcp.data_len = 0;
+		}
+next_read:
+		in_len -= chunk;
+		in_offset += chunk;
+		copied += chunk;
+	}
+
+	return copied;
+err:
+	netdev_err(peer->ovpn->dev, "cannot process incoming TCP data: %d\n",
+		   desc->error);
+	ovpn_peer_del(peer, OVPN_DEL_PEER_REASON_TRANSPORT_ERROR);
+	return 0;
+}
+
+static void ovpn_tcp_data_ready(struct sock *sk)
+{
+	struct socket *sock = sk->sk_socket;
+	read_descriptor_t desc;
+
+	if (unlikely(!sock || !sock->ops || !sock->ops->read_sock))
+		return;
+
+	desc.arg.data = sk;
+	desc.error = 0;
+	desc.count = 1;
+
+	sock->ops->read_sock(sk, &desc, ovpn_tcp_read_sock);
+}
+
+static void ovpn_tcp_write_space(struct sock *sk)
+{
+	struct ovpn_socket *sock;
+
+	rcu_read_lock();
+	sock = rcu_dereference_sk_user_data(sk);
+	rcu_read_unlock();
+
+	if (!sock || !sock->peer)
+		return;
+
+	queue_work(sock->peer->ovpn->events_wq, &sock->peer->tcp.tx_work);
+}
+
+static bool ovpn_tcp_sock_is_readable(struct sock *sk)
+
+{
+	struct ovpn_socket *sock;
+
+	rcu_read_lock();
+	sock = rcu_dereference_sk_user_data(sk);
+	rcu_read_unlock();
+
+	if (!sock || !sock->peer)
+		return false;
+
+	return !ptr_ring_empty_bh(&sock->recv_ring);
+}
+
+static int ovpn_tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
+			    int flags, int *addr_len)
+{
+	bool tmp = flags & MSG_DONTWAIT;
+	DEFINE_WAIT_FUNC(wait, woken_wake_function);
+	int ret, chunk, copied = 0;
+	struct ovpn_socket *sock;
+	struct sk_buff *skb;
+	long timeo;
+
+	if (unlikely(flags & MSG_ERRQUEUE))
+		return sock_recv_errqueue(sk, msg, len, SOL_IP, IP_RECVERR);
+
+	timeo = sock_rcvtimeo(sk, tmp);
+
+	rcu_read_lock();
+	sock = rcu_dereference_sk_user_data(sk);
+	rcu_read_unlock();
+
+	if (!sock || !sock->peer) {
+		ret = -EBADF;
+		goto unlock;
+	}
+
+	while (ptr_ring_empty_bh(&sock->recv_ring)) {
+		if (sk->sk_shutdown & RCV_SHUTDOWN)
+			return 0;
+
+		if (sock_flag(sk, SOCK_DONE))
+			return 0;
+
+		if (!timeo) {
+			ret = -EAGAIN;
+			goto unlock;
+		}
+
+		add_wait_queue(sk_sleep(sk), &wait);
+		sk_set_bit(SOCKWQ_ASYNC_WAITDATA, sk);
+		sk_wait_event(sk, &timeo, !ptr_ring_empty_bh(&sock->recv_ring),
+			      &wait);
+		sk_clear_bit(SOCKWQ_ASYNC_WAITDATA, sk);
+		remove_wait_queue(sk_sleep(sk), &wait);
+
+		/* take care of signals */
+		if (signal_pending(current)) {
+			ret = sock_intr_errno(timeo);
+			goto unlock;
+		}
+	}
+
+	while (len && (skb = __ptr_ring_peek(&sock->recv_ring))) {
+		chunk = min_t(size_t, len, skb->len);
+		ret = skb_copy_datagram_msg(skb, 0, msg, chunk);
+		if (ret < 0) {
+			pr_err("ovpn: cannot copy TCP data to userspace: %d\n",
+			       ret);
+			kfree_skb(skb);
+			goto unlock;
+		}
+
+		__skb_pull(skb, chunk);
+
+		if (!skb->len) {
+			/* skb was entirely consumed and can now be removed from
+			 * the ring
+			 */
+			__ptr_ring_discard_one(&sock->recv_ring);
+			consume_skb(skb);
+		}
+
+		len -= chunk;
+		copied += chunk;
+	}
+	ret = copied;
+
+unlock:
+	return ret ? : -EAGAIN;
+}
+
+static void ovpn_destroy_skb(void *skb)
+{
+	consume_skb(skb);
+}
+
+void ovpn_tcp_socket_detach(struct socket *sock)
+{
+	struct ovpn_socket *ovpn_sock;
+	struct ovpn_peer *peer;
+
+	if (!sock)
+		return;
+
+	rcu_read_lock();
+	ovpn_sock = rcu_dereference_sk_user_data(sock->sk);
+	rcu_read_unlock();
+
+	if (!ovpn_sock->peer)
+		return;
+
+	peer = ovpn_sock->peer;
+
+	/* restore CBs that were saved in ovpn_sock_set_tcp_cb() */
+	write_lock_bh(&sock->sk->sk_callback_lock);
+	sock->sk->sk_data_ready = peer->tcp.sk_cb.sk_data_ready;
+	sock->sk->sk_write_space = peer->tcp.sk_cb.sk_write_space;
+	sock->sk->sk_prot = peer->tcp.sk_cb.prot;
+	rcu_assign_sk_user_data(sock->sk, NULL);
+	write_unlock_bh(&sock->sk->sk_callback_lock);
+
+	/* cancel any ongoing work. Done after removing the CBs so that these
+	 * workers cannot be re-armed
+	 */
+	cancel_work_sync(&peer->tcp.tx_work);
+
+	ptr_ring_cleanup(&ovpn_sock->recv_ring, ovpn_destroy_skb);
+	ptr_ring_cleanup(&peer->tcp.tx_ring, ovpn_destroy_skb);
+}
+
+/* Try to send one skb (or part of it) over the TCP stream.
+ *
+ * Return 0 on success or a negative error code otherwise.
+ *
+ * Note that the skb is modified by putting away the data being sent, therefore
+ * the caller should check if skb->len is zero to understand if the full skb was
+ * sent or not.
+ */
+static int ovpn_tcp_send_one(struct ovpn_peer *peer, struct sk_buff *skb)
+{
+	struct msghdr msg = { .msg_flags = MSG_DONTWAIT | MSG_NOSIGNAL };
+	struct kvec iv = { 0 };
+	int ret;
+
+	if (skb_linearize(skb) < 0) {
+		net_err_ratelimited("%s: can't linearize packet\n", __func__);
+		return -ENOMEM;
+	}
+
+	/* initialize iv structure now as skb_linearize() may have changed
+	 * skb->data
+	 */
+	iv.iov_base = skb->data;
+	iv.iov_len = skb->len;
+
+	ret = kernel_sendmsg(peer->sock->sock, &msg, &iv, 1, iv.iov_len);
+	if (ret > 0) {
+		__skb_pull(skb, ret);
+
+		/* since we update per-cpu stats in process context,
+		 * we need to disable softirqs
+		 */
+		local_bh_disable();
+		dev_sw_netstats_tx_add(peer->ovpn->dev, 1, ret);
+		local_bh_enable();
+
+		return 0;
+	}
+
+	return ret;
+}
+
+/* Process packets in TCP TX queue */
+static void ovpn_tcp_tx_work(struct work_struct *work)
+{
+	struct ovpn_peer *peer;
+	struct sk_buff *skb;
+	int ret;
+
+	peer = container_of(work, struct ovpn_peer, tcp.tx_work);
+	while ((skb = __ptr_ring_peek(&peer->tcp.tx_ring))) {
+		ret = ovpn_tcp_send_one(peer, skb);
+		if (ret < 0 && ret != -EAGAIN) {
+			net_warn_ratelimited("%s: cannot send TCP packet to peer %u: %d\n",
+					     __func__, peer->id, ret);
+			/* in case of TCP error stop sending loop and delete
+			 * peer
+			 */
+			ovpn_peer_del(peer,
+				      OVPN_DEL_PEER_REASON_TRANSPORT_ERROR);
+			break;
+		} else if (!skb->len) {
+			/* skb was entirely consumed and can now be removed from
+			 * the ring
+			 */
+			__ptr_ring_discard_one(&peer->tcp.tx_ring);
+			consume_skb(skb);
+		}
+
+		/* give a chance to be rescheduled if needed */
+		cond_resched();
+	}
+}
+
+/* Put packet into TCP TX queue and schedule a consumer */
+void ovpn_queue_tcp_skb(struct ovpn_peer *peer, struct sk_buff *skb)
+{
+	int ret;
+
+	ret = ptr_ring_produce_bh(&peer->tcp.tx_ring, skb);
+	if (ret < 0) {
+		kfree_skb_list(skb);
+		return;
+	}
+
+	queue_work(peer->ovpn->events_wq, &peer->tcp.tx_work);
+}
+
+/* Set TCP encapsulation callbacks */
+int ovpn_tcp_socket_attach(struct socket *sock, struct ovpn_peer *peer)
+{
+	void *old_data;
+	int ret;
+
+	INIT_WORK(&peer->tcp.tx_work, ovpn_tcp_tx_work);
+
+	ret = ptr_ring_init(&peer->tcp.tx_ring, OVPN_QUEUE_LEN, GFP_KERNEL);
+	if (ret < 0) {
+		netdev_err(peer->ovpn->dev, "cannot allocate TCP TX ring\n");
+		return ret;
+	}
+
+	peer->tcp.skb = NULL;
+	peer->tcp.offset = 0;
+	peer->tcp.data_len = 0;
+
+	write_lock_bh(&sock->sk->sk_callback_lock);
+
+	/* make sure no pre-existing encapsulation handler exists */
+	rcu_read_lock();
+	old_data = rcu_dereference_sk_user_data(sock->sk);
+	rcu_read_unlock();
+	if (old_data) {
+		netdev_err(peer->ovpn->dev,
+			   "provided socket already taken by other user\n");
+		ret = -EBUSY;
+		goto err;
+	}
+
+	/* sanity check */
+	if (sock->sk->sk_protocol != IPPROTO_TCP) {
+		netdev_err(peer->ovpn->dev,
+			   "provided socket is UDP but expected TCP\n");
+		ret = -EINVAL;
+		goto err;
+	}
+
+	/* only a fully connected socket are expected. Connection should be
+	 * handled in userspace
+	 */
+	if (sock->sk->sk_state != TCP_ESTABLISHED) {
+		netdev_err(peer->ovpn->dev,
+			   "provided TCP socket is not in ESTABLISHED state: %d\n",
+			   sock->sk->sk_state);
+		ret = -EINVAL;
+		goto err;
+	}
+
+	/* save current CBs so that they can be restored upon socket release */
+	peer->tcp.sk_cb.sk_data_ready = sock->sk->sk_data_ready;
+	peer->tcp.sk_cb.sk_write_space = sock->sk->sk_write_space;
+	peer->tcp.sk_cb.prot = sock->sk->sk_prot;
+
+	/* assign our static CBs */
+	sock->sk->sk_data_ready = ovpn_tcp_data_ready;
+	sock->sk->sk_write_space = ovpn_tcp_write_space;
+	sock->sk->sk_prot = &ovpn_tcp_prot;
+
+	write_unlock_bh(&sock->sk->sk_callback_lock);
+
+	return 0;
+err:
+	write_unlock_bh(&sock->sk->sk_callback_lock);
+	ptr_ring_cleanup(&peer->tcp.tx_ring, NULL);
+
+	return ret;
+}
+
+int __init ovpn_tcp_init(void)
+{
+	/* We need to substitute the recvmsg and the sock_is_readable
+	 * callbacks in the sk_prot member of the sock object for TCP
+	 * sockets.
+	 *
+	 * However sock->sk_prot is a pointer to a static variable and
+	 * therefore we can't directly modify it, otherwise every socket
+	 * pointing to it will be affected.
+	 *
+	 * For this reason we create our own static copy and modify what
+	 * we need. Then we make sk_prot point to this copy
+	 * (in ovpn_tcp_socket_attach())
+	 */
+	ovpn_tcp_prot = tcp_prot;
+	ovpn_tcp_prot.recvmsg = ovpn_tcp_recvmsg;
+	ovpn_tcp_prot.sock_is_readable = ovpn_tcp_sock_is_readable;
+
+	return 0;
+}
diff --git a/drivers/net/ovpn/tcp.h b/drivers/net/ovpn/tcp.h
new file mode 100644
index 000000000000..7e73f6e76e6c
--- /dev/null
+++ b/drivers/net/ovpn/tcp.h
@@ -0,0 +1,42 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*  OpenVPN data channel offload
+ *
+ *  Copyright (C) 2019-2024 OpenVPN, Inc.
+ *
+ *  Author:	Antonio Quartulli <antonio@openvpn.net>
+ */
+
+#ifndef _NET_OVPN_TCP_H_
+#define _NET_OVPN_TCP_H_
+
+#include <linux/net.h>
+#include <linux/skbuff.h>
+#include <linux/types.h>
+#include <linux/workqueue.h>
+
+#include "peer.h"
+
+/* Initialize TCP static objects */
+int __init ovpn_tcp_init(void);
+
+void ovpn_queue_tcp_skb(struct ovpn_peer *peer, struct sk_buff *skb);
+
+int ovpn_tcp_socket_attach(struct socket *sock, struct ovpn_peer *peer);
+void ovpn_tcp_socket_detach(struct socket *sock);
+
+/* Prepare skb and enqueue it for sending to peer.
+ *
+ * Preparation consist in prepending the skb payload with its size.
+ * Required by the OpenVPN protocol in order to extract packets from
+ * the TCP stream on the receiver side.
+ */
+static inline void ovpn_tcp_send_skb(struct ovpn_peer *peer,
+				     struct sk_buff *skb)
+{
+	u16 len = skb->len;
+
+	*(__be16 *)__skb_push(skb, sizeof(u16)) = htons(len);
+	ovpn_queue_tcp_skb(peer, skb);
+}
+
+#endif /* _NET_OVPN_TCP_H_ */
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH net-next v3 14/24] ovpn: implement multi-peer support
  2024-05-06  1:16 [PATCH net-next v3 00/24] Introducing OpenVPN Data Channel Offload Antonio Quartulli
                   ` (12 preceding siblings ...)
  2024-05-06  1:16 ` [PATCH net-next v3 13/24] ovpn: implement TCP transport Antonio Quartulli
@ 2024-05-06  1:16 ` Antonio Quartulli
  2024-05-28 14:44   ` Sabrina Dubroca
  2024-05-06  1:16 ` [PATCH net-next v3 15/24] ovpn: implement peer lookup logic Antonio Quartulli
                   ` (10 subsequent siblings)
  24 siblings, 1 reply; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-06  1:16 UTC (permalink / raw
  To: netdev
  Cc: Jakub Kicinski, Sergey Ryazanov, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Esben Haabendal, Antonio Quartulli

With this change an ovpn instance will be able to stay connected to
multiple remote endpoints.

This functionality is strictly required when running ovpn on an
OpenVPN server.

Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
---
 drivers/net/ovpn/io.c         |   1 +
 drivers/net/ovpn/main.c       |   8 +-
 drivers/net/ovpn/ovpnstruct.h |  10 +++
 drivers/net/ovpn/peer.c       | 149 ++++++++++++++++++++++++++++++++++
 drivers/net/ovpn/peer.h       |  14 ++++
 5 files changed, 181 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c
index 49efcfff963c..8ccf2700a370 100644
--- a/drivers/net/ovpn/io.c
+++ b/drivers/net/ovpn/io.c
@@ -36,6 +36,7 @@ int ovpn_struct_init(struct net_device *dev)
 		return err;
 
 	spin_lock_init(&ovpn->lock);
+	spin_lock_init(&ovpn->peers.lock);
 
 	ovpn->crypto_wq = alloc_workqueue("ovpn-crypto-wq-%s",
 					  WQ_CPU_INTENSIVE | WQ_MEM_RECLAIM, 0,
diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c
index a04d6e55a473..d6ba91c6571f 100644
--- a/drivers/net/ovpn/main.c
+++ b/drivers/net/ovpn/main.c
@@ -176,8 +176,14 @@ void ovpn_iface_destruct(struct ovpn_struct *ovpn)
 
 	ovpn->registered = false;
 
-	if (ovpn->mode == OVPN_MODE_P2P)
+	switch (ovpn->mode) {
+	case OVPN_MODE_P2P:
 		ovpn_peer_release_p2p(ovpn);
+		break;
+	default:
+		ovpn_peers_free(ovpn);
+		break;
+	}
 
 	unregister_netdevice(ovpn->dev);
 	synchronize_net();
diff --git a/drivers/net/ovpn/ovpnstruct.h b/drivers/net/ovpn/ovpnstruct.h
index 7414c2459fb9..58166fdeac63 100644
--- a/drivers/net/ovpn/ovpnstruct.h
+++ b/drivers/net/ovpn/ovpnstruct.h
@@ -21,6 +21,10 @@
  * @crypto_wq: used to schedule crypto work that may sleep during TX/RX
  * @event_wq: used to schedule generic events that may sleep and that need to be
  *            performed outside of softirq context
+ * @peers.by_id: table of peers index by ID
+ * @peers.by_transp_addr: table of peers indexed by transport address
+ * @peers.by_vpn_addr: table of peers indexed by VPN IP address
+ * @peers.lock: protects writes to peers tables
  * @peer: in P2P mode, this is the only remote peer
  * @dev_list: entry for the module wide device list
  */
@@ -31,6 +35,12 @@ struct ovpn_struct {
 	spinlock_t lock; /* protect writing to the ovpn_struct object */
 	struct workqueue_struct *crypto_wq;
 	struct workqueue_struct *events_wq;
+	struct {
+		DECLARE_HASHTABLE(by_id, 12);
+		DECLARE_HASHTABLE(by_transp_addr, 12);
+		DECLARE_HASHTABLE(by_vpn_addr, 12);
+		spinlock_t lock; /* protects writes to peers tables */
+	} peers;
 	struct ovpn_peer __rcu *peer;
 	struct list_head dev_list;
 };
diff --git a/drivers/net/ovpn/peer.c b/drivers/net/ovpn/peer.c
index 99a2ae42a332..38a89595dade 100644
--- a/drivers/net/ovpn/peer.c
+++ b/drivers/net/ovpn/peer.c
@@ -9,6 +9,7 @@
 
 #include <linux/skbuff.h>
 #include <linux/list.h>
+#include <linux/hashtable.h>
 #include <linux/workqueue.h>
 
 #include "ovpnstruct.h"
@@ -361,6 +362,91 @@ struct ovpn_peer *ovpn_peer_get_by_src(struct ovpn_struct *ovpn,
 	return peer;
 }
 
+/**
+ * ovpn_peer_add_mp - add per to related tables in a MP instance
+ * @ovpn: the instance to add the peer to
+ * @peer: the peer to add
+ *
+ * Return: 0 on success or a negative error code otherwise
+ */
+static int ovpn_peer_add_mp(struct ovpn_struct *ovpn, struct ovpn_peer *peer)
+{
+	struct sockaddr_storage sa = { 0 };
+	struct sockaddr_in6 *sa6;
+	struct sockaddr_in *sa4;
+	struct ovpn_bind *bind;
+	struct ovpn_peer *tmp;
+	size_t salen;
+	int ret = 0;
+	u32 index;
+
+	spin_lock_bh(&ovpn->peers.lock);
+	/* do not add duplicates */
+	tmp = ovpn_peer_get_by_id(ovpn, peer->id);
+	if (tmp) {
+		ovpn_peer_put(tmp);
+		ret = -EEXIST;
+		goto unlock;
+	}
+
+	hlist_del_init_rcu(&peer->hash_entry_transp_addr);
+	bind = rcu_dereference_protected(peer->bind, true);
+	/* peers connected via TCP have bind == NULL */
+	if (bind) {
+		switch (bind->sa.in4.sin_family) {
+		case AF_INET:
+			sa4 = (struct sockaddr_in *)&sa;
+
+			sa4->sin_family = AF_INET;
+			sa4->sin_addr.s_addr = bind->sa.in4.sin_addr.s_addr;
+			sa4->sin_port = bind->sa.in4.sin_port;
+			salen = sizeof(*sa4);
+			break;
+		case AF_INET6:
+			sa6 = (struct sockaddr_in6 *)&sa;
+
+			sa6->sin6_family = AF_INET6;
+			sa6->sin6_addr = bind->sa.in6.sin6_addr;
+			sa6->sin6_port = bind->sa.in6.sin6_port;
+			salen = sizeof(*sa6);
+			break;
+		default:
+			ret = -EPROTONOSUPPORT;
+			goto unlock;
+		}
+
+		index = ovpn_peer_index(ovpn->peers.by_transp_addr, &sa, salen);
+		hlist_add_head_rcu(&peer->hash_entry_transp_addr,
+				   &ovpn->peers.by_transp_addr[index]);
+	}
+
+	index = ovpn_peer_index(ovpn->peers.by_id, &peer->id, sizeof(peer->id));
+	hlist_add_head_rcu(&peer->hash_entry_id, &ovpn->peers.by_id[index]);
+
+	if (peer->vpn_addrs.ipv4.s_addr != htonl(INADDR_ANY)) {
+		index = ovpn_peer_index(ovpn->peers.by_vpn_addr,
+					&peer->vpn_addrs.ipv4,
+					sizeof(peer->vpn_addrs.ipv4));
+		hlist_add_head_rcu(&peer->hash_entry_addr4,
+				   &ovpn->peers.by_vpn_addr[index]);
+	}
+
+	hlist_del_init_rcu(&peer->hash_entry_addr6);
+	if (memcmp(&peer->vpn_addrs.ipv6, &in6addr_any,
+		   sizeof(peer->vpn_addrs.ipv6))) {
+		index = ovpn_peer_index(ovpn->peers.by_vpn_addr,
+					&peer->vpn_addrs.ipv6,
+					sizeof(peer->vpn_addrs.ipv6));
+		hlist_add_head_rcu(&peer->hash_entry_addr6,
+				   &ovpn->peers.by_vpn_addr[index]);
+	}
+
+unlock:
+	spin_unlock_bh(&ovpn->peers.lock);
+
+	return ret;
+}
+
 /**
  * ovpn_peer_add_p2p - add per to related tables in a P2P instance
  * @ovpn: the instance to add the peer to
@@ -391,6 +477,8 @@ static int ovpn_peer_add_p2p(struct ovpn_struct *ovpn, struct ovpn_peer *peer)
 int ovpn_peer_add(struct ovpn_struct *ovpn, struct ovpn_peer *peer)
 {
 	switch (ovpn->mode) {
+	case OVPN_MODE_MP:
+		return ovpn_peer_add_mp(ovpn, peer);
 	case OVPN_MODE_P2P:
 		return ovpn_peer_add_p2p(ovpn, peer);
 	default:
@@ -398,6 +486,53 @@ int ovpn_peer_add(struct ovpn_struct *ovpn, struct ovpn_peer *peer)
 	}
 }
 
+/**
+ * ovpn_peer_unhash - remove peer reference from all hashtables
+ * @peer: the peer to remove
+ * @reason: the delete reason to attach to the peer
+ */
+static void ovpn_peer_unhash(struct ovpn_peer *peer,
+			     enum ovpn_del_peer_reason reason)
+{
+	hlist_del_init_rcu(&peer->hash_entry_id);
+	hlist_del_init_rcu(&peer->hash_entry_addr4);
+	hlist_del_init_rcu(&peer->hash_entry_addr6);
+	hlist_del_init_rcu(&peer->hash_entry_transp_addr);
+
+	ovpn_peer_put(peer);
+	peer->delete_reason = reason;
+}
+
+/**
+ * ovpn_peer_del_mp - delete peer from related tables in a MP instance
+ * @peer: the peer to delete
+ * @reason: reason why the peer was deleted (sent to userspace)
+ *
+ * Return: 0 on success or a negative error code otherwise
+ */
+static int ovpn_peer_del_mp(struct ovpn_peer *peer,
+			    enum ovpn_del_peer_reason reason)
+{
+	struct ovpn_peer *tmp;
+	int ret = 0;
+
+	spin_lock_bh(&peer->ovpn->peers.lock);
+	tmp = ovpn_peer_get_by_id(peer->ovpn, peer->id);
+	if (tmp != peer) {
+		ret = -ENOENT;
+		goto unlock;
+	}
+	ovpn_peer_unhash(peer, reason);
+
+unlock:
+	spin_unlock_bh(&peer->ovpn->peers.lock);
+
+	if (tmp)
+		ovpn_peer_put(tmp);
+
+	return ret;
+}
+
 /**
  * ovpn_peer_del_p2p - delete peer from related tables in a P2P instance
  * @peer: the peer to delete
@@ -444,9 +579,23 @@ void ovpn_peer_release_p2p(struct ovpn_struct *ovpn)
 int ovpn_peer_del(struct ovpn_peer *peer, enum ovpn_del_peer_reason reason)
 {
 	switch (peer->ovpn->mode) {
+	case OVPN_MODE_MP:
+		return ovpn_peer_del_mp(peer, reason);
 	case OVPN_MODE_P2P:
 		return ovpn_peer_del_p2p(peer, reason);
 	default:
 		return -EOPNOTSUPP;
 	}
 }
+
+void ovpn_peers_free(struct ovpn_struct *ovpn)
+{
+	struct hlist_node *tmp;
+	struct ovpn_peer *peer;
+	int bkt;
+
+	spin_lock_bh(&ovpn->peers.lock);
+	hash_for_each_safe(ovpn->peers.by_id, bkt, tmp, peer, hash_entry_id)
+		ovpn_peer_unhash(peer, OVPN_DEL_PEER_REASON_TEARDOWN);
+	spin_unlock_bh(&ovpn->peers.lock);
+}
diff --git a/drivers/net/ovpn/peer.h b/drivers/net/ovpn/peer.h
index ac4907705d98..10f4153f7c8f 100644
--- a/drivers/net/ovpn/peer.h
+++ b/drivers/net/ovpn/peer.h
@@ -26,6 +26,10 @@
  * @id: unique identifier
  * @vpn_addrs.ipv4: IPv4 assigned to peer on the tunnel
  * @vpn_addrs.ipv6: IPv6 assigned to peer on the tunnel
+ * @hash_entry_id: entry in the peer ID hashtable
+ * @hash_entry_addr4: entry in the peer IPv4 hashtable
+ * @hash_entry_addr6: entry in the peer IPv6 hashtable
+ * @hash_entry_transp_addr: entry in the peer transport address hashtable
  * @encrypt_work: work used to process outgoing packets
  * @decrypt_work: work used to process incoming packets
  * @tx_ring: queue of outgoing poackets to this peer
@@ -62,6 +66,10 @@ struct ovpn_peer {
 		struct in_addr ipv4;
 		struct in6_addr ipv6;
 	} vpn_addrs;
+	struct hlist_node hash_entry_id;
+	struct hlist_node hash_entry_addr4;
+	struct hlist_node hash_entry_addr6;
+	struct hlist_node hash_entry_transp_addr;
 	struct work_struct encrypt_work;
 	struct work_struct decrypt_work;
 	struct ptr_ring tx_ring;
@@ -208,4 +216,10 @@ struct ovpn_peer *ovpn_peer_get_by_dst(struct ovpn_struct *ovpn,
 struct ovpn_peer *ovpn_peer_get_by_src(struct ovpn_struct *ovpn,
 				       struct sk_buff *skb);
 
+/**
+ * ovpn_peers_free - free all peers in the instance
+ * @ovpn: the instance whose peers should be released
+ */
+void ovpn_peers_free(struct ovpn_struct *ovpn);
+
 #endif /* _NET_OVPN_OVPNPEER_H_ */
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH net-next v3 15/24] ovpn: implement peer lookup logic
  2024-05-06  1:16 [PATCH net-next v3 00/24] Introducing OpenVPN Data Channel Offload Antonio Quartulli
                   ` (13 preceding siblings ...)
  2024-05-06  1:16 ` [PATCH net-next v3 14/24] ovpn: implement multi-peer support Antonio Quartulli
@ 2024-05-06  1:16 ` Antonio Quartulli
  2024-05-28 16:42   ` Sabrina Dubroca
  2024-05-06  1:16 ` [PATCH net-next v3 16/24] ovpn: implement keepalive mechanism Antonio Quartulli
                   ` (9 subsequent siblings)
  24 siblings, 1 reply; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-06  1:16 UTC (permalink / raw
  To: netdev
  Cc: Jakub Kicinski, Sergey Ryazanov, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Esben Haabendal, Antonio Quartulli

In a multi-peer scenario there are a number of situations when a
specific peer needs to be looked up.

We may want to lookup a peer by:
1. its ID
2. its VPN destination IP
3. its transport IP/port couple

For each of the above, there is a specific routing table referencing all
peers for fast look up.

Case 2. is a bit special in the sense that an outgoing packet may not be
sent to the peer VPN IP directly, but rather to a network behind it. For
this reason we first perform a nexthop lookup in the system routing
table and then we use the retrieved nexthop as peer search key.

Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
---
 drivers/net/ovpn/peer.c | 285 +++++++++++++++++++++++++++++++++++++++-
 1 file changed, 281 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ovpn/peer.c b/drivers/net/ovpn/peer.c
index 38a89595dade..31d7fb718b6b 100644
--- a/drivers/net/ovpn/peer.c
+++ b/drivers/net/ovpn/peer.c
@@ -198,6 +198,98 @@ static bool ovpn_peer_skb_to_sockaddr(struct sk_buff *skb,
 	return true;
 }
 
+/**
+ * ovpn_nexthop_from_skb4 - retrieve IPv4 nexthop for outgoing skb
+ * @skb: the outgoing packet
+ *
+ * Return: the IPv4 of the nexthop
+ */
+static __be32 ovpn_nexthop_from_skb4(struct sk_buff *skb)
+{
+	struct rtable *rt = skb_rtable(skb);
+
+	if (rt && rt->rt_uses_gateway)
+		return rt->rt_gw4;
+
+	return ip_hdr(skb)->daddr;
+}
+
+/**
+ * ovpn_nexthop_from_skb6 - retrieve IPv6 nexthop for outgoing skb
+ * @skb: the outgoing packet
+ *
+ * Return: the IPv6 of the nexthop
+ */
+static struct in6_addr ovpn_nexthop_from_skb6(struct sk_buff *skb)
+{
+	struct rt6_info *rt = (struct rt6_info *)skb_rtable(skb);
+
+	if (!rt || !(rt->rt6i_flags & RTF_GATEWAY))
+		return ipv6_hdr(skb)->daddr;
+
+	return rt->rt6i_gateway;
+}
+
+/**
+ * ovpn_peer_get_by_vpn_addr4 - retrieve peer by its VPN IPv4 address
+ * @head: list head to search
+ * @addr: VPN IPv4 to use as search key
+ *
+ * Return: the peer if found or NULL otherwise
+ */
+static struct ovpn_peer *ovpn_peer_get_by_vpn_addr4(struct hlist_head *head,
+						    __be32 *addr)
+{
+	struct ovpn_peer *tmp, *peer = NULL;
+
+	rcu_read_lock();
+	hlist_for_each_entry_rcu(tmp, head, hash_entry_addr4) {
+		if (*addr != tmp->vpn_addrs.ipv4.s_addr)
+			continue;
+
+		if (!ovpn_peer_hold(tmp))
+			continue;
+
+		peer = tmp;
+		break;
+	}
+	rcu_read_unlock();
+
+	return peer;
+}
+
+/**
+ * ovpn_peer_get_by_vpn_addr6 - retrieve peer by its VPN IPv6 address
+ * @head: list head to search
+ * @addr: VPN IPv6 to use as search key
+ *
+ * Return: the peer if found or NULL otherwise
+ */
+static struct ovpn_peer *ovpn_peer_get_by_vpn_addr6(struct hlist_head *head,
+						    struct in6_addr *addr)
+{
+	struct ovpn_peer *tmp, *peer = NULL;
+	int i;
+
+	rcu_read_lock();
+	hlist_for_each_entry_rcu(tmp, head, hash_entry_addr6) {
+		for (i = 0; i < 4; i++) {
+			if (addr->s6_addr32[i] !=
+			    tmp->vpn_addrs.ipv6.s6_addr32[i])
+				continue;
+		}
+
+		if (!ovpn_peer_hold(tmp))
+			continue;
+
+		peer = tmp;
+		break;
+	}
+	rcu_read_unlock();
+
+	return peer;
+}
+
 /**
  * ovpn_peer_transp_match - check if sockaddr and peer binding match
  * @peer: the peer to get the binding from
@@ -268,14 +360,46 @@ ovpn_peer_get_by_transp_addr_p2p(struct ovpn_struct *ovpn,
 struct ovpn_peer *ovpn_peer_get_by_transp_addr(struct ovpn_struct *ovpn,
 					       struct sk_buff *skb)
 {
-	struct ovpn_peer *peer = NULL;
+	struct ovpn_peer *tmp, *peer = NULL;
 	struct sockaddr_storage ss = { 0 };
+	struct hlist_head *head;
+	size_t sa_len;
+	bool found;
+	u32 index;
 
 	if (unlikely(!ovpn_peer_skb_to_sockaddr(skb, &ss)))
 		return NULL;
 
 	if (ovpn->mode == OVPN_MODE_P2P)
-		peer = ovpn_peer_get_by_transp_addr_p2p(ovpn, &ss);
+		return ovpn_peer_get_by_transp_addr_p2p(ovpn, &ss);
+
+	switch (ss.ss_family) {
+	case AF_INET:
+		sa_len = sizeof(struct sockaddr_in);
+		break;
+	case AF_INET6:
+		sa_len = sizeof(struct sockaddr_in6);
+		break;
+	default:
+		return NULL;
+	}
+
+	index = ovpn_peer_index(ovpn->peers.by_transp_addr, &ss, sa_len);
+	head = &ovpn->peers.by_transp_addr[index];
+
+	rcu_read_lock();
+	hlist_for_each_entry_rcu(tmp, head, hash_entry_transp_addr) {
+		found = ovpn_peer_transp_match(tmp, &ss);
+		if (!found)
+			continue;
+
+		if (!ovpn_peer_hold(tmp))
+			continue;
+
+		peer = tmp;
+		break;
+	}
+	rcu_read_unlock();
 
 	return peer;
 }
@@ -303,10 +427,28 @@ static struct ovpn_peer *ovpn_peer_get_by_id_p2p(struct ovpn_struct *ovpn,
 
 struct ovpn_peer *ovpn_peer_get_by_id(struct ovpn_struct *ovpn, u32 peer_id)
 {
-	struct ovpn_peer *peer = NULL;
+	struct ovpn_peer *tmp, *peer = NULL;
+	struct hlist_head *head;
+	u32 index;
 
 	if (ovpn->mode == OVPN_MODE_P2P)
-		peer = ovpn_peer_get_by_id_p2p(ovpn, peer_id);
+		return ovpn_peer_get_by_id_p2p(ovpn, peer_id);
+
+	index = ovpn_peer_index(ovpn->peers.by_id, &peer_id, sizeof(peer_id));
+	head = &ovpn->peers.by_id[index];
+
+	rcu_read_lock();
+	hlist_for_each_entry_rcu(tmp, head, hash_entry_id) {
+		if (tmp->id != peer_id)
+			continue;
+
+		if (!ovpn_peer_hold(tmp))
+			continue;
+
+		peer = tmp;
+		break;
+	}
+	rcu_read_unlock();
 
 	return peer;
 }
@@ -328,6 +470,11 @@ struct ovpn_peer *ovpn_peer_get_by_dst(struct ovpn_struct *ovpn,
 				       struct sk_buff *skb)
 {
 	struct ovpn_peer *tmp, *peer = NULL;
+	struct hlist_head *head;
+	sa_family_t sa_fam;
+	struct in6_addr addr6;
+	__be32 addr4;
+	u32 index;
 
 	/* in P2P mode, no matter the destination, packets are always sent to
 	 * the single peer listening on the other side
@@ -338,15 +485,123 @@ struct ovpn_peer *ovpn_peer_get_by_dst(struct ovpn_struct *ovpn,
 		if (likely(tmp && ovpn_peer_hold(tmp)))
 			peer = tmp;
 		rcu_read_unlock();
+		return peer;
+	}
+
+	sa_fam = skb_protocol_to_family(skb);
+
+	switch (sa_fam) {
+	case AF_INET:
+		addr4 = ovpn_nexthop_from_skb4(skb);
+		index = ovpn_peer_index(ovpn->peers.by_vpn_addr, &addr4,
+					sizeof(addr4));
+		head = &ovpn->peers.by_vpn_addr[index];
+
+		peer = ovpn_peer_get_by_vpn_addr4(head, &addr4);
+		break;
+	case AF_INET6:
+		addr6 = ovpn_nexthop_from_skb6(skb);
+		index = ovpn_peer_index(ovpn->peers.by_vpn_addr, &addr6,
+					sizeof(addr6));
+		head = &ovpn->peers.by_vpn_addr[index];
+
+		peer = ovpn_peer_get_by_vpn_addr6(head, &addr6);
+		break;
 	}
 
 	return peer;
 }
 
+/**
+ * ovpn_nexthop_from_rt4 - look up the IPv4 nexthop for the given destination
+ * @ovpn: the private data representing the current VPN session
+ * @dst: the destination to be looked up
+ *
+ * Looks up in the IPv4 system routing table the IP of the nexthop to be used
+ * to reach the destination passed as argument. If no nexthop can be found, the
+ * destination itself is returned as it probably has to be used as nexthop.
+ *
+ * Return: the IP of the next hop if found or the dst itself otherwise
+ */
+static __be32 ovpn_nexthop_from_rt4(struct ovpn_struct *ovpn, __be32 dst)
+{
+	struct rtable *rt;
+	struct flowi4 fl = {
+		.daddr = dst
+	};
+
+	rt = ip_route_output_flow(dev_net(ovpn->dev), &fl, NULL);
+	if (IS_ERR(rt)) {
+		net_dbg_ratelimited("%s: no route to host %pI4\n", __func__,
+				    &dst);
+		/* if we end up here this packet is probably going to be
+		 * thrown away later
+		 */
+		return dst;
+	}
+
+	if (!rt->rt_uses_gateway)
+		goto out;
+
+	dst = rt->rt_gw4;
+out:
+	ip_rt_put(rt);
+	return dst;
+}
+
+/**
+ * ovpn_nexthop_from_rt6 - look up the IPv6 nexthop for the given destination
+ * @ovpn: the private data representing the current VPN session
+ * @dst: the destination to be looked up
+ *
+ * Looks up in the IPv6 system routing table the IO of the nexthop to be used
+ * to reach the destination passed as argument. IF no nexthop can be found, the
+ * destination itself is returned as it probably has to be used as nexthop.
+ *
+ * Return: the IP of the next hop if found or the dst itself otherwise
+ */
+static struct in6_addr ovpn_nexthop_from_rt6(struct ovpn_struct *ovpn,
+					     struct in6_addr dst)
+{
+#if IS_ENABLED(CONFIG_IPV6)
+	struct dst_entry *entry;
+	struct rt6_info *rt;
+	struct flowi6 fl = {
+		.daddr = dst,
+	};
+
+	entry = ipv6_stub->ipv6_dst_lookup_flow(dev_net(ovpn->dev), NULL, &fl,
+						NULL);
+	if (IS_ERR(entry)) {
+		net_dbg_ratelimited("%s: no route to host %pI6c\n", __func__,
+				    &dst);
+		/* if we end up here this packet is probably going to be
+		 * thrown away later
+		 */
+		return dst;
+	}
+
+	rt = container_of(entry, struct rt6_info, dst);
+
+	if (!(rt->rt6i_flags & RTF_GATEWAY))
+		goto out;
+
+	dst = rt->rt6i_gateway;
+out:
+	dst_release((struct dst_entry *)rt);
+#endif
+	return dst;
+}
+
 struct ovpn_peer *ovpn_peer_get_by_src(struct ovpn_struct *ovpn,
 				       struct sk_buff *skb)
 {
 	struct ovpn_peer *tmp, *peer = NULL;
+	struct hlist_head *head;
+	sa_family_t sa_fam;
+	struct in6_addr addr6;
+	__be32 addr4;
+	u32 index;
 
 	/* in P2P mode, no matter the destination, packets are always sent to
 	 * the single peer listening on the other side
@@ -357,6 +612,28 @@ struct ovpn_peer *ovpn_peer_get_by_src(struct ovpn_struct *ovpn,
 		if (likely(tmp && ovpn_peer_hold(tmp)))
 			peer = tmp;
 		rcu_read_unlock();
+		return peer;
+	}
+
+	sa_fam = skb_protocol_to_family(skb);
+
+	switch (sa_fam) {
+	case AF_INET:
+		addr4 = ovpn_nexthop_from_rt4(ovpn, ip_hdr(skb)->saddr);
+		index = ovpn_peer_index(ovpn->peers.by_vpn_addr, &addr4,
+					sizeof(addr4));
+		head = &ovpn->peers.by_vpn_addr[index];
+
+		peer = ovpn_peer_get_by_vpn_addr4(head, &addr4);
+		break;
+	case AF_INET6:
+		addr6 = ovpn_nexthop_from_rt6(ovpn, ipv6_hdr(skb)->saddr);
+		index = ovpn_peer_index(ovpn->peers.by_vpn_addr, &addr6,
+					sizeof(addr6));
+		head = &ovpn->peers.by_vpn_addr[index];
+
+		peer = ovpn_peer_get_by_vpn_addr6(head, &addr6);
+		break;
 	}
 
 	return peer;
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH net-next v3 16/24] ovpn: implement keepalive mechanism
  2024-05-06  1:16 [PATCH net-next v3 00/24] Introducing OpenVPN Data Channel Offload Antonio Quartulli
                   ` (14 preceding siblings ...)
  2024-05-06  1:16 ` [PATCH net-next v3 15/24] ovpn: implement peer lookup logic Antonio Quartulli
@ 2024-05-06  1:16 ` Antonio Quartulli
  2024-05-06  1:16 ` [PATCH net-next v3 17/24] ovpn: add support for updating local UDP endpoint Antonio Quartulli
                   ` (8 subsequent siblings)
  24 siblings, 0 replies; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-06  1:16 UTC (permalink / raw
  To: netdev
  Cc: Jakub Kicinski, Sergey Ryazanov, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Esben Haabendal, Antonio Quartulli

OpenVPN supports configuring a periodic keepalive "ping"
message to allow the remote endpoint detect link failures.

This change implements the ping sending and timer expiring logic.

Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
---
 drivers/net/ovpn/io.c   | 88 +++++++++++++++++++++++++++++++++++++++++
 drivers/net/ovpn/io.h   |  6 +++
 drivers/net/ovpn/peer.c | 65 ++++++++++++++++++++++++++++++
 drivers/net/ovpn/peer.h | 51 ++++++++++++++++++++++++
 4 files changed, 210 insertions(+)

diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c
index 8ccf2700a370..2469e30970b7 100644
--- a/drivers/net/ovpn/io.c
+++ b/drivers/net/ovpn/io.c
@@ -24,6 +24,31 @@
 #include "tcp.h"
 #include "udp.h"
 
+static const unsigned char ovpn_keepalive_message[] = {
+	0x2a, 0x18, 0x7b, 0xf3, 0x64, 0x1e, 0xb4, 0xcb,
+	0x07, 0xed, 0x2d, 0x0a, 0x98, 0x1f, 0xc7, 0x48
+};
+
+/**
+ * ovpn_is_keepalive - check if skb contains a keepalive message
+ * @skb: packet to check
+ *
+ * Assumes that the first byte of skb->data is defined.
+ *
+ * Return: true if skb contains a keepalive or false otherwise
+ */
+static bool ovpn_is_keepalive(struct sk_buff *skb)
+{
+	if (*skb->data != OVPN_KEEPALIVE_FIRST_BYTE)
+		return false;
+
+	if (!pskb_may_pull(skb, sizeof(ovpn_keepalive_message)))
+		return false;
+
+	return !memcmp(skb->data, ovpn_keepalive_message,
+		       sizeof(ovpn_keepalive_message));
+}
+
 int ovpn_struct_init(struct net_device *dev)
 {
 	struct ovpn_struct *ovpn = netdev_priv(dev);
@@ -190,6 +215,9 @@ static int ovpn_decrypt_one(struct ovpn_peer *peer, struct sk_buff *skb)
 		goto drop;
 	}
 
+	/* note event of authenticated packet received for keepalive */
+	ovpn_peer_keepalive_recv_reset(peer);
+
 	/* increment RX stats */
 	ovpn_peer_stats_increment_rx(&peer->vpn_stats, skb->len);
 
@@ -208,6 +236,18 @@ static int ovpn_decrypt_one(struct ovpn_peer *peer, struct sk_buff *skb)
 			goto drop;
 		}
 
+		/* check if special OpenVPN message */
+		if (ovpn_is_keepalive(skb)) {
+			netdev_dbg(peer->ovpn->dev,
+				   "ping received from peer %u\n", peer->id);
+			/* not an error */
+			consume_skb(skb);
+			/* inform the caller that NAPI should not be scheduled
+			 * for this packet
+			 */
+			return -1;
+		}
+
 		netdev_dbg(peer->ovpn->dev,
 			   "unsupported protocol received from peer %u\n",
 			   peer->id);
@@ -352,6 +392,11 @@ void ovpn_encrypt_work(struct work_struct *work)
 					break;
 				}
 			}
+
+			/* note event of authenticated packet xmit for
+			 * keepalive
+			 */
+			ovpn_peer_keepalive_xmit_reset(peer);
 		}
 
 		/* give a chance to be rescheduled if needed */
@@ -456,3 +501,46 @@ netdev_tx_t ovpn_net_xmit(struct sk_buff *skb, struct net_device *dev)
 	kfree_skb_list(skb);
 	return NET_XMIT_DROP;
 }
+
+/**
+ * ovpn_xmit_special - encrypt and transmit an out-of-band message to peer
+ * @peer: peer to send the message to
+ * @data: message content
+ * @len: message length
+ *
+ * Assumes that caller holds a reference to peer
+ */
+static void ovpn_xmit_special(struct ovpn_peer *peer, const void *data,
+			      const unsigned int len)
+{
+	struct ovpn_struct *ovpn;
+	struct sk_buff *skb;
+
+	ovpn = peer->ovpn;
+	if (unlikely(!ovpn))
+		return;
+
+	skb = alloc_skb(256 + len, GFP_ATOMIC);
+	if (unlikely(!skb))
+		return;
+
+	skb_reserve(skb, 128);
+	skb->priority = TC_PRIO_BESTEFFORT;
+	memcpy(__skb_put(skb, len), data, len);
+
+	/* increase reference counter when passing peer to sending queue */
+	if (!ovpn_peer_hold(peer)) {
+		netdev_dbg(ovpn->dev, "%s: cannot hold peer reference for sending special packet\n",
+			   __func__);
+		kfree_skb(skb);
+		return;
+	}
+
+	ovpn_queue_skb(ovpn, skb, peer);
+}
+
+void ovpn_keepalive_xmit(struct ovpn_peer *peer)
+{
+	ovpn_xmit_special(peer, ovpn_keepalive_message,
+			  sizeof(ovpn_keepalive_message));
+}
diff --git a/drivers/net/ovpn/io.h b/drivers/net/ovpn/io.h
index 63d549c8c53b..e11bfa0d3e43 100644
--- a/drivers/net/ovpn/io.h
+++ b/drivers/net/ovpn/io.h
@@ -20,6 +20,12 @@ int ovpn_struct_init(struct net_device *dev);
 netdev_tx_t ovpn_net_xmit(struct sk_buff *skb, struct net_device *dev);
 int ovpn_napi_poll(struct napi_struct *napi, int budget);
 
+/**
+ * ovpn_keepalive_xmit - send keepalive message to peer
+ * @peer: the peer to send the message to
+ */
+void ovpn_keepalive_xmit(struct ovpn_peer *peer);
+
 int ovpn_recv(struct ovpn_struct *ovpn, struct ovpn_peer *peer,
 	      struct sk_buff *skb);
 
diff --git a/drivers/net/ovpn/peer.c b/drivers/net/ovpn/peer.c
index 31d7fb718b6b..79a6d6fb1be1 100644
--- a/drivers/net/ovpn/peer.c
+++ b/drivers/net/ovpn/peer.c
@@ -23,6 +23,57 @@
 #include "peer.h"
 #include "socket.h"
 
+/**
+ * ovpn_peer_ping - timer task for sending periodic keepalive
+ * @t: timer object that triggered the task
+ */
+static void ovpn_peer_ping(struct timer_list *t)
+{
+	struct ovpn_peer *peer = from_timer(peer, t, keepalive_xmit);
+
+	netdev_dbg(peer->ovpn->dev, "%s: sending ping to peer %u\n", __func__,
+		   peer->id);
+	ovpn_keepalive_xmit(peer);
+}
+
+/**
+ * ovpn_peer_expire - timer task for incoming keepialive timeout
+ * @t: the timer that triggered the task
+ */
+static void ovpn_peer_expire(struct timer_list *t)
+{
+	struct ovpn_peer *peer = from_timer(peer, t, keepalive_recv);
+
+	netdev_dbg(peer->ovpn->dev, "%s: peer %u expired\n", __func__,
+		   peer->id);
+	ovpn_peer_del(peer, OVPN_DEL_PEER_REASON_EXPIRED);
+}
+
+void ovpn_peer_keepalive_set(struct ovpn_peer *peer, u32 interval, u32 timeout)
+{
+	u32 delta;
+
+	netdev_dbg(peer->ovpn->dev,
+		   "%s: scheduling keepalive for peer %u: interval=%u timeout=%u\n",
+		   __func__, peer->id, interval, timeout);
+
+	peer->keepalive_interval = interval;
+	if (interval > 0) {
+		delta = msecs_to_jiffies(interval * MSEC_PER_SEC);
+		mod_timer(&peer->keepalive_xmit, jiffies + delta);
+	} else {
+		del_timer(&peer->keepalive_xmit);
+	}
+
+	peer->keepalive_timeout = timeout;
+	if (timeout) {
+		delta = msecs_to_jiffies(timeout * MSEC_PER_SEC);
+		mod_timer(&peer->keepalive_recv, jiffies + delta);
+	} else {
+		del_timer(&peer->keepalive_recv);
+	}
+}
+
 struct ovpn_peer *ovpn_peer_new(struct ovpn_struct *ovpn, u32 id)
 {
 	struct ovpn_peer *peer;
@@ -85,6 +136,9 @@ struct ovpn_peer *ovpn_peer_new(struct ovpn_struct *ovpn, u32 id)
 
 	dev_hold(ovpn->dev);
 
+	timer_setup(&peer->keepalive_xmit, ovpn_peer_ping, 0);
+	timer_setup(&peer->keepalive_recv, ovpn_peer_expire, 0);
+
 	return peer;
 err_rx_ring:
 	ptr_ring_cleanup(&peer->rx_ring, NULL);
@@ -100,6 +154,16 @@ struct ovpn_peer *ovpn_peer_new(struct ovpn_struct *ovpn, u32 id)
 #define ovpn_peer_index(_tbl, _key, _key_len)		\
 	(jhash(_key, _key_len, 0) % HASH_SIZE(_tbl))	\
 
+/**
+ * ovpn_peer_timer_delete_all - killall keepalive timers
+ * @peer: peer for which timers should be killed
+ */
+static void ovpn_peer_timer_delete_all(struct ovpn_peer *peer)
+{
+	del_timer_sync(&peer->keepalive_xmit);
+	del_timer_sync(&peer->keepalive_recv);
+}
+
 /**
  * ovpn_peer_free - release private members and free peer object
  * @peer: the peer to free
@@ -107,6 +171,7 @@ struct ovpn_peer *ovpn_peer_new(struct ovpn_struct *ovpn, u32 id)
 static void ovpn_peer_free(struct ovpn_peer *peer)
 {
 	ovpn_bind_reset(peer, NULL);
+	ovpn_peer_timer_delete_all(peer);
 
 	WARN_ON(!__ptr_ring_empty(&peer->tx_ring));
 	ptr_ring_cleanup(&peer->tx_ring, NULL);
diff --git a/drivers/net/ovpn/peer.h b/drivers/net/ovpn/peer.h
index 10f4153f7c8f..d5b63c07408e 100644
--- a/drivers/net/ovpn/peer.h
+++ b/drivers/net/ovpn/peer.h
@@ -50,6 +50,10 @@
  * @crypto: the crypto configuration (ciphers, keys, etc..)
  * @dst_cache: cache for dst_entry used to send to peer
  * @bind: remote peer binding
+ * @keepalive_xmit: timer used to send the next keepalive
+ * @keepalive_interval: seconds after which a new keepalive should be sent
+ * @keepalive_recv: timer used to check for received keepalives
+ * @keepalive_timeout: seconds after which an inactive peer is considered dead
  * @halt: true if ovpn_peer_mark_delete was called
  * @vpn_stats: per-peer in-VPN TX/RX stays
  * @link_stats: per-peer link/transport TX/RX stats
@@ -99,6 +103,10 @@ struct ovpn_peer {
 	struct ovpn_crypto_state crypto;
 	struct dst_cache dst_cache;
 	struct ovpn_bind __rcu *bind;
+	struct timer_list keepalive_xmit;
+	unsigned long keepalive_interval;
+	struct timer_list keepalive_recv;
+	unsigned long keepalive_timeout;
 	bool halt;
 	struct ovpn_peer_stats vpn_stats;
 	struct ovpn_peer_stats link_stats;
@@ -222,4 +230,47 @@ struct ovpn_peer *ovpn_peer_get_by_src(struct ovpn_struct *ovpn,
  */
 void ovpn_peers_free(struct ovpn_struct *ovpn);
 
+/**
+ * ovpn_peer_keepalive_recv_reset - reset keepalive timeout
+ * @peer: peer for which the timeout should be reset
+ *
+ * To be invoked upon reception of an authenticated packet from peer in order
+ * to report valid activity and thus reset the keepalive timeout
+ */
+static inline void ovpn_peer_keepalive_recv_reset(struct ovpn_peer *peer)
+{
+	u32 delta = msecs_to_jiffies(peer->keepalive_timeout * MSEC_PER_SEC);
+
+	if (unlikely(!delta))
+		return;
+
+	mod_timer(&peer->keepalive_recv, jiffies + delta);
+}
+
+/**
+ * ovpn_peer_keepalive_xmit_reset - reset keepalive sending timer
+ * @peer: peer for which the timer should be reset
+ *
+ * To be invoked upon sending of an authenticated packet to peer in order
+ * to report valid outgoing activity and thus reset the keepalive sending
+ * timer
+ */
+static inline void ovpn_peer_keepalive_xmit_reset(struct ovpn_peer *peer)
+{
+	u32 delta = msecs_to_jiffies(peer->keepalive_interval * MSEC_PER_SEC);
+
+	if (unlikely(!delta))
+		return;
+
+	mod_timer(&peer->keepalive_xmit, jiffies + delta);
+}
+
+/**
+ * ovpn_peer_keepalive_set - configure keepalive values for peer
+ * @peer: the peer to configure
+ * @interval: outgoing keepalive interval
+ * @timeout: incoming keepalive timeout
+ */
+void ovpn_peer_keepalive_set(struct ovpn_peer *peer, u32 interval, u32 timeout);
+
 #endif /* _NET_OVPN_OVPNPEER_H_ */
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH net-next v3 17/24] ovpn: add support for updating local UDP endpoint
  2024-05-06  1:16 [PATCH net-next v3 00/24] Introducing OpenVPN Data Channel Offload Antonio Quartulli
                   ` (15 preceding siblings ...)
  2024-05-06  1:16 ` [PATCH net-next v3 16/24] ovpn: implement keepalive mechanism Antonio Quartulli
@ 2024-05-06  1:16 ` Antonio Quartulli
  2024-05-06  1:16 ` [PATCH net-next v3 18/24] ovpn: add support for peer floating Antonio Quartulli
                   ` (7 subsequent siblings)
  24 siblings, 0 replies; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-06  1:16 UTC (permalink / raw
  To: netdev
  Cc: Jakub Kicinski, Sergey Ryazanov, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Esben Haabendal, Antonio Quartulli

In case of UDP links, the local endpoint used to communicate with a
given peer may change without a connection restart.

Add support for learning the new address in case of change.

Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
---
 drivers/net/ovpn/io.c   |  4 ++++
 drivers/net/ovpn/peer.c | 37 +++++++++++++++++++++++++++++++++++++
 drivers/net/ovpn/peer.h |  7 +++++++
 3 files changed, 48 insertions(+)

diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c
index 2469e30970b7..19ebc0fbe2be 100644
--- a/drivers/net/ovpn/io.c
+++ b/drivers/net/ovpn/io.c
@@ -218,6 +218,10 @@ static int ovpn_decrypt_one(struct ovpn_peer *peer, struct sk_buff *skb)
 	/* note event of authenticated packet received for keepalive */
 	ovpn_peer_keepalive_recv_reset(peer);
 
+	/* update source endpoint for this peer */
+	if (peer->sock->sock->sk->sk_protocol == IPPROTO_UDP)
+		ovpn_peer_update_local_endpoint(peer, skb);
+
 	/* increment RX stats */
 	ovpn_peer_stats_increment_rx(&peer->vpn_stats, skb->len);
 
diff --git a/drivers/net/ovpn/peer.c b/drivers/net/ovpn/peer.c
index 79a6d6fb1be1..e88c2483450d 100644
--- a/drivers/net/ovpn/peer.c
+++ b/drivers/net/ovpn/peer.c
@@ -518,6 +518,43 @@ struct ovpn_peer *ovpn_peer_get_by_id(struct ovpn_struct *ovpn, u32 peer_id)
 	return peer;
 }
 
+void ovpn_peer_update_local_endpoint(struct ovpn_peer *peer,
+				     struct sk_buff *skb)
+{
+	struct ovpn_bind *bind;
+
+	rcu_read_lock();
+	bind = rcu_dereference(peer->bind);
+	if (unlikely(!bind))
+		goto unlock;
+
+	switch (skb_protocol_to_family(skb)) {
+	case AF_INET:
+		if (unlikely(bind->local.ipv4.s_addr != ip_hdr(skb)->daddr)) {
+			netdev_dbg(peer->ovpn->dev,
+				   "%s: learning local IPv4 for peer %d (%pI4 -> %pI4)\n",
+				   __func__, peer->id, &bind->local.ipv4.s_addr,
+				   &ip_hdr(skb)->daddr);
+			bind->local.ipv4.s_addr = ip_hdr(skb)->daddr;
+		}
+		break;
+	case AF_INET6:
+		if (unlikely(memcmp(&bind->local.ipv6, &ipv6_hdr(skb)->daddr,
+				    sizeof(bind->local.ipv6)))) {
+			netdev_dbg(peer->ovpn->dev,
+				   "%s: learning local IPv6 for peer %d (%pI6c -> %pI6c\n",
+				   __func__, peer->id, &bind->local.ipv6,
+				   &ipv6_hdr(skb)->daddr);
+			bind->local.ipv6 = ipv6_hdr(skb)->daddr;
+		}
+		break;
+	default:
+		break;
+	}
+unlock:
+	rcu_read_unlock();
+}
+
 /**
  * ovpn_peer_get_by_dst - Lookup peer to send skb to
  * @ovpn: the private data representing the current VPN session
diff --git a/drivers/net/ovpn/peer.h b/drivers/net/ovpn/peer.h
index d5b63c07408e..df2b1c93dead 100644
--- a/drivers/net/ovpn/peer.h
+++ b/drivers/net/ovpn/peer.h
@@ -273,4 +273,11 @@ static inline void ovpn_peer_keepalive_xmit_reset(struct ovpn_peer *peer)
  */
 void ovpn_peer_keepalive_set(struct ovpn_peer *peer, u32 interval, u32 timeout);
 
+/**
+ * ovpn_peer_update_local_endpoint - update local endpoint for peer
+ * @peer: peer to update the endpoint for
+ * @skb: incoming packet to retrieve the destination address (local) from
+ */
+void ovpn_peer_update_local_endpoint(struct ovpn_peer *peer,
+				     struct sk_buff *skb);
 #endif /* _NET_OVPN_OVPNPEER_H_ */
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH net-next v3 18/24] ovpn: add support for peer floating
  2024-05-06  1:16 [PATCH net-next v3 00/24] Introducing OpenVPN Data Channel Offload Antonio Quartulli
                   ` (16 preceding siblings ...)
  2024-05-06  1:16 ` [PATCH net-next v3 17/24] ovpn: add support for updating local UDP endpoint Antonio Quartulli
@ 2024-05-06  1:16 ` Antonio Quartulli
  2024-05-06  1:16 ` [PATCH net-next v3 19/24] ovpn: implement peer add/dump/delete via netlink Antonio Quartulli
                   ` (6 subsequent siblings)
  24 siblings, 0 replies; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-06  1:16 UTC (permalink / raw
  To: netdev
  Cc: Jakub Kicinski, Sergey Ryazanov, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Esben Haabendal, Antonio Quartulli

A peer connected via UDP may change its IP address without reconnecting
(float).

Add support for detecting and updating the new peer IP/port in case of
floating.

Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
---
 drivers/net/ovpn/peer.c | 96 +++++++++++++++++++++++++++++++++++++++++
 drivers/net/ovpn/peer.h |  8 ++++
 drivers/net/ovpn/udp.c  |  5 +++
 3 files changed, 109 insertions(+)

diff --git a/drivers/net/ovpn/peer.c b/drivers/net/ovpn/peer.c
index e88c2483450d..e1eee1bb1ad2 100644
--- a/drivers/net/ovpn/peer.c
+++ b/drivers/net/ovpn/peer.c
@@ -151,9 +151,105 @@ struct ovpn_peer *ovpn_peer_new(struct ovpn_struct *ovpn, u32 id)
 	return ERR_PTR(ret);
 }
 
+static int ovpn_peer_reset_sockaddr(struct ovpn_peer *peer,
+				    const struct sockaddr_storage *ss,
+				    const u8 *local_ip)
+{
+	struct ovpn_bind *bind;
+	size_t ip_len;
+
+	/* create new ovpn_bind object */
+	bind = ovpn_bind_from_sockaddr(ss);
+	if (IS_ERR(bind))
+		return PTR_ERR(bind);
+
+	if (local_ip) {
+		if (ss->ss_family == AF_INET) {
+			ip_len = sizeof(struct in_addr);
+		} else if (ss->ss_family == AF_INET6) {
+			ip_len = sizeof(struct in6_addr);
+		} else {
+			netdev_dbg(peer->ovpn->dev, "%s: invalid family for remote endpoint\n",
+				   __func__);
+			kfree(bind);
+			return -EINVAL;
+		}
+
+		memcpy(&bind->local, local_ip, ip_len);
+	}
+
+	/* set binding */
+	ovpn_bind_reset(peer, bind);
+
+	return 0;
+}
+
 #define ovpn_peer_index(_tbl, _key, _key_len)		\
 	(jhash(_key, _key_len, 0) % HASH_SIZE(_tbl))	\
 
+void ovpn_peer_float(struct ovpn_peer *peer, struct sk_buff *skb)
+{
+	struct sockaddr_storage ss;
+	const u8 *local_ip = NULL;
+	struct sockaddr_in6 *sa6;
+	struct sockaddr_in *sa;
+	struct ovpn_bind *bind;
+	sa_family_t family;
+	size_t salen;
+	u32 index;
+
+	rcu_read_lock();
+	bind = rcu_dereference(peer->bind);
+	if (unlikely(!bind))
+		goto unlock;
+
+	if (likely(ovpn_bind_skb_src_match(bind, skb)))
+		goto unlock;
+
+	family = skb_protocol_to_family(skb);
+
+	if (bind->sa.in4.sin_family == family)
+		local_ip = (u8 *)&bind->local;
+
+	switch (family) {
+	case AF_INET:
+		sa = (struct sockaddr_in *)&ss;
+		sa->sin_family = AF_INET;
+		sa->sin_addr.s_addr = ip_hdr(skb)->saddr;
+		sa->sin_port = udp_hdr(skb)->source;
+		salen = sizeof(*sa);
+		break;
+	case AF_INET6:
+		sa6 = (struct sockaddr_in6 *)&ss;
+		sa6->sin6_family = AF_INET6;
+		sa6->sin6_addr = ipv6_hdr(skb)->saddr;
+		sa6->sin6_port = udp_hdr(skb)->source;
+		sa6->sin6_scope_id = ipv6_iface_scope_id(&ipv6_hdr(skb)->saddr,
+							 skb->skb_iif);
+		salen = sizeof(*sa6);
+		break;
+	default:
+		goto unlock;
+	}
+
+	netdev_dbg(peer->ovpn->dev, "%s: peer %d floated to %pIScp", __func__,
+		   peer->id, &ss);
+	ovpn_peer_reset_sockaddr(peer, (struct sockaddr_storage *)&ss,
+				 local_ip);
+
+	spin_lock_bh(&peer->ovpn->peers.lock);
+	/* remove old hashing */
+	hlist_del_init_rcu(&peer->hash_entry_transp_addr);
+	/* re-add with new transport address */
+	index = ovpn_peer_index(peer->ovpn->peers.by_transp_addr, &ss, salen);
+	hlist_add_head_rcu(&peer->hash_entry_transp_addr,
+			   &peer->ovpn->peers.by_transp_addr[index]);
+	spin_unlock_bh(&peer->ovpn->peers.lock);
+
+unlock:
+	rcu_read_unlock();
+}
+
 /**
  * ovpn_peer_timer_delete_all - killall keepalive timers
  * @peer: peer for which timers should be killed
diff --git a/drivers/net/ovpn/peer.h b/drivers/net/ovpn/peer.h
index df2b1c93dead..5ea35ccc2824 100644
--- a/drivers/net/ovpn/peer.h
+++ b/drivers/net/ovpn/peer.h
@@ -280,4 +280,12 @@ void ovpn_peer_keepalive_set(struct ovpn_peer *peer, u32 interval, u32 timeout);
  */
 void ovpn_peer_update_local_endpoint(struct ovpn_peer *peer,
 				     struct sk_buff *skb);
+
+/**
+ * ovpn_peer_float - update remote endpoint for peer
+ * @peer: peer to update the remote endpoint for
+ * @skb: incoming packet to retrieve the source address (remote) from
+ */
+void ovpn_peer_float(struct ovpn_peer *peer, struct sk_buff *skb);
+
 #endif /* _NET_OVPN_OVPNPEER_H_ */
diff --git a/drivers/net/ovpn/udp.c b/drivers/net/ovpn/udp.c
index c2a88d26defd..151c27da7e6f 100644
--- a/drivers/net/ovpn/udp.c
+++ b/drivers/net/ovpn/udp.c
@@ -84,6 +84,11 @@ static int ovpn_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
 					    __func__, peer_id);
 			goto drop;
 		}
+
+		/* check if this peer changed it's IP address and update
+		 * state
+		 */
+		ovpn_peer_float(peer, skb);
 	}
 
 	if (!peer) {
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH net-next v3 19/24] ovpn: implement peer add/dump/delete via netlink
  2024-05-06  1:16 [PATCH net-next v3 00/24] Introducing OpenVPN Data Channel Offload Antonio Quartulli
                   ` (17 preceding siblings ...)
  2024-05-06  1:16 ` [PATCH net-next v3 18/24] ovpn: add support for peer floating Antonio Quartulli
@ 2024-05-06  1:16 ` Antonio Quartulli
  2024-05-06  1:16 ` [PATCH net-next v3 20/24] ovpn: implement key add/del/swap " Antonio Quartulli
                   ` (5 subsequent siblings)
  24 siblings, 0 replies; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-06  1:16 UTC (permalink / raw
  To: netdev
  Cc: Jakub Kicinski, Sergey Ryazanov, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Esben Haabendal, Antonio Quartulli

This change introduces the netlink command needed to add, delete and
retrieve/dump known peers. Userspace is expected to use these commands
to handle known peer lifecycles.

Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
---
 drivers/net/ovpn/netlink.c | 511 ++++++++++++++++++++++++++++++++++++-
 drivers/net/ovpn/peer.c    |   6 +-
 drivers/net/ovpn/peer.h    |  12 +
 3 files changed, 522 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ovpn/netlink.c b/drivers/net/ovpn/netlink.c
index 66f5c6fbe8e4..914b04631ae8 100644
--- a/drivers/net/ovpn/netlink.c
+++ b/drivers/net/ovpn/netlink.c
@@ -17,6 +17,10 @@
 #include "io.h"
 #include "netlink.h"
 #include "netlink-gen.h"
+#include "bind.h"
+#include "packet.h"
+#include "peer.h"
+#include "socket.h"
 
 MODULE_ALIAS_GENL_FAMILY(OVPN_FAMILY_NAME);
 
@@ -137,24 +141,523 @@ int ovpn_nl_del_iface_doit(struct sk_buff *skb, struct genl_info *info)
 	return 0;
 }
 
+static u8 *ovpn_nl_attr_local_ip(struct ovpn_struct *ovpn,
+				 struct nlattr **attrs, int sock_fam)
+{
+	size_t ip_len = nla_len(attrs[OVPN_A_PEER_LOCAL_IP]);
+	u8 *local_ip = nla_data(attrs[OVPN_A_PEER_LOCAL_IP]);
+	bool is_mapped;
+
+	if (ip_len == sizeof(struct in_addr)) {
+		if (sock_fam != AF_INET) {
+			netdev_dbg(ovpn->dev,
+				   "%s: the specified local IP is IPv4, but the peer endpoint is not\n",
+				   __func__);
+			return ERR_PTR(-EINVAL);
+		}
+	} else if (ip_len == sizeof(struct in6_addr)) {
+		is_mapped = ipv6_addr_v4mapped((struct in6_addr *)local_ip);
+
+		if (sock_fam != AF_INET6 && !is_mapped) {
+			netdev_dbg(ovpn->dev,
+				   "%s: the specified local IP is IPv6, but the peer endpoint is not\n",
+				   __func__);
+			return ERR_PTR(-EINVAL);
+		}
+
+		if (is_mapped)
+			/* this is an IPv6-mapped IPv4
+			 * address, therefore extract
+			 * the actual v4 address from
+			 * the last 4 bytes
+			 */
+			local_ip += 12;
+	} else {
+		netdev_dbg(ovpn->dev, "%s: invalid length %zu for local IP\n",
+			   __func__, ip_len);
+		return ERR_PTR(-EINVAL);
+	}
+
+	return local_ip;
+}
+
 int ovpn_nl_set_peer_doit(struct sk_buff *skb, struct genl_info *info)
 {
-	return -ENOTSUPP;
+	bool keepalive_set = false, new_peer = false;
+	struct nlattr *attrs[OVPN_A_PEER_MAX + 1];
+	struct ovpn_struct *ovpn = info->user_ptr[0];
+	struct sockaddr_storage *ss = NULL;
+	u32 sockfd, id, interv, timeout;
+	struct socket *sock = NULL;
+	struct sockaddr_in mapped;
+	struct sockaddr_in6 *in6;
+	struct ovpn_peer *peer;
+	u8 *local_ip = NULL;
+	size_t sa_len;
+	int ret;
+
+	if (GENL_REQ_ATTR_CHECK(info, OVPN_A_PEER)) {
+		netdev_err(ovpn->dev, "%s: missing peer object\n", __func__);
+		return -EINVAL;
+	}
+
+	ret = nla_parse_nested(attrs, OVPN_A_PEER_MAX, info->attrs[OVPN_A_PEER],
+			       ovpn_peer_nl_policy, info->extack);
+	if (ret) {
+		netdev_err(ovpn->dev, "%s: can't parse peer object\n",
+			   __func__);
+		return ret;
+	}
+
+	if (NL_REQ_ATTR_CHECK(info->extack, info->attrs[OVPN_A_PEER], attrs,
+			      OVPN_A_PEER_ID)) {
+		netdev_err(ovpn->dev, "%s: peer ID missing\n", __func__);
+		return -EINVAL;
+	}
+
+	id = nla_get_u32(attrs[OVPN_A_PEER_ID]);
+	/* check if the peer exists first, otherwise create a new one */
+	peer = ovpn_peer_get_by_id(ovpn, id);
+	if (!peer) {
+		peer = ovpn_peer_new(ovpn, id);
+		new_peer = true;
+		if (IS_ERR(peer)) {
+			netdev_err(ovpn->dev, "%s: cannot create new peer object for peer %u (sockaddr=%pIScp): %ld\n",
+				   __func__, id, ss, PTR_ERR(peer));
+			return PTR_ERR(peer);
+		}
+	}
+
+	if (new_peer && NL_REQ_ATTR_CHECK(info->extack,
+					  info->attrs[OVPN_A_PEER], attrs,
+					  OVPN_A_PEER_SOCKET)) {
+		netdev_err(ovpn->dev, "%s: socket missing for new peer\n",
+			   __func__);
+		ret = -EINVAL;
+		goto peer_release;
+	}
+
+	if (new_peer && ovpn->mode == OVPN_MODE_MP &&
+	    !attrs[OVPN_A_PEER_VPN_IPV4] && !attrs[OVPN_A_PEER_VPN_IPV6]) {
+		netdev_err(ovpn->dev,
+			   "%s: a VPN IP is required when adding a peer in MP mode\n",
+			   __func__);
+		ret = -EINVAL;
+		goto peer_release;
+	}
+
+	if (attrs[OVPN_A_PEER_SOCKET]) {
+		/* lookup the fd in the kernel table and extract the socket
+		 * object
+		 */
+		sockfd = nla_get_u32(attrs[OVPN_A_PEER_SOCKET]);
+		/* sockfd_lookup() increases sock's refcounter */
+		sock = sockfd_lookup(sockfd, &ret);
+		if (!sock) {
+			netdev_err(ovpn->dev,
+				   "%s: cannot lookup peer socket (fd=%u): %d\n",
+				   __func__, sockfd, ret);
+			ret = -ENOTSOCK;
+			goto peer_release;
+		}
+
+		/* Only when using UDP as transport protocol the remote endpoint
+		 * can be configured so that ovpn knows where to send packets
+		 * to.
+		 *
+		 * In case of TCP, the socket is connected to the peer and ovpn
+		 * will just send bytes over it, without the need to specify a
+		 * destination.
+		 */
+		if (sock->sk->sk_protocol == IPPROTO_UDP &&
+		    attrs[OVPN_A_PEER_SOCKADDR_REMOTE]) {
+			ss = nla_data(attrs[OVPN_A_PEER_SOCKADDR_REMOTE]);
+			sa_len = nla_len(attrs[OVPN_A_PEER_SOCKADDR_REMOTE]);
+			switch (sa_len) {
+			case sizeof(struct sockaddr_in):
+				if (ss->ss_family == AF_INET)
+					/* valid sockaddr */
+					break;
+
+				netdev_err(ovpn->dev,
+					   "%s: remote sockaddr_in has invalid family\n",
+					   __func__);
+				ret = -EINVAL;
+				goto peer_release;
+			case sizeof(struct sockaddr_in6):
+				if (ss->ss_family == AF_INET6)
+					/* valid sockaddr */
+					break;
+
+				netdev_err(ovpn->dev,
+					   "%s: remote sockaddr_in6 has invalid family\n",
+					   __func__);
+				ret = -EINVAL;
+				goto peer_release;
+			default:
+				netdev_err(ovpn->dev,
+					   "%s: invalid size for sockaddr\n",
+					   __func__);
+				ret = -EINVAL;
+				goto peer_release;
+			}
+
+			/* if this is a v6-mapped-v4, convert the sockaddr
+			 * object from AF_INET6 to AF_INET before continue
+			 * processing
+			 */
+			if (ss->ss_family == AF_INET6) {
+				in6 = (struct sockaddr_in6 *)ss;
+
+				if (ipv6_addr_v4mapped(&in6->sin6_addr)) {
+					mapped.sin_family = AF_INET;
+					mapped.sin_addr.s_addr =
+						in6->sin6_addr.s6_addr32[3];
+					mapped.sin_port = in6->sin6_port;
+					ss = (struct sockaddr_storage *)&mapped;
+				}
+			}
+
+			if (attrs[OVPN_A_PEER_LOCAL_IP]) {
+				local_ip = ovpn_nl_attr_local_ip(ovpn, attrs,
+								 ss->ss_family);
+				if (IS_ERR(local_ip)) {
+					ret = PTR_ERR(local_ip);
+					netdev_err(ovpn->dev,
+						   "%s: cannot retrieve local IP: %d\n",
+						   __func__, ret);
+					goto peer_release;
+				}
+			}
+
+			/* set peer sockaddr */
+			ret = ovpn_peer_reset_sockaddr(peer, ss, local_ip);
+			if (ret < 0) {
+				netdev_err(ovpn->dev,
+					   "%s: cannot set peer sockaddr: %d\n",
+					   __func__, ret);
+				goto peer_release;
+			}
+		}
+
+		if (peer->sock)
+			ovpn_socket_put(peer->sock);
+
+		peer->sock = ovpn_socket_new(sock, peer);
+		if (IS_ERR(peer->sock)) {
+			sockfd_put(sock);
+			peer->sock = NULL;
+			ret = -ENOTSOCK;
+			netdev_err(ovpn->dev,
+				   "%s: cannot encapsulate socket: %d\n",
+				   __func__, ret);
+			goto peer_release;
+		}
+	}
+
+	/* VPN IPs cannot be updated, because they are hashed */
+	if (new_peer && attrs[OVPN_A_PEER_VPN_IPV4]) {
+		if (nla_len(attrs[OVPN_A_PEER_VPN_IPV4]) !=
+		    sizeof(struct in_addr)) {
+			netdev_err(ovpn->dev, "%s: invalid IPv4\n", __func__);
+			ret = -EINVAL;
+			goto peer_release;
+		}
+
+		peer->vpn_addrs.ipv4.s_addr =
+			nla_get_be32(attrs[OVPN_A_PEER_VPN_IPV4]);
+	}
+
+	/* VPN IPs cannot be updated, because they are hashed */
+	if (new_peer && attrs[OVPN_A_PEER_VPN_IPV6]) {
+		if (nla_len(attrs[OVPN_A_PEER_VPN_IPV6]) !=
+		    sizeof(struct in6_addr)) {
+			netdev_err(ovpn->dev, "%s: invalid IPv6\n", __func__);
+			ret = -EINVAL;
+			goto peer_release;
+		}
+
+		memcpy(&peer->vpn_addrs.ipv6,
+		       nla_data(attrs[OVPN_A_PEER_VPN_IPV6]),
+		       sizeof(struct in6_addr));
+	}
+
+	/* when setting the keepalive, both parameters have to be configured */
+	if (attrs[OVPN_A_PEER_KEEPALIVE_INTERVAL] &&
+	    attrs[OVPN_A_PEER_KEEPALIVE_TIMEOUT]) {
+		keepalive_set = true;
+		interv = nla_get_u32(attrs[OVPN_A_PEER_KEEPALIVE_INTERVAL]);
+		timeout = nla_get_u32(attrs[OVPN_A_PEER_KEEPALIVE_TIMEOUT]);
+	}
+
+	if (keepalive_set)
+		ovpn_peer_keepalive_set(peer, interv, timeout);
+
+	netdev_dbg(ovpn->dev,
+		   "%s: adding peer with endpoint=%pIScp/%s id=%u VPN-IPv4=%pI4 VPN-IPv6=%pI6c\n",
+		   __func__, ss, sock->sk->sk_prot_creator->name, peer->id,
+		   &peer->vpn_addrs.ipv4.s_addr, &peer->vpn_addrs.ipv6);
+
+	ret = ovpn_peer_add(ovpn, peer);
+	if (ret < 0) {
+		netdev_err(ovpn->dev,
+			   "%s: cannot add new peer (id=%u) to hashtable: %d\n",
+			   __func__, peer->id, ret);
+		goto peer_release;
+	}
+
+	return 0;
+
+peer_release:
+	/* release right away because peer is not really used in any context */
+	ovpn_peer_release(peer);
+	return ret;
+}
+
+static int ovpn_nl_send_peer(struct sk_buff *skb, const struct ovpn_peer *peer,
+			     u32 portid, u32 seq, int flags)
+{
+	const struct ovpn_bind *bind;
+	struct nlattr *attr;
+	void *hdr;
+
+	hdr = genlmsg_put(skb, portid, seq, &ovpn_nl_family, flags,
+			  OVPN_CMD_SET_PEER);
+	if (!hdr) {
+		netdev_dbg(peer->ovpn->dev,
+			   "%s: cannot create message header\n", __func__);
+		return -EMSGSIZE;
+	}
+
+	attr = nla_nest_start(skb, OVPN_A_PEER);
+	if (!attr) {
+		netdev_dbg(peer->ovpn->dev, "%s: cannot create submessage\n",
+			   __func__);
+		goto err;
+	}
+
+	if (nla_put_u32(skb, OVPN_A_PEER_ID, peer->id))
+		goto err;
+
+	if (peer->vpn_addrs.ipv4.s_addr != htonl(INADDR_ANY))
+		if (nla_put(skb, OVPN_A_PEER_VPN_IPV4,
+			    sizeof(peer->vpn_addrs.ipv4),
+			    &peer->vpn_addrs.ipv4))
+			goto err;
+
+	if (memcmp(&peer->vpn_addrs.ipv6, &in6addr_any,
+		   sizeof(peer->vpn_addrs.ipv6)))
+		if (nla_put(skb, OVPN_A_PEER_VPN_IPV6,
+			    sizeof(peer->vpn_addrs.ipv6),
+			    &peer->vpn_addrs.ipv6))
+			goto err;
+
+	if (nla_put_u32(skb, OVPN_A_PEER_KEEPALIVE_INTERVAL,
+			peer->keepalive_interval) ||
+	    nla_put_u32(skb, OVPN_A_PEER_KEEPALIVE_TIMEOUT,
+			peer->keepalive_timeout))
+		goto err;
+
+	rcu_read_lock();
+	bind = rcu_dereference(peer->bind);
+	if (bind) {
+		if (bind->sa.in4.sin_family == AF_INET) {
+			if (nla_put(skb, OVPN_A_PEER_SOCKADDR_REMOTE,
+				    sizeof(bind->sa.in4), &bind->sa.in4) ||
+			    nla_put(skb, OVPN_A_PEER_LOCAL_IP,
+				    sizeof(bind->local.ipv4),
+				    &bind->local.ipv4))
+				goto err_unlock;
+		} else if (bind->sa.in4.sin_family == AF_INET6) {
+			if (nla_put(skb, OVPN_A_PEER_SOCKADDR_REMOTE,
+				    sizeof(bind->sa.in6), &bind->sa.in6) ||
+			    nla_put(skb, OVPN_A_PEER_LOCAL_IP,
+				    sizeof(bind->local.ipv6),
+				    &bind->local.ipv6))
+				goto err_unlock;
+		}
+	}
+	rcu_read_unlock();
+
+	if (nla_put_net16(skb, OVPN_A_PEER_LOCAL_PORT,
+			  inet_sk(peer->sock->sock->sk)->inet_sport) ||
+	    /* VPN RX stats */
+	    nla_put_u64_64bit(skb, OVPN_A_PEER_VPN_RX_BYTES,
+			      atomic64_read(&peer->vpn_stats.rx.bytes),
+			      OVPN_A_PAD) ||
+	    nla_put_u32(skb, OVPN_A_PEER_VPN_RX_PACKETS,
+			atomic_read(&peer->vpn_stats.rx.packets)) ||
+	    /* VPN TX stats */
+	    nla_put_u64_64bit(skb, OVPN_A_PEER_VPN_TX_BYTES,
+			      atomic64_read(&peer->vpn_stats.tx.bytes),
+			      OVPN_A_PAD) ||
+	    nla_put_u32(skb, OVPN_A_PEER_VPN_TX_PACKETS,
+			atomic_read(&peer->vpn_stats.tx.packets)) ||
+	    /* link RX stats */
+	    nla_put_u64_64bit(skb, OVPN_A_PEER_LINK_RX_BYTES,
+			      atomic64_read(&peer->link_stats.rx.bytes),
+			      OVPN_A_PAD) ||
+	    nla_put_u32(skb, OVPN_A_PEER_LINK_RX_PACKETS,
+			atomic_read(&peer->link_stats.rx.packets)) ||
+	    /* link TX stats */
+	    nla_put_u64_64bit(skb, OVPN_A_PEER_LINK_TX_BYTES,
+			      atomic64_read(&peer->link_stats.tx.bytes),
+			      OVPN_A_PAD) ||
+	    nla_put_u32(skb, OVPN_A_PEER_LINK_TX_PACKETS,
+			atomic_read(&peer->link_stats.tx.packets)))
+		goto err;
+
+	nla_nest_end(skb, attr);
+	genlmsg_end(skb, hdr);
+
+	return 0;
+err_unlock:
+	rcu_read_unlock();
+err:
+	genlmsg_cancel(skb, hdr);
+	return -EMSGSIZE;
 }
 
 int ovpn_nl_get_peer_doit(struct sk_buff *skb, struct genl_info *info)
 {
-	return -ENOTSUPP;
+	struct nlattr *attrs[OVPN_A_PEER_MAX + 1];
+	struct ovpn_struct *ovpn = info->user_ptr[0];
+	struct ovpn_peer *peer;
+	struct sk_buff *msg;
+	u32 peer_id;
+	int ret;
+
+	if (GENL_REQ_ATTR_CHECK(info, OVPN_A_PEER))
+		return -EINVAL;
+
+	ret = nla_parse_nested(attrs, OVPN_A_PEER_MAX, info->attrs[OVPN_A_PEER],
+			       ovpn_peer_nl_policy, info->extack);
+	if (ret)
+		return ret;
+
+	if (NL_REQ_ATTR_CHECK(info->extack, info->attrs[OVPN_A_PEER], attrs,
+			      OVPN_A_PEER_ID))
+		return -EINVAL;
+
+	peer_id = nla_get_u32(attrs[OVPN_A_PEER_ID]);
+	peer = ovpn_peer_get_by_id(ovpn, peer_id);
+	if (!peer)
+		return -ENOENT;
+
+	msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!msg)
+		return -ENOMEM;
+
+	ret = ovpn_nl_send_peer(msg, peer, info->snd_portid, info->snd_seq, 0);
+	if (ret < 0) {
+		nlmsg_free(msg);
+		goto err;
+	}
+
+	ret = genlmsg_reply(msg, info);
+err:
+	ovpn_peer_put(peer);
+	return ret;
 }
 
 int ovpn_nl_get_peer_dumpit(struct sk_buff *skb, struct netlink_callback *cb)
 {
-	return -ENOTSUPP;
+	const struct genl_info *info = genl_info_dump(cb);
+	struct nlattr **attrs = info->attrs;
+	struct ovpn_struct *ovpn;
+	struct ovpn_peer *peer;
+	struct net_device *dev;
+	int ret, bkt, last_idx = cb->args[1], dumped = 0;
+
+	dev = ovpn_get_dev_from_attrs(sock_net(cb->skb->sk), attrs);
+	if (IS_ERR(dev)) {
+		ret = PTR_ERR(dev);
+		pr_err("ovpn: cannot retrieve device in %s: %d\n", __func__,
+		       ret);
+		return ret;
+	}
+
+	ovpn = netdev_priv(dev);
+
+	if (ovpn->mode == OVPN_MODE_P2P) {
+		/* if we already dumped a peer it means we are done */
+		if (last_idx)
+			goto out;
+
+		rcu_read_lock();
+		peer = rcu_dereference(ovpn->peer);
+		if (peer) {
+			if (ovpn_nl_send_peer(skb, peer,
+					      NETLINK_CB(cb->skb).portid,
+					      cb->nlh->nlmsg_seq,
+					      NLM_F_MULTI) == 0)
+				dumped++;
+		}
+		rcu_read_unlock();
+	} else {
+		rcu_read_lock();
+		hash_for_each_rcu(ovpn->peers.by_id, bkt, peer, hash_entry_id) {
+			/* skip already dumped peers that were dumped by
+			 * previous invocations
+			 */
+			if (last_idx > 0) {
+				last_idx--;
+				continue;
+			}
+
+			if (ovpn_nl_send_peer(skb, peer,
+					      NETLINK_CB(cb->skb).portid,
+					      cb->nlh->nlmsg_seq,
+					      NLM_F_MULTI) < 0)
+				break;
+
+			/* count peers being dumped during this invocation */
+			dumped++;
+		}
+		rcu_read_unlock();
+	}
+
+out:
+	dev_put(dev);
+
+	/* sum up peers dumped in this message, so that at the next invocation
+	 * we can continue from where we left
+	 */
+	cb->args[1] += dumped;
+	return skb->len;
 }
 
 int ovpn_nl_del_peer_doit(struct sk_buff *skb, struct genl_info *info)
 {
-	return -ENOTSUPP;
+	struct nlattr *attrs[OVPN_A_PEER_MAX + 1];
+	struct ovpn_struct *ovpn = info->user_ptr[0];
+	struct ovpn_peer *peer;
+	u32 peer_id;
+	int ret;
+
+	if (GENL_REQ_ATTR_CHECK(info, OVPN_A_PEER))
+		return -EINVAL;
+
+	ret = nla_parse_nested(attrs, OVPN_A_PEER_MAX, info->attrs[OVPN_A_PEER],
+			       ovpn_peer_nl_policy, info->extack);
+	if (ret)
+		return ret;
+
+	if (NL_REQ_ATTR_CHECK(info->extack, info->attrs[OVPN_A_PEER], attrs,
+			      OVPN_A_PEER_ID))
+		return -EINVAL;
+
+	peer_id = nla_get_u32(attrs[OVPN_A_PEER_ID]);
+
+	peer = ovpn_peer_get_by_id(ovpn, peer_id);
+	if (!peer)
+		return -ENOENT;
+
+	netdev_dbg(ovpn->dev, "%s: peer id=%u\n", __func__, peer->id);
+	ret = ovpn_peer_del(peer, OVPN_DEL_PEER_REASON_USERSPACE);
+	ovpn_peer_put(peer);
+
+	return ret;
 }
 
 int ovpn_nl_set_key_doit(struct sk_buff *skb, struct genl_info *info)
diff --git a/drivers/net/ovpn/peer.c b/drivers/net/ovpn/peer.c
index e1eee1bb1ad2..07daa359b3a2 100644
--- a/drivers/net/ovpn/peer.c
+++ b/drivers/net/ovpn/peer.c
@@ -151,9 +151,9 @@ struct ovpn_peer *ovpn_peer_new(struct ovpn_struct *ovpn, u32 id)
 	return ERR_PTR(ret);
 }
 
-static int ovpn_peer_reset_sockaddr(struct ovpn_peer *peer,
-				    const struct sockaddr_storage *ss,
-				    const u8 *local_ip)
+int ovpn_peer_reset_sockaddr(struct ovpn_peer *peer,
+			     const struct sockaddr_storage *ss,
+			     const u8 *local_ip)
 {
 	struct ovpn_bind *bind;
 	size_t ip_len;
diff --git a/drivers/net/ovpn/peer.h b/drivers/net/ovpn/peer.h
index 5ea35ccc2824..f7784615c63f 100644
--- a/drivers/net/ovpn/peer.h
+++ b/drivers/net/ovpn/peer.h
@@ -288,4 +288,16 @@ void ovpn_peer_update_local_endpoint(struct ovpn_peer *peer,
  */
 void ovpn_peer_float(struct ovpn_peer *peer, struct sk_buff *skb);
 
+/**
+ * ovpn_peer_reset_sockaddr - recreate binding for peer
+ * @peer: peer to recreate the binding for
+ * @ss: sockaddr to use as remote endpoint for the binding
+ * @local_ip: local IP for the binding
+ *
+ * Return: 0 on success or a negative error code otherwise
+ */
+int ovpn_peer_reset_sockaddr(struct ovpn_peer *peer,
+			     const struct sockaddr_storage *ss,
+			     const u8 *local_ip);
+
 #endif /* _NET_OVPN_OVPNPEER_H_ */
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH net-next v3 20/24] ovpn: implement key add/del/swap via netlink
  2024-05-06  1:16 [PATCH net-next v3 00/24] Introducing OpenVPN Data Channel Offload Antonio Quartulli
                   ` (18 preceding siblings ...)
  2024-05-06  1:16 ` [PATCH net-next v3 19/24] ovpn: implement peer add/dump/delete via netlink Antonio Quartulli
@ 2024-05-06  1:16 ` Antonio Quartulli
  2024-05-06  1:16 ` [PATCH net-next v3 21/24] ovpn: kill key and notify userspace in case of IV exhaustion Antonio Quartulli
                   ` (4 subsequent siblings)
  24 siblings, 0 replies; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-06  1:16 UTC (permalink / raw
  To: netdev
  Cc: Jakub Kicinski, Sergey Ryazanov, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Esben Haabendal, Antonio Quartulli

This change introduces the netlink commands needed to add, delete and
swap keys for a specific peer.

Userspace is expected to use these commands to create, destroy and
rotate session keys for a specific peer.

Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
---
 drivers/net/ovpn/netlink.c | 193 ++++++++++++++++++++++++++++++++++++-
 1 file changed, 190 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ovpn/netlink.c b/drivers/net/ovpn/netlink.c
index 914b04631ae8..df14988c1f43 100644
--- a/drivers/net/ovpn/netlink.c
+++ b/drivers/net/ovpn/netlink.c
@@ -660,19 +660,206 @@ int ovpn_nl_del_peer_doit(struct sk_buff *skb, struct genl_info *info)
 	return ret;
 }
 
+static int ovpn_nl_get_key_dir(struct genl_info *info, struct nlattr *key,
+			       enum ovpn_cipher_alg cipher,
+			       struct ovpn_key_direction *dir)
+{
+	struct nlattr *attr, *attrs[OVPN_A_KEYDIR_MAX + 1];
+	int ret;
+
+	ret = nla_parse_nested(attrs, OVPN_A_KEYDIR_MAX, key,
+			       ovpn_keydir_nl_policy, info->extack);
+	if (ret)
+		return ret;
+
+	switch (cipher) {
+	case OVPN_CIPHER_ALG_AES_GCM:
+	case OVPN_CIPHER_ALG_CHACHA20_POLY1305:
+		attr = attrs[OVPN_A_KEYDIR_CIPHER_KEY];
+		if (!attr)
+			return -EINVAL;
+
+		dir->cipher_key = nla_data(attr);
+		dir->cipher_key_size = nla_len(attr);
+
+		attr = attrs[OVPN_A_KEYDIR_NONCE_TAIL];
+		/* These algorithms require a 96bit nonce,
+		 * Construct it by combining 4-bytes packet id and
+		 * 8-bytes nonce-tail from userspace
+		 */
+		if (!attr)
+			return -EINVAL;
+
+		dir->nonce_tail = nla_data(attr);
+		dir->nonce_tail_size = nla_len(attr);
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 int ovpn_nl_set_key_doit(struct sk_buff *skb, struct genl_info *info)
 {
-	return -ENOTSUPP;
+	struct nlattr *p_attrs[OVPN_A_PEER_MAX + 1];
+	struct nlattr *attrs[OVPN_A_KEYCONF_MAX + 1];
+	struct ovpn_struct *ovpn = info->user_ptr[0];
+	struct ovpn_peer_key_reset pkr;
+	struct ovpn_peer *peer;
+	u32 peer_id;
+	int ret;
+
+	if (GENL_REQ_ATTR_CHECK(info, OVPN_A_PEER))
+		return -EINVAL;
+
+	ret = nla_parse_nested(p_attrs, OVPN_A_PEER_MAX,
+			       info->attrs[OVPN_A_PEER], ovpn_peer_nl_policy,
+			       info->extack);
+	if (ret)
+		return ret;
+
+	if (NL_REQ_ATTR_CHECK(info->extack, info->attrs[OVPN_A_PEER], p_attrs,
+			      OVPN_A_PEER_ID) ||
+	    NL_REQ_ATTR_CHECK(info->extack, info->attrs[OVPN_A_PEER], p_attrs,
+			      OVPN_A_PEER_KEYCONF))
+		return -EINVAL;
+
+	ret = nla_parse_nested(attrs, OVPN_A_KEYCONF_MAX,
+			       p_attrs[OVPN_A_PEER_KEYCONF],
+			       ovpn_keyconf_nl_policy, info->extack);
+	if (ret)
+		return ret;
+
+	if (NL_REQ_ATTR_CHECK(info->extack, p_attrs[OVPN_A_PEER_KEYCONF], attrs,
+			      OVPN_A_KEYCONF_SLOT) ||
+	    NL_REQ_ATTR_CHECK(info->extack, p_attrs[OVPN_A_PEER_KEYCONF], attrs,
+			      OVPN_A_KEYCONF_KEY_ID) ||
+	    NL_REQ_ATTR_CHECK(info->extack, p_attrs[OVPN_A_PEER_KEYCONF], attrs,
+			      OVPN_A_KEYCONF_CIPHER_ALG) ||
+	    NL_REQ_ATTR_CHECK(info->extack, p_attrs[OVPN_A_PEER_KEYCONF], attrs,
+			      OVPN_A_KEYCONF_ENCRYPT_DIR) ||
+	    NL_REQ_ATTR_CHECK(info->extack, p_attrs[OVPN_A_PEER_KEYCONF], attrs,
+			      OVPN_A_KEYCONF_DECRYPT_DIR))
+		return -EINVAL;
+
+	peer_id = nla_get_u32(p_attrs[OVPN_A_PEER_ID]);
+	pkr.slot = nla_get_u8(attrs[OVPN_A_KEYCONF_SLOT]);
+	pkr.key.key_id = nla_get_u16(attrs[OVPN_A_KEYCONF_KEY_ID]);
+	pkr.key.cipher_alg = nla_get_u16(attrs[OVPN_A_KEYCONF_CIPHER_ALG]);
+
+	ret = ovpn_nl_get_key_dir(info, attrs[OVPN_A_KEYCONF_ENCRYPT_DIR],
+				  pkr.key.cipher_alg, &pkr.key.encrypt);
+	if (ret < 0)
+		return ret;
+
+	ret = ovpn_nl_get_key_dir(info, attrs[OVPN_A_KEYCONF_DECRYPT_DIR],
+				  pkr.key.cipher_alg, &pkr.key.decrypt);
+	if (ret < 0)
+		return ret;
+
+	peer = ovpn_peer_get_by_id(ovpn, peer_id);
+	if (!peer) {
+		netdev_dbg(ovpn->dev, "%s: no peer with id %u to set key for\n",
+			   __func__, peer_id);
+		return -ENOENT;
+	}
+
+	mutex_lock(&peer->crypto.mutex);
+	ret = ovpn_crypto_state_reset(&peer->crypto, &pkr);
+	if (ret < 0) {
+		netdev_dbg(ovpn->dev,
+			   "%s: cannot install new key for peer %u\n", __func__,
+			   peer_id);
+		goto unlock;
+	}
+
+	netdev_dbg(ovpn->dev, "%s: new key installed (id=%u) for peer %u\n",
+		   __func__, pkr.key.key_id, peer_id);
+unlock:
+	mutex_unlock(&peer->crypto.mutex);
+	ovpn_peer_put(peer);
+	return ret;
 }
 
 int ovpn_nl_swap_keys_doit(struct sk_buff *skb, struct genl_info *info)
 {
-	return -ENOTSUPP;
+	struct ovpn_struct *ovpn = info->user_ptr[0];
+	struct nlattr *attrs[OVPN_A_PEER + 1];
+	struct ovpn_peer *peer;
+	u32 peer_id;
+	int ret;
+
+	if (GENL_REQ_ATTR_CHECK(info, OVPN_A_PEER))
+		return -EINVAL;
+
+	ret = nla_parse_nested(attrs, OVPN_A_PEER_MAX, info->attrs[OVPN_A_PEER],
+			       ovpn_peer_nl_policy, info->extack);
+	if (ret)
+		return ret;
+
+	if (NL_REQ_ATTR_CHECK(info->extack, info->attrs[OVPN_A_PEER], attrs,
+			      OVPN_A_PEER_ID))
+		return -EINVAL;
+
+	peer_id = nla_get_u32(attrs[OVPN_A_PEER_ID]);
+
+	peer = ovpn_peer_get_by_id(ovpn, peer_id);
+	if (!peer)
+		return -ENOENT;
+
+	ovpn_crypto_key_slots_swap(&peer->crypto);
+	ovpn_peer_put(peer);
+
+	return 0;
 }
 
 int ovpn_nl_del_key_doit(struct sk_buff *skb, struct genl_info *info)
 {
-	return -ENOTSUPP;
+	struct nlattr *p_attrs[OVPN_A_PEER_MAX + 1];
+	struct nlattr *attrs[OVPN_A_KEYCONF_MAX + 1];
+	struct ovpn_struct *ovpn = info->user_ptr[0];
+	enum ovpn_key_slot slot;
+	struct ovpn_peer *peer;
+	u32 peer_id;
+	int ret;
+
+	if (GENL_REQ_ATTR_CHECK(info, OVPN_A_PEER))
+		return -EINVAL;
+
+	ret = nla_parse_nested(p_attrs, OVPN_A_PEER_MAX,
+			       info->attrs[OVPN_A_PEER], ovpn_peer_nl_policy,
+			       info->extack);
+	if (ret)
+		return ret;
+
+	if (NL_REQ_ATTR_CHECK(info->extack, info->attrs[OVPN_A_PEER], p_attrs,
+			      OVPN_A_PEER_ID) ||
+	    NL_REQ_ATTR_CHECK(info->extack, info->attrs[OVPN_A_PEER], p_attrs,
+			      OVPN_A_PEER_KEYCONF))
+		return -EINVAL;
+
+	ret = nla_parse_nested(attrs, OVPN_A_KEYCONF_MAX,
+			       p_attrs[OVPN_A_PEER_KEYCONF],
+			       ovpn_keyconf_nl_policy, info->extack);
+	if (ret)
+		return ret;
+
+	if (NL_REQ_ATTR_CHECK(info->extack, p_attrs[OVPN_A_PEER_KEYCONF], attrs,
+			      OVPN_A_KEYCONF_SLOT))
+		return -EINVAL;
+
+	peer_id = nla_get_u32(p_attrs[OVPN_A_PEER_ID]);
+	slot = nla_get_u8(attrs[OVPN_A_KEYCONF_SLOT]);
+
+	peer = ovpn_peer_get_by_id(ovpn, peer_id);
+	if (!peer)
+		return -ENOENT;
+
+	ovpn_crypto_key_slot_delete(&peer->crypto, slot);
+	ovpn_peer_put(peer);
+
+	return 0;
 }
 
 /**
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH net-next v3 21/24] ovpn: kill key and notify userspace in case of IV exhaustion
  2024-05-06  1:16 [PATCH net-next v3 00/24] Introducing OpenVPN Data Channel Offload Antonio Quartulli
                   ` (19 preceding siblings ...)
  2024-05-06  1:16 ` [PATCH net-next v3 20/24] ovpn: implement key add/del/swap " Antonio Quartulli
@ 2024-05-06  1:16 ` Antonio Quartulli
  2024-05-06  1:16 ` [PATCH net-next v3 22/24] ovpn: notify userspace when a peer is deleted Antonio Quartulli
                   ` (3 subsequent siblings)
  24 siblings, 0 replies; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-06  1:16 UTC (permalink / raw
  To: netdev
  Cc: Jakub Kicinski, Sergey Ryazanov, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Esben Haabendal, Antonio Quartulli

IV wrap-around is cryptographically dangerous for a number of ciphers,
therefore kill the key and inform userspace (via netlink) should the
IV space go exhausted.

Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
---
 drivers/net/ovpn/io.c      | 16 +++++++++++++++
 drivers/net/ovpn/netlink.c | 42 ++++++++++++++++++++++++++++++++++++++
 drivers/net/ovpn/netlink.h |  8 ++++++++
 3 files changed, 66 insertions(+)

diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c
index 19ebc0fbe2be..8806479ccae5 100644
--- a/drivers/net/ovpn/io.c
+++ b/drivers/net/ovpn/io.c
@@ -336,6 +336,22 @@ static bool ovpn_encrypt_one(struct ovpn_peer *peer, struct sk_buff *skb)
 	/* encrypt */
 	ret = ovpn_aead_encrypt(ks, skb, peer->id);
 	if (unlikely(ret < 0)) {
+		/* if we ran out of IVs we must kill the key as it can't be used
+		 * anymore
+		 */
+		if (ret == -ERANGE) {
+			netdev_warn(peer->ovpn->dev,
+				    "killing primary key as we ran out of IVs for peer %u\n",
+				    peer->id);
+			ovpn_crypto_kill_primary(&peer->crypto);
+			ret = ovpn_nl_notify_swap_keys(peer);
+			if (ret < 0)
+				netdev_warn(peer->ovpn->dev,
+					    "couldn't send key killing notification to userspace for peer %u\n",
+					    peer->id);
+			goto err;
+		}
+
 		net_err_ratelimited("%s: error during encryption for peer %u, key-id %u: %d\n",
 				    peer->ovpn->dev->name, peer->id, ks->key_id,
 				    ret);
diff --git a/drivers/net/ovpn/netlink.c b/drivers/net/ovpn/netlink.c
index df14988c1f43..dc80004eadbb 100644
--- a/drivers/net/ovpn/netlink.c
+++ b/drivers/net/ovpn/netlink.c
@@ -862,6 +862,48 @@ int ovpn_nl_del_key_doit(struct sk_buff *skb, struct genl_info *info)
 	return 0;
 }
 
+int ovpn_nl_notify_swap_keys(struct ovpn_peer *peer)
+{
+	struct sk_buff *msg;
+	void *hdr;
+	int ret;
+
+	netdev_info(peer->ovpn->dev, "peer with id %u must rekey - primary key unusable.\n",
+		    peer->id);
+
+	msg = nlmsg_new(100, GFP_KERNEL);
+	if (!msg)
+		return -ENOMEM;
+
+	hdr = genlmsg_put(msg, 0, 0, &ovpn_nl_family, 0,
+			  OVPN_CMD_SWAP_KEYS);
+	if (!hdr) {
+		ret = -ENOBUFS;
+		goto err_free_msg;
+	}
+
+	if (nla_put_u32(msg, OVPN_A_IFINDEX, peer->ovpn->dev->ifindex)) {
+		ret = -EMSGSIZE;
+		goto err_free_msg;
+	}
+
+	if (nla_put_u32(msg, OVPN_A_PEER_ID, peer->id)) {
+		ret = -EMSGSIZE;
+		goto err_free_msg;
+	}
+
+	genlmsg_end(msg, hdr);
+
+	genlmsg_multicast_netns(&ovpn_nl_family, dev_net(peer->ovpn->dev),
+				msg, 0, OVPN_NLGRP_PEERS, GFP_KERNEL);
+
+	return 0;
+
+err_free_msg:
+	nlmsg_free(msg);
+	return ret;
+}
+
 /**
  * ovpn_nl_init - perform any ovpn specific netlink initialization
  * @ovpn: the openvpn instance object
diff --git a/drivers/net/ovpn/netlink.h b/drivers/net/ovpn/netlink.h
index d79f3ca604b0..ccc49130a150 100644
--- a/drivers/net/ovpn/netlink.h
+++ b/drivers/net/ovpn/netlink.h
@@ -27,4 +27,12 @@ int ovpn_nl_register(void);
  */
 void ovpn_nl_unregister(void);
 
+/**
+ * ovpn_nl_notify_swap_keys - notify userspace peer's key must be renewed
+ * @peer: the peer whose key needs to be renewed
+ *
+ * Return: 0 on success or a negative error code otherwise
+ */
+int ovpn_nl_notify_swap_keys(struct ovpn_peer *peer);
+
 #endif /* _NET_OVPN_NETLINK_H_ */
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH net-next v3 22/24] ovpn: notify userspace when a peer is deleted
  2024-05-06  1:16 [PATCH net-next v3 00/24] Introducing OpenVPN Data Channel Offload Antonio Quartulli
                   ` (20 preceding siblings ...)
  2024-05-06  1:16 ` [PATCH net-next v3 21/24] ovpn: kill key and notify userspace in case of IV exhaustion Antonio Quartulli
@ 2024-05-06  1:16 ` Antonio Quartulli
  2024-05-06  1:16 ` [PATCH net-next v3 23/24] ovpn: add basic ethtool support Antonio Quartulli
                   ` (2 subsequent siblings)
  24 siblings, 0 replies; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-06  1:16 UTC (permalink / raw
  To: netdev
  Cc: Jakub Kicinski, Sergey Ryazanov, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Esben Haabendal, Antonio Quartulli

Whenever a peer is deleted, send a notification to userspace so that it
can react accordingly.

This is most important when a peer is deleted due to ping timeout,
because it all happens in kernelspace and thus userspace has no direct
way to learn about it.

Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
---
 drivers/net/ovpn/netlink.c | 56 ++++++++++++++++++++++++++++++++++++++
 drivers/net/ovpn/netlink.h |  8 ++++++
 drivers/net/ovpn/peer.c    |  1 +
 3 files changed, 65 insertions(+)

diff --git a/drivers/net/ovpn/netlink.c b/drivers/net/ovpn/netlink.c
index dc80004eadbb..98c4e389b4f5 100644
--- a/drivers/net/ovpn/netlink.c
+++ b/drivers/net/ovpn/netlink.c
@@ -862,6 +862,62 @@ int ovpn_nl_del_key_doit(struct sk_buff *skb, struct genl_info *info)
 	return 0;
 }
 
+int ovpn_nl_notify_del_peer(struct ovpn_peer *peer)
+{
+	struct sk_buff *msg;
+	struct nlattr *attr;
+	void *hdr;
+	int ret;
+
+	netdev_info(peer->ovpn->dev, "deleting peer with id %u, reason %d\n",
+		    peer->id, peer->delete_reason);
+
+	msg = nlmsg_new(100, GFP_KERNEL);
+	if (!msg)
+		return -ENOMEM;
+
+	hdr = genlmsg_put(msg, 0, 0, &ovpn_nl_family, 0,
+			  OVPN_CMD_DEL_PEER);
+	if (!hdr) {
+		ret = -ENOBUFS;
+		goto err_free_msg;
+	}
+
+	if (nla_put_u32(msg, OVPN_A_IFINDEX, peer->ovpn->dev->ifindex)) {
+		ret = -EMSGSIZE;
+		goto err_free_msg;
+	}
+
+	attr = nla_nest_start(msg, OVPN_A_PEER);
+	if (!attr) {
+		ret = -EMSGSIZE;
+		goto err_free_msg;
+	}
+
+	if (nla_put_u8(msg, OVPN_A_PEER_DEL_REASON, peer->delete_reason)) {
+		ret = -EMSGSIZE;
+		goto err_free_msg;
+	}
+
+	if (nla_put_u32(msg, OVPN_A_PEER_ID, peer->id)) {
+		ret = -EMSGSIZE;
+		goto err_free_msg;
+	}
+
+	nla_nest_end(msg, attr);
+
+	genlmsg_end(msg, hdr);
+
+	genlmsg_multicast_netns(&ovpn_nl_family, dev_net(peer->ovpn->dev),
+				msg, 0, OVPN_NLGRP_PEERS, GFP_KERNEL);
+
+	return 0;
+
+err_free_msg:
+	nlmsg_free(msg);
+	return ret;
+}
+
 int ovpn_nl_notify_swap_keys(struct ovpn_peer *peer)
 {
 	struct sk_buff *msg;
diff --git a/drivers/net/ovpn/netlink.h b/drivers/net/ovpn/netlink.h
index ccc49130a150..d2720fb67257 100644
--- a/drivers/net/ovpn/netlink.h
+++ b/drivers/net/ovpn/netlink.h
@@ -27,6 +27,14 @@ int ovpn_nl_register(void);
  */
 void ovpn_nl_unregister(void);
 
+/**
+ * ovpn_nl_notify_del_peer - notify userspace about peer being deleted
+ * @peer the peer being deleted
+ *
+ * Return: 0 on success or a negative error code otherwise
+ */
+int ovpn_nl_notify_del_peer(struct ovpn_peer *peer);
+
 /**
  * ovpn_nl_notify_swap_keys - notify userspace peer's key must be renewed
  * @peer: the peer whose key needs to be renewed
diff --git a/drivers/net/ovpn/peer.c b/drivers/net/ovpn/peer.c
index 07daa359b3a2..fb94ace6c9cf 100644
--- a/drivers/net/ovpn/peer.c
+++ b/drivers/net/ovpn/peer.c
@@ -315,6 +315,7 @@ static void ovpn_peer_delete_work(struct work_struct *work)
 	struct ovpn_peer *peer = container_of(work, struct ovpn_peer,
 					      delete_work);
 	ovpn_peer_release(peer);
+	ovpn_nl_notify_del_peer(peer);
 }
 
 void ovpn_peer_release_kref(struct kref *kref)
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH net-next v3 23/24] ovpn: add basic ethtool support
  2024-05-06  1:16 [PATCH net-next v3 00/24] Introducing OpenVPN Data Channel Offload Antonio Quartulli
                   ` (21 preceding siblings ...)
  2024-05-06  1:16 ` [PATCH net-next v3 22/24] ovpn: notify userspace when a peer is deleted Antonio Quartulli
@ 2024-05-06  1:16 ` Antonio Quartulli
  2024-05-06  1:16 ` [PATCH net-next v3 24/24] testing/selftest: add test tool and scripts for ovpn module Antonio Quartulli
  2024-05-07 23:48 ` [PATCH net-next v3 00/24] Introducing OpenVPN Data Channel Offload Jakub Kicinski
  24 siblings, 0 replies; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-06  1:16 UTC (permalink / raw
  To: netdev
  Cc: Jakub Kicinski, Sergey Ryazanov, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Esben Haabendal, Antonio Quartulli

Implement support for basic ethtool functionality.

Note that ovpn is a virtual device driver, therefore
various ethtool APIs are just not meaningful and thus
not implemented.

Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
---
 drivers/net/ovpn/main.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c
index d6ba91c6571f..17ccc9a483fe 100644
--- a/drivers/net/ovpn/main.c
+++ b/drivers/net/ovpn/main.c
@@ -7,6 +7,7 @@
  *		James Yonan <james@openvpn.net>
  */
 
+#include <linux/ethtool.h>
 #include <linux/genetlink.h>
 #include <linux/module.h>
 #include <linux/netdevice.h>
@@ -88,6 +89,19 @@ bool ovpn_dev_is_valid(const struct net_device *dev)
 	return dev->netdev_ops->ndo_start_xmit == ovpn_net_xmit;
 }
 
+static void ovpn_get_drvinfo(struct net_device *dev,
+			     struct ethtool_drvinfo *info)
+{
+	strscpy(info->driver, OVPN_FAMILY_NAME, sizeof(info->driver));
+	strscpy(info->bus_info, "ovpn", sizeof(info->bus_info));
+}
+
+static const struct ethtool_ops ovpn_ethtool_ops = {
+	.get_drvinfo		= ovpn_get_drvinfo,
+	.get_link		= ethtool_op_get_link,
+	.get_ts_info		= ethtool_op_get_ts_info,
+};
+
 static void ovpn_setup(struct net_device *dev)
 {
 	/* compute the overhead considering AEAD encryption */
@@ -102,6 +116,7 @@ static void ovpn_setup(struct net_device *dev)
 
 	dev->needs_free_netdev = true;
 
+	dev->ethtool_ops = &ovpn_ethtool_ops;
 	dev->netdev_ops = &ovpn_netdev_ops;
 
 	dev->priv_destructor = ovpn_struct_free;
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH net-next v3 24/24] testing/selftest: add test tool and scripts for ovpn module
  2024-05-06  1:16 [PATCH net-next v3 00/24] Introducing OpenVPN Data Channel Offload Antonio Quartulli
                   ` (22 preceding siblings ...)
  2024-05-06  1:16 ` [PATCH net-next v3 23/24] ovpn: add basic ethtool support Antonio Quartulli
@ 2024-05-06  1:16 ` Antonio Quartulli
  2024-05-07 23:55   ` Jakub Kicinski
  2024-05-07 23:48 ` [PATCH net-next v3 00/24] Introducing OpenVPN Data Channel Offload Jakub Kicinski
  24 siblings, 1 reply; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-06  1:16 UTC (permalink / raw
  To: netdev
  Cc: Jakub Kicinski, Sergey Ryazanov, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Esben Haabendal, Antonio Quartulli

The ovpn-cli tool can be compiled and used as selftest for the ovpn
kernel module.

It implementes the netlink API and can thus be integrated in any
script for more automated testing.

Along with the tool, 2 scripts are added that perform basic
functionality tests by means of network namespaces.

The scripts can be performed in sequence by running run.sh

Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
---
 tools/testing/selftests/Makefile           |    1 +
 tools/testing/selftests/ovpn/Makefile      |   15 +
 tools/testing/selftests/ovpn/config        |    8 +
 tools/testing/selftests/ovpn/data64.key    |    5 +
 tools/testing/selftests/ovpn/float-test.sh |  113 ++
 tools/testing/selftests/ovpn/netns-test.sh |  118 ++
 tools/testing/selftests/ovpn/ovpn-cli.c    | 1640 ++++++++++++++++++++
 tools/testing/selftests/ovpn/run.sh        |   12 +
 tools/testing/selftests/ovpn/tcp_peers.txt |    1 +
 tools/testing/selftests/ovpn/udp_peers.txt |    5 +
 10 files changed, 1918 insertions(+)
 create mode 100644 tools/testing/selftests/ovpn/Makefile
 create mode 100644 tools/testing/selftests/ovpn/config
 create mode 100644 tools/testing/selftests/ovpn/data64.key
 create mode 100644 tools/testing/selftests/ovpn/float-test.sh
 create mode 100644 tools/testing/selftests/ovpn/netns-test.sh
 create mode 100644 tools/testing/selftests/ovpn/ovpn-cli.c
 create mode 100644 tools/testing/selftests/ovpn/run.sh
 create mode 100644 tools/testing/selftests/ovpn/tcp_peers.txt
 create mode 100644 tools/testing/selftests/ovpn/udp_peers.txt

diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 9039f3709aff..6767dbfd0539 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -67,6 +67,7 @@ TARGETS += net/openvswitch
 TARGETS += net/tcp_ao
 TARGETS += net/netfilter
 TARGETS += nsfs
+TARGETS += ovpn
 TARGETS += perf_events
 TARGETS += pidfd
 TARGETS += pid_namespace
diff --git a/tools/testing/selftests/ovpn/Makefile b/tools/testing/selftests/ovpn/Makefile
new file mode 100644
index 000000000000..d172e9d64a8c
--- /dev/null
+++ b/tools/testing/selftests/ovpn/Makefile
@@ -0,0 +1,15 @@
+# SPDX-License-Identifier: GPL-2.0+ OR MIT
+# Copyright (C) 2020-2024 OpenVPN, Inc.
+#
+CFLAGS = -Wall -idirafter ../../../../include/uapi
+CFLAGS += $(shell pkg-config --cflags libnl-3.0 libnl-genl-3.0)
+
+LDFLAGS = -lmbedtls -lmbedcrypto
+LDFLAGS += $(shell pkg-config --libs libnl-3.0 libnl-genl-3.0)
+
+ovpn-cli: ovpn-cli.c
+
+TEST_PROGS = run.sh
+TEST_GEN_PROGS_EXTENDED = ovpn-cli
+
+include ../lib.mk
diff --git a/tools/testing/selftests/ovpn/config b/tools/testing/selftests/ovpn/config
new file mode 100644
index 000000000000..5ff47de23c12
--- /dev/null
+++ b/tools/testing/selftests/ovpn/config
@@ -0,0 +1,8 @@
+CONFIG_NET=y
+CONFIG_INET=y
+CONFIG_NET_UDP_TUNNEL=y
+CONFIG_DST_CACHE=y
+CONFIG_CRYPTO_AES=y
+CONFIG_CRYPTO_GCM=y
+CONFIG_CRYPTO_CHACHA20POLY1305=y
+CONFIG_OVPN=y
diff --git a/tools/testing/selftests/ovpn/data64.key b/tools/testing/selftests/ovpn/data64.key
new file mode 100644
index 000000000000..a99e88c4e290
--- /dev/null
+++ b/tools/testing/selftests/ovpn/data64.key
@@ -0,0 +1,5 @@
+jRqMACN7d7/aFQNT8S7jkrBD8uwrgHbG5OQZP2eu4R1Y7tfpS2bf5RHv06Vi163CGoaIiTX99R3B
+ia9ycAH8Wz1+9PWv51dnBLur9jbShlgZ2QHLtUc4a/gfT7zZwULXuuxdLnvR21DDeMBaTbkgbai9
+uvAa7ne1liIgGFzbv+Bas4HDVrygxIxuAnP5Qgc3648IJkZ0QEXPF+O9f0n5+QIvGCxkAUVx+5K6
+KIs+SoeWXnAopELmoGSjUpFtJbagXK82HfdqpuUxT2Tnuef0/14SzVE/vNleBNu2ZbyrSAaah8tE
+BofkPJUBFY+YQcfZNM5Dgrw3i+Bpmpq/gpdg5w==
diff --git a/tools/testing/selftests/ovpn/float-test.sh b/tools/testing/selftests/ovpn/float-test.sh
new file mode 100644
index 000000000000..66e0e44a7ec9
--- /dev/null
+++ b/tools/testing/selftests/ovpn/float-test.sh
@@ -0,0 +1,113 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0+ OR MIT
+# Copyright (C) 2020-2024 OpenVPN, Inc.
+#
+#  Author:	Antonio Quartulli <antonio@openvpn.net>
+
+#set -x
+set -e
+
+UDP_PEERS_FILE=${UDP_PEERS_FILE:-udp_peers.txt}
+TCP_PEERS_FILE=${TCP_PEERS_FILE:-tcp_peers.txt}
+OVPN_CLI=${OVPN_CLI:-./ovpn-cli}
+ALG=${ALG:-aes}
+
+function create_ns() {
+	ip netns add peer$1
+}
+
+function setup_ns() {
+	MODE="P2P"
+
+	if [ $1 -eq 0 ]; then
+		MODE="MP"
+		for p in $(seq 1 $NUM_PEERS); do
+			ip link add veth${p} netns peer0 type veth peer name veth${p} netns peer${p}
+
+			ip -n peer0 addr add 10.10.${p}.1/24 dev veth${p}
+			ip -n peer0 link set veth${p} up
+
+			ip -n peer${p} addr add 10.10.${p}.2/24 dev veth${p}
+			ip -n peer${p} link set veth${p} up
+		done
+	fi
+
+	ip netns exec peer$1 ${OVPN_CLI} new_iface tun$1 $MODE
+	ip -n peer$1 addr add $2 dev tun$1
+	ip -n peer$1 link set tun$1 up
+}
+
+function add_peer() {
+	if [ $tcp -eq 0 ]; then
+		if [ $1 -eq 0 ]; then
+			ip netns exec peer0 $OVPN_CLI new_multi_peer tun0 1 $UDP_PEERS_FILE
+
+			for p in $(seq 1 $NUM_PEERS); do
+			#	ip netns exec peer0 $OVPN_CLI new_peer tun0 ${p} ${p} 10.10.${p}.2 1 5.5.5.$((${p} + 1))
+				ip netns exec peer0 $OVPN_CLI new_key tun0 ${p} $ALG 0 data64.key
+			done
+		else
+			ip netns exec peer${1} $OVPN_CLI new_peer tun${1} 1 ${1} 10.10.${1}.1 1 5.5.5.1
+			ip netns exec peer${1} $OVPN_CLI new_key tun${1} ${1} $ALG 1 data64.key
+		fi
+	else
+		if [ $1 -eq 0 ]; then
+			(ip netns exec peer$1 $OVPN_CLI listen tun0 1 $TCP_PEERS_FILE && {
+				for p in $(seq 1 $NUM_PEERS); do
+					ip netns exec peer0 $OVPN_CLI new_key tun0 ${p} $ALG 0 data64.key
+				done
+			}) &
+			sleep 5
+		else
+			ip netns exec peer${1} $OVPN_CLI connect tun${1} ${1} 10.10.${1}.1 1 5.5.5.1
+			ip netns exec peer${1} $OVPN_CLI new_key tun${1} ${1} $ALG 1 data64.key
+		fi
+	fi
+}
+
+function cleanup() {
+	for p in $(seq 1 10); do
+		ip -n peer0 link del veth${p} 2>/dev/null || true
+	done
+	for p in $(seq 0 10); do
+		ip netns exec peer${p} ${OVPN_CLI} del_iface tun${p} 2>/dev/null || true
+		ip netns del peer${p} 2>/dev/null || true
+	done
+}
+
+tcp=0
+if [ "$1" == "-t" ]; then
+	shift
+	tcp=1
+	NUM_PEERS=${NUM_PEERS:-$(wc -l $TCP_PEERS_FILE | awk '{print $1}')}
+else
+	NUM_PEERS=${NUM_PEERS:-$(wc -l $UDP_PEERS_FILE | awk '{print $1}')}
+fi
+
+cleanup
+
+for p in $(seq 0 $NUM_PEERS); do
+	create_ns ${p}
+done
+
+for p in $(seq 0 $NUM_PEERS); do
+	setup_ns ${p} 5.5.5.$((${p} + 1))/24
+done
+
+for p in $(seq 0 $NUM_PEERS); do
+	add_peer ${p}
+done
+
+for p in $(seq 1 $NUM_PEERS); do
+	ip netns exec peer0 ping -qfc 2000 -w 5 5.5.5.$((${p} + 1))
+done
+# make clients float..
+for p in $(seq 1 $NUM_PEERS); do
+	ip -n peer${p} addr del 10.10.${p}.2/24 dev veth${p}
+	ip -n peer${p} addr add 10.10.${p}.3/24 dev veth${p}
+done
+for p in $(seq 1 $NUM_PEERS); do
+	ip netns exec peer${p} ping -qfc 2000 -w 5 5.5.5.1
+done
+
+cleanup
diff --git a/tools/testing/selftests/ovpn/netns-test.sh b/tools/testing/selftests/ovpn/netns-test.sh
new file mode 100644
index 000000000000..69ba06bb67c0
--- /dev/null
+++ b/tools/testing/selftests/ovpn/netns-test.sh
@@ -0,0 +1,118 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0+ OR MIT
+# Copyright (C) 2020-2024 OpenVPN, Inc.
+#
+#  Author:	Antonio Quartulli <antonio@openvpn.net>
+
+#set -x
+set -e
+
+UDP_PEERS_FILE=${UDP_PEERS_FILE:-udp_peers.txt}
+TCP_PEERS_FILE=${TCP_PEERS_FILE:-tcp_peers.txt}
+OVPN_CLI=${OVPN_CLI:-./ovpn-cli}
+ALG=${ALG:-aes}
+
+function create_ns() {
+	ip netns add peer$1
+}
+
+function setup_ns() {
+	MODE="P2P"
+
+	if [ $1 -eq 0 ]; then
+		MODE="MP"
+		for p in $(seq 1 $NUM_PEERS); do
+			ip link add veth${p} netns peer0 type veth peer name veth${p} netns peer${p}
+
+			ip -n peer0 addr add 10.10.${p}.1/24 dev veth${p}
+			ip -n peer0 link set veth${p} up
+
+			ip -n peer${p} addr add 10.10.${p}.2/24 dev veth${p}
+			ip -n peer${p} link set veth${p} up
+		done
+	fi
+
+	ip netns exec peer$1 $OVPN_CLI new_iface tun${1} $MODE
+	ip -n peer$1 addr add $2 dev tun${1}
+	ip -n peer$1 link set tun${1} up
+}
+
+function add_peer() {
+	if [ $tcp -eq 0 ]; then
+		if [ $1 -eq 0 ]; then
+			ip netns exec peer0 $OVPN_CLI new_multi_peer tun0 1 $UDP_PEERS_FILE
+
+			for p in $(seq 1 $NUM_PEERS); do
+			#	ip netns exec peer0 $OVPN_CLI new_peer tun0 ${p} ${p} 10.10.${p}.2 1 5.5.5.$((${p} + 1))
+				ip netns exec peer0 $OVPN_CLI new_key tun0 ${p} $ALG 0 data64.key
+			done
+		else
+			ip netns exec peer${1} $OVPN_CLI new_peer tun${1} 1 ${1} 10.10.${1}.1 1 5.5.5.1
+			ip netns exec peer${1} $OVPN_CLI new_key tun${1} ${1} $ALG 1 data64.key
+		fi
+	else
+		if [ $1 -eq 0 ]; then
+			(ip netns exec peer0 $OVPN_CLI listen tun0 1 $TCP_PEERS_FILE && {
+				for p in $(seq 1 $NUM_PEERS); do
+					ip netns exec peer0 $OVPN_CLI new_key tun0 ${p} $ALG 0 data64.key
+				done
+			}) &
+			sleep 5
+		else
+			ip netns exec peer${1} $OVPN_CLI connect tun${1} ${1} 10.10.${1}.1 1 5.5.5.1
+			ip netns exec peer${1} $OVPN_CLI new_key tun${1} ${1} $ALG 1 data64.key
+		fi
+	fi
+}
+
+cleanup() {
+	for p in $(seq 1 10); do
+		ip -n peer0 link del veth${p} 2>/dev/null || true
+	done
+	for p in $(seq 0 10); do
+		ip netns exec peer${p} $OVPN_CLI del_iface tun${p} 2>/dev/null || true
+		ip netns del peer${p} 2>/dev/null || true
+	done
+}
+
+tcp=0
+if [ "$1" == "-t" ]; then
+	shift
+	tcp=1
+	NUM_PEERS=${NUM_PEERS:-$(wc -l $TCP_PEERS_FILE | awk '{print $1}')}
+else
+	NUM_PEERS=${NUM_PEERS:-$(wc -l $UDP_PEERS_FILE | awk '{print $1}')}
+fi
+
+cleanup
+
+for p in $(seq 0 $NUM_PEERS); do
+	create_ns ${p}
+done
+
+for p in $(seq 0 $NUM_PEERS); do
+	setup_ns ${p} 5.5.5.$((${p} + 1))/24
+done
+
+for p in $(seq 0 $NUM_PEERS); do
+	add_peer ${p}
+done
+
+for p in $(seq 1 $NUM_PEERS); do
+	ip netns exec peer0 ping -qfc 10 -w 5 5.5.5.$((${p} + 1))
+done
+
+sleep 1
+echo "Querying all peers:"
+ip netns exec peer0 $OVPN_CLI get_peer tun0
+ip netns exec peer1 $OVPN_CLI get_peer tun1
+
+echo "Querying peer 1:"
+ip netns exec peer0 $OVPN_CLI get_peer tun0 1
+
+echo "Querying non-existent peer 10:"
+ip netns exec peer0 $OVPN_CLI get_peer tun0 10 || true
+
+ip netns exec peer0 $OVPN_CLI del_peer tun0 1
+
+cleanup
diff --git a/tools/testing/selftests/ovpn/ovpn-cli.c b/tools/testing/selftests/ovpn/ovpn-cli.c
new file mode 100644
index 000000000000..d1dd8d731bb5
--- /dev/null
+++ b/tools/testing/selftests/ovpn/ovpn-cli.c
@@ -0,0 +1,1640 @@
+// SPDX-License-Identifier: GPL-2.0
+/*  OpenVPN data channel accelerator
+ *
+ *  Copyright (C) 2020-2023 OpenVPN, Inc.
+ *
+ *  Author:	Antonio Quartulli <antonio@openvpn.net>
+ */
+
+#include <stdio.h>
+#include <inttypes.h>
+#include <stdbool.h>
+#include <string.h>
+#include <errno.h>
+#include <unistd.h>
+#include <arpa/inet.h>
+#include <net/if.h>
+#include <netinet/in.h>
+
+#include <linux/ovpn.h>
+#include <linux/types.h>
+#include <linux/netlink.h>
+
+#include <netlink/socket.h>
+#include <netlink/netlink.h>
+#include <netlink/genl/genl.h>
+#include <netlink/genl/family.h>
+#include <netlink/genl/ctrl.h>
+
+#include <mbedtls/base64.h>
+#include <mbedtls/error.h>
+
+/* libnl < 3.5.0 does not set the NLA_F_NESTED on its own, therefore we
+ * have to explicitly do it to prevent the kernel from failing upon
+ * parsing of the message
+ */
+#define nla_nest_start(_msg, _type) \
+	nla_nest_start(_msg, (_type) | NLA_F_NESTED)
+
+typedef int (*ovpn_nl_cb)(struct nl_msg *msg, void *arg);
+
+enum ovpn_key_direction {
+	KEY_DIR_IN = 0,
+	KEY_DIR_OUT,
+};
+
+#define KEY_LEN (256 / 8)
+#define NONCE_LEN 8
+
+#define PEER_ID_UNDEF 0x00FFFFFF
+
+struct nl_ctx {
+	struct nl_sock *nl_sock;
+	struct nl_msg *nl_msg;
+	struct nl_cb *nl_cb;
+
+	int ovpn_dco_id;
+};
+
+struct ovpn_ctx {
+	__u8 key_enc[KEY_LEN];
+	__u8 key_dec[KEY_LEN];
+	__u8 nonce[NONCE_LEN];
+
+	enum ovpn_cipher_alg cipher;
+
+	sa_family_t sa_family;
+
+	__u32 peer_id;
+	__u16 lport;
+
+	union {
+		struct sockaddr_in in4;
+		struct sockaddr_in6 in6;
+	} remote;
+
+	union {
+		struct sockaddr_in in4;
+		struct sockaddr_in6 in6;
+	} peer_ip;
+
+	unsigned int ifindex;
+	char ifname[IFNAMSIZ];
+	enum ovpn_mode mode;
+	bool mode_set;
+
+	int socket;
+
+	__u32 keepalive_interval;
+	__u32 keepalive_timeout;
+
+	enum ovpn_key_direction key_dir;
+};
+
+static int ovpn_nl_recvmsgs(struct nl_ctx *ctx)
+{
+	int ret;
+
+	ret = nl_recvmsgs(ctx->nl_sock, ctx->nl_cb);
+
+	switch (ret) {
+	case -NLE_INTR:
+		fprintf(stderr,
+			"netlink received interrupt due to signal - ignoring\n");
+		break;
+	case -NLE_NOMEM:
+		fprintf(stderr, "netlink out of memory error\n");
+		break;
+	case -NLE_AGAIN:
+		fprintf(stderr,
+			"netlink reports blocking read - aborting wait\n");
+		break;
+	default:
+		if (ret)
+			fprintf(stderr, "netlink reports error (%d): %s\n",
+				ret, nl_geterror(-ret));
+		break;
+	}
+
+	return ret;
+}
+
+static struct nl_ctx *nl_ctx_alloc_flags(struct ovpn_ctx *ovpn, int cmd,
+					 int flags)
+{
+	struct nl_ctx *ctx;
+	int ret;
+
+	ctx = calloc(1, sizeof(*ctx));
+	if (!ctx)
+		return NULL;
+
+	ctx->nl_sock = nl_socket_alloc();
+	if (!ctx->nl_sock) {
+		fprintf(stderr, "cannot allocate netlink socket\n");
+		goto err_free;
+	}
+
+	nl_socket_set_buffer_size(ctx->nl_sock, 8192, 8192);
+
+	ret = genl_connect(ctx->nl_sock);
+	if (ret) {
+		fprintf(stderr, "cannot connect to generic netlink: %s\n",
+			nl_geterror(ret));
+		goto err_sock;
+	}
+
+	ctx->ovpn_dco_id = genl_ctrl_resolve(ctx->nl_sock, OVPN_FAMILY_NAME);
+	if (ctx->ovpn_dco_id < 0) {
+		fprintf(stderr, "cannot find ovpn_dco netlink component: %d\n",
+			ctx->ovpn_dco_id);
+		goto err_free;
+	}
+
+	ctx->nl_msg = nlmsg_alloc();
+	if (!ctx->nl_msg) {
+		fprintf(stderr, "cannot allocate netlink message\n");
+		goto err_sock;
+	}
+
+	ctx->nl_cb = nl_cb_alloc(NL_CB_DEFAULT);
+	if (!ctx->nl_cb) {
+		fprintf(stderr, "failed to allocate netlink callback\n");
+		goto err_msg;
+	}
+
+	nl_socket_set_cb(ctx->nl_sock, ctx->nl_cb);
+
+	genlmsg_put(ctx->nl_msg, 0, 0, ctx->ovpn_dco_id, 0, flags, cmd, 0);
+
+	if (ovpn->ifindex > 0)
+		NLA_PUT_U32(ctx->nl_msg, OVPN_A_IFINDEX, ovpn->ifindex);
+
+	return ctx;
+nla_put_failure:
+err_msg:
+	nlmsg_free(ctx->nl_msg);
+err_sock:
+	nl_socket_free(ctx->nl_sock);
+err_free:
+	free(ctx);
+	return NULL;
+}
+
+static struct nl_ctx *nl_ctx_alloc(struct ovpn_ctx *ovpn, int cmd)
+{
+	return nl_ctx_alloc_flags(ovpn, cmd, 0);
+}
+
+static void nl_ctx_free(struct nl_ctx *ctx)
+{
+	if (!ctx)
+		return;
+
+	nl_socket_free(ctx->nl_sock);
+	nlmsg_free(ctx->nl_msg);
+	nl_cb_put(ctx->nl_cb);
+	free(ctx);
+}
+
+static int ovpn_nl_cb_error(struct sockaddr_nl (*nla)__attribute__((unused)),
+			    struct nlmsgerr *err, void *arg)
+{
+	struct nlmsghdr *nlh = (struct nlmsghdr *)err - 1;
+	struct nlattr *tb_msg[NLMSGERR_ATTR_MAX + 1];
+	int len = nlh->nlmsg_len;
+	struct nlattr *attrs;
+	int *ret = arg;
+	int ack_len = sizeof(*nlh) + sizeof(int) + sizeof(*nlh);
+
+	*ret = err->error;
+
+	if (!(nlh->nlmsg_flags & NLM_F_ACK_TLVS))
+		return NL_STOP;
+
+	if (!(nlh->nlmsg_flags & NLM_F_CAPPED))
+		ack_len += err->msg.nlmsg_len - sizeof(*nlh);
+
+	if (len <= ack_len)
+		return NL_STOP;
+
+	attrs = (void *)((unsigned char *)nlh + ack_len);
+	len -= ack_len;
+
+	nla_parse(tb_msg, NLMSGERR_ATTR_MAX, attrs, len, NULL);
+	if (tb_msg[NLMSGERR_ATTR_MSG]) {
+		len = strnlen((char *)nla_data(tb_msg[NLMSGERR_ATTR_MSG]),
+			      nla_len(tb_msg[NLMSGERR_ATTR_MSG]));
+		fprintf(stderr, "kernel error: %*s\n", len,
+			(char *)nla_data(tb_msg[NLMSGERR_ATTR_MSG]));
+	}
+
+	return NL_STOP;
+}
+
+static int ovpn_nl_cb_finish(struct nl_msg (*msg)__attribute__((unused)),
+			     void *arg)
+{
+	int *status = arg;
+
+	*status = 0;
+	return NL_SKIP;
+}
+
+static int ovpn_nl_msg_send(struct nl_ctx *ctx, ovpn_nl_cb cb)
+{
+	int status = 1;
+
+	nl_cb_err(ctx->nl_cb, NL_CB_CUSTOM, ovpn_nl_cb_error, &status);
+	nl_cb_set(ctx->nl_cb, NL_CB_FINISH, NL_CB_CUSTOM, ovpn_nl_cb_finish,
+		  &status);
+	nl_cb_set(ctx->nl_cb, NL_CB_ACK, NL_CB_CUSTOM, ovpn_nl_cb_finish,
+		  &status);
+
+	if (cb)
+		nl_cb_set(ctx->nl_cb, NL_CB_VALID, NL_CB_CUSTOM, cb, ctx);
+
+	nl_send_auto_complete(ctx->nl_sock, ctx->nl_msg);
+
+	while (status == 1)
+		ovpn_nl_recvmsgs(ctx);
+
+	if (status < 0)
+		fprintf(stderr, "failed to send netlink message: %s (%d)\n",
+			strerror(-status), status);
+
+	return status;
+}
+
+static int ovpn_read_key(const char *file, struct ovpn_ctx *ctx)
+{
+	int idx_enc, idx_dec, ret = -1;
+	unsigned char *ckey = NULL;
+	__u8 *bkey = NULL;
+	size_t olen = 0;
+	long ckey_len;
+	FILE *fp;
+
+	fp = fopen(file, "r");
+	if (!fp) {
+		fprintf(stderr, "cannot open: %s\n", file);
+		return -1;
+	}
+
+	/* get file size */
+	fseek(fp, 0L, SEEK_END);
+	ckey_len = ftell(fp);
+	rewind(fp);
+
+	/* if the file is longer, let's just read a portion */
+	if (ckey_len > 256)
+		ckey_len = 256;
+
+	ckey = malloc(ckey_len);
+	if (!ckey)
+		goto err;
+
+	ret = fread(ckey, 1, ckey_len, fp);
+	if (ret != ckey_len) {
+		fprintf(stderr,
+			"couldn't read enough data from key file: %dbytes read\n",
+			ret);
+		goto err;
+	}
+
+	olen = 0;
+	ret = mbedtls_base64_decode(NULL, 0, &olen, ckey, ckey_len);
+	if (ret != MBEDTLS_ERR_BASE64_BUFFER_TOO_SMALL) {
+		char buf[256];
+
+		mbedtls_strerror(ret, buf, sizeof(buf));
+		fprintf(stderr, "unexpected base64 error1: %s (%d)\n", buf,
+			ret);
+
+		goto err;
+	}
+
+	bkey = malloc(olen);
+	if (!bkey) {
+		fprintf(stderr, "cannot allocate binary key buffer\n");
+		goto err;
+	}
+
+	ret = mbedtls_base64_decode(bkey, olen, &olen, ckey, ckey_len);
+	if (ret) {
+		char buf[256];
+
+		mbedtls_strerror(ret, buf, sizeof(buf));
+		fprintf(stderr, "unexpected base64 error2: %s (%d)\n", buf,
+			ret);
+
+		goto err;
+	}
+
+	if (olen < 2 * KEY_LEN + NONCE_LEN) {
+		fprintf(stderr,
+			"not enough data in key file, found %zdB but needs %dB\n",
+			olen, 2 * KEY_LEN + NONCE_LEN);
+		goto err;
+	}
+
+	switch (ctx->key_dir) {
+	case KEY_DIR_IN:
+		idx_enc = 0;
+		idx_dec = 1;
+		break;
+	case KEY_DIR_OUT:
+		idx_enc = 1;
+		idx_dec = 0;
+		break;
+	}
+
+	memcpy(ctx->key_enc, bkey + KEY_LEN * idx_enc, KEY_LEN);
+	memcpy(ctx->key_dec, bkey + KEY_LEN * idx_dec, KEY_LEN);
+	memcpy(ctx->nonce, bkey + 2 * KEY_LEN, NONCE_LEN);
+
+	ret = 0;
+
+err:
+	fclose(fp);
+	free(bkey);
+	free(ckey);
+
+	return ret;
+}
+
+static int ovpn_read_cipher(const char *cipher, struct ovpn_ctx *ctx)
+{
+	if (strcmp(cipher, "aes") == 0)
+		ctx->cipher = OVPN_CIPHER_ALG_AES_GCM;
+	else if (strcmp(cipher, "chachapoly") == 0)
+		ctx->cipher = OVPN_CIPHER_ALG_CHACHA20_POLY1305;
+	else if (strcmp(cipher, "none") == 0)
+		ctx->cipher = OVPN_CIPHER_ALG_NONE;
+	else
+		return -ENOTSUP;
+
+	return 0;
+}
+
+static int ovpn_read_key_direction(const char *dir, struct ovpn_ctx *ctx)
+{
+	int in_dir;
+
+	in_dir = strtoll(dir, NULL, 10);
+	switch (in_dir) {
+	case KEY_DIR_IN:
+	case KEY_DIR_OUT:
+		ctx->key_dir = in_dir;
+		break;
+	default:
+		fprintf(stderr,
+			"invalid key direction provided. Can be 0 or 1 only\n");
+		return -1;
+	}
+
+	return 0;
+}
+
+static int ovpn_socket(struct ovpn_ctx *ctx, sa_family_t family, int proto)
+{
+	struct sockaddr_storage local_sock;
+	struct sockaddr_in6 *in6;
+	struct sockaddr_in *in;
+	int ret, s, sock_type;
+	size_t sock_len;
+
+	if (proto == IPPROTO_UDP)
+		sock_type = SOCK_DGRAM;
+	else if (proto == IPPROTO_TCP)
+		sock_type = SOCK_STREAM;
+	else
+		return -EINVAL;
+
+	s = socket(family, sock_type, 0);
+	if (s < 0) {
+		perror("cannot create socket");
+		return -1;
+	}
+
+	memset((char *)&local_sock, 0, sizeof(local_sock));
+
+	switch (family) {
+	case AF_INET:
+		in = (struct sockaddr_in *)&local_sock;
+		in->sin_family = family;
+		in->sin_port = htons(ctx->lport);
+		in->sin_addr.s_addr = htonl(INADDR_ANY);
+		sock_len = sizeof(*in);
+		break;
+	case AF_INET6:
+		in6 = (struct sockaddr_in6 *)&local_sock;
+		in6->sin6_family = family;
+		in6->sin6_port = htons(ctx->lport);
+		in6->sin6_addr = in6addr_any;
+		sock_len = sizeof(*in6);
+		break;
+	default:
+		return -1;
+	}
+
+	int opt = 1;
+	ret = setsockopt(s, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt));
+	if (ret < 0) {
+		perror("setsockopt for SO_REUSEADDR");
+		return ret;
+	}
+
+	ret = setsockopt(s, SOL_SOCKET, SO_REUSEPORT, &opt, sizeof(opt));
+	if (ret < 0) {
+		perror("setsockopt for SO_REUSEPORT");
+		return ret;
+	}
+
+	if (family == AF_INET6) {
+		opt = 0;
+		if (setsockopt(s, IPPROTO_IPV6, IPV6_V6ONLY, &opt, sizeof(opt))) {
+			perror("failed to set IPV6_V6ONLY");
+			return -1;
+		}
+	}
+
+	ret = bind(s, (struct sockaddr *)&local_sock, sock_len);
+	if (ret < 0) {
+		perror("cannot bind socket");
+		goto err_socket;
+	}
+
+	ctx->socket = s;
+	ctx->sa_family = family;
+	return 0;
+
+err_socket:
+	close(s);
+	return -1;
+}
+
+static int ovpn_udp_socket(struct ovpn_ctx *ctx, sa_family_t family)
+{
+	return ovpn_socket(ctx, family, IPPROTO_UDP);
+}
+
+static int ovpn_listen(struct ovpn_ctx *ctx, sa_family_t family)
+{
+	int ret;
+
+	ret = ovpn_socket(ctx, family, IPPROTO_TCP);
+	if (ret < 0)
+		return ret;
+
+	ret = listen(ctx->socket, 10);
+	if (ret < 0) {
+		perror("listen");
+		close(ctx->socket);
+		return -1;
+	}
+
+	return 0;
+}
+
+static int ovpn_accept(struct ovpn_ctx *ctx)
+{
+	socklen_t socklen;
+	int ret;
+
+	socklen = sizeof(ctx->remote);
+	ret = accept(ctx->socket, (struct sockaddr *)&ctx->remote, &socklen);
+	if (ret < 0) {
+		perror("accept");
+		goto err;
+	}
+
+	fprintf(stderr, "Connection received!\n");
+
+	switch (socklen) {
+	case sizeof(struct sockaddr_in):
+	case sizeof(struct sockaddr_in6):
+		break;
+	default:
+		fprintf(stderr, "error: expecting IPv4 or IPv6 connection\n");
+		close(ret);
+		ret = -EINVAL;
+		goto err;
+	}
+
+	return ret;
+err:
+	close(ctx->socket);
+	return ret;
+}
+
+static int ovpn_connect(struct ovpn_ctx *ovpn)
+{
+	socklen_t socklen;
+	int s, ret;
+
+	s = socket(ovpn->remote.in4.sin_family, SOCK_STREAM, 0);
+	if (s < 0) {
+		perror("cannot create socket");
+		return -1;
+	}
+
+	switch (ovpn->remote.in4.sin_family) {
+	case AF_INET:
+		socklen = sizeof(struct sockaddr_in);
+		break;
+	case AF_INET6:
+		socklen = sizeof(struct sockaddr_in6);
+		break;
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	ret = connect(s, (struct sockaddr *)&ovpn->remote, socklen);
+	if (ret < 0) {
+		perror("connect");
+		goto err;
+	}
+
+	fprintf(stderr, "connected\n");
+
+	ovpn->socket = s;
+
+	return 0;
+err:
+	close(s);
+	return ret;
+}
+
+static int ovpn_new_peer(struct ovpn_ctx *ovpn, bool is_tcp)
+{
+	struct nlattr *attr;
+	struct nl_ctx *ctx;
+	size_t alen;
+	int ret = -1;
+
+	ctx = nl_ctx_alloc(ovpn, OVPN_CMD_SET_PEER);
+	if (!ctx)
+		return -ENOMEM;
+
+	attr = nla_nest_start(ctx->nl_msg, OVPN_A_PEER);
+	NLA_PUT_U32(ctx->nl_msg, OVPN_A_PEER_ID, ovpn->peer_id);
+	NLA_PUT_U32(ctx->nl_msg, OVPN_A_PEER_SOCKET, ovpn->socket);
+
+	if (!is_tcp) {
+		switch (ovpn->remote.in4.sin_family) {
+		case AF_INET:
+			alen = sizeof(struct sockaddr_in);
+			break;
+		case AF_INET6:
+			alen = sizeof(struct sockaddr_in6);
+			break;
+		default:
+			fprintf(stderr, "Invalid family for remote socket address\n");
+			goto nla_put_failure;
+		}
+		NLA_PUT(ctx->nl_msg, OVPN_A_PEER_SOCKADDR_REMOTE, alen, &ovpn->remote);
+	}
+
+
+	switch (ovpn->peer_ip.in4.sin_family) {
+	case AF_INET:
+		NLA_PUT_U32(ctx->nl_msg, OVPN_A_PEER_VPN_IPV4,
+			    ovpn->peer_ip.in4.sin_addr.s_addr);
+		break;
+	case AF_INET6:
+		NLA_PUT(ctx->nl_msg, OVPN_A_PEER_VPN_IPV6, sizeof(struct in6_addr),
+			&ovpn->peer_ip.in6.sin6_addr);
+		break;
+	default:
+		fprintf(stderr, "Invalid family for peer address\n");
+		goto nla_put_failure;
+	}
+
+	nla_nest_end(ctx->nl_msg, attr);
+
+	ret = ovpn_nl_msg_send(ctx, NULL);
+nla_put_failure:
+	nl_ctx_free(ctx);
+	return ret;
+}
+
+static int ovpn_set_peer(struct ovpn_ctx *ovpn)
+{
+	struct nlattr *attr;
+	struct nl_ctx *ctx;
+	int ret = -1;
+
+	ctx = nl_ctx_alloc(ovpn, OVPN_CMD_SET_PEER);
+	if (!ctx)
+		return -ENOMEM;
+
+	attr = nla_nest_start(ctx->nl_msg, OVPN_A_PEER);
+	NLA_PUT_U32(ctx->nl_msg, OVPN_A_PEER_ID, ovpn->peer_id);
+	NLA_PUT_U32(ctx->nl_msg, OVPN_A_PEER_KEEPALIVE_INTERVAL,
+		    ovpn->keepalive_interval);
+	NLA_PUT_U32(ctx->nl_msg, OVPN_A_PEER_KEEPALIVE_TIMEOUT,
+		    ovpn->keepalive_timeout);
+	nla_nest_end(ctx->nl_msg, attr);
+
+	ret = ovpn_nl_msg_send(ctx, NULL);
+nla_put_failure:
+	nl_ctx_free(ctx);
+	return ret;
+}
+
+static int ovpn_del_peer(struct ovpn_ctx *ovpn)
+{
+	struct nlattr *attr;
+	struct nl_ctx *ctx;
+	int ret = -1;
+
+	ctx = nl_ctx_alloc(ovpn, OVPN_CMD_DEL_PEER);
+	if (!ctx)
+		return -ENOMEM;
+
+	attr = nla_nest_start(ctx->nl_msg, OVPN_A_PEER);
+	NLA_PUT_U32(ctx->nl_msg, OVPN_A_PEER_ID, ovpn->peer_id);
+	nla_nest_end(ctx->nl_msg, attr);
+
+	ret = ovpn_nl_msg_send(ctx, NULL);
+nla_put_failure:
+	nl_ctx_free(ctx);
+	return ret;
+}
+
+static int ovpn_handle_peer(struct nl_msg *msg, void *arg)
+{
+	struct nlattr *attrs_peer[OVPN_A_PEER_MAX + 1];
+	struct genlmsghdr *gnlh = nlmsg_data(nlmsg_hdr(msg));
+	struct nlattr *attrs[OVPN_A_MAX + 1];
+	__u16 port = 0;
+
+	nla_parse(attrs, OVPN_A_MAX, genlmsg_attrdata(gnlh, 0),
+		  genlmsg_attrlen(gnlh, 0), NULL);
+
+	if (!attrs[OVPN_A_PEER]) {
+		fprintf(stderr, "no packet content in netlink message\n");
+		return NL_SKIP;
+	}
+
+	nla_parse(attrs_peer, OVPN_A_PEER_MAX, nla_data(attrs[OVPN_A_PEER]),
+		  nla_len(attrs[OVPN_A_PEER]), NULL);
+
+	if (attrs_peer[OVPN_A_PEER_ID])
+		fprintf(stderr, "* Peer %u\n",
+			nla_get_u32(attrs_peer[OVPN_A_PEER_ID]));
+
+	if (attrs_peer[OVPN_A_PEER_VPN_IPV4]) {
+		char buf[INET_ADDRSTRLEN];
+		inet_ntop(AF_INET, nla_data(attrs_peer[OVPN_A_PEER_VPN_IPV4]), buf,
+			  sizeof(buf));
+		fprintf(stderr, "\tVPN IPv4: %s\n", buf);
+	}
+
+	if (attrs_peer[OVPN_A_PEER_VPN_IPV6]) {
+		char buf[INET6_ADDRSTRLEN];
+		inet_ntop(AF_INET6, nla_data(attrs_peer[OVPN_A_PEER_VPN_IPV6]), buf,
+			  sizeof(buf));
+		fprintf(stderr, "\tVPN IPv6: %s\n", buf);
+	}
+
+	if (attrs_peer[OVPN_A_PEER_LOCAL_PORT])
+		port = ntohs(nla_get_u16(attrs_peer[OVPN_A_PEER_LOCAL_PORT]));
+
+	if (attrs_peer[OVPN_A_PEER_SOCKADDR_REMOTE]) {
+		struct sockaddr_storage ss;
+		struct sockaddr_in6 *in6 = (struct sockaddr_in6 *)&ss;
+		struct sockaddr_in *in = (struct sockaddr_in *)&ss;
+
+		memcpy(&ss, nla_data(attrs_peer[OVPN_A_PEER_SOCKADDR_REMOTE]),
+		       nla_len(attrs_peer[OVPN_A_PEER_SOCKADDR_REMOTE]));
+
+		if (in->sin_family == AF_INET) {
+			char buf[INET_ADDRSTRLEN];
+
+			if (attrs_peer[OVPN_A_PEER_LOCAL_IP]) {
+				inet_ntop(AF_INET,
+					  nla_data(attrs_peer[OVPN_A_PEER_LOCAL_IP]),
+					  buf, sizeof(buf));
+				fprintf(stderr, "\tLocal: %s:%hu\n", buf, port);
+			}
+
+			inet_ntop(AF_INET, &in->sin_addr, buf, sizeof(buf));
+			fprintf(stderr, "\tRemote: %s:%u\n", buf, ntohs(in->sin_port));
+		} else if (in->sin_family == AF_INET6) {
+			char buf[INET6_ADDRSTRLEN];
+
+			if (attrs_peer[OVPN_A_PEER_LOCAL_IP]) {
+				inet_ntop(AF_INET6,
+					  nla_data(attrs_peer[OVPN_A_PEER_LOCAL_IP]),
+					  buf, sizeof(buf));
+				fprintf(stderr, "\tLocal: %s\n", buf);
+			}
+
+			inet_ntop(AF_INET6, &in6->sin6_addr, buf, sizeof(buf));
+			fprintf(stderr, "\tRemote: %s:%u (scope-id: %u)\n", buf,
+				ntohs(in6->sin6_port), ntohl(in6->sin6_scope_id));
+		}
+	}
+
+	if (attrs_peer[OVPN_A_PEER_KEEPALIVE_INTERVAL])
+		fprintf(stderr, "\tKeepalive interval: %u sec\n",
+			nla_get_u32(attrs_peer[OVPN_A_PEER_KEEPALIVE_INTERVAL]));
+
+	if (attrs_peer[OVPN_A_PEER_KEEPALIVE_TIMEOUT])
+		fprintf(stderr, "\tKeepalive timeout: %u sec\n",
+			nla_get_u32(attrs_peer[OVPN_A_PEER_KEEPALIVE_TIMEOUT]));
+
+	if (attrs_peer[OVPN_A_PEER_VPN_RX_BYTES])
+		fprintf(stderr, "\tVPN RX bytes: %" PRIu64 "\n",
+			nla_get_u64(attrs_peer[OVPN_A_PEER_VPN_RX_BYTES]));
+
+	if (attrs_peer[OVPN_A_PEER_VPN_TX_BYTES])
+		fprintf(stderr, "\tVPN TX bytes: %" PRIu64 "\n",
+			nla_get_u64(attrs_peer[OVPN_A_PEER_VPN_TX_BYTES]));
+
+	if (attrs_peer[OVPN_A_PEER_VPN_RX_PACKETS])
+		fprintf(stderr, "\tVPN RX packets: %u\n",
+			nla_get_u32(attrs_peer[OVPN_A_PEER_VPN_RX_PACKETS]));
+
+	if (attrs_peer[OVPN_A_PEER_VPN_TX_PACKETS])
+		fprintf(stderr, "\tVPN TX packets: %u\n",
+			nla_get_u32(attrs_peer[OVPN_A_PEER_VPN_TX_PACKETS]));
+
+	if (attrs_peer[OVPN_A_PEER_LINK_RX_BYTES])
+		fprintf(stderr, "\tLINK RX bytes: %" PRIu64 "\n",
+			nla_get_u64(attrs_peer[OVPN_A_PEER_LINK_RX_BYTES]));
+
+	if (attrs_peer[OVPN_A_PEER_LINK_TX_BYTES])
+		fprintf(stderr, "\tLINK TX bytes: %" PRIu64 "\n",
+			nla_get_u64(attrs_peer[OVPN_A_PEER_LINK_TX_BYTES]));
+
+	if (attrs_peer[OVPN_A_PEER_LINK_RX_PACKETS])
+		fprintf(stderr, "\tLINK RX packets: %u\n",
+			nla_get_u32(attrs_peer[OVPN_A_PEER_LINK_RX_PACKETS]));
+
+	if (attrs_peer[OVPN_A_PEER_LINK_TX_PACKETS])
+		fprintf(stderr, "\tLINK TX packets: %u\n",
+			nla_get_u32(attrs_peer[OVPN_A_PEER_LINK_TX_PACKETS]));
+
+	return NL_SKIP;
+}
+
+static int ovpn_get_peer(struct ovpn_ctx *ovpn)
+{
+	int flags = 0, ret = -1;
+	struct nlattr *attr;
+	struct nl_ctx *ctx;
+
+	if (ovpn->peer_id == PEER_ID_UNDEF)
+		flags = NLM_F_DUMP;
+
+	ctx = nl_ctx_alloc_flags(ovpn, OVPN_CMD_GET_PEER, flags);
+	if (!ctx)
+		return -ENOMEM;
+
+	if (ovpn->peer_id != PEER_ID_UNDEF) {
+		attr = nla_nest_start(ctx->nl_msg, OVPN_A_PEER);
+		NLA_PUT_U32(ctx->nl_msg, OVPN_A_PEER_ID, ovpn->peer_id);
+		nla_nest_end(ctx->nl_msg, attr);
+	}
+
+	ret = ovpn_nl_msg_send(ctx, ovpn_handle_peer);
+nla_put_failure:
+	nl_ctx_free(ctx);
+	return ret;
+}
+
+static int ovpn_new_key(struct ovpn_ctx *ovpn)
+{
+	struct nlattr *peer, *keyconf, *key_dir;
+	struct nl_ctx *ctx;
+	int ret = -1;
+
+	ctx = nl_ctx_alloc(ovpn, OVPN_CMD_SET_KEY);
+	if (!ctx)
+		return -ENOMEM;
+
+	peer = nla_nest_start(ctx->nl_msg, OVPN_A_PEER);
+	NLA_PUT_U32(ctx->nl_msg, OVPN_A_PEER_ID, ovpn->peer_id);
+
+	keyconf = nla_nest_start(ctx->nl_msg, OVPN_A_PEER_KEYCONF);
+	NLA_PUT_U32(ctx->nl_msg, OVPN_A_KEYCONF_SLOT, OVPN_KEY_SLOT_PRIMARY);
+	NLA_PUT_U32(ctx->nl_msg, OVPN_A_KEYCONF_KEY_ID, 0);
+	NLA_PUT_U32(ctx->nl_msg, OVPN_A_KEYCONF_CIPHER_ALG, ovpn->cipher);
+
+	key_dir = nla_nest_start(ctx->nl_msg, OVPN_A_KEYCONF_ENCRYPT_DIR);
+	NLA_PUT(ctx->nl_msg, OVPN_A_KEYDIR_CIPHER_KEY, KEY_LEN, ovpn->key_enc);
+	NLA_PUT(ctx->nl_msg, OVPN_A_KEYDIR_NONCE_TAIL, NONCE_LEN, ovpn->nonce);
+	nla_nest_end(ctx->nl_msg, key_dir);
+
+	key_dir = nla_nest_start(ctx->nl_msg, OVPN_A_KEYCONF_DECRYPT_DIR);
+	NLA_PUT(ctx->nl_msg, OVPN_A_KEYDIR_CIPHER_KEY, KEY_LEN, ovpn->key_dec);
+	NLA_PUT(ctx->nl_msg, OVPN_A_KEYDIR_NONCE_TAIL, NONCE_LEN, ovpn->nonce);
+	nla_nest_end(ctx->nl_msg, key_dir);
+
+	nla_nest_end(ctx->nl_msg, keyconf);
+
+	nla_nest_end(ctx->nl_msg, peer);
+
+	ret = ovpn_nl_msg_send(ctx, NULL);
+nla_put_failure:
+	nl_ctx_free(ctx);
+	return ret;
+}
+
+static int ovpn_del_key(struct ovpn_ctx *ovpn)
+{
+	struct nlattr *peer, *keyconf;
+	struct nl_ctx *ctx;
+	int ret = -1;
+
+	ctx = nl_ctx_alloc(ovpn, OVPN_CMD_DEL_KEY);
+	if (!ctx)
+		return -ENOMEM;
+
+	peer = nla_nest_start(ctx->nl_msg, OVPN_A_PEER);
+	NLA_PUT_U32(ctx->nl_msg, OVPN_A_PEER_ID, ovpn->peer_id);
+
+	keyconf = nla_nest_start(ctx->nl_msg, OVPN_A_PEER_KEYCONF);
+	NLA_PUT_U32(ctx->nl_msg, OVPN_A_KEYCONF_SLOT, OVPN_KEY_SLOT_PRIMARY);
+	nla_nest_end(ctx->nl_msg, keyconf);
+
+	nla_nest_end(ctx->nl_msg, peer);
+
+	ret = ovpn_nl_msg_send(ctx, NULL);
+nla_put_failure:
+	nl_ctx_free(ctx);
+	return ret;
+}
+
+static int ovpn_swap_keys(struct ovpn_ctx *ovpn)
+{
+	struct nlattr *peer;
+	struct nl_ctx *ctx;
+	int ret = -1;
+
+	ctx = nl_ctx_alloc(ovpn, OVPN_CMD_SWAP_KEYS);
+	if (!ctx)
+		return -ENOMEM;
+
+	peer = nla_nest_start(ctx->nl_msg, OVPN_A_PEER);
+	NLA_PUT_U32(ctx->nl_msg, OVPN_A_PEER_ID, ovpn->peer_id);
+	nla_nest_end(ctx->nl_msg, peer);
+
+	ret = ovpn_nl_msg_send(ctx, NULL);
+nla_put_failure:
+	nl_ctx_free(ctx);
+	return ret;
+}
+
+static int ovpn_handle_iface(struct nl_msg *msg, void *arg)
+{
+	struct genlmsghdr *gnlh = nlmsg_data(nlmsg_hdr(msg));
+	struct nlattr *attrs[OVPN_A_MAX + 1];
+
+	nla_parse(attrs, OVPN_A_MAX, genlmsg_attrdata(gnlh, 0),
+		  genlmsg_attrlen(gnlh, 0), NULL);
+
+	if (!attrs[OVPN_A_IFNAME]) {
+		fprintf(stderr, "no ifname in netlink message\n");
+		return NL_SKIP;
+	}
+
+	fprintf(stderr, "Created ifname: %s\n",
+		(char *)nla_data(attrs[OVPN_A_IFNAME]));
+
+	return NL_SKIP;
+}
+
+static int ovpn_new_iface(struct ovpn_ctx *ovpn)
+{
+	struct nl_ctx *ctx;
+	int ret = -1;
+
+	ctx = nl_ctx_alloc(ovpn, OVPN_CMD_NEW_IFACE);
+	if (!ctx)
+		return -ENOMEM;
+
+	NLA_PUT(ctx->nl_msg, OVPN_A_IFNAME, strlen(ovpn->ifname) + 1,
+		ovpn->ifname);
+
+	if (ovpn->mode_set)
+		NLA_PUT_U32(ctx->nl_msg, OVPN_A_MODE, ovpn->mode);
+
+	fprintf(stdout, "Creating interface %s with mode %u\n", ovpn->ifname,
+		ovpn->mode);
+
+	ret = ovpn_nl_msg_send(ctx, ovpn_handle_iface);
+nla_put_failure:
+	nl_ctx_free(ctx);
+	return ret;
+}
+
+static int ovpn_del_iface(struct ovpn_ctx *ovpn)
+{
+	struct nl_ctx *ctx;
+	int ret = -1;
+
+	ctx = nl_ctx_alloc(ovpn, OVPN_CMD_DEL_IFACE);
+	if (!ctx)
+		return -ENOMEM;
+
+	ret = ovpn_nl_msg_send(ctx, NULL);
+	nl_ctx_free(ctx);
+	return ret;
+}
+
+static int nl_seq_check(struct nl_msg *msg, void *arg)
+{
+	return NL_OK;
+}
+
+struct mcast_handler_args {
+	const char *group;
+	int id;
+};
+
+static int mcast_family_handler(struct nl_msg *msg, void *arg)
+{
+	struct mcast_handler_args *grp = arg;
+	struct nlattr *tb[CTRL_ATTR_MAX + 1];
+	struct genlmsghdr *gnlh = nlmsg_data(nlmsg_hdr(msg));
+	struct nlattr *mcgrp;
+	int rem_mcgrp;
+
+	nla_parse(tb, CTRL_ATTR_MAX, genlmsg_attrdata(gnlh, 0),
+		  genlmsg_attrlen(gnlh, 0), NULL);
+
+	if (!tb[CTRL_ATTR_MCAST_GROUPS])
+		return NL_SKIP;
+
+	nla_for_each_nested(mcgrp, tb[CTRL_ATTR_MCAST_GROUPS], rem_mcgrp) {
+		struct nlattr *tb_mcgrp[CTRL_ATTR_MCAST_GRP_MAX + 1];
+
+		nla_parse(tb_mcgrp, CTRL_ATTR_MCAST_GRP_MAX,
+			  nla_data(mcgrp), nla_len(mcgrp), NULL);
+
+		if (!tb_mcgrp[CTRL_ATTR_MCAST_GRP_NAME] ||
+		    !tb_mcgrp[CTRL_ATTR_MCAST_GRP_ID])
+			continue;
+		if (strncmp(nla_data(tb_mcgrp[CTRL_ATTR_MCAST_GRP_NAME]),
+			    grp->group, nla_len(tb_mcgrp[CTRL_ATTR_MCAST_GRP_NAME])))
+			continue;
+		grp->id = nla_get_u32(tb_mcgrp[CTRL_ATTR_MCAST_GRP_ID]);
+		break;
+	}
+
+	return NL_SKIP;
+}
+
+static int mcast_error_handler(struct sockaddr_nl *nla, struct nlmsgerr *err,
+			       void *arg)
+{
+	int *ret = arg;
+
+	*ret = err->error;
+	return NL_STOP;
+}
+
+static int mcast_ack_handler(struct nl_msg *msg, void *arg)
+{
+	int *ret = arg;
+
+	*ret = 0;
+	return NL_STOP;
+}
+
+static int ovpn_handle_msg(struct nl_msg *msg, void *arg)
+{
+	struct genlmsghdr *gnlh = nlmsg_data(nlmsg_hdr(msg));
+	struct nlattr *attrs[OVPN_A_MAX + 1];
+	struct nlmsghdr *nlh = nlmsg_hdr(msg);
+	//enum ovpn_del_peer_reason reason;
+	char ifname[IF_NAMESIZE];
+	__u32 ifindex;
+
+	fprintf(stderr, "received message from ovpn-dco\n");
+
+	if (!genlmsg_valid_hdr(nlh, 0)) {
+		fprintf(stderr, "invalid header\n");
+		return NL_STOP;
+	}
+
+	if (nla_parse(attrs, OVPN_A_MAX, genlmsg_attrdata(gnlh, 0),
+		      genlmsg_attrlen(gnlh, 0), NULL)) {
+		fprintf(stderr, "received bogus data from ovpn-dco\n");
+		return NL_STOP;
+	}
+
+	if (!attrs[OVPN_A_IFINDEX]) {
+		fprintf(stderr, "no ifindex in this message\n");
+		return NL_STOP;
+	}
+
+	ifindex = nla_get_u32(attrs[OVPN_A_IFINDEX]);
+	if (!if_indextoname(ifindex, ifname)) {
+		fprintf(stderr, "cannot resolve ifname for ifindex: %u\n",
+			ifindex);
+		return NL_STOP;
+	}
+
+	switch (gnlh->cmd) {
+	case OVPN_CMD_DEL_PEER:
+		/*if (!attrs[OVPN_A_DEL_PEER_REASON]) {
+			fprintf(stderr, "no reason in DEL_PEER message\n");
+			return NL_STOP;
+		}
+		reason = nla_get_u8(attrs[OVPN_A_DEL_PEER_REASON]);
+		fprintf(stderr,
+			"received CMD_DEL_PEER, ifname: %s reason: %d\n",
+			ifname, reason);
+		*/
+		fprintf(stdout, "received CMD_DEL_PEER\n");
+		break;
+	default:
+		fprintf(stderr, "received unknown command: %d\n", gnlh->cmd);
+		return NL_STOP;
+	}
+
+	return NL_OK;
+}
+
+static int ovpn_get_mcast_id(struct nl_sock *sock, const char *family,
+			     const char *group)
+{
+	struct nl_msg *msg;
+	struct nl_cb *cb;
+	int ret, ctrlid;
+	struct mcast_handler_args grp = {
+		.group = group,
+		.id = -ENOENT,
+	};
+
+	msg = nlmsg_alloc();
+	if (!msg)
+		return -ENOMEM;
+
+	cb = nl_cb_alloc(NL_CB_DEFAULT);
+	if (!cb) {
+		ret = -ENOMEM;
+		goto out_fail_cb;
+	}
+
+	ctrlid = genl_ctrl_resolve(sock, "nlctrl");
+
+	genlmsg_put(msg, 0, 0, ctrlid, 0, 0, CTRL_CMD_GETFAMILY, 0);
+
+	ret = -ENOBUFS;
+	NLA_PUT_STRING(msg, CTRL_ATTR_FAMILY_NAME, family);
+
+	ret = nl_send_auto_complete(sock, msg);
+	if (ret < 0)
+		goto nla_put_failure;
+
+	ret = 1;
+
+	nl_cb_err(cb, NL_CB_CUSTOM, mcast_error_handler, &ret);
+	nl_cb_set(cb, NL_CB_ACK, NL_CB_CUSTOM, mcast_ack_handler, &ret);
+	nl_cb_set(cb, NL_CB_VALID, NL_CB_CUSTOM, mcast_family_handler, &grp);
+
+	while (ret > 0)
+		nl_recvmsgs(sock, cb);
+
+	if (ret == 0)
+		ret = grp.id;
+ nla_put_failure:
+	nl_cb_put(cb);
+ out_fail_cb:
+	nlmsg_free(msg);
+	return ret;
+}
+
+static void ovpn_listen_mcast(void)
+{
+	struct nl_sock *sock;
+	struct nl_cb *cb;
+	int mcid, ret;
+
+	sock = nl_socket_alloc();
+	if (!sock) {
+		fprintf(stderr, "cannot allocate netlink socket\n");
+		goto err_free;
+	}
+
+	nl_socket_set_buffer_size(sock, 8192, 8192);
+
+	ret = genl_connect(sock);
+	if (ret < 0) {
+		fprintf(stderr, "cannot connect to generic netlink: %s\n",
+			nl_geterror(ret));
+		goto err_free;
+	}
+
+	mcid = ovpn_get_mcast_id(sock, OVPN_FAMILY_NAME, OVPN_MCGRP_PEERS);
+	if (mcid < 0) {
+		fprintf(stderr, "cannot get mcast group: %s\n",
+			nl_geterror(mcid));
+		goto err_free;
+	}
+
+	ret = nl_socket_add_membership(sock, mcid);
+	if (ret) {
+		fprintf(stderr, "failed to join mcast group: %d\n", ret);
+		goto err_free;
+	}
+
+	ret = 0;
+	cb = nl_cb_alloc(NL_CB_DEFAULT);
+	nl_cb_set(cb, NL_CB_SEQ_CHECK, NL_CB_CUSTOM, nl_seq_check, NULL);
+	nl_cb_set(cb, NL_CB_VALID, NL_CB_CUSTOM, ovpn_handle_msg, &ret);
+	nl_cb_err(cb, NL_CB_CUSTOM, ovpn_nl_cb_error, &ret);
+
+	while (ret != -EINTR)
+		ret = nl_recvmsgs(sock, cb);
+
+	nl_cb_put(cb);
+err_free:
+	nl_socket_free(sock);
+}
+
+static void usage(const char *cmd)
+{
+	fprintf(stderr, "Error: invalid arguments.\n\n");
+	fprintf(stderr,
+		"Usage %s <iface> <connect|listen|new_peer|new_multi_peer|set_peer|del_peer|new_key|del_key|recv|send|listen_mcast> [arguments..]\n",
+		cmd);
+	fprintf(stderr, "\tiface: tun interface name\n\n");
+
+	fprintf(stderr, "* connect <peer_id> <raddr> <rport> <vpnaddr>: start connecting peer of TCP-based VPN session\n");
+	fprintf(stderr, "\tpeer-id: peer ID of the connecting peer\n");
+	fprintf(stderr, "\tremote-addr: peer IP address\n");
+	fprintf(stderr, "\tremote-port: peer TCP port\n");
+	fprintf(stderr, "\tvpn-ip: peer VPN IP\n\n");
+
+	fprintf(stderr, "* listen <lport> <peers_file>: listen for incoming peer TCP connections\n");
+	fprintf(stderr, "\tlport: src TCP port\n");
+	fprintf(stderr, "\tpeers_file: file containing one peer per line: Line format:\n");
+	fprintf(stderr, "\t\t<peer_id> <vpnaddr>\n\n");
+
+	fprintf(stderr, "* new_peer <lport> <peer-id> <raddr> <rport> <vpnaddr>: add new peer\n");
+	fprintf(stderr, "\tpeer-id: peer ID to be used in data packets to/from this peer\n");
+	fprintf(stderr, "\tlocal-port: local UDP port\n");
+	fprintf(stderr, "\tremote-addr: peer IP address\n");
+	fprintf(stderr, "\tremote-port: peer UDP port\n");
+	fprintf(stderr, "\tvpnaddr: peer VPN IP\n\n");
+
+	fprintf(stderr, "* new_multi_peer <lport> <file>: add multiple peers as listed in the file\n");
+	fprintf(stderr, "\tlport: local UDP port to bind to\n");
+	fprintf(stderr, "\tfile: text file containing one peer per line. Line format:\n");
+	fprintf(stderr, "\t\t<peer-id> <raddr> <rport> <vpnaddr>\n\n");
+
+	fprintf(stderr,
+		"* set_peer <peer-id> <keepalive_interval> <keepalive_timeout>: set peer attributes\n");
+	fprintf(stderr, "\tpeer-id: peer ID of the peer to modify\n");
+	fprintf(stderr,
+		"\tkeepalive_interval: interval for sending ping messages\n");
+	fprintf(stderr,
+		"\tkeepalive_timeout: time after which a peer is timed out\n\n");
+
+	fprintf(stderr, "* del_peer <peer-id>: delete peer\n");
+	fprintf(stderr, "\tpeer-id: peer ID of the peer to delete\n\n");
+
+	fprintf(stderr,
+		"* new_key <peer-id> <cipher> <key_dir> <key_file>: set data channel key\n");
+	fprintf(stderr, "\tpeer-id: peer ID of the peer to configure the key for\n");
+	fprintf(stderr,
+		"\tcipher: cipher to use, supported: aes (AES-GCM), chachapoly (CHACHA20POLY1305), none\n");
+	fprintf(stderr,
+		"\tkey_dir: key direction, must 0 on one host and 1 on the other\n");
+	fprintf(stderr, "\tkey_file: file containing the pre-shared key\n\n");
+
+	fprintf(stderr, "* del_key <peer-id>: erase existing data channel key\n");
+	fprintf(stderr, "\tpeer-id: peer ID of the peer to modify\n\n");
+
+	fprintf(stderr, "* swap_keys <peer-id>: swap primary and seconday key slots\n");
+	fprintf(stderr, "\tpeer-id: peer ID of the peer to modify\n\n");
+
+	fprintf(stderr, "* listen_mcast: listen to ovpn-dco netlink multicast messages\n");
+}
+
+static int ovpn_parse_remote(struct ovpn_ctx *ovpn, const char *host, const char *service,
+			     const char *vpn_addr)
+{
+	int ret;
+	struct addrinfo *result;
+	struct addrinfo hints = {
+		.ai_family = ovpn->sa_family,
+		.ai_socktype = SOCK_DGRAM,
+		.ai_protocol = IPPROTO_UDP
+	};
+
+	if (host) {
+		ret = getaddrinfo(host, service, &hints, &result);
+		if (ret == EAI_NONAME || ret == EAI_FAIL)
+			return -1;
+
+		if (!(result->ai_family == AF_INET && result->ai_addrlen == sizeof(struct sockaddr_in)) &&
+		    !(result->ai_family == AF_INET6 && result->ai_addrlen == sizeof(struct sockaddr_in6))) {
+			ret = -EINVAL;
+			goto out;
+		}
+
+		memcpy(&ovpn->remote, result->ai_addr, result->ai_addrlen);
+	}
+
+	ret = getaddrinfo(vpn_addr, NULL, &hints, &result);
+	if (ret == EAI_NONAME || ret == EAI_FAIL)
+		return -1;
+
+	if (!(result->ai_family == AF_INET && result->ai_addrlen == sizeof(struct sockaddr_in)) &&
+	    !(result->ai_family == AF_INET6 && result->ai_addrlen == sizeof(struct sockaddr_in6))) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	memcpy(&ovpn->peer_ip, result->ai_addr, result->ai_addrlen);
+	ovpn->sa_family = result->ai_family;
+
+	ret = 0;
+out:
+	freeaddrinfo(result);
+	return ret;
+}
+
+static int ovpn_parse_new_peer(struct ovpn_ctx *ovpn, const char *peer_id, const char *raddr,
+			       const char *rport, const char *vpnip)
+{
+	ovpn->peer_id = strtoul(peer_id, NULL, 10);
+	if (errno == ERANGE) {
+		fprintf(stderr, "peer ID value out of range\n");
+		return -1;
+	}
+
+	return ovpn_parse_remote(ovpn, raddr, rport, vpnip);
+}
+
+static int ovpn_parse_set_peer(struct ovpn_ctx *ovpn, int argc, char *argv[])
+{
+	if (argc < 5) {
+		usage(argv[0]);
+		return -1;
+	}
+
+	ovpn->keepalive_interval = strtoul(argv[3], NULL, 10);
+	if (errno == ERANGE) {
+		fprintf(stderr, "keepalive interval value out of range\n");
+		return -1;
+	}
+
+	ovpn->keepalive_timeout = strtoul(argv[4], NULL, 10);
+	if (errno == ERANGE) {
+		fprintf(stderr, "keepalive interval value out of range\n");
+		return -1;
+	}
+
+	return 0;
+}
+
+int main(int argc, char *argv[])
+{
+	struct ovpn_ctx ovpn;
+//	struct nl_ctx *ctx;
+	int ret;
+
+	if (argc < 2) {
+		usage(argv[0]);
+		return -1;
+	}
+
+	memset(&ovpn, 0, sizeof(ovpn));
+	ovpn.sa_family = AF_INET;
+
+	if (argc > 2) {
+		strncpy(ovpn.ifname, argv[2], IFNAMSIZ - 1);
+		ovpn.ifname[IFNAMSIZ - 1] = '\0';
+	}
+
+	/* all commands except new_iface expect a valid ifindex */
+	if (strcmp(argv[1], "new_iface")) {
+		/* in this case a ifname MUST be defined */
+		if (argc < 3) {
+			usage(argv[0]);
+			return -1;
+		}
+
+		ovpn.ifindex = if_nametoindex(ovpn.ifname);
+		if (!ovpn.ifindex) {
+			fprintf(stderr, "cannot find interface: %s\n",
+				strerror(errno));
+			return -1;
+		}
+	}
+
+	if (!strcmp(argv[1], "new_iface")) {
+		if (argc > 3) {
+			if (!strcmp(argv[3], "P2P")) {
+				ovpn.mode = OVPN_MODE_P2P;
+			} else if (!strcmp(argv[3], "MP")) {
+				ovpn.mode = OVPN_MODE_MP;
+			} else {
+				fprintf(stderr, "Cannot parse iface mode: %s\n",
+					argv[3]);
+				return -1;
+			}
+			ovpn.mode_set = true;
+		}
+
+		ret = ovpn_new_iface(&ovpn);
+		if (ret < 0) {
+			fprintf(stderr, "Cannot create interface %s: %d\n",
+				ovpn.ifname, ret);
+			return -1;
+		}
+	}else if (!strcmp(argv[1], "del_iface")) {
+		ret = ovpn_del_iface(&ovpn);
+		if (ret < 0) {
+			fprintf(stderr, "Cannot delete interface %s: %d\n",
+				ovpn.ifname, ret);
+			return -1;
+		}
+	} else if (!strcmp(argv[1], "listen")) {
+		char peer_id[10], vpnip[100];
+		int n;
+		FILE *fp;
+
+		if (argc < 4) {
+			usage(argv[0]);
+			return -1;
+		}
+
+		ovpn.lport = strtoul(argv[3], NULL, 10);
+		if (errno == ERANGE || ovpn.lport > 65535) {
+			fprintf(stderr, "lport value out of range\n");
+			return -1;
+		}
+
+		if (argc > 4 && !strcmp(argv[4], "ipv6"))
+			ovpn.sa_family = AF_INET6;
+
+		ret = ovpn_listen(&ovpn, ovpn.sa_family);
+		if (ret < 0) {
+			fprintf(stderr, "cannot listen on TCP socket\n");
+			return ret;
+		}
+
+		fp = fopen(argv[4], "r");
+		if (!fp) {
+			fprintf(stderr, "cannot open file: %s\n", argv[4]);
+			return -1;
+		}
+
+		while ((n = fscanf(fp, "%s %s\n", peer_id, vpnip)) == 2) {
+			struct ovpn_ctx peer_ctx = { 0 };
+
+			peer_ctx.ifindex = ovpn.ifindex;
+			peer_ctx.sa_family = ovpn.sa_family;
+
+			peer_ctx.socket = ovpn_accept(&ovpn);
+			if (peer_ctx.socket < 0) {
+				fprintf(stderr, "cannot accept connection!\n");
+				return -1;
+			}
+
+			ret = ovpn_parse_new_peer(&peer_ctx, peer_id, NULL, NULL, vpnip);
+			if (ret < 0) {
+				fprintf(stderr, "error while parsing line\n");
+				return -1;
+			}
+
+			ret = ovpn_new_peer(&peer_ctx, true);
+			if (ret < 0) {
+				fprintf(stderr, "cannot add peer to VPN: %s %s\n", peer_id, vpnip);
+				return ret;
+			}
+		}
+	} else if (!strcmp(argv[1], "connect")) {
+		if (argc < 6) {
+			usage(argv[0]);
+			return -1;
+		}
+
+		ovpn.sa_family = AF_INET;
+
+		ret = ovpn_parse_new_peer(&ovpn, argv[3], argv[4], argv[5], argv[6]);
+		if (ret < 0) {
+			fprintf(stderr, "Cannot parse remote peer data\n");
+			return ret;
+		}
+
+		ret = ovpn_connect(&ovpn);
+		if (ret < 0) {
+			fprintf(stderr, "cannot connect TCP socket\n");
+			return ret;
+		}
+
+		ret = ovpn_new_peer(&ovpn, true);
+		if (ret < 0) {
+			fprintf(stderr, "cannot add peer to VPN\n");
+			close(ovpn.socket);
+			return ret;
+		}
+	} else if (!strcmp(argv[1], "new_peer")) {
+		if (argc < 8) {
+			usage(argv[0]);
+			return -1;
+		}
+
+		ovpn.lport = strtoul(argv[3], NULL, 10);
+		if (errno == ERANGE || ovpn.lport > 65535) {
+			fprintf(stderr, "lport value out of range\n");
+			return -1;
+		}
+
+		ret = ovpn_parse_new_peer(&ovpn, argv[4], argv[5], argv[6], argv[7]);
+		if (ret < 0)
+			return ret;
+
+		ret = ovpn_udp_socket(&ovpn, AF_INET6);//ovpn.sa_family);
+		if (ret < 0)
+			return ret;
+
+		ret = ovpn_new_peer(&ovpn, false);
+		if (ret < 0) {
+			fprintf(stderr, "cannot add peer to VPN\n");
+			return ret;
+		}
+	} else if (!strcmp(argv[1], "new_multi_peer")) {
+		char peer_id[10], raddr[128], rport[10], vpnip[100];
+		FILE *fp;
+		int n;
+
+		if (argc < 5) {
+			usage(argv[0]);
+			return -1;
+		}
+
+		ovpn.lport = strtoul(argv[3], NULL, 10);
+		if (errno == ERANGE || ovpn.lport > 65535) {
+			fprintf(stderr, "lport value out of range\n");
+			return -1;
+		}
+
+		fp = fopen(argv[4], "r");
+		if (!fp) {
+			fprintf(stderr, "cannot open file: %s\n", argv[4]);
+			return -1;
+		}
+
+		ret = ovpn_udp_socket(&ovpn, AF_INET6);
+		if (ret < 0)
+			return ret;
+
+		while ((n = fscanf(fp, "%s %s %s %s\n", peer_id, raddr, rport, vpnip)) == 4) {
+			struct ovpn_ctx peer_ctx = { 0 };
+
+			peer_ctx.ifindex = ovpn.ifindex;
+			peer_ctx.socket = ovpn.socket;
+			peer_ctx.sa_family = AF_UNSPEC;
+
+			ret = ovpn_parse_new_peer(&peer_ctx, peer_id, raddr, rport, vpnip);
+			if (ret < 0) {
+				fprintf(stderr, "error while parsing line\n");
+				return -1;
+			}
+
+			ret = ovpn_new_peer(&peer_ctx, false);
+			if (ret < 0) {
+				fprintf(stderr, "cannot add peer to VPN: %s %s %s %s\n", peer_id,
+					raddr, rport, vpnip);
+				return ret;
+			}
+		}
+	} else if (!strcmp(argv[1], "set_peer")) {
+		ovpn.peer_id = strtoul(argv[3], NULL, 10);
+		if (errno == ERANGE) {
+			fprintf(stderr, "peer ID value out of range\n");
+			return -1;
+		}
+
+		argv++;
+		argc--;
+
+		ret = ovpn_parse_set_peer(&ovpn, argc, argv);
+		if (ret < 0)
+			return ret;
+
+		ret = ovpn_set_peer(&ovpn);
+		if (ret < 0) {
+			fprintf(stderr, "cannot set peer to VPN\n");
+			return ret;
+		}
+	} else if (!strcmp(argv[1], "del_peer")) {
+		if (argc < 4) {
+			usage(argv[0]);
+			return -1;
+		}
+
+		ovpn.peer_id = strtoul(argv[3], NULL, 10);
+		if (errno == ERANGE) {
+			fprintf(stderr, "peer ID value out of range\n");
+			return -1;
+		}
+
+		ret = ovpn_del_peer(&ovpn);
+		if (ret < 0) {
+			fprintf(stderr, "cannot delete peer to VPN\n");
+			return ret;
+		}
+	} else if (!strcmp(argv[1], "get_peer")) {
+		ovpn.peer_id = PEER_ID_UNDEF;
+		if (argc > 3)
+			ovpn.peer_id = strtoul(argv[3], NULL, 10);
+
+		fprintf(stderr, "List of peers connected to: %s\n",
+			ovpn.ifname);
+
+		ret = ovpn_get_peer(&ovpn);
+		if (ret < 0) {
+			fprintf(stderr, "cannot get peer(s): %d\n", ret);
+			return ret;
+		}
+	} else if (!strcmp(argv[1], "new_key")) {
+		if (argc < 6) {
+			usage(argv[0]);
+			return -1;
+		}
+
+		ovpn.peer_id = strtoul(argv[3], NULL, 10);
+		if (errno == ERANGE) {
+			fprintf(stderr, "peer ID value out of range\n");
+			return -1;
+		}
+
+		ret = ovpn_read_cipher(argv[4], &ovpn);
+		if (ret < 0)
+			return ret;
+
+		ret = ovpn_read_key_direction(argv[5], &ovpn);
+		if (ret < 0)
+			return ret;
+
+		ret = ovpn_read_key(argv[6], &ovpn);
+		if (ret)
+			return ret;
+
+		ret = ovpn_new_key(&ovpn);
+		if (ret < 0) {
+			fprintf(stderr, "cannot set key\n");
+			return ret;
+		}
+	} else if (!strcmp(argv[1], "del_key")) {
+		if (argc < 3) {
+			usage(argv[0]);
+			return -1;
+		}
+
+		ovpn.peer_id = strtoul(argv[3], NULL, 10);
+		if (errno == ERANGE) {
+			fprintf(stderr, "peer ID value out of range\n");
+			return -1;
+		}
+
+		argv++;
+		argc--;
+
+		ret = ovpn_del_key(&ovpn);
+		if (ret < 0) {
+			fprintf(stderr, "cannot delete key\n");
+			return ret;
+		}
+	} else if (!strcmp(argv[1], "swap_keys")) {
+		if (argc < 3) {
+			usage(argv[0]);
+			return -1;
+		}
+
+		ovpn.peer_id = strtoul(argv[3], NULL, 10);
+		if (errno == ERANGE) {
+			fprintf(stderr, "peer ID value out of range\n");
+			return -1;
+		}
+
+		argv++;
+		argc--;
+
+		ret = ovpn_swap_keys(&ovpn);
+		if (ret < 0) {
+			fprintf(stderr, "cannot swap keys\n");
+			return ret;
+		}
+	} else if (!strcmp(argv[1], "listen_mcast")) {
+		ovpn_listen_mcast();
+	} else {
+		usage(argv[0]);
+		return -1;
+	}
+
+	return ret;
+}
diff --git a/tools/testing/selftests/ovpn/run.sh b/tools/testing/selftests/ovpn/run.sh
new file mode 100644
index 000000000000..065d3dea34bf
--- /dev/null
+++ b/tools/testing/selftests/ovpn/run.sh
@@ -0,0 +1,12 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0+ OR MIT
+# Copyright (C) 2020-2024 OpenVPN, Inc.
+#
+#  Author:	Antonio Quartulli <antonio@openvpn.net>
+
+set -e
+
+./netns-test.sh
+./netns-test.sh -t
+./float-test.sh
+
diff --git a/tools/testing/selftests/ovpn/tcp_peers.txt b/tools/testing/selftests/ovpn/tcp_peers.txt
new file mode 100644
index 000000000000..3b7f68bb7f64
--- /dev/null
+++ b/tools/testing/selftests/ovpn/tcp_peers.txt
@@ -0,0 +1 @@
+1 5.5.5.2
diff --git a/tools/testing/selftests/ovpn/udp_peers.txt b/tools/testing/selftests/ovpn/udp_peers.txt
new file mode 100644
index 000000000000..32f14bd9347a
--- /dev/null
+++ b/tools/testing/selftests/ovpn/udp_peers.txt
@@ -0,0 +1,5 @@
+1 10.10.1.2 1 5.5.5.2
+2 10.10.2.2 1 5.5.5.3
+3 10.10.3.2 1 5.5.5.4
+4 10.10.4.2 1 5.5.5.5
+5 10.10.5.2 1 5.5.5.6
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 00/24] Introducing OpenVPN Data Channel Offload
  2024-05-06  1:16 [PATCH net-next v3 00/24] Introducing OpenVPN Data Channel Offload Antonio Quartulli
                   ` (23 preceding siblings ...)
  2024-05-06  1:16 ` [PATCH net-next v3 24/24] testing/selftest: add test tool and scripts for ovpn module Antonio Quartulli
@ 2024-05-07 23:48 ` Jakub Kicinski
  2024-05-08  9:56   ` Antonio Quartulli
  24 siblings, 1 reply; 111+ messages in thread
From: Jakub Kicinski @ 2024-05-07 23:48 UTC (permalink / raw
  To: Antonio Quartulli
  Cc: netdev, Sergey Ryazanov, Paolo Abeni, Eric Dumazet, Andrew Lunn,
	Esben Haabendal

On Mon,  6 May 2024 03:16:13 +0200 Antonio Quartulli wrote:
> I am finally back with version 3 of the ovpn patchset.
> It took a while to address all comments I have received on v2, but I
> am happy to say that I addressed 99% of the feedback I collected.

Nice, one more check / warning that pops up is missing kdoc.
W=1 build only catches kdoc problems in C sources, for headers
try running something like:

./scripts/kernel-doc -none -Wall $new_files
-- 
pw-bot: cr

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 24/24] testing/selftest: add test tool and scripts for ovpn module
  2024-05-06  1:16 ` [PATCH net-next v3 24/24] testing/selftest: add test tool and scripts for ovpn module Antonio Quartulli
@ 2024-05-07 23:55   ` Jakub Kicinski
  2024-05-08  9:51     ` Antonio Quartulli
  0 siblings, 1 reply; 111+ messages in thread
From: Jakub Kicinski @ 2024-05-07 23:55 UTC (permalink / raw
  To: Antonio Quartulli
  Cc: netdev, Sergey Ryazanov, Paolo Abeni, Eric Dumazet, Andrew Lunn,
	Esben Haabendal

On Mon,  6 May 2024 03:16:37 +0200 Antonio Quartulli wrote:
> +CFLAGS = -Wall -idirafter ../../../../include/uapi

This may end badly once the headers you're after also exist in system
paths. The guards in uapi/ are modified when header is installed.
It's better to -I../../../../usr/include/ and do "make headers"
before building tests.

> +CFLAGS += $(shell pkg-config --cflags libnl-3.0 libnl-genl-3.0)
> +
> +LDFLAGS = -lmbedtls -lmbedcrypto
> +LDFLAGS += $(shell pkg-config --libs libnl-3.0 libnl-genl-3.0)
> +
> +ovpn-cli: ovpn-cli.c
> +
> +TEST_PROGS = run.sh
> +TEST_GEN_PROGS_EXTENDED = ovpn-cli

TEST_GEN_FILES - it's not a test at all, AFAICT.

> +./netns-test.sh
> +./netns-test.sh -t
> +./float-test.sh
> +

nit: extra new line at the end

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 03/24] ovpn: add basic netlink support
  2024-05-06  1:16 ` [PATCH net-next v3 03/24] ovpn: add basic netlink support Antonio Quartulli
@ 2024-05-08  0:10   ` Jakub Kicinski
  2024-05-08  7:42     ` Antonio Quartulli
  2024-05-08 14:42   ` Sabrina Dubroca
  1 sibling, 1 reply; 111+ messages in thread
From: Jakub Kicinski @ 2024-05-08  0:10 UTC (permalink / raw
  To: Antonio Quartulli
  Cc: netdev, Sergey Ryazanov, Paolo Abeni, Eric Dumazet, Andrew Lunn,
	Esben Haabendal

On Mon,  6 May 2024 03:16:16 +0200 Antonio Quartulli wrote:
> +    name: nonce_tail_size

nit: typically we hyphenate the names in YAML and C codegen replaces
the hyphens with underscores (and converts to uppercase)

> +         exact-len: OVPN_NONCE_TAIL_SIZE

speaking of which - is the codegen buggy or this can be nonce_tail_size?
(or rather nonce-tail-size)

> +      -
> +        name: pad
> +        type: pad

You shouldn't need this, now that we have uint.
replace nla_put_u64_64bit() with nla_put_uint().
Unfortunately libnl hasn't caught up so you may need to open code 
the getter a little in user space CLI.

BTW I'd also bump the packet counters to uint.
Doesn't cost much if they don't grow > 32b and you never know..

> +        request:
> +          attributes:
> +            - ifname
> +            - mode
> +        reply:
> +          attributes:
> +            - ifname

The attribute lists 

> +	struct net_device *dev;
> +	int ifindex;
> +
> +	if (!attrs[OVPN_A_IFINDEX])

GENL_REQ_ATTR_CHECK()

> +		return ERR_PTR(-EINVAL);
> +
> +	ifindex = nla_get_u32(attrs[OVPN_A_IFINDEX]);
> +
> +	dev = dev_get_by_index(net, ifindex);
> +	if (!dev)
> +		return ERR_PTR(-ENODEV);
> +
> +	if (!ovpn_dev_is_valid(dev))
> +		goto err_put_dev;
> +
> +	return dev;
> +
> +err_put_dev:
> +	dev_put(dev);

NL_SET_BAD_ATTR(info->extack, ...[OVPN_A_IFINDEX])

> +	return ERR_PTR(-EINVAL);

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 04/24] ovpn: add basic interface creation/destruction/management routines
  2024-05-06  1:16 ` [PATCH net-next v3 04/24] ovpn: add basic interface creation/destruction/management routines Antonio Quartulli
@ 2024-05-08  0:18   ` Jakub Kicinski
  2024-05-08  7:53     ` Antonio Quartulli
  2024-05-08 14:52   ` Sabrina Dubroca
  1 sibling, 1 reply; 111+ messages in thread
From: Jakub Kicinski @ 2024-05-08  0:18 UTC (permalink / raw
  To: Antonio Quartulli
  Cc: netdev, Sergey Ryazanov, Paolo Abeni, Eric Dumazet, Andrew Lunn,
	Esben Haabendal

On Mon,  6 May 2024 03:16:17 +0200 Antonio Quartulli wrote:

> diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c
> index ad3813419c33..338e99dfe886 100644
> --- a/drivers/net/ovpn/io.c
> +++ b/drivers/net/ovpn/io.c
> @@ -11,6 +11,26 @@
>  #include <linux/skbuff.h>
>  
>  #include "io.h"
> +#include "ovpnstruct.h"
> +#include "netlink.h"
> +
> +int ovpn_struct_init(struct net_device *dev)
> +{
> +	struct ovpn_struct *ovpn = netdev_priv(dev);
> +	int err;
> +
> +	ovpn->dev = dev;
> +
> +	err = ovpn_nl_init(ovpn);
> +	if (err < 0)
> +		return err;
> +
> +	dev->tstats = netdev_alloc_pcpu_stats(struct pcpu_sw_netstats);

Set pcpu_stat_type, core will allocate for you

> +	if (!dev->tstats)
> +		return -ENOMEM;
> +
> +	return 0;
> +}

> +/**
> + * ovpn_struct_init - Initialize the netdevice private area
> + * @dev: the device to initialize
> + *
> + * Return: 0 on success or a negative error code otherwise
> + */
> +int ovpn_struct_init(struct net_device *dev);

Weak preference for kdoc to go with the implementation, not declaration.

> +static const struct net_device_ops ovpn_netdev_ops = {
> +	.ndo_open		= ovpn_net_open,
> +	.ndo_stop		= ovpn_net_stop,
> +	.ndo_start_xmit		= ovpn_net_xmit,
> +	.ndo_get_stats64        = dev_get_tstats64,

Core should count pcpu stats automatically

> +};
> +
>  bool ovpn_dev_is_valid(const struct net_device *dev)
>  {
>  	return dev->netdev_ops->ndo_start_xmit == ovpn_net_xmit;
>  }

> +	list_add(&ovpn->dev_list, &dev_list);
> +	rtnl_unlock();
> +
> +	/* turn carrier explicitly off after registration, this way state is
> +	 * clearly defined
> +	 */
> +	netif_carrier_off(dev);

carrier off inside the locked section, user can call open
immediately after unlock

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 05/24] ovpn: implement interface creation/destruction via netlink
  2024-05-06  1:16 ` [PATCH net-next v3 05/24] ovpn: implement interface creation/destruction via netlink Antonio Quartulli
@ 2024-05-08  0:21   ` Jakub Kicinski
  2024-05-08  9:49     ` Antonio Quartulli
  0 siblings, 1 reply; 111+ messages in thread
From: Jakub Kicinski @ 2024-05-08  0:21 UTC (permalink / raw
  To: Antonio Quartulli
  Cc: netdev, Sergey Ryazanov, Paolo Abeni, Eric Dumazet, Andrew Lunn,
	Esben Haabendal

On Mon,  6 May 2024 03:16:18 +0200 Antonio Quartulli wrote:
>  int ovpn_nl_new_iface_doit(struct sk_buff *skb, struct genl_info *info)
>  {
> -	return -ENOTSUPP;
> +	const char *ifname = OVPN_DEFAULT_IFNAME;
> +	enum ovpn_mode mode = OVPN_MODE_P2P;
> +	struct net_device *dev;
> +	struct sk_buff *msg;
> +	void *hdr;
> +
> +	if (info->attrs[OVPN_A_IFNAME])
> +		ifname = nla_data(info->attrs[OVPN_A_IFNAME]);
> +
> +	if (info->attrs[OVPN_A_MODE]) {
> +		mode = nla_get_u32(info->attrs[OVPN_A_MODE]);
> +		pr_debug("ovpn: setting device (%s) mode: %u\n", ifname, mode);
> +	}
> +
> +	dev = ovpn_iface_create(ifname, mode, genl_info_net(info));
> +	if (IS_ERR(dev)) {
> +		pr_err("ovpn: error while creating interface %s: %ld\n", ifname,
> +		       PTR_ERR(dev));

Better to send the error to the caller with NL_SET_ERR_MSG_MOD()

> +		return PTR_ERR(dev);
> +	}
> +
> +	msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
> +	if (!msg)
> +		return -ENOMEM;
> +
> +	hdr = genlmsg_put(msg, info->snd_portid, info->snd_seq, &ovpn_nl_family,
> +			  0, OVPN_CMD_NEW_IFACE);

genlmsg_iput() will save you a lot of typing

> +	if (!hdr) {
> +		netdev_err(dev, "%s: cannot create message header\n", __func__);
> +		return -EMSGSIZE;
> +	}
> +
> +	if (nla_put(msg, OVPN_A_IFNAME, strlen(dev->name) + 1, dev->name)) {

nla_put_string() ?

> +		netdev_err(dev, "%s: cannot add ifname to reply\n", __func__);

Probably not worth it, can't happen given the message size

> +		genlmsg_cancel(msg, hdr);
> +		nlmsg_free(msg);
> +		return -EMSGSIZE;
> +	}
> +
> +	genlmsg_end(msg, hdr);
> +
> +	return genlmsg_reply(msg, info);
>  }
>  
>  int ovpn_nl_del_iface_doit(struct sk_buff *skb, struct genl_info *info)
>  {
> -	return -ENOTSUPP;
> +	struct ovpn_struct *ovpn = info->user_ptr[0];
> +
> +	rtnl_lock();
> +	ovpn_iface_destruct(ovpn);
> +	dev_put(ovpn->dev);
> +	rtnl_unlock();
> +
> +	synchronize_net();

Why? 🤔️

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 03/24] ovpn: add basic netlink support
  2024-05-08  0:10   ` Jakub Kicinski
@ 2024-05-08  7:42     ` Antonio Quartulli
  0 siblings, 0 replies; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-08  7:42 UTC (permalink / raw
  To: Jakub Kicinski
  Cc: netdev, Sergey Ryazanov, Paolo Abeni, Eric Dumazet, Andrew Lunn,
	Esben Haabendal

On 08/05/2024 02:10, Jakub Kicinski wrote:
> On Mon,  6 May 2024 03:16:16 +0200 Antonio Quartulli wrote:
>> +    name: nonce_tail_size
> 
> nit: typically we hyphenate the names in YAML and C codegen replaces
> the hyphens with underscores (and converts to uppercase)

ACK will go with hyphens.

> 
>> +         exact-len: OVPN_NONCE_TAIL_SIZE
> 
> speaking of which - is the codegen buggy or this can be nonce_tail_size?
> (or rather nonce-tail-size)

yeah, something was wrong.
I used 'nonce_tail_size' at first, but it did not get converted, 
therefore I hardcoded the final define name.

 From what you say this seems to be unexpected.
Will check what the script does.

> 
>> +      -
>> +        name: pad
>> +        type: pad
> 
> You shouldn't need this, now that we have uint.
> replace nla_put_u64_64bit() with nla_put_uint().

ACK

> Unfortunately libnl hasn't caught up so you may need to open code
> the getter a little in user space CLI.

ok, no big deal.

> 
> BTW I'd also bump the packet counters to uint.
> Doesn't cost much if they don't grow > 32b and you never know..

Ok, will do

> 
>> +        request:
>> +          attributes:
>> +            - ifname
>> +            - mode
>> +        reply:
>> +          attributes:
>> +            - ifname
> 
> The attribute lists
> 
>> +	struct net_device *dev;
>> +	int ifindex;
>> +
>> +	if (!attrs[OVPN_A_IFINDEX])
> 
> GENL_REQ_ATTR_CHECK()

ACK, I must have missed this one.

> 
>> +		return ERR_PTR(-EINVAL);
>> +
>> +	ifindex = nla_get_u32(attrs[OVPN_A_IFINDEX]);
>> +
>> +	dev = dev_get_by_index(net, ifindex);
>> +	if (!dev)
>> +		return ERR_PTR(-ENODEV);
>> +
>> +	if (!ovpn_dev_is_valid(dev))
>> +		goto err_put_dev;
>> +
>> +	return dev;
>> +
>> +err_put_dev:
>> +	dev_put(dev);
> 
> NL_SET_BAD_ATTR(info->extack, ...[OVPN_A_IFINDEX])

Oh, thanks for pointing this out.

> 
>> +	return ERR_PTR(-EINVAL);

-- 
Antonio Quartulli
OpenVPN Inc.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 04/24] ovpn: add basic interface creation/destruction/management routines
  2024-05-08  0:18   ` Jakub Kicinski
@ 2024-05-08  7:53     ` Antonio Quartulli
  0 siblings, 0 replies; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-08  7:53 UTC (permalink / raw
  To: Jakub Kicinski
  Cc: netdev, Sergey Ryazanov, Paolo Abeni, Eric Dumazet, Andrew Lunn,
	Esben Haabendal

On 08/05/2024 02:18, Jakub Kicinski wrote:
> On Mon,  6 May 2024 03:16:17 +0200 Antonio Quartulli wrote:
> 
>> diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c
>> index ad3813419c33..338e99dfe886 100644
>> --- a/drivers/net/ovpn/io.c
>> +++ b/drivers/net/ovpn/io.c
>> @@ -11,6 +11,26 @@
>>   #include <linux/skbuff.h>
>>   
>>   #include "io.h"
>> +#include "ovpnstruct.h"
>> +#include "netlink.h"
>> +
>> +int ovpn_struct_init(struct net_device *dev)
>> +{
>> +	struct ovpn_struct *ovpn = netdev_priv(dev);
>> +	int err;
>> +
>> +	ovpn->dev = dev;
>> +
>> +	err = ovpn_nl_init(ovpn);
>> +	if (err < 0)
>> +		return err;
>> +
>> +	dev->tstats = netdev_alloc_pcpu_stats(struct pcpu_sw_netstats);
> 
> Set pcpu_stat_type, core will allocate for you

ok

> 
>> +	if (!dev->tstats)
>> +		return -ENOMEM;
>> +
>> +	return 0;
>> +}
> 
>> +/**
>> + * ovpn_struct_init - Initialize the netdevice private area
>> + * @dev: the device to initialize
>> + *
>> + * Return: 0 on success or a negative error code otherwise
>> + */
>> +int ovpn_struct_init(struct net_device *dev);
> 
> Weak preference for kdoc to go with the implementation, not declaration.

oh ok - this wasn't clear.
Will move the kdoc next to the implementation.

> 
>> +static const struct net_device_ops ovpn_netdev_ops = {
>> +	.ndo_open		= ovpn_net_open,
>> +	.ndo_stop		= ovpn_net_stop,
>> +	.ndo_start_xmit		= ovpn_net_xmit,
>> +	.ndo_get_stats64        = dev_get_tstats64,
> 
> Core should count pcpu stats automatically

Thanks for pointing this out.
I see dev_get_stats() takes care of all this for us.

> 
>> +};
>> +
>>   bool ovpn_dev_is_valid(const struct net_device *dev)
>>   {
>>   	return dev->netdev_ops->ndo_start_xmit == ovpn_net_xmit;
>>   }
> 
>> +	list_add(&ovpn->dev_list, &dev_list);
>> +	rtnl_unlock();
>> +
>> +	/* turn carrier explicitly off after registration, this way state is
>> +	 * clearly defined
>> +	 */
>> +	netif_carrier_off(dev);
> 
> carrier off inside the locked section, user can call open
> immediately after unlock

ok, will move it up.


-- 
Antonio Quartulli
OpenVPN Inc.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 05/24] ovpn: implement interface creation/destruction via netlink
  2024-05-08  0:21   ` Jakub Kicinski
@ 2024-05-08  9:49     ` Antonio Quartulli
  2024-05-09  1:09       ` Jakub Kicinski
  0 siblings, 1 reply; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-08  9:49 UTC (permalink / raw
  To: Jakub Kicinski
  Cc: netdev, Sergey Ryazanov, Paolo Abeni, Eric Dumazet, Andrew Lunn,
	Esben Haabendal

On 08/05/2024 02:21, Jakub Kicinski wrote:
> On Mon,  6 May 2024 03:16:18 +0200 Antonio Quartulli wrote:
>>   int ovpn_nl_new_iface_doit(struct sk_buff *skb, struct genl_info *info)
>>   {
>> -	return -ENOTSUPP;
>> +	const char *ifname = OVPN_DEFAULT_IFNAME;
>> +	enum ovpn_mode mode = OVPN_MODE_P2P;
>> +	struct net_device *dev;
>> +	struct sk_buff *msg;
>> +	void *hdr;
>> +
>> +	if (info->attrs[OVPN_A_IFNAME])
>> +		ifname = nla_data(info->attrs[OVPN_A_IFNAME]);
>> +
>> +	if (info->attrs[OVPN_A_MODE]) {
>> +		mode = nla_get_u32(info->attrs[OVPN_A_MODE]);
>> +		pr_debug("ovpn: setting device (%s) mode: %u\n", ifname, mode);
>> +	}
>> +
>> +	dev = ovpn_iface_create(ifname, mode, genl_info_net(info));
>> +	if (IS_ERR(dev)) {
>> +		pr_err("ovpn: error while creating interface %s: %ld\n", ifname,
>> +		       PTR_ERR(dev));
> 
> Better to send the error to the caller with NL_SET_ERR_MSG_MOD()

yeah, makes sense. I guess I can do the same for every other error 
generated in any netlink handler.

> 
>> +		return PTR_ERR(dev);
>> +	}
>> +
>> +	msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
>> +	if (!msg)
>> +		return -ENOMEM;
>> +
>> +	hdr = genlmsg_put(msg, info->snd_portid, info->snd_seq, &ovpn_nl_family,
>> +			  0, OVPN_CMD_NEW_IFACE);
> 
> genlmsg_iput() will save you a lot of typing

oh, wow, nice one :) will switch to iput()

> 
>> +	if (!hdr) {
>> +		netdev_err(dev, "%s: cannot create message header\n", __func__);
>> +		return -EMSGSIZE;
>> +	}
>> +
>> +	if (nla_put(msg, OVPN_A_IFNAME, strlen(dev->name) + 1, dev->name)) {
> 
> nla_put_string() ?
> 

right.

>> +		netdev_err(dev, "%s: cannot add ifname to reply\n", __func__);
> 
> Probably not worth it, can't happen given the message size

Personally I still prefer to check the return value of functions that 
may fail, because somebody may break the assumption (i.e. message large 
enough by design) without realizing that this call was relying on that.

If you want, I could still add a comment saying that we don't expect 
this to happen.

> 
>> +		genlmsg_cancel(msg, hdr);
>> +		nlmsg_free(msg);
>> +		return -EMSGSIZE;
>> +	}
>> +
>> +	genlmsg_end(msg, hdr);
>> +
>> +	return genlmsg_reply(msg, info);
>>   }
>>   
>>   int ovpn_nl_del_iface_doit(struct sk_buff *skb, struct genl_info *info)
>>   {
>> -	return -ENOTSUPP;
>> +	struct ovpn_struct *ovpn = info->user_ptr[0];
>> +
>> +	rtnl_lock();
>> +	ovpn_iface_destruct(ovpn);
>> +	dev_put(ovpn->dev);
>> +	rtnl_unlock();
>> +
>> +	synchronize_net();
> 
> Why? 🤔️


hmm I was under the impression that we should always call this function 
when destroying an interface to make sure that packets that already 
entered the network stack can be properly processed before the interface 
is gone for good.

Maybe this is not the right place? Any hint?

-- 
Antonio Quartulli
OpenVPN Inc.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 24/24] testing/selftest: add test tool and scripts for ovpn module
  2024-05-07 23:55   ` Jakub Kicinski
@ 2024-05-08  9:51     ` Antonio Quartulli
  2024-05-09  0:50       ` Jakub Kicinski
  0 siblings, 1 reply; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-08  9:51 UTC (permalink / raw
  To: Jakub Kicinski
  Cc: netdev, Sergey Ryazanov, Paolo Abeni, Eric Dumazet, Andrew Lunn,
	Esben Haabendal

On 08/05/2024 01:55, Jakub Kicinski wrote:
> On Mon,  6 May 2024 03:16:37 +0200 Antonio Quartulli wrote:
>> +CFLAGS = -Wall -idirafter ../../../../include/uapi
> 
> This may end badly once the headers you're after also exist in system
> paths. The guards in uapi/ are modified when header is installed.
> It's better to -I../../../../usr/include/ and do "make headers"
> before building tests.

ok!

> 
>> +CFLAGS += $(shell pkg-config --cflags libnl-3.0 libnl-genl-3.0)
>> +
>> +LDFLAGS = -lmbedtls -lmbedcrypto
>> +LDFLAGS += $(shell pkg-config --libs libnl-3.0 libnl-genl-3.0)
>> +
>> +ovpn-cli: ovpn-cli.c
>> +
>> +TEST_PROGS = run.sh
>> +TEST_GEN_PROGS_EXTENDED = ovpn-cli
> 
> TEST_GEN_FILES - it's not a test at all, AFAICT.

This binary is just a helper and it is used by the scripts below.

I only need it to be built before executing the run.sh script.

Isn't this the right VARIABLE to use for the purpose?

> 
>> +./netns-test.sh
>> +./netns-test.sh -t
>> +./float-test.sh
>> +
> 
> nit: extra new line at the end

ACK


-- 
Antonio Quartulli
OpenVPN Inc.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 00/24] Introducing OpenVPN Data Channel Offload
  2024-05-07 23:48 ` [PATCH net-next v3 00/24] Introducing OpenVPN Data Channel Offload Jakub Kicinski
@ 2024-05-08  9:56   ` Antonio Quartulli
  2024-05-09  0:53     ` Jakub Kicinski
  0 siblings, 1 reply; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-08  9:56 UTC (permalink / raw
  To: Jakub Kicinski
  Cc: netdev, Sergey Ryazanov, Paolo Abeni, Eric Dumazet, Andrew Lunn,
	Esben Haabendal



On 08/05/2024 01:48, Jakub Kicinski wrote:
> On Mon,  6 May 2024 03:16:13 +0200 Antonio Quartulli wrote:
>> I am finally back with version 3 of the ovpn patchset.
>> It took a while to address all comments I have received on v2, but I
>> am happy to say that I addressed 99% of the feedback I collected.
> 
> Nice, one more check / warning that pops up is missing kdoc.
> W=1 build only catches kdoc problems in C sources, for headers
> try running something like:
> 
> ./scripts/kernel-doc -none -Wall $new_files

I see there is one warning to fix due to a typ0 (eventS_wq vs event_wq), 
but I also get more warnings like this:

drivers/net/ovpn/peer.h:119: warning: Function parameter or struct 
member 'vpn_addrs' not described in 'ovpn_peer'

However vpn_addrs is an anonymous struct within struct ovpn_peer.
I have already documented all its members using the form:

@vpn_addrs.ipv4
@vpn_addrs.ipv6

Am I expected to document the vpn_addrs as well?
Or is this a false positive?

Regards,



-- 
Antonio Quartulli
OpenVPN Inc.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 03/24] ovpn: add basic netlink support
  2024-05-06  1:16 ` [PATCH net-next v3 03/24] ovpn: add basic netlink support Antonio Quartulli
  2024-05-08  0:10   ` Jakub Kicinski
@ 2024-05-08 14:42   ` Sabrina Dubroca
  2024-05-08 14:51     ` Antonio Quartulli
  1 sibling, 1 reply; 111+ messages in thread
From: Sabrina Dubroca @ 2024-05-08 14:42 UTC (permalink / raw
  To: Antonio Quartulli
  Cc: netdev, Jakub Kicinski, Sergey Ryazanov, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Esben Haabendal

2024-05-06, 03:16:16 +0200, Antonio Quartulli wrote:
> diff --git a/drivers/net/ovpn/netlink.c b/drivers/net/ovpn/netlink.c
> new file mode 100644
> index 000000000000..c0a9f58e0e87
> --- /dev/null
> +++ b/drivers/net/ovpn/netlink.c
> +int ovpn_nl_new_iface_doit(struct sk_buff *skb, struct genl_info *info)
> +{
> +	return -ENOTSUPP;

nit: All thhese should probably be EOPNOTSUPP if those return values
can be passed back to userspace, but since you're removing all of them
as you implement the functions, it doesn't really matter.

[...]
> +/**
> + * ovpn_nl_init - perform any ovpn specific netlink initialization
> + * @ovpn: the openvpn instance object
> + */
> +int ovpn_nl_init(struct ovpn_struct *ovpn)
> +{
> +	return 0;
> +}

Is this also part of the auto-generated code? Or maybe a leftover from
previous iterations? This function doesn't do anything even after all
other patches are applied.

-- 
Sabrina


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 03/24] ovpn: add basic netlink support
  2024-05-08 14:42   ` Sabrina Dubroca
@ 2024-05-08 14:51     ` Antonio Quartulli
  0 siblings, 0 replies; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-08 14:51 UTC (permalink / raw
  To: Sabrina Dubroca
  Cc: netdev, Jakub Kicinski, Sergey Ryazanov, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Esben Haabendal

On 08/05/2024 16:42, Sabrina Dubroca wrote:
> 2024-05-06, 03:16:16 +0200, Antonio Quartulli wrote:
>> diff --git a/drivers/net/ovpn/netlink.c b/drivers/net/ovpn/netlink.c
>> new file mode 100644
>> index 000000000000..c0a9f58e0e87
>> --- /dev/null
>> +++ b/drivers/net/ovpn/netlink.c
>> +int ovpn_nl_new_iface_doit(struct sk_buff *skb, struct genl_info *info)
>> +{
>> +	return -ENOTSUPP;
> 
> nit: All thhese should probably be EOPNOTSUPP if those return values
> can be passed back to userspace, but since you're removing all of them
> as you implement the functions, it doesn't really matter.

Yeah, I just saw the warnings about these errors.
I'll still change them in v4.

> 
> [...]
>> +/**
>> + * ovpn_nl_init - perform any ovpn specific netlink initialization
>> + * @ovpn: the openvpn instance object
>> + */
>> +int ovpn_nl_init(struct ovpn_struct *ovpn)
>> +{
>> +	return 0;
>> +}
> 
> Is this also part of the auto-generated code? Or maybe a leftover from
> previous iterations? This function doesn't do anything even after all
> other patches are applied.

Ouch, I missed this. Definitely a left over.
Will remove it.

Thanks a lot

> 

-- 
Antonio Quartulli
OpenVPN Inc.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 04/24] ovpn: add basic interface creation/destruction/management routines
  2024-05-06  1:16 ` [PATCH net-next v3 04/24] ovpn: add basic interface creation/destruction/management routines Antonio Quartulli
  2024-05-08  0:18   ` Jakub Kicinski
@ 2024-05-08 14:52   ` Sabrina Dubroca
  2024-05-09  1:06     ` Jakub Kicinski
  2024-05-09  8:25     ` Antonio Quartulli
  1 sibling, 2 replies; 111+ messages in thread
From: Sabrina Dubroca @ 2024-05-08 14:52 UTC (permalink / raw
  To: Antonio Quartulli
  Cc: netdev, Jakub Kicinski, Sergey Ryazanov, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Esben Haabendal

2024-05-06, 03:16:17 +0200, Antonio Quartulli wrote:
> diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c
> index ad3813419c33..338e99dfe886 100644
> --- a/drivers/net/ovpn/io.c
> +++ b/drivers/net/ovpn/io.c
> @@ -11,6 +11,26 @@
>  #include <linux/skbuff.h>
>  
>  #include "io.h"
> +#include "ovpnstruct.h"
> +#include "netlink.h"
> +
> +int ovpn_struct_init(struct net_device *dev)

nit: Should this be in main.c? It's only used there, and I think it
would make more sense to drop it next to ovpn_struct_free.

> +{
> +	struct ovpn_struct *ovpn = netdev_priv(dev);
> +	int err;
> +

[...]
> diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c
> index 33c0b004ce16..584cd7286aff 100644
> --- a/drivers/net/ovpn/main.c
> +++ b/drivers/net/ovpn/main.c
[...]
> +static void ovpn_struct_free(struct net_device *net)
> +{
> +	struct ovpn_struct *ovpn = netdev_priv(net);
> +
> +	rtnl_lock();

 ->priv_destructor can run from register_netdevice (already under
RTNL), this doesn't look right.

> +	list_del(&ovpn->dev_list);

And if this gets called from register_netdevice, the list_add from
ovpn_iface_create hasn't run yet, so this will probably do strange
things?

> +	rtnl_unlock();
> +
> +	free_percpu(net->tstats);
> +}
> +
> +static int ovpn_net_open(struct net_device *dev)
> +{
> +	struct in_device *dev_v4 = __in_dev_get_rtnl(dev);
> +
> +	if (dev_v4) {
> +		/* disable redirects as Linux gets confused by ovpn handling
> +		 * same-LAN routing
> +		 */
> +		IN_DEV_CONF_SET(dev_v4, SEND_REDIRECTS, false);
> +		IPV4_DEVCONF_ALL(dev_net(dev), SEND_REDIRECTS) = false;

Jakub, are you ok with that? This feels a bit weird to have in the
middle of a driver.

> +	}
> +
> +	netif_tx_start_all_queues(dev);
> +	return 0;
> +}

[...]
> +void ovpn_iface_destruct(struct ovpn_struct *ovpn)
> +{
> +	ASSERT_RTNL();
> +
> +	netif_carrier_off(ovpn->dev);
> +
> +	ovpn->registered = false;
> +
> +	unregister_netdevice(ovpn->dev);
> +	synchronize_net();

If this gets called from the loop in ovpn_netns_pre_exit, one
synchronize_net per ovpn device would seem quite expensive.

> +}
> +
>  static int ovpn_netdev_notifier_call(struct notifier_block *nb,
>  				     unsigned long state, void *ptr)
>  {
>  	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
> +	struct ovpn_struct *ovpn;
>  
>  	if (!ovpn_dev_is_valid(dev))
>  		return NOTIFY_DONE;
>  
> +	ovpn = netdev_priv(dev);
> +
>  	switch (state) {
>  	case NETDEV_REGISTER:
> -		/* add device to internal list for later destruction upon
> -		 * unregistration
> -		 */
> +		ovpn->registered = true;
>  		break;
>  	case NETDEV_UNREGISTER:
> +		/* twiddle thumbs on netns device moves */
> +		if (dev->reg_state != NETREG_UNREGISTERING)
> +			break;
> +
>  		/* can be delivered multiple times, so check registered flag,
>  		 * then destroy the interface
>  		 */
> +		if (!ovpn->registered)
> +			return NOTIFY_DONE;
> +
> +		ovpn_iface_destruct(ovpn);

Maybe I'm misunderstanding this code. Why do you want to manually
destroy a device that is already going away?

>  		break;
>  	case NETDEV_POST_INIT:
>  	case NETDEV_GOING_DOWN:
>  	case NETDEV_DOWN:
>  	case NETDEV_UP:
>  	case NETDEV_PRE_UP:
> +		break;
>  	default:
>  		return NOTIFY_DONE;
>  	}
> @@ -62,6 +210,24 @@ static struct notifier_block ovpn_netdev_notifier = {
>  	.notifier_call = ovpn_netdev_notifier_call,
>  };
>  
> +static void ovpn_netns_pre_exit(struct net *net)
> +{
> +	struct ovpn_struct *ovpn;
> +
> +	rtnl_lock();
> +	list_for_each_entry(ovpn, &dev_list, dev_list) {
> +		if (dev_net(ovpn->dev) != net)
> +			continue;
> +
> +		ovpn_iface_destruct(ovpn);

Is this needed? On netns destruction all devices within the ns will be
destroyed by the networking core.

> +	}
> +	rtnl_unlock();
> +}

-- 
Sabrina


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 07/24] ovpn: introduce the ovpn_peer object
  2024-05-06  1:16 ` [PATCH net-next v3 07/24] ovpn: introduce the ovpn_peer object Antonio Quartulli
@ 2024-05-08 16:06   ` Sabrina Dubroca
  2024-05-08 20:31     ` Antonio Quartulli
  2024-05-13 10:09   ` Simon Horman
  1 sibling, 1 reply; 111+ messages in thread
From: Sabrina Dubroca @ 2024-05-08 16:06 UTC (permalink / raw
  To: Antonio Quartulli
  Cc: netdev, Jakub Kicinski, Sergey Ryazanov, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Esben Haabendal

2024-05-06, 03:16:20 +0200, Antonio Quartulli wrote:
> An ovpn_peer object holds the whole status of a remote peer
> (regardless whether it is a server or a client).
> 
> This includes status for crypto, tx/rx buffers, napi, etc.
> 
> Only support for one peer is introduced (P2P mode).
> Multi peer support is introduced with a later patch.
> 
> Along with the ovpn_peer, also the ovpn_bind object is introcued
                                                         ^
typo: "introduced"

> as the two are strictly related.
> An ovpn_bind object wraps a sockaddr representing the local
> coordinates being used to talk to a specific peer.

> diff --git a/drivers/net/ovpn/bind.c b/drivers/net/ovpn/bind.c
> new file mode 100644
> index 000000000000..c1f842c06e32
> --- /dev/null
> +++ b/drivers/net/ovpn/bind.c
> +static void ovpn_bind_release_rcu(struct rcu_head *head)
> +{
> +	struct ovpn_bind *bind = container_of(head, struct ovpn_bind, rcu);
> +
> +	kfree(bind);
> +}
> +
> +void ovpn_bind_reset(struct ovpn_peer *peer, struct ovpn_bind *new)
> +{
> +	struct ovpn_bind *old;
> +
> +	spin_lock_bh(&peer->lock);
> +	old = rcu_replace_pointer(peer->bind, new, true);
> +	spin_unlock_bh(&peer->lock);
> +
> +	if (old)
> +		call_rcu(&old->rcu, ovpn_bind_release_rcu);

Isn't that just kfree_rcu? (note kfree_rcu doesn't need the NULL check)

> +}


> diff --git a/drivers/net/ovpn/bind.h b/drivers/net/ovpn/bind.h
> new file mode 100644
> index 000000000000..61433550a961
> --- /dev/null
> +++ b/drivers/net/ovpn/bind.h
[...]
> +static inline bool ovpn_bind_skb_src_match(const struct ovpn_bind *bind,
> +					   struct sk_buff *skb)

nit: I think skb can also be const here


> diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c
> index 338e99dfe886..a420bb45f25f 100644
> --- a/drivers/net/ovpn/io.c
> +++ b/drivers/net/ovpn/io.c
> @@ -13,6 +13,7 @@
>  #include "io.h"
>  #include "ovpnstruct.h"
>  #include "netlink.h"
> +#include "peer.h"
>  
>  int ovpn_struct_init(struct net_device *dev)
>  {
> @@ -25,6 +26,13 @@ int ovpn_struct_init(struct net_device *dev)
>  	if (err < 0)
>  		return err;
>  
> +	spin_lock_init(&ovpn->lock);
> +
> +	ovpn->events_wq = alloc_workqueue("ovpn-events-wq-%s", WQ_MEM_RECLAIM,
> +					  0, dev->name);

I'm not convinced this will get freed consistently if
register_netdevice fails early (before ndo_init).  After talking to
Paolo, it seems this should be moved into a new ->ndo_init instead.

> +	if (!ovpn->events_wq)
> +		return -ENOMEM;
> +
>  	dev->tstats = netdev_alloc_pcpu_stats(struct pcpu_sw_netstats);
>  	if (!dev->tstats)
>  		return -ENOMEM;
> diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c
> index cc8a97a1a189..dba35ecb236b 100644
> --- a/drivers/net/ovpn/main.c
> +++ b/drivers/net/ovpn/main.c
> @@ -37,6 +39,9 @@ static void ovpn_struct_free(struct net_device *net)
>  	rtnl_unlock();
>  
>  	free_percpu(net->tstats);
> +	flush_workqueue(ovpn->events_wq);
> +	destroy_workqueue(ovpn->events_wq);

Is the flush needed? I'm not an expert on workqueues, but from a quick
look at destroy_workqueue it calls drain_workqueue, which would take
care of flushing the queue?

> +	rcu_barrier();
>  }
>  

[...]
> diff --git a/drivers/net/ovpn/ovpnstruct.h b/drivers/net/ovpn/ovpnstruct.h
> index ee05b8a2c61d..b79d4f0474b0 100644
> --- a/drivers/net/ovpn/ovpnstruct.h
> +++ b/drivers/net/ovpn/ovpnstruct.h
> @@ -17,12 +17,19 @@
>   * @dev: the actual netdev representing the tunnel
>   * @registered: whether dev is still registered with netdev or not
>   * @mode: device operation mode (i.e. p2p, mp, ..)
> + * @lock: protect this object
> + * @event_wq: used to schedule generic events that may sleep and that need to be
> + *            performed outside of softirq context
> + * @peer: in P2P mode, this is the only remote peer
>   * @dev_list: entry for the module wide device list
>   */
>  struct ovpn_struct {
>  	struct net_device *dev;
>  	bool registered;
>  	enum ovpn_mode mode;
> +	spinlock_t lock; /* protect writing to the ovpn_struct object */

nit: the comment isn't really needed since you have kdoc saying the same thing

> +	struct workqueue_struct *events_wq;
> +	struct ovpn_peer __rcu *peer;
>  	struct list_head dev_list;
>  };
>  
> diff --git a/drivers/net/ovpn/peer.c b/drivers/net/ovpn/peer.c
> new file mode 100644
> index 000000000000..2948b7320d47
> --- /dev/null
> +++ b/drivers/net/ovpn/peer.c
[...]
> +/**
> + * ovpn_peer_free - release private members and free peer object
> + * @peer: the peer to free
> + */
> +static void ovpn_peer_free(struct ovpn_peer *peer)
> +{
> +	ovpn_bind_reset(peer, NULL);
> +
> +	WARN_ON(!__ptr_ring_empty(&peer->tx_ring));

Could you pass a destructor to ptr_ring_cleanup instead of all these WARNs?

> +	ptr_ring_cleanup(&peer->tx_ring, NULL);
> +	WARN_ON(!__ptr_ring_empty(&peer->rx_ring));
> +	ptr_ring_cleanup(&peer->rx_ring, NULL);
> +	WARN_ON(!__ptr_ring_empty(&peer->netif_rx_ring));
> +	ptr_ring_cleanup(&peer->netif_rx_ring, NULL);
> +
> +	dst_cache_destroy(&peer->dst_cache);
> +
> +	dev_put(peer->ovpn->dev);
> +
> +	kfree(peer);
> +}

[...]
> +void ovpn_peer_release(struct ovpn_peer *peer)
> +{
> +	call_rcu(&peer->rcu, ovpn_peer_release_rcu);
> +}
> +
> +/**
> + * ovpn_peer_delete_work - work scheduled to release peer in process context
> + * @work: the work object
> + */
> +static void ovpn_peer_delete_work(struct work_struct *work)
> +{
> +	struct ovpn_peer *peer = container_of(work, struct ovpn_peer,
> +					      delete_work);
> +	ovpn_peer_release(peer);

Does call_rcu really need to run in process context?

> +}

[...]
> +/**
> + * ovpn_peer_transp_match - check if sockaddr and peer binding match
> + * @peer: the peer to get the binding from
> + * @ss: the sockaddr to match
> + *
> + * Return: true if sockaddr and binding match or false otherwise
> + */
> +static bool ovpn_peer_transp_match(struct ovpn_peer *peer,
> +				   struct sockaddr_storage *ss)
> +{
[...]
> +	case AF_INET6:
> +		sa6 = (struct sockaddr_in6 *)ss;
> +		if (memcmp(&sa6->sin6_addr, &bind->sa.in6.sin6_addr,
> +			   sizeof(struct in6_addr)))

ipv6_addr_equal?

> +			return false;
> +		if (sa6->sin6_port != bind->sa.in6.sin6_port)
> +			return false;
> +		break;

[...]
> +struct ovpn_peer *ovpn_peer_get_by_id(struct ovpn_struct *ovpn, u32 peer_id)
> +{
> +	struct ovpn_peer *peer = NULL;
> +
> +	if (ovpn->mode == OVPN_MODE_P2P)
> +		peer = ovpn_peer_get_by_id_p2p(ovpn, peer_id);
> +
> +	return peer;
> +}
> +
> +/**
> + * ovpn_peer_add_p2p - add per to related tables in a P2P instance
                              ^
typo: peer?


[...]
> +/**
> + * ovpn_peer_del_p2p - delete peer from related tables in a P2P instance
> + * @peer: the peer to delete
> + * @reason: reason why the peer was deleted (sent to userspace)
> + *
> + * Return: 0 on success or a negative error code otherwise
> + */
> +static int ovpn_peer_del_p2p(struct ovpn_peer *peer,
> +			     enum ovpn_del_peer_reason reason)
> +{
> +	struct ovpn_peer *tmp;
> +	int ret = -ENOENT;
> +
> +	spin_lock_bh(&peer->ovpn->lock);
> +	tmp = rcu_dereference(peer->ovpn->peer);
> +	if (tmp != peer)
> +		goto unlock;

How do we recover if all those objects got out of sync? Are we stuck
with a broken peer?

And if this happens during interface deletion, aren't we leaking the
peer memory here?

> +	ovpn_peer_put(tmp);
> +	tmp->delete_reason = reason;
> +	RCU_INIT_POINTER(peer->ovpn->peer, NULL);
> +	ret = 0;
> +
> +unlock:
> +	spin_unlock_bh(&peer->ovpn->lock);
> +
> +	return ret;
> +}

[...]
> diff --git a/drivers/net/ovpn/peer.h b/drivers/net/ovpn/peer.h
> new file mode 100644
> index 000000000000..659df320525c
> --- /dev/null
> +++ b/drivers/net/ovpn/peer.h
[...]
> +/**
> + * struct ovpn_peer - the main remote peer object
> + * @ovpn: main openvpn instance this peer belongs to
> + * @id: unique identifier
> + * @vpn_addrs.ipv4: IPv4 assigned to peer on the tunnel
> + * @vpn_addrs.ipv6: IPv6 assigned to peer on the tunnel
> + * @tx_ring: queue of outgoing poackets to this peer
> + * @rx_ring: queue of incoming packets from this peer
> + * @netif_rx_ring: queue of packets to be sent to the netdevice via NAPI
> + * @dst_cache: cache for dst_entry used to send to peer
> + * @bind: remote peer binding
> + * @halt: true if ovpn_peer_mark_delete was called
> + * @delete_reason: why peer was deleted (i.e. timeout, transport error, ..)
> + * @lock: protects binding to peer (bind)
> + * @refcount: reference counter
> + * @rcu: used to free peer in an RCU safe way
> + * @delete_work: deferred cleanup work, used to notify userspace
> + */
> +struct ovpn_peer {
> +	struct ovpn_struct *ovpn;
> +	u32 id;
> +	struct {
> +		struct in_addr ipv4;
> +		struct in6_addr ipv6;
> +	} vpn_addrs;
> +	struct ptr_ring tx_ring;
> +	struct ptr_ring rx_ring;
> +	struct ptr_ring netif_rx_ring;
> +	struct dst_cache dst_cache;
> +	struct ovpn_bind __rcu *bind;
> +	bool halt;
> +	enum ovpn_del_peer_reason delete_reason;
> +	spinlock_t lock; /* protects bind */

nit: the comment isn't really needed, it's redundant with kdoc.

> +	struct kref refcount;
> +	struct rcu_head rcu;
> +	struct work_struct delete_work;
> +};

-- 
Sabrina


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 08/24] ovpn: introduce the ovpn_socket object
  2024-05-06  1:16 ` [PATCH net-next v3 08/24] ovpn: introduce the ovpn_socket object Antonio Quartulli
@ 2024-05-08 17:10   ` Sabrina Dubroca
  2024-05-08 20:38     ` Antonio Quartulli
  0 siblings, 1 reply; 111+ messages in thread
From: Sabrina Dubroca @ 2024-05-08 17:10 UTC (permalink / raw
  To: Antonio Quartulli
  Cc: netdev, Jakub Kicinski, Sergey Ryazanov, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Esben Haabendal

2024-05-06, 03:16:21 +0200, Antonio Quartulli wrote:
> This specific structure is used in the ovpn kernel module
> to wrap and carry around a standard kernel socket.
> 
> ovpn takes ownership of passed sockets and therefore an ovpn
> specific objects is attathced to them for status tracking

typos:      object    attached


> diff --git a/drivers/net/ovpn/socket.c b/drivers/net/ovpn/socket.c
> new file mode 100644
> index 000000000000..a4a4d69162f0
> --- /dev/null
> +++ b/drivers/net/ovpn/socket.c
[...]
> +
> +/* Finalize release of socket, called after RCU grace period */

kref_put seems to call ovpn_socket_release_kref without waiting, and
then that calls ovpn_socket_detach immediately as well. Am I missing
something?

> +static void ovpn_socket_detach(struct socket *sock)
> +{
> +	if (!sock)
> +		return;
> +
> +	sockfd_put(sock);
> +}

[...]
> +
> +/* Finalize release of socket, called after RCU grace period */

Did that comment get misplaced? It doesn't match the code.

> +static int ovpn_socket_attach(struct socket *sock, struct ovpn_peer *peer)
> +{
> +	int ret = -EOPNOTSUPP;
> +
> +	if (!sock || !peer)
> +		return -EINVAL;
> +
> +	if (sock->sk->sk_protocol == IPPROTO_UDP)
> +		ret = ovpn_udp_socket_attach(sock, peer->ovpn);
> +
> +	return ret;
> +}

> diff --git a/drivers/net/ovpn/udp.c b/drivers/net/ovpn/udp.c
> new file mode 100644
> index 000000000000..4b7d96a13df0
> --- /dev/null
> +++ b/drivers/net/ovpn/udp.c
[...]
> +
> +int ovpn_udp_socket_attach(struct socket *sock, struct ovpn_struct *ovpn)
> +{
> +	struct ovpn_socket *old_data;
> +
> +	/* sanity check */
> +	if (sock->sk->sk_protocol != IPPROTO_UDP) {
> +		netdev_err(ovpn->dev, "%s: expected UDP socket\n", __func__);

Maybe use DEBUG_NET_WARN_ON_ONCE here since it's never expected to
actually happen? That would help track down (in test/debug setups) how
we ended up here.

> +		return -EINVAL;
> +	}
> +
> +	/* make sure no pre-existing encapsulation handler exists */
> +	rcu_read_lock();
> +	old_data = rcu_dereference_sk_user_data(sock->sk);
> +	rcu_read_unlock();
> +	if (old_data) {
> +		if (old_data->ovpn == ovpn) {

You should stay under rcu_read_unlock if you access old_data's fields.

> +			netdev_dbg(ovpn->dev,
> +				   "%s: provided socket already owned by this interface\n",
> +				   __func__);
> +			return -EALREADY;
> +		}
> +
> +		netdev_err(ovpn->dev, "%s: provided socket already taken by other user\n",
> +			   __func__);
> +		return -EBUSY;
> +	}
> +
> +	return 0;
> +}

-- 
Sabrina


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 07/24] ovpn: introduce the ovpn_peer object
  2024-05-08 16:06   ` Sabrina Dubroca
@ 2024-05-08 20:31     ` Antonio Quartulli
  2024-05-09 13:04       ` Sabrina Dubroca
  0 siblings, 1 reply; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-08 20:31 UTC (permalink / raw
  To: Sabrina Dubroca
  Cc: netdev, Jakub Kicinski, Sergey Ryazanov, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Esben Haabendal

On 08/05/2024 18:06, Sabrina Dubroca wrote:
> 2024-05-06, 03:16:20 +0200, Antonio Quartulli wrote:
>> An ovpn_peer object holds the whole status of a remote peer
>> (regardless whether it is a server or a client).
>>
>> This includes status for crypto, tx/rx buffers, napi, etc.
>>
>> Only support for one peer is introduced (P2P mode).
>> Multi peer support is introduced with a later patch.
>>
>> Along with the ovpn_peer, also the ovpn_bind object is introcued
>                                                           ^
> typo: "introduced"

thanks

> 
>> as the two are strictly related.
>> An ovpn_bind object wraps a sockaddr representing the local
>> coordinates being used to talk to a specific peer.
> 
>> diff --git a/drivers/net/ovpn/bind.c b/drivers/net/ovpn/bind.c
>> new file mode 100644
>> index 000000000000..c1f842c06e32
>> --- /dev/null
>> +++ b/drivers/net/ovpn/bind.c
>> +static void ovpn_bind_release_rcu(struct rcu_head *head)
>> +{
>> +	struct ovpn_bind *bind = container_of(head, struct ovpn_bind, rcu);
>> +
>> +	kfree(bind);
>> +}
>> +
>> +void ovpn_bind_reset(struct ovpn_peer *peer, struct ovpn_bind *new)
>> +{
>> +	struct ovpn_bind *old;
>> +
>> +	spin_lock_bh(&peer->lock);
>> +	old = rcu_replace_pointer(peer->bind, new, true);
>> +	spin_unlock_bh(&peer->lock);
>> +
>> +	if (old)
>> +		call_rcu(&old->rcu, ovpn_bind_release_rcu);
> 
> Isn't that just kfree_rcu? (note kfree_rcu doesn't need the NULL check)

yeah, you're right. I think ovpn_bind_release_rcu() was more complex in 
the past, but got reduced step by step...will directly use kfree_rcu().

> 
>> +}
> 
> 
>> diff --git a/drivers/net/ovpn/bind.h b/drivers/net/ovpn/bind.h
>> new file mode 100644
>> index 000000000000..61433550a961
>> --- /dev/null
>> +++ b/drivers/net/ovpn/bind.h
> [...]
>> +static inline bool ovpn_bind_skb_src_match(const struct ovpn_bind *bind,
>> +					   struct sk_buff *skb)
> 
> nit: I think skb can also be const here

right

> 
> 
>> diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c
>> index 338e99dfe886..a420bb45f25f 100644
>> --- a/drivers/net/ovpn/io.c
>> +++ b/drivers/net/ovpn/io.c
>> @@ -13,6 +13,7 @@
>>   #include "io.h"
>>   #include "ovpnstruct.h"
>>   #include "netlink.h"
>> +#include "peer.h"
>>   
>>   int ovpn_struct_init(struct net_device *dev)
>>   {
>> @@ -25,6 +26,13 @@ int ovpn_struct_init(struct net_device *dev)
>>   	if (err < 0)
>>   		return err;
>>   
>> +	spin_lock_init(&ovpn->lock);
>> +
>> +	ovpn->events_wq = alloc_workqueue("ovpn-events-wq-%s", WQ_MEM_RECLAIM,
>> +					  0, dev->name);
> 
> I'm not convinced this will get freed consistently if
> register_netdevice fails early (before ndo_init).  After talking to
> Paolo, it seems this should be moved into a new ->ndo_init instead.

oh good point. I didn't consider that register_netdevice could fail that 
early.

> 
>> +	if (!ovpn->events_wq)
>> +		return -ENOMEM;
>> +
>>   	dev->tstats = netdev_alloc_pcpu_stats(struct pcpu_sw_netstats);
>>   	if (!dev->tstats)
>>   		return -ENOMEM;
>> diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c
>> index cc8a97a1a189..dba35ecb236b 100644
>> --- a/drivers/net/ovpn/main.c
>> +++ b/drivers/net/ovpn/main.c
>> @@ -37,6 +39,9 @@ static void ovpn_struct_free(struct net_device *net)
>>   	rtnl_unlock();
>>   
>>   	free_percpu(net->tstats);
>> +	flush_workqueue(ovpn->events_wq);
>> +	destroy_workqueue(ovpn->events_wq);
> 
> Is the flush needed? I'm not an expert on workqueues, but from a quick
> look at destroy_workqueue it calls drain_workqueue, which would take
> care of flushing the queue?

you're right. drain_workqueue calls __flush_workqueue as often as needed 
to empty the queue.
Therefore I can get rid of my flush_worqueue invocation.

> 
>> +	rcu_barrier();
>>   }
>>   
> 
> [...]
>> diff --git a/drivers/net/ovpn/ovpnstruct.h b/drivers/net/ovpn/ovpnstruct.h
>> index ee05b8a2c61d..b79d4f0474b0 100644
>> --- a/drivers/net/ovpn/ovpnstruct.h
>> +++ b/drivers/net/ovpn/ovpnstruct.h
>> @@ -17,12 +17,19 @@
>>    * @dev: the actual netdev representing the tunnel
>>    * @registered: whether dev is still registered with netdev or not
>>    * @mode: device operation mode (i.e. p2p, mp, ..)
>> + * @lock: protect this object
>> + * @event_wq: used to schedule generic events that may sleep and that need to be
>> + *            performed outside of softirq context
>> + * @peer: in P2P mode, this is the only remote peer
>>    * @dev_list: entry for the module wide device list
>>    */
>>   struct ovpn_struct {
>>   	struct net_device *dev;
>>   	bool registered;
>>   	enum ovpn_mode mode;
>> +	spinlock_t lock; /* protect writing to the ovpn_struct object */
> 
> nit: the comment isn't really needed since you have kdoc saying the same thing

True, but checkpatch.pl (or some other script?) was still throwing a 
warning, therefore I added this comment to silence it.

> 
>> +	struct workqueue_struct *events_wq;
>> +	struct ovpn_peer __rcu *peer;
>>   	struct list_head dev_list;
>>   };
>>   
>> diff --git a/drivers/net/ovpn/peer.c b/drivers/net/ovpn/peer.c
>> new file mode 100644
>> index 000000000000..2948b7320d47
>> --- /dev/null
>> +++ b/drivers/net/ovpn/peer.c
> [...]
>> +/**
>> + * ovpn_peer_free - release private members and free peer object
>> + * @peer: the peer to free
>> + */
>> +static void ovpn_peer_free(struct ovpn_peer *peer)
>> +{
>> +	ovpn_bind_reset(peer, NULL);
>> +
>> +	WARN_ON(!__ptr_ring_empty(&peer->tx_ring));
> 
> Could you pass a destructor to ptr_ring_cleanup instead of all these WARNs?

hmm but if we remove the WARNs then we lose the possibility to catch 
potential bugs, no? rings should definitely be empty at this point.

Or you think I should just not care and free any potentially remaining item?

> 
>> +	ptr_ring_cleanup(&peer->tx_ring, NULL);
>> +	WARN_ON(!__ptr_ring_empty(&peer->rx_ring));
>> +	ptr_ring_cleanup(&peer->rx_ring, NULL);
>> +	WARN_ON(!__ptr_ring_empty(&peer->netif_rx_ring));
>> +	ptr_ring_cleanup(&peer->netif_rx_ring, NULL);
>> +
>> +	dst_cache_destroy(&peer->dst_cache);
>> +
>> +	dev_put(peer->ovpn->dev);
>> +
>> +	kfree(peer);
>> +}
> 
> [...]
>> +void ovpn_peer_release(struct ovpn_peer *peer)
>> +{
>> +	call_rcu(&peer->rcu, ovpn_peer_release_rcu);
>> +}
>> +
>> +/**
>> + * ovpn_peer_delete_work - work scheduled to release peer in process context
>> + * @work: the work object
>> + */
>> +static void ovpn_peer_delete_work(struct work_struct *work)
>> +{
>> +	struct ovpn_peer *peer = container_of(work, struct ovpn_peer,
>> +					      delete_work);
>> +	ovpn_peer_release(peer);
> 
> Does call_rcu really need to run in process context?

Reason for switching to process context is that we have to invoke 
ovpn_nl_notify_del_peer (that sends a netlink event to userspace) and 
the latter requires a reference to the peer.

For this reason I thought it would be safe to have 
ovpn_nl_notify_del_peer and call_rcu invoked by the same context.

If I invoke call_rcu in ovpn_peer_release_kref, how can I be sure that 
the peer hasn't been free'd already when ovpn_nl_notify_del_peer is 
executed?


> 
>> +}
> 
> [...]
>> +/**
>> + * ovpn_peer_transp_match - check if sockaddr and peer binding match
>> + * @peer: the peer to get the binding from
>> + * @ss: the sockaddr to match
>> + *
>> + * Return: true if sockaddr and binding match or false otherwise
>> + */
>> +static bool ovpn_peer_transp_match(struct ovpn_peer *peer,
>> +				   struct sockaddr_storage *ss)
>> +{
> [...]
>> +	case AF_INET6:
>> +		sa6 = (struct sockaddr_in6 *)ss;
>> +		if (memcmp(&sa6->sin6_addr, &bind->sa.in6.sin6_addr,
>> +			   sizeof(struct in6_addr)))
> 
> ipv6_addr_equal?
> 

definitely. thanks

>> +			return false;
>> +		if (sa6->sin6_port != bind->sa.in6.sin6_port)
>> +			return false;
>> +		break;
> 
> [...]
>> +struct ovpn_peer *ovpn_peer_get_by_id(struct ovpn_struct *ovpn, u32 peer_id)
>> +{
>> +	struct ovpn_peer *peer = NULL;
>> +
>> +	if (ovpn->mode == OVPN_MODE_P2P)
>> +		peer = ovpn_peer_get_by_id_p2p(ovpn, peer_id);
>> +
>> +	return peer;
>> +}
>> +
>> +/**
>> + * ovpn_peer_add_p2p - add per to related tables in a P2P instance
>                                ^
> typo: peer?

yeah

> 
> 
> [...]
>> +/**
>> + * ovpn_peer_del_p2p - delete peer from related tables in a P2P instance
>> + * @peer: the peer to delete
>> + * @reason: reason why the peer was deleted (sent to userspace)
>> + *
>> + * Return: 0 on success or a negative error code otherwise
>> + */
>> +static int ovpn_peer_del_p2p(struct ovpn_peer *peer,
>> +			     enum ovpn_del_peer_reason reason)
>> +{
>> +	struct ovpn_peer *tmp;
>> +	int ret = -ENOENT;
>> +
>> +	spin_lock_bh(&peer->ovpn->lock);
>> +	tmp = rcu_dereference(peer->ovpn->peer);
>> +	if (tmp != peer)
>> +		goto unlock;
> 
> How do we recover if all those objects got out of sync? Are we stuck
> with a broken peer?

mhhh I don't fully get the scenario you are depicting.

In P2P mode there is only peer stored (reference is saved in ovpn->peer)

When we want to get rid of it, we invoke ovpn_peer_del_p2p().
The check we are performing here is just about being sure that we are 
removing the exact peer we requested to remove (and not some other peer 
that was still floating around for some reason).

> 
> And if this happens during interface deletion, aren't we leaking the
> peer memory here?

at interface deletion we call

ovpn_iface_destruct -> ovpn_peer_release_p2p -> 
ovpn_peer_del_p2p(ovpn->peer)

so at the last step we just ask to remove the very same peer that is 
curently stored, which should just never fail.

makes sense?

> 
>> +	ovpn_peer_put(tmp);
>> +	tmp->delete_reason = reason;
>> +	RCU_INIT_POINTER(peer->ovpn->peer, NULL);
>> +	ret = 0;
>> +
>> +unlock:
>> +	spin_unlock_bh(&peer->ovpn->lock);
>> +
>> +	return ret;
>> +}
> 
> [...]
>> diff --git a/drivers/net/ovpn/peer.h b/drivers/net/ovpn/peer.h
>> new file mode 100644
>> index 000000000000..659df320525c
>> --- /dev/null
>> +++ b/drivers/net/ovpn/peer.h
> [...]
>> +/**
>> + * struct ovpn_peer - the main remote peer object
>> + * @ovpn: main openvpn instance this peer belongs to
>> + * @id: unique identifier
>> + * @vpn_addrs.ipv4: IPv4 assigned to peer on the tunnel
>> + * @vpn_addrs.ipv6: IPv6 assigned to peer on the tunnel
>> + * @tx_ring: queue of outgoing poackets to this peer
>> + * @rx_ring: queue of incoming packets from this peer
>> + * @netif_rx_ring: queue of packets to be sent to the netdevice via NAPI
>> + * @dst_cache: cache for dst_entry used to send to peer
>> + * @bind: remote peer binding
>> + * @halt: true if ovpn_peer_mark_delete was called
>> + * @delete_reason: why peer was deleted (i.e. timeout, transport error, ..)
>> + * @lock: protects binding to peer (bind)
>> + * @refcount: reference counter
>> + * @rcu: used to free peer in an RCU safe way
>> + * @delete_work: deferred cleanup work, used to notify userspace
>> + */
>> +struct ovpn_peer {
>> +	struct ovpn_struct *ovpn;
>> +	u32 id;
>> +	struct {
>> +		struct in_addr ipv4;
>> +		struct in6_addr ipv6;
>> +	} vpn_addrs;
>> +	struct ptr_ring tx_ring;
>> +	struct ptr_ring rx_ring;
>> +	struct ptr_ring netif_rx_ring;
>> +	struct dst_cache dst_cache;
>> +	struct ovpn_bind __rcu *bind;
>> +	bool halt;
>> +	enum ovpn_del_peer_reason delete_reason;
>> +	spinlock_t lock; /* protects bind */
> 
> nit: the comment isn't really needed, it's redundant with kdoc.

like before, also here I had a warning which I wanted to silence.

> 
>> +	struct kref refcount;
>> +	struct rcu_head rcu;
>> +	struct work_struct delete_work;
>> +};
> 

Thanks a lot!


-- 
Antonio Quartulli
OpenVPN Inc.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 08/24] ovpn: introduce the ovpn_socket object
  2024-05-08 17:10   ` Sabrina Dubroca
@ 2024-05-08 20:38     ` Antonio Quartulli
  2024-05-09 13:32       ` Sabrina Dubroca
  0 siblings, 1 reply; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-08 20:38 UTC (permalink / raw
  To: Sabrina Dubroca
  Cc: netdev, Jakub Kicinski, Sergey Ryazanov, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Esben Haabendal

On 08/05/2024 19:10, Sabrina Dubroca wrote:
> 2024-05-06, 03:16:21 +0200, Antonio Quartulli wrote:
>> This specific structure is used in the ovpn kernel module
>> to wrap and carry around a standard kernel socket.
>>
>> ovpn takes ownership of passed sockets and therefore an ovpn
>> specific objects is attathced to them for status tracking
> 
> typos:      object    attached

thanks

> 
> 
>> diff --git a/drivers/net/ovpn/socket.c b/drivers/net/ovpn/socket.c
>> new file mode 100644
>> index 000000000000..a4a4d69162f0
>> --- /dev/null
>> +++ b/drivers/net/ovpn/socket.c
> [...]
>> +
>> +/* Finalize release of socket, called after RCU grace period */
> 
> kref_put seems to call ovpn_socket_release_kref without waiting, and
> then that calls ovpn_socket_detach immediately as well. Am I missing
> something?

hmm what do we need to wait for exactly? (Maybe I am missing something)
The ovpn_socket will survive a bit longer thanks to kfree_rcu.

> 
>> +static void ovpn_socket_detach(struct socket *sock)
>> +{
>> +	if (!sock)
>> +		return;
>> +
>> +	sockfd_put(sock);
>> +}
> 
> [...]
>> +
>> +/* Finalize release of socket, called after RCU grace period */
> 
> Did that comment get misplaced? It doesn't match the code.

yeah it did. wiping it.

> 
>> +static int ovpn_socket_attach(struct socket *sock, struct ovpn_peer *peer)
>> +{
>> +	int ret = -EOPNOTSUPP;
>> +
>> +	if (!sock || !peer)
>> +		return -EINVAL;
>> +
>> +	if (sock->sk->sk_protocol == IPPROTO_UDP)
>> +		ret = ovpn_udp_socket_attach(sock, peer->ovpn);
>> +
>> +	return ret;
>> +}
> 
>> diff --git a/drivers/net/ovpn/udp.c b/drivers/net/ovpn/udp.c
>> new file mode 100644
>> index 000000000000..4b7d96a13df0
>> --- /dev/null
>> +++ b/drivers/net/ovpn/udp.c
> [...]
>> +
>> +int ovpn_udp_socket_attach(struct socket *sock, struct ovpn_struct *ovpn)
>> +{
>> +	struct ovpn_socket *old_data;
>> +
>> +	/* sanity check */
>> +	if (sock->sk->sk_protocol != IPPROTO_UDP) {
>> +		netdev_err(ovpn->dev, "%s: expected UDP socket\n", __func__);
> 
> Maybe use DEBUG_NET_WARN_ON_ONCE here since it's never expected to
> actually happen? That would help track down (in test/debug setups) how
> we ended up here.

will do, thanks for the suggestion

> 
>> +		return -EINVAL;
>> +	}
>> +
>> +	/* make sure no pre-existing encapsulation handler exists */
>> +	rcu_read_lock();
>> +	old_data = rcu_dereference_sk_user_data(sock->sk);
>> +	rcu_read_unlock();
>> +	if (old_data) {
>> +		if (old_data->ovpn == ovpn) {
> 
> You should stay under rcu_read_unlock if you access old_data's fields.

My assumption was: if we have an ovpn object in the user data, it means 
that its reference counter was increased to account for this usage.

But I presume we have no guarantee that it won't be decreased while 
outside of the rcu read lock area.

Will move the check inside.

> 
>> +			netdev_dbg(ovpn->dev,
>> +				   "%s: provided socket already owned by this interface\n",
>> +				   __func__);
>> +			return -EALREADY;
>> +		}
>> +
>> +		netdev_err(ovpn->dev, "%s: provided socket already taken by other user\n",
>> +			   __func__);
>> +		return -EBUSY;
>> +	}
>> +
>> +	return 0;
>> +}
> 

Thanks!

-- 
Antonio Quartulli
OpenVPN Inc.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 24/24] testing/selftest: add test tool and scripts for ovpn module
  2024-05-08  9:51     ` Antonio Quartulli
@ 2024-05-09  0:50       ` Jakub Kicinski
  2024-05-09  8:40         ` Antonio Quartulli
  0 siblings, 1 reply; 111+ messages in thread
From: Jakub Kicinski @ 2024-05-09  0:50 UTC (permalink / raw
  To: Antonio Quartulli
  Cc: netdev, Sergey Ryazanov, Paolo Abeni, Eric Dumazet, Andrew Lunn,
	Esben Haabendal

On Wed, 8 May 2024 11:51:46 +0200 Antonio Quartulli wrote:
> >> +TEST_GEN_PROGS_EXTENDED = ovpn-cli  
> > 
> > TEST_GEN_FILES - it's not a test at all, AFAICT.  
> 
> This binary is just a helper and it is used by the scripts below.
> 
> I only need it to be built before executing the run.sh script.
> 
> Isn't this the right VARIABLE to use for the purpose?

I don't think so, but the variables are pretty confusing I could be
wrong. My understanding is that TEST_GEN_PROGS_EXTENDED is for tests.
But tests which you don't want to run as unit tests. Like performance
tests, or some slow tests I guess. TEST_GEN_FILES is for building
dependencies and tools which are themselves not tests.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 00/24] Introducing OpenVPN Data Channel Offload
  2024-05-08  9:56   ` Antonio Quartulli
@ 2024-05-09  0:53     ` Jakub Kicinski
  2024-05-09  8:41       ` Antonio Quartulli
  0 siblings, 1 reply; 111+ messages in thread
From: Jakub Kicinski @ 2024-05-09  0:53 UTC (permalink / raw
  To: Antonio Quartulli
  Cc: netdev, Sergey Ryazanov, Paolo Abeni, Eric Dumazet, Andrew Lunn,
	Esben Haabendal

On Wed, 8 May 2024 11:56:45 +0200 Antonio Quartulli wrote:
> I see there is one warning to fix due to a typ0 (eventS_wq vs event_wq), 
> but I also get more warnings like this:
> 
> drivers/net/ovpn/peer.h:119: warning: Function parameter or struct 
> member 'vpn_addrs' not described in 'ovpn_peer'
> 
> However vpn_addrs is an anonymous struct within struct ovpn_peer.
> I have already documented all its members using the form:
> 
> @vpn_addrs.ipv4
> @vpn_addrs.ipv6
> 
> Am I expected to document the vpn_addrs as well?
> Or is this a false positive?

I think we need to trust the script on what's expected. 
The expectations around documenting anonymous structs may have 
changed recently, I remember fixing this in my code, too.

BTW make sure you use -Wall, people started sending trivial
patches to fix those :S Would be best not to add new ones.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 04/24] ovpn: add basic interface creation/destruction/management routines
  2024-05-08 14:52   ` Sabrina Dubroca
@ 2024-05-09  1:06     ` Jakub Kicinski
  2024-05-09  8:25     ` Antonio Quartulli
  1 sibling, 0 replies; 111+ messages in thread
From: Jakub Kicinski @ 2024-05-09  1:06 UTC (permalink / raw
  To: Sabrina Dubroca
  Cc: Antonio Quartulli, netdev, Sergey Ryazanov, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Esben Haabendal

On Wed, 8 May 2024 16:52:27 +0200 Sabrina Dubroca wrote:
> > +static int ovpn_net_open(struct net_device *dev)
> > +{
> > +	struct in_device *dev_v4 = __in_dev_get_rtnl(dev);
> > +
> > +	if (dev_v4) {
> > +		/* disable redirects as Linux gets confused by ovpn handling
> > +		 * same-LAN routing
> > +		 */
> > +		IN_DEV_CONF_SET(dev_v4, SEND_REDIRECTS, false);
> > +		IPV4_DEVCONF_ALL(dev_net(dev), SEND_REDIRECTS) = false;  
> 
> Jakub, are you ok with that? This feels a bit weird to have in the
> middle of a driver.

Herm, I only looked at the netlink bits so far.
Would be good to get more details on the problem and see if we can fix
it more directly.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 05/24] ovpn: implement interface creation/destruction via netlink
  2024-05-08  9:49     ` Antonio Quartulli
@ 2024-05-09  1:09       ` Jakub Kicinski
  2024-05-09  8:30         ` Antonio Quartulli
  0 siblings, 1 reply; 111+ messages in thread
From: Jakub Kicinski @ 2024-05-09  1:09 UTC (permalink / raw
  To: Antonio Quartulli
  Cc: netdev, Sergey Ryazanov, Paolo Abeni, Eric Dumazet, Andrew Lunn,
	Esben Haabendal

On Wed, 8 May 2024 11:49:07 +0200 Antonio Quartulli wrote:
> >> +		netdev_err(dev, "%s: cannot add ifname to reply\n", __func__);  
> > 
> > Probably not worth it, can't happen given the message size  
> 
> Personally I still prefer to check the return value of functions that 
> may fail, because somebody may break the assumption (i.e. message large 
> enough by design) without realizing that this call was relying on that.
> 
> If you want, I could still add a comment saying that we don't expect 
> this to happen.

In a few other places we put a WARN_ON_ONCE() on messages size errors.
That way syzbot usually catches the miscalculation rather quickly.
But no strong objections if you prefer the print.
  
> >> +		genlmsg_cancel(msg, hdr);
> >> +		nlmsg_free(msg);
> >> +		return -EMSGSIZE;
> >> +	}
> >> +
> >> +	genlmsg_end(msg, hdr);
> >> +
> >> +	return genlmsg_reply(msg, info);
> >>   }
> >>   
> >>   int ovpn_nl_del_iface_doit(struct sk_buff *skb, struct genl_info *info)
> >>   {
> >> -	return -ENOTSUPP;
> >> +	struct ovpn_struct *ovpn = info->user_ptr[0];
> >> +
> >> +	rtnl_lock();
> >> +	ovpn_iface_destruct(ovpn);
> >> +	dev_put(ovpn->dev);
> >> +	rtnl_unlock();
> >> +
> >> +	synchronize_net();  
> > 
> > Why? 🤔️  
> 
> 
> hmm I was under the impression that we should always call this function 
> when destroying an interface to make sure that packets that already 
> entered the network stack can be properly processed before the interface 
> is gone for good.
> 
> Maybe this is not the right place? Any hint?

The unregistration of the netdevice should take care of syncing packets
in flight, AFAIU.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 04/24] ovpn: add basic interface creation/destruction/management routines
  2024-05-08 14:52   ` Sabrina Dubroca
  2024-05-09  1:06     ` Jakub Kicinski
@ 2024-05-09  8:25     ` Antonio Quartulli
  2024-05-09 10:09       ` Sabrina Dubroca
  1 sibling, 1 reply; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-09  8:25 UTC (permalink / raw
  To: Sabrina Dubroca
  Cc: netdev, Jakub Kicinski, Sergey Ryazanov, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Esben Haabendal

On 08/05/2024 16:52, Sabrina Dubroca wrote:
> 2024-05-06, 03:16:17 +0200, Antonio Quartulli wrote:
>> diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c
>> index ad3813419c33..338e99dfe886 100644
>> --- a/drivers/net/ovpn/io.c
>> +++ b/drivers/net/ovpn/io.c
>> @@ -11,6 +11,26 @@
>>   #include <linux/skbuff.h>
>>   
>>   #include "io.h"
>> +#include "ovpnstruct.h"
>> +#include "netlink.h"
>> +
>> +int ovpn_struct_init(struct net_device *dev)
> 
> nit: Should this be in main.c? It's only used there, and I think it
> would make more sense to drop it next to ovpn_struct_free.

yeah, it makes sense. will move it.

> 
>> +{
>> +	struct ovpn_struct *ovpn = netdev_priv(dev);
>> +	int err;
>> +
> 
> [...]
>> diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c
>> index 33c0b004ce16..584cd7286aff 100644
>> --- a/drivers/net/ovpn/main.c
>> +++ b/drivers/net/ovpn/main.c
> [...]
>> +static void ovpn_struct_free(struct net_device *net)
>> +{
>> +	struct ovpn_struct *ovpn = netdev_priv(net);
>> +
>> +	rtnl_lock();
> 
>   ->priv_destructor can run from register_netdevice (already under
> RTNL), this doesn't look right.
> 
>> +	list_del(&ovpn->dev_list);
> 
> And if this gets called from register_netdevice, the list_add from
> ovpn_iface_create hasn't run yet, so this will probably do strange
> things?

Argh, again I haven't considered a failure in register_netdevice and you 
are indeed right.

Maybe it is better to call list_del() in the netdev notifier, upon 
NETDEV_UNREGISTER event?


> 
>> +	rtnl_unlock();
>> +
>> +	free_percpu(net->tstats);
>> +}
>> +
>> +static int ovpn_net_open(struct net_device *dev)
>> +{
>> +	struct in_device *dev_v4 = __in_dev_get_rtnl(dev);
>> +
>> +	if (dev_v4) {
>> +		/* disable redirects as Linux gets confused by ovpn handling
>> +		 * same-LAN routing
>> +		 */
>> +		IN_DEV_CONF_SET(dev_v4, SEND_REDIRECTS, false);
>> +		IPV4_DEVCONF_ALL(dev_net(dev), SEND_REDIRECTS) = false;
> 
> Jakub, are you ok with that? This feels a bit weird to have in the
> middle of a driver.

Let me share what the problem is (copied from the email I sent to Andrew 
Lunn as he was also curious about this):

The reason for requiring this setting lies in the OpenVPN server acting 
as relay point (star topology) for hosts in the same subnet.

Example: given the a.b.c.0/24 IP network, you have .2 that in order to 
talk to .3 must have its traffic relayed by .1 (the server).

When the kernel (at .1) sees this traffic it will send the ICMP 
redirects, because it believes that .2 should directly talk to .3 
without passing through .1.

Of course it makes sense in a normal network with a classic broadcast 
domain, but this is not the case in a VPN implemented as a star topology.

Does it make sense?

The only way I see to fix this globally is to have an extra flag in the 
netdevice signaling this peculiarity and thus disabling ICMP redirects 
automatically.

Note: wireguard has those lines too, as it probably needs to address the 
same scenario.


> 
>> +	}
>> +
>> +	netif_tx_start_all_queues(dev);
>> +	return 0;
>> +}
> 
> [...]
>> +void ovpn_iface_destruct(struct ovpn_struct *ovpn)
>> +{
>> +	ASSERT_RTNL();
>> +
>> +	netif_carrier_off(ovpn->dev);
>> +
>> +	ovpn->registered = false;
>> +
>> +	unregister_netdevice(ovpn->dev);
>> +	synchronize_net();
> 
> If this gets called from the loop in ovpn_netns_pre_exit, one
> synchronize_net per ovpn device would seem quite expensive.

As per your other comment, maybe I should just remove the 
synchronize_net() entirely since it'll be the core to take care of 
inflight packets?

> 
>> +}
>> +
>>   static int ovpn_netdev_notifier_call(struct notifier_block *nb,
>>   				     unsigned long state, void *ptr)
>>   {
>>   	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
>> +	struct ovpn_struct *ovpn;
>>   
>>   	if (!ovpn_dev_is_valid(dev))
>>   		return NOTIFY_DONE;
>>   
>> +	ovpn = netdev_priv(dev);
>> +
>>   	switch (state) {
>>   	case NETDEV_REGISTER:
>> -		/* add device to internal list for later destruction upon
>> -		 * unregistration
>> -		 */
>> +		ovpn->registered = true;
>>   		break;
>>   	case NETDEV_UNREGISTER:
>> +		/* twiddle thumbs on netns device moves */
>> +		if (dev->reg_state != NETREG_UNREGISTERING)
>> +			break;
>> +
>>   		/* can be delivered multiple times, so check registered flag,
>>   		 * then destroy the interface
>>   		 */
>> +		if (!ovpn->registered)
>> +			return NOTIFY_DONE;
>> +
>> +		ovpn_iface_destruct(ovpn);
> 
> Maybe I'm misunderstanding this code. Why do you want to manually
> destroy a device that is already going away?

We need to perform some internal cleanup (i.e. release all peers).
I don't see how this can happen automatically, no?

> 
>>   		break;
>>   	case NETDEV_POST_INIT:
>>   	case NETDEV_GOING_DOWN:
>>   	case NETDEV_DOWN:
>>   	case NETDEV_UP:
>>   	case NETDEV_PRE_UP:
>> +		break;
>>   	default:
>>   		return NOTIFY_DONE;
>>   	}
>> @@ -62,6 +210,24 @@ static struct notifier_block ovpn_netdev_notifier = {
>>   	.notifier_call = ovpn_netdev_notifier_call,
>>   };
>>   
>> +static void ovpn_netns_pre_exit(struct net *net)
>> +{
>> +	struct ovpn_struct *ovpn;
>> +
>> +	rtnl_lock();
>> +	list_for_each_entry(ovpn, &dev_list, dev_list) {
>> +		if (dev_net(ovpn->dev) != net)
>> +			continue;
>> +
>> +		ovpn_iface_destruct(ovpn);
> 
> Is this needed? On netns destruction all devices within the ns will be
> destroyed by the networking core.

Before implementing ovpn_netns_pre_exit() this way, upon namespace 
deletion the ovpn interface was being moved to the global namespace.

Hence I decided to manually take care of its destruction.

Isn't this expected?

-- 
Antonio Quartulli
OpenVPN Inc.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 05/24] ovpn: implement interface creation/destruction via netlink
  2024-05-09  1:09       ` Jakub Kicinski
@ 2024-05-09  8:30         ` Antonio Quartulli
  0 siblings, 0 replies; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-09  8:30 UTC (permalink / raw
  To: Jakub Kicinski
  Cc: netdev, Sergey Ryazanov, Paolo Abeni, Eric Dumazet, Andrew Lunn,
	Esben Haabendal

On 09/05/2024 03:09, Jakub Kicinski wrote:
> On Wed, 8 May 2024 11:49:07 +0200 Antonio Quartulli wrote:
>>>> +		netdev_err(dev, "%s: cannot add ifname to reply\n", __func__);
>>>
>>> Probably not worth it, can't happen given the message size
>>
>> Personally I still prefer to check the return value of functions that
>> may fail, because somebody may break the assumption (i.e. message large
>> enough by design) without realizing that this call was relying on that.
>>
>> If you want, I could still add a comment saying that we don't expect
>> this to happen.
> 
> In a few other places we put a WARN_ON_ONCE() on messages size errors.
> That way syzbot usually catches the miscalculation rather quickly.
> But no strong objections if you prefer the print.

I am fine as long as we have some check.
If WARN_ON_ONCE() helps syzbot, then I'll go with it.

>    
>>>> +		genlmsg_cancel(msg, hdr);
>>>> +		nlmsg_free(msg);
>>>> +		return -EMSGSIZE;
>>>> +	}
>>>> +
>>>> +	genlmsg_end(msg, hdr);
>>>> +
>>>> +	return genlmsg_reply(msg, info);
>>>>    }
>>>>    
>>>>    int ovpn_nl_del_iface_doit(struct sk_buff *skb, struct genl_info *info)
>>>>    {
>>>> -	return -ENOTSUPP;
>>>> +	struct ovpn_struct *ovpn = info->user_ptr[0];
>>>> +
>>>> +	rtnl_lock();
>>>> +	ovpn_iface_destruct(ovpn);
>>>> +	dev_put(ovpn->dev);
>>>> +	rtnl_unlock();
>>>> +
>>>> +	synchronize_net();
>>>
>>> Why? 🤔️
>>
>>
>> hmm I was under the impression that we should always call this function
>> when destroying an interface to make sure that packets that already
>> entered the network stack can be properly processed before the interface
>> is gone for good.
>>
>> Maybe this is not the right place? Any hint?
> 
> The unregistration of the netdevice should take care of syncing packets
> in flight, AFAIU.

I have another call to synchronize_net() in ovpn_iface_destruct() after 
calling unregister_netdevice().

Sabrina was actually questioning that call too.

First of all I now realize that we are calling it twice, but from what I 
am understanding, I think we can just ditch any invocation and let core 
do the right thing.

I'll remove it and do some tests.


-- 
Antonio Quartulli
OpenVPN Inc.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 24/24] testing/selftest: add test tool and scripts for ovpn module
  2024-05-09  0:50       ` Jakub Kicinski
@ 2024-05-09  8:40         ` Antonio Quartulli
  0 siblings, 0 replies; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-09  8:40 UTC (permalink / raw
  To: Jakub Kicinski
  Cc: netdev, Sergey Ryazanov, Paolo Abeni, Eric Dumazet, Andrew Lunn,
	Esben Haabendal

On 09/05/2024 02:50, Jakub Kicinski wrote:
> On Wed, 8 May 2024 11:51:46 +0200 Antonio Quartulli wrote:
>>>> +TEST_GEN_PROGS_EXTENDED = ovpn-cli
>>>
>>> TEST_GEN_FILES - it's not a test at all, AFAICT.
>>
>> This binary is just a helper and it is used by the scripts below.
>>
>> I only need it to be built before executing the run.sh script.
>>
>> Isn't this the right VARIABLE to use for the purpose?
> 
> I don't think so, but the variables are pretty confusing I could be
> wrong. My understanding is that TEST_GEN_PROGS_EXTENDED is for tests.
> But tests which you don't want to run as unit tests. Like performance
> tests, or some slow tests I guess. TEST_GEN_FILES is for building
> dependencies and tools which are themselves not tests.

I just re-tested and you are indeed right.
Will switch to TEST_GEN_FILES.

Thanks a lot!


-- 
Antonio Quartulli
OpenVPN Inc.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 00/24] Introducing OpenVPN Data Channel Offload
  2024-05-09  0:53     ` Jakub Kicinski
@ 2024-05-09  8:41       ` Antonio Quartulli
  0 siblings, 0 replies; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-09  8:41 UTC (permalink / raw
  To: Jakub Kicinski
  Cc: netdev, Sergey Ryazanov, Paolo Abeni, Eric Dumazet, Andrew Lunn,
	Esben Haabendal

On 09/05/2024 02:53, Jakub Kicinski wrote:
> On Wed, 8 May 2024 11:56:45 +0200 Antonio Quartulli wrote:
>> I see there is one warning to fix due to a typ0 (eventS_wq vs event_wq),
>> but I also get more warnings like this:
>>
>> drivers/net/ovpn/peer.h:119: warning: Function parameter or struct
>> member 'vpn_addrs' not described in 'ovpn_peer'
>>
>> However vpn_addrs is an anonymous struct within struct ovpn_peer.
>> I have already documented all its members using the form:
>>
>> @vpn_addrs.ipv4
>> @vpn_addrs.ipv6
>>
>> Am I expected to document the vpn_addrs as well?
>> Or is this a false positive?
> 
> I think we need to trust the script on what's expected.
> The expectations around documenting anonymous structs may have
> changed recently, I remember fixing this in my code, too.

Alright, I will document those structs too then.

> 
> BTW make sure you use -Wall, people started sending trivial
> patches to fix those :S Would be best not to add new ones.

eheh, rebase -exec is my friend :-)
No warning shall pass!

Thanks a lot,

-- 
Antonio Quartulli
OpenVPN Inc.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 04/24] ovpn: add basic interface creation/destruction/management routines
  2024-05-09  8:25     ` Antonio Quartulli
@ 2024-05-09 10:09       ` Sabrina Dubroca
  2024-05-09 10:35         ` Antonio Quartulli
  0 siblings, 1 reply; 111+ messages in thread
From: Sabrina Dubroca @ 2024-05-09 10:09 UTC (permalink / raw
  To: Antonio Quartulli
  Cc: netdev, Jakub Kicinski, Sergey Ryazanov, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Esben Haabendal

2024-05-09, 10:25:44 +0200, Antonio Quartulli wrote:
> On 08/05/2024 16:52, Sabrina Dubroca wrote:
> > 2024-05-06, 03:16:17 +0200, Antonio Quartulli wrote:
> > > diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c
> > > index 33c0b004ce16..584cd7286aff 100644
> > > --- a/drivers/net/ovpn/main.c
> > > +++ b/drivers/net/ovpn/main.c
> > [...]
> > > +static void ovpn_struct_free(struct net_device *net)
> > > +{
> > > +	struct ovpn_struct *ovpn = netdev_priv(net);
> > > +
> > > +	rtnl_lock();
> > 
> >   ->priv_destructor can run from register_netdevice (already under
> > RTNL), this doesn't look right.
> > 
> > > +	list_del(&ovpn->dev_list);
> > 
> > And if this gets called from register_netdevice, the list_add from
> > ovpn_iface_create hasn't run yet, so this will probably do strange
> > things?
> 
> Argh, again I haven't considered a failure in register_netdevice and you are
> indeed right.
> 
> Maybe it is better to call list_del() in the netdev notifier, upon
> NETDEV_UNREGISTER event?

I'd like to avoid splitting the clean up code over so maybe different
functions and called through different means. Keep it simple.

AFAICT the only reason you need this list is to delete your devices on
netns exit, so if we can get rid of that the list can go away.


> > > +static int ovpn_net_open(struct net_device *dev)
> > > +{
> > > +	struct in_device *dev_v4 = __in_dev_get_rtnl(dev);
> > > +
> > > +	if (dev_v4) {
> > > +		/* disable redirects as Linux gets confused by ovpn handling
> > > +		 * same-LAN routing
> > > +		 */
> > > +		IN_DEV_CONF_SET(dev_v4, SEND_REDIRECTS, false);
> > > +		IPV4_DEVCONF_ALL(dev_net(dev), SEND_REDIRECTS) = false;
> > 
> > Jakub, are you ok with that? This feels a bit weird to have in the
> > middle of a driver.
> 
> Let me share what the problem is (copied from the email I sent to Andrew
> Lunn as he was also curious about this):
> 
> The reason for requiring this setting lies in the OpenVPN server acting as
> relay point (star topology) for hosts in the same subnet.
> 
> Example: given the a.b.c.0/24 IP network, you have .2 that in order to talk
> to .3 must have its traffic relayed by .1 (the server).
> 
> When the kernel (at .1) sees this traffic it will send the ICMP redirects,
> because it believes that .2 should directly talk to .3 without passing
> through .1.

So only the server would need to stop sending them, not the client?
(or the client would need to ignore them)
But the kernel has no way of knowing if an ovpn device is on a client
or a server?

> Of course it makes sense in a normal network with a classic broadcast
> domain, but this is not the case in a VPN implemented as a star topology.
> 
> Does it make sense?

It looks like the problem is that ovpn links are point-to-point
(instead of a broadcast LAN kind of link where redirects would make
sense), and the kernel doesn't handle it that way.

> The only way I see to fix this globally is to have an extra flag in the
> netdevice signaling this peculiarity and thus disabling ICMP redirects
> automatically.
> 
> Note: wireguard has those lines too, as it probably needs to address the
> same scenario.

I've noticed a lot of similarities in some bits I've looked at, and I
hate that this is turning into another pile of duplicate code like
vxlan/geneve, bond/team, etc :(


> > [...]
> > > +void ovpn_iface_destruct(struct ovpn_struct *ovpn)
> > > +{
> > > +	ASSERT_RTNL();
> > > +
> > > +	netif_carrier_off(ovpn->dev);
> > > +
> > > +	ovpn->registered = false;
> > > +
> > > +	unregister_netdevice(ovpn->dev);
> > > +	synchronize_net();
> > 
> > If this gets called from the loop in ovpn_netns_pre_exit, one
> > synchronize_net per ovpn device would seem quite expensive.
> 
> As per your other comment, maybe I should just remove the synchronize_net()
> entirely since it'll be the core to take care of inflight packets?

There's a synchronize_net in unregister_netdevice_many_notify, so I'd
say you can get rid of it here.


> > >   static int ovpn_netdev_notifier_call(struct notifier_block *nb,
> > >   				     unsigned long state, void *ptr)
> > >   {
> > >   	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
> > > +	struct ovpn_struct *ovpn;
> > >   	if (!ovpn_dev_is_valid(dev))
> > >   		return NOTIFY_DONE;
> > > +	ovpn = netdev_priv(dev);
> > > +
> > >   	switch (state) {
> > >   	case NETDEV_REGISTER:
> > > -		/* add device to internal list for later destruction upon
> > > -		 * unregistration
> > > -		 */
> > > +		ovpn->registered = true;
> > >   		break;
> > >   	case NETDEV_UNREGISTER:
> > > +		/* twiddle thumbs on netns device moves */
> > > +		if (dev->reg_state != NETREG_UNREGISTERING)
> > > +			break;
> > > +
> > >   		/* can be delivered multiple times, so check registered flag,
> > >   		 * then destroy the interface
> > >   		 */
> > > +		if (!ovpn->registered)
> > > +			return NOTIFY_DONE;
> > > +
> > > +		ovpn_iface_destruct(ovpn);
> > 
> > Maybe I'm misunderstanding this code. Why do you want to manually
> > destroy a device that is already going away?
> 
> We need to perform some internal cleanup (i.e. release all peers).
> I don't see how this can happen automatically, no?

That's what ->priv_destructor does, and it will be called ultimately
by the unregister_netdevice call you have in ovpn_iface_destruct (in
netdev_run_todo). Anyway, this UNREGISTER event is probably generated
by unregister_netdevice_many_notify (basically a previous
unregister_netdevice() call), so I don't know why you want to call
unregister_netdevice again on the same device.


> > > @@ -62,6 +210,24 @@ static struct notifier_block ovpn_netdev_notifier = {
> > >   	.notifier_call = ovpn_netdev_notifier_call,
> > >   };
> > > +static void ovpn_netns_pre_exit(struct net *net)
> > > +{
> > > +	struct ovpn_struct *ovpn;
> > > +
> > > +	rtnl_lock();
> > > +	list_for_each_entry(ovpn, &dev_list, dev_list) {
> > > +		if (dev_net(ovpn->dev) != net)
> > > +			continue;
> > > +
> > > +		ovpn_iface_destruct(ovpn);
> > 
> > Is this needed? On netns destruction all devices within the ns will be
> > destroyed by the networking core.
> 
> Before implementing ovpn_netns_pre_exit() this way, upon namespace deletion
> the ovpn interface was being moved to the global namespace.

Crap it's only the devices with ->rtnl_link_ops that get killed by the
core. Because you create your devices via genl (which I'm not a fan
of, even if it's a bit nicer for userspace having a single netlink api
to deal with), default_device_exit_batch/default_device_exit_net think
ovpn devices are real NICs and move them back to init_net instead of
destroying them.

Maybe we can extend the condition in default_device_exit_net with a
new flag so that ovpn devices get destroyed by the core, even without
rtnl_link_ops?

-- 
Sabrina


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 04/24] ovpn: add basic interface creation/destruction/management routines
  2024-05-09 10:09       ` Sabrina Dubroca
@ 2024-05-09 10:35         ` Antonio Quartulli
  2024-05-09 12:16           ` Sabrina Dubroca
  0 siblings, 1 reply; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-09 10:35 UTC (permalink / raw
  To: Sabrina Dubroca
  Cc: netdev, Jakub Kicinski, Sergey Ryazanov, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Esben Haabendal

On 09/05/2024 12:09, Sabrina Dubroca wrote:
> 2024-05-09, 10:25:44 +0200, Antonio Quartulli wrote:
>> On 08/05/2024 16:52, Sabrina Dubroca wrote:
>>> 2024-05-06, 03:16:17 +0200, Antonio Quartulli wrote:
>>>> diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c
>>>> index 33c0b004ce16..584cd7286aff 100644
>>>> --- a/drivers/net/ovpn/main.c
>>>> +++ b/drivers/net/ovpn/main.c
>>> [...]
>>>> +static void ovpn_struct_free(struct net_device *net)
>>>> +{
>>>> +	struct ovpn_struct *ovpn = netdev_priv(net);
>>>> +
>>>> +	rtnl_lock();
>>>
>>>    ->priv_destructor can run from register_netdevice (already under
>>> RTNL), this doesn't look right.
>>>
>>>> +	list_del(&ovpn->dev_list);
>>>
>>> And if this gets called from register_netdevice, the list_add from
>>> ovpn_iface_create hasn't run yet, so this will probably do strange
>>> things?
>>
>> Argh, again I haven't considered a failure in register_netdevice and you are
>> indeed right.
>>
>> Maybe it is better to call list_del() in the netdev notifier, upon
>> NETDEV_UNREGISTER event?
> 
> I'd like to avoid splitting the clean up code over so maybe different
> functions and called through different means. Keep it simple.
> 
> AFAICT the only reason you need this list is to delete your devices on
> netns exit, so if we can get rid of that the list can go away.

right.

> 
> 
>>>> +static int ovpn_net_open(struct net_device *dev)
>>>> +{
>>>> +	struct in_device *dev_v4 = __in_dev_get_rtnl(dev);
>>>> +
>>>> +	if (dev_v4) {
>>>> +		/* disable redirects as Linux gets confused by ovpn handling
>>>> +		 * same-LAN routing
>>>> +		 */
>>>> +		IN_DEV_CONF_SET(dev_v4, SEND_REDIRECTS, false);
>>>> +		IPV4_DEVCONF_ALL(dev_net(dev), SEND_REDIRECTS) = false;
>>>
>>> Jakub, are you ok with that? This feels a bit weird to have in the
>>> middle of a driver.
>>
>> Let me share what the problem is (copied from the email I sent to Andrew
>> Lunn as he was also curious about this):
>>
>> The reason for requiring this setting lies in the OpenVPN server acting as
>> relay point (star topology) for hosts in the same subnet.
>>
>> Example: given the a.b.c.0/24 IP network, you have .2 that in order to talk
>> to .3 must have its traffic relayed by .1 (the server).
>>
>> When the kernel (at .1) sees this traffic it will send the ICMP redirects,
>> because it believes that .2 should directly talk to .3 without passing
>> through .1.
> 
> So only the server would need to stop sending them, not the client?

correct

> (or the client would need to ignore them)
> But the kernel has no way of knowing if an ovpn device is on a client
> or a server?

the server knows if the interface is configured in P2P or MP (MultiPeer) 
mode. The latter is what requires redirects to be off, so we could at 
least add a check and switch them off only for MP ifaces.

> 
>> Of course it makes sense in a normal network with a classic broadcast
>> domain, but this is not the case in a VPN implemented as a star topology.
>>
>> Does it make sense?
> 
> It looks like the problem is that ovpn links are point-to-point
> (instead of a broadcast LAN kind of link where redirects would make
> sense), and the kernel doesn't handle it that way.

exactly

> 
>> The only way I see to fix this globally is to have an extra flag in the
>> netdevice signaling this peculiarity and thus disabling ICMP redirects
>> automatically.
>>
>> Note: wireguard has those lines too, as it probably needs to address the
>> same scenario.
> 
> I've noticed a lot of similarities in some bits I've looked at, and I
> hate that this is turning into another pile of duplicate code like
> vxlan/geneve, bond/team, etc :(

For starters, we could at least moves these few lines in some helper 
function and call it from both modules.

On the other hand, we could, like I suggested above, convert this into a 
netdev flag and let core handle the behaviour when the flag is set.

> 
> 
>>> [...]
>>>> +void ovpn_iface_destruct(struct ovpn_struct *ovpn)
>>>> +{
>>>> +	ASSERT_RTNL();
>>>> +
>>>> +	netif_carrier_off(ovpn->dev);
>>>> +
>>>> +	ovpn->registered = false;
>>>> +
>>>> +	unregister_netdevice(ovpn->dev);
>>>> +	synchronize_net();
>>>
>>> If this gets called from the loop in ovpn_netns_pre_exit, one
>>> synchronize_net per ovpn device would seem quite expensive.
>>
>> As per your other comment, maybe I should just remove the synchronize_net()
>> entirely since it'll be the core to take care of inflight packets?
> 
> There's a synchronize_net in unregister_netdevice_many_notify, so I'd
> say you can get rid of it here.

ok! Jakub was indeed suggesting that core should already take care of this.

Will remove it for good.

> 
> 
>>>>    static int ovpn_netdev_notifier_call(struct notifier_block *nb,
>>>>    				     unsigned long state, void *ptr)
>>>>    {
>>>>    	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
>>>> +	struct ovpn_struct *ovpn;
>>>>    	if (!ovpn_dev_is_valid(dev))
>>>>    		return NOTIFY_DONE;
>>>> +	ovpn = netdev_priv(dev);
>>>> +
>>>>    	switch (state) {
>>>>    	case NETDEV_REGISTER:
>>>> -		/* add device to internal list for later destruction upon
>>>> -		 * unregistration
>>>> -		 */
>>>> +		ovpn->registered = true;
>>>>    		break;
>>>>    	case NETDEV_UNREGISTER:
>>>> +		/* twiddle thumbs on netns device moves */
>>>> +		if (dev->reg_state != NETREG_UNREGISTERING)
>>>> +			break;
>>>> +
>>>>    		/* can be delivered multiple times, so check registered flag,
>>>>    		 * then destroy the interface
>>>>    		 */
>>>> +		if (!ovpn->registered)
>>>> +			return NOTIFY_DONE;
>>>> +
>>>> +		ovpn_iface_destruct(ovpn);
>>>
>>> Maybe I'm misunderstanding this code. Why do you want to manually
>>> destroy a device that is already going away?
>>
>> We need to perform some internal cleanup (i.e. release all peers).
>> I don't see how this can happen automatically, no?
> 
> That's what ->priv_destructor does, 

Not really.

Every peer object increases the netdev refcounter to the netdev, 
therefore we must first delete all peers in order to have 
netdevice->refcnt reach 0 (and then invoke priv_destructor).

So the idea is: upon UNREGISTER event we drop all resources and 
eventually (via RCU) all references to the netdev are also released, 
which in turn triggers the destructor.

makes sense?


> and it will be called ultimately
> by the unregister_netdevice call you have in ovpn_iface_destruct (in
> netdev_run_todo). Anyway, this UNREGISTER event is probably generated
> by unregister_netdevice_many_notify (basically a previous
> unregister_netdevice() call), so I don't know why you want to call
> unregister_netdevice again on the same device.

I believe I have seen this notification being triggered upon netns exit, 
but in that case the netdevice was not being removed from core.

Hence I decided to fully trigger the unregistration.

Expected?

I can repeat the test to be sure.

> 
> 
>>>> @@ -62,6 +210,24 @@ static struct notifier_block ovpn_netdev_notifier = {
>>>>    	.notifier_call = ovpn_netdev_notifier_call,
>>>>    };
>>>> +static void ovpn_netns_pre_exit(struct net *net)
>>>> +{
>>>> +	struct ovpn_struct *ovpn;
>>>> +
>>>> +	rtnl_lock();
>>>> +	list_for_each_entry(ovpn, &dev_list, dev_list) {
>>>> +		if (dev_net(ovpn->dev) != net)
>>>> +			continue;
>>>> +
>>>> +		ovpn_iface_destruct(ovpn);
>>>
>>> Is this needed? On netns destruction all devices within the ns will be
>>> destroyed by the networking core.
>>
>> Before implementing ovpn_netns_pre_exit() this way, upon namespace deletion
>> the ovpn interface was being moved to the global namespace.
> 
> Crap it's only the devices with ->rtnl_link_ops that get killed by the
> core. 

exactly! this goes hand to hand with my comment above: event delivered 
but interface not destroyed.

> Because you create your devices via genl (which I'm not a fan
> of, even if it's a bit nicer for userspace having a single netlink api
> to deal with),

Originally I had implemented the rtnl_link_ops, but the (meaningful) 
objection was that a user is never supposed to create an ovpn iface by 
himself, but there should always be an openvpn process running in 
userspace. Hence the restriction to genl only.

> default_device_exit_batch/default_device_exit_net think
> ovpn devices are real NICs and move them back to init_net instead of
> destroying them.
> 
> Maybe we can extend the condition in default_device_exit_net with a
> new flag so that ovpn devices get destroyed by the core, even without
> rtnl_link_ops?

Thanks for pointing out the function responsible for this decision.
How would you extend the check though?

Alternatively, what if ovpn simply registers an empty rtnl_link_ops with 
netns_fund set to false? That should make the condition happy, while 
keeping ovpn genl-only


Thanks a lot


-- 
Antonio Quartulli
OpenVPN Inc.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 04/24] ovpn: add basic interface creation/destruction/management routines
  2024-05-09 10:35         ` Antonio Quartulli
@ 2024-05-09 12:16           ` Sabrina Dubroca
  2024-05-09 13:25             ` Antonio Quartulli
  0 siblings, 1 reply; 111+ messages in thread
From: Sabrina Dubroca @ 2024-05-09 12:16 UTC (permalink / raw
  To: Antonio Quartulli
  Cc: netdev, Jakub Kicinski, Sergey Ryazanov, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Esben Haabendal

2024-05-09, 12:35:32 +0200, Antonio Quartulli wrote:
> On 09/05/2024 12:09, Sabrina Dubroca wrote:
> > 2024-05-09, 10:25:44 +0200, Antonio Quartulli wrote:
> > > On 08/05/2024 16:52, Sabrina Dubroca wrote:
> > > > 2024-05-06, 03:16:17 +0200, Antonio Quartulli wrote:
> > > > >    static int ovpn_netdev_notifier_call(struct notifier_block *nb,
> > > > >    				     unsigned long state, void *ptr)
> > > > >    {
> > > > >    	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
> > > > > +	struct ovpn_struct *ovpn;
> > > > >    	if (!ovpn_dev_is_valid(dev))
> > > > >    		return NOTIFY_DONE;
> > > > > +	ovpn = netdev_priv(dev);
> > > > > +
> > > > >    	switch (state) {
> > > > >    	case NETDEV_REGISTER:
> > > > > -		/* add device to internal list for later destruction upon
> > > > > -		 * unregistration
> > > > > -		 */
> > > > > +		ovpn->registered = true;
> > > > >    		break;
> > > > >    	case NETDEV_UNREGISTER:
> > > > > +		/* twiddle thumbs on netns device moves */
> > > > > +		if (dev->reg_state != NETREG_UNREGISTERING)
> > > > > +			break;
> > > > > +
> > > > >    		/* can be delivered multiple times, so check registered flag,
> > > > >    		 * then destroy the interface
> > > > >    		 */
> > > > > +		if (!ovpn->registered)
> > > > > +			return NOTIFY_DONE;
> > > > > +
> > > > > +		ovpn_iface_destruct(ovpn);
> > > > 
> > > > Maybe I'm misunderstanding this code. Why do you want to manually
> > > > destroy a device that is already going away?
> > > 
> > > We need to perform some internal cleanup (i.e. release all peers).
> > > I don't see how this can happen automatically, no?
> > 
> > That's what ->priv_destructor does,
> 
> Not really.
> 
> Every peer object increases the netdev refcounter to the netdev, therefore
> we must first delete all peers in order to have netdevice->refcnt reach 0
> (and then invoke priv_destructor).

Oh, I see. I'm still trying to wrap my head around all the objects and
components of your driver.

> So the idea is: upon UNREGISTER event we drop all resources and eventually
> (via RCU) all references to the netdev are also released, which in turn
> triggers the destructor.
> 
> makes sense?

That part, yes, thanks for explaining. Do you really need the peers to
hold a reference on the netdevice? With my limited understanding, it
seems the peers are sub-objects of the netdevice.

> > and it will be called ultimately
> > by the unregister_netdevice call you have in ovpn_iface_destruct (in
> > netdev_run_todo). Anyway, this UNREGISTER event is probably generated
> > by unregister_netdevice_many_notify (basically a previous
> > unregister_netdevice() call), so I don't know why you want to call
> > unregister_netdevice again on the same device.
> 
> I believe I have seen this notification being triggered upon netns exit, but
> in that case the netdevice was not being removed from core.

Sure, but you have a comment about that and you're filtering that
event, so I'm ignoring this case.

> Hence I decided to fully trigger the unregistration.

That's the bit that doesn't make sense to me: the device is going
away, so you trigger a manual unregister. Cleaning up some additional
resources (peers etc), that makes sense. But calling
unregister_netdevice (when you're most likely getting called from
unregister_netdevice already, because I don't see other spots setting
dev->reg_state = NETREG_UNREGISTERING) is what I don't get. And I
wonder why you're not hitting the BUG_ON in
unregister_netdevice_many_notify:

    BUG_ON(dev->reg_state != NETREG_REGISTERED);


> > > > > @@ -62,6 +210,24 @@ static struct notifier_block ovpn_netdev_notifier = {
> > > > >    	.notifier_call = ovpn_netdev_notifier_call,
> > > > >    };
> > > > > +static void ovpn_netns_pre_exit(struct net *net)

BTW, in case you end up keeping this function, it should have
__net_exit annotation (see for example ipv4_frags_exit_net).

> > > > > +{
> > > > > +	struct ovpn_struct *ovpn;
> > > > > +
> > > > > +	rtnl_lock();
> > > > > +	list_for_each_entry(ovpn, &dev_list, dev_list) {
> > > > > +		if (dev_net(ovpn->dev) != net)
> > > > > +			continue;
> > > > > +
> > > > > +		ovpn_iface_destruct(ovpn);
> > > > 
> > > > Is this needed? On netns destruction all devices within the ns will be
> > > > destroyed by the networking core.
> > > 
> > > Before implementing ovpn_netns_pre_exit() this way, upon namespace deletion
> > > the ovpn interface was being moved to the global namespace.
> > 
> > Crap it's only the devices with ->rtnl_link_ops that get killed by the
> > core.
> 
> exactly! this goes hand to hand with my comment above: event delivered but
> interface not destroyed.

There's no event sent to ovpn_netdev_notifier_call in that case (well,
only the fake "unregister" out of the current netns that you're
ignoring). Otherwise, you wouldn't need ovpn_netns_pre_exit.

> > Because you create your devices via genl (which I'm not a fan
> > of, even if it's a bit nicer for userspace having a single netlink api
> > to deal with),
> 
> Originally I had implemented the rtnl_link_ops, but the (meaningful)
> objection was that a user is never supposed to create an ovpn iface by
> himself, but there should always be an openvpn process running in userspace.
> Hence the restriction to genl only.

Sorry, but how does genl prevent a user from creating the ovpn
interface manually? Whatever API you define, anyone who manages to
come up with the right netlink message will be able to create an
interface. You can't stop people from using your API without your
official client.

> > default_device_exit_batch/default_device_exit_net think
> > ovpn devices are real NICs and move them back to init_net instead of
> > destroying them.
> > 
> > Maybe we can extend the condition in default_device_exit_net with a
> > new flag so that ovpn devices get destroyed by the core, even without
> > rtnl_link_ops?
> 
> Thanks for pointing out the function responsible for this decision.
> How would you extend the check though?
>
> Alternatively, what if ovpn simply registers an empty rtnl_link_ops with
> netns_fund set to false? That should make the condition happy, while keeping
> ovpn genl-only

Yes. I was thinking about adding a flag to the device, because I
wasn't sure an almost empty rtnl_link_ops could be handled safely, but
it seems ok. ovs does it, see commit 5b9e7e160795 ("openvswitch:
introduce rtnl ops stub"). And, as that commit message says, "ip -d
link show" would also show that the device is of type openvpn (or
ovpn, whatever you put in ops->kind), which would be nice.

-- 
Sabrina


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 07/24] ovpn: introduce the ovpn_peer object
  2024-05-08 20:31     ` Antonio Quartulli
@ 2024-05-09 13:04       ` Sabrina Dubroca
  2024-05-09 13:24         ` Andrew Lunn
  2024-05-09 13:44         ` Antonio Quartulli
  0 siblings, 2 replies; 111+ messages in thread
From: Sabrina Dubroca @ 2024-05-09 13:04 UTC (permalink / raw
  To: Antonio Quartulli
  Cc: netdev, Jakub Kicinski, Sergey Ryazanov, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Esben Haabendal

2024-05-08, 22:31:51 +0200, Antonio Quartulli wrote:
> On 08/05/2024 18:06, Sabrina Dubroca wrote:
> > 2024-05-06, 03:16:20 +0200, Antonio Quartulli wrote:
> > > diff --git a/drivers/net/ovpn/ovpnstruct.h b/drivers/net/ovpn/ovpnstruct.h
> > > index ee05b8a2c61d..b79d4f0474b0 100644
> > > --- a/drivers/net/ovpn/ovpnstruct.h
> > > +++ b/drivers/net/ovpn/ovpnstruct.h
> > > @@ -17,12 +17,19 @@
> > >    * @dev: the actual netdev representing the tunnel
> > >    * @registered: whether dev is still registered with netdev or not
> > >    * @mode: device operation mode (i.e. p2p, mp, ..)
> > > + * @lock: protect this object
> > > + * @event_wq: used to schedule generic events that may sleep and that need to be
> > > + *            performed outside of softirq context
> > > + * @peer: in P2P mode, this is the only remote peer
> > >    * @dev_list: entry for the module wide device list
> > >    */
> > >   struct ovpn_struct {
> > >   	struct net_device *dev;
> > >   	bool registered;
> > >   	enum ovpn_mode mode;
> > > +	spinlock_t lock; /* protect writing to the ovpn_struct object */
> > 
> > nit: the comment isn't really needed since you have kdoc saying the same thing
> 
> True, but checkpatch.pl (or some other script?) was still throwing a
> warning, therefore I added this comment to silence it.

Ok, then I guess the comment (and the other one below) can stay. That
sounds like a checkpatch.pl bug.

> > > +	struct workqueue_struct *events_wq;
> > > +	struct ovpn_peer __rcu *peer;
> > >   	struct list_head dev_list;
> > >   };
> > > diff --git a/drivers/net/ovpn/peer.c b/drivers/net/ovpn/peer.c
> > > new file mode 100644
> > > index 000000000000..2948b7320d47
> > > --- /dev/null
> > > +++ b/drivers/net/ovpn/peer.c
> > [...]
> > > +/**
> > > + * ovpn_peer_free - release private members and free peer object
> > > + * @peer: the peer to free
> > > + */
> > > +static void ovpn_peer_free(struct ovpn_peer *peer)
> > > +{
> > > +	ovpn_bind_reset(peer, NULL);
> > > +
> > > +	WARN_ON(!__ptr_ring_empty(&peer->tx_ring));
> > 
> > Could you pass a destructor to ptr_ring_cleanup instead of all these WARNs?
> 
> hmm but if we remove the WARNs then we lose the possibility to catch
> potential bugs, no? rings should definitely be empty at this point.

Ok, I haven't looked deep enough into how all the parts interact to
understand that. The refcount bump around the tx_ring loop in
ovpn_encrypt_work() takes care of that? Maybe worth a comment "$RING
should be empty at this point because of XYZ" (for each of the rings).

> Or you think I should just not care and free any potentially remaining item?

Whether you WARN or not, any remaining item is going to be leaked. I'd
go with WARN (or maybe DEBUG_NET_WARN_ON_ONCE) and free remaining
items. It should never happen but seems easy to deal with, so why not
handle it?

> > > +void ovpn_peer_release(struct ovpn_peer *peer)
> > > +{
> > > +	call_rcu(&peer->rcu, ovpn_peer_release_rcu);
> > > +}
> > > +
> > > +/**
> > > + * ovpn_peer_delete_work - work scheduled to release peer in process context
> > > + * @work: the work object
> > > + */
> > > +static void ovpn_peer_delete_work(struct work_struct *work)
> > > +{
> > > +	struct ovpn_peer *peer = container_of(work, struct ovpn_peer,
> > > +					      delete_work);
> > > +	ovpn_peer_release(peer);
> > 
> > Does call_rcu really need to run in process context?
> 
> Reason for switching to process context is that we have to invoke
> ovpn_nl_notify_del_peer (that sends a netlink event to userspace) and the
> latter requires a reference to the peer.

I'm confused. When you say "requires a reference to the peer", do you
mean accessing fields of the peer object? I don't see why this
requires ovpn_nl_notify_del_peer to to run from process context.

> For this reason I thought it would be safe to have ovpn_nl_notify_del_peer
> and call_rcu invoked by the same context.
> 
> If I invoke call_rcu in ovpn_peer_release_kref, how can I be sure that the
> peer hasn't been free'd already when ovpn_nl_notify_del_peer is executed?

Put the ovpn_nl_notify_del_peer call before the call_rcu, it will
access the peer and then once that's done call_rcu will do its job?


> > > +/**
> > > + * ovpn_peer_del_p2p - delete peer from related tables in a P2P instance
> > > + * @peer: the peer to delete
> > > + * @reason: reason why the peer was deleted (sent to userspace)
> > > + *
> > > + * Return: 0 on success or a negative error code otherwise
> > > + */
> > > +static int ovpn_peer_del_p2p(struct ovpn_peer *peer,
> > > +			     enum ovpn_del_peer_reason reason)
> > > +{
> > > +	struct ovpn_peer *tmp;
> > > +	int ret = -ENOENT;
> > > +
> > > +	spin_lock_bh(&peer->ovpn->lock);
> > > +	tmp = rcu_dereference(peer->ovpn->peer);
> > > +	if (tmp != peer)
> > > +		goto unlock;
> > 
> > How do we recover if all those objects got out of sync? Are we stuck
> > with a broken peer?
> 
> mhhh I don't fully get the scenario you are depicting.
> 
> In P2P mode there is only peer stored (reference is saved in ovpn->peer)
> 
> When we want to get rid of it, we invoke ovpn_peer_del_p2p().
> The check we are performing here is just about being sure that we are
> removing the exact peer we requested to remove (and not some other peer that
> was still floating around for some reason).

But it's the right peer because it's the one the caller decided to get
rid of.  How about DEBUG_NET_WARN_ON_ONCE(tmp != peer) and always
releasing the peer?

> > And if this happens during interface deletion, aren't we leaking the
> > peer memory here?
> 
> at interface deletion we call
> 
> ovpn_iface_destruct -> ovpn_peer_release_p2p ->
> ovpn_peer_del_p2p(ovpn->peer)
> 
> so at the last step we just ask to remove the very same peer that is
> curently stored, which should just never fail.

But that's not what the test checks for. If ovpn->peer->ovpn != ovpn,
the test in ovpn_peer_del_p2p will fail. That's "objects getting out
of sync" in my previous email. The peer has a bogus back reference to
its ovpn parent, but it's ovpn->peer nevertheless.

-- 
Sabrina


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 07/24] ovpn: introduce the ovpn_peer object
  2024-05-09 13:04       ` Sabrina Dubroca
@ 2024-05-09 13:24         ` Andrew Lunn
  2024-05-10 18:57           ` Antonio Quartulli
  2024-05-09 13:44         ` Antonio Quartulli
  1 sibling, 1 reply; 111+ messages in thread
From: Andrew Lunn @ 2024-05-09 13:24 UTC (permalink / raw
  To: Sabrina Dubroca
  Cc: Antonio Quartulli, netdev, Jakub Kicinski, Sergey Ryazanov,
	Paolo Abeni, Eric Dumazet, Esben Haabendal

On Thu, May 09, 2024 at 03:04:36PM +0200, Sabrina Dubroca wrote:
> 2024-05-08, 22:31:51 +0200, Antonio Quartulli wrote:
> > On 08/05/2024 18:06, Sabrina Dubroca wrote:
> > > 2024-05-06, 03:16:20 +0200, Antonio Quartulli wrote:
> > > > diff --git a/drivers/net/ovpn/ovpnstruct.h b/drivers/net/ovpn/ovpnstruct.h
> > > > index ee05b8a2c61d..b79d4f0474b0 100644
> > > > --- a/drivers/net/ovpn/ovpnstruct.h
> > > > +++ b/drivers/net/ovpn/ovpnstruct.h
> > > > @@ -17,12 +17,19 @@
> > > >    * @dev: the actual netdev representing the tunnel
> > > >    * @registered: whether dev is still registered with netdev or not
> > > >    * @mode: device operation mode (i.e. p2p, mp, ..)
> > > > + * @lock: protect this object
> > > > + * @event_wq: used to schedule generic events that may sleep and that need to be
> > > > + *            performed outside of softirq context
> > > > + * @peer: in P2P mode, this is the only remote peer
> > > >    * @dev_list: entry for the module wide device list
> > > >    */
> > > >   struct ovpn_struct {
> > > >   	struct net_device *dev;
> > > >   	bool registered;
> > > >   	enum ovpn_mode mode;
> > > > +	spinlock_t lock; /* protect writing to the ovpn_struct object */
> > > 
> > > nit: the comment isn't really needed since you have kdoc saying the same thing
> > 
> > True, but checkpatch.pl (or some other script?) was still throwing a
> > warning, therefore I added this comment to silence it.
> 
> Ok, then I guess the comment (and the other one below) can stay. That
> sounds like a checkpatch.pl bug.

I suspect it is more complex than that. checkpatch does not understand
kdoc. It just knows the rule that there should be a comment next to a
lock, hopefully indicating what the lock protects. In order to fix
this, checkpatch would need to somehow invoke the kdoc parser, and ask
it if the lock has kdoc documentation.

I suspect we are just going to have to live with this.

  Andrew


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 04/24] ovpn: add basic interface creation/destruction/management routines
  2024-05-09 12:16           ` Sabrina Dubroca
@ 2024-05-09 13:25             ` Antonio Quartulli
  2024-05-09 13:52               ` Sabrina Dubroca
  0 siblings, 1 reply; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-09 13:25 UTC (permalink / raw
  To: Sabrina Dubroca
  Cc: netdev, Jakub Kicinski, Sergey Ryazanov, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Esben Haabendal

By the way, thank you very much for taking the time to have this 
constructive discussion. I really appreciate it!

On 09/05/2024 14:16, Sabrina Dubroca wrote:
> 2024-05-09, 12:35:32 +0200, Antonio Quartulli wrote:
>> On 09/05/2024 12:09, Sabrina Dubroca wrote:
>>> 2024-05-09, 10:25:44 +0200, Antonio Quartulli wrote:
>>>> On 08/05/2024 16:52, Sabrina Dubroca wrote:
>>>>> 2024-05-06, 03:16:17 +0200, Antonio Quartulli wrote:
>>>>>>     static int ovpn_netdev_notifier_call(struct notifier_block *nb,
>>>>>>     				     unsigned long state, void *ptr)
>>>>>>     {
>>>>>>     	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
>>>>>> +	struct ovpn_struct *ovpn;
>>>>>>     	if (!ovpn_dev_is_valid(dev))
>>>>>>     		return NOTIFY_DONE;
>>>>>> +	ovpn = netdev_priv(dev);
>>>>>> +
>>>>>>     	switch (state) {
>>>>>>     	case NETDEV_REGISTER:
>>>>>> -		/* add device to internal list for later destruction upon
>>>>>> -		 * unregistration
>>>>>> -		 */
>>>>>> +		ovpn->registered = true;
>>>>>>     		break;
>>>>>>     	case NETDEV_UNREGISTER:
>>>>>> +		/* twiddle thumbs on netns device moves */
>>>>>> +		if (dev->reg_state != NETREG_UNREGISTERING)
>>>>>> +			break;
>>>>>> +
>>>>>>     		/* can be delivered multiple times, so check registered flag,
>>>>>>     		 * then destroy the interface
>>>>>>     		 */
>>>>>> +		if (!ovpn->registered)
>>>>>> +			return NOTIFY_DONE;
>>>>>> +
>>>>>> +		ovpn_iface_destruct(ovpn);
>>>>>
>>>>> Maybe I'm misunderstanding this code. Why do you want to manually
>>>>> destroy a device that is already going away?
>>>>
>>>> We need to perform some internal cleanup (i.e. release all peers).
>>>> I don't see how this can happen automatically, no?
>>>
>>> That's what ->priv_destructor does,
>>
>> Not really.
>>
>> Every peer object increases the netdev refcounter to the netdev, therefore
>> we must first delete all peers in order to have netdevice->refcnt reach 0
>> (and then invoke priv_destructor).
> 
> Oh, I see. I'm still trying to wrap my head around all the objects and
> components of your driver.
> 
>> So the idea is: upon UNREGISTER event we drop all resources and eventually
>> (via RCU) all references to the netdev are also released, which in turn
>> triggers the destructor.
>>
>> makes sense?
> 
> That part, yes, thanks for explaining. Do you really need the peers to
> hold a reference on the netdevice? With my limited understanding, it
> seems the peers are sub-objects of the netdevice.
> 
>>> and it will be called ultimately
>>> by the unregister_netdevice call you have in ovpn_iface_destruct (in
>>> netdev_run_todo). Anyway, this UNREGISTER event is probably generated
>>> by unregister_netdevice_many_notify (basically a previous
>>> unregister_netdevice() call), so I don't know why you want to call
>>> unregister_netdevice again on the same device.
>>
>> I believe I have seen this notification being triggered upon netns exit, but
>> in that case the netdevice was not being removed from core.
> 
> Sure, but you have a comment about that and you're filtering that
> event, so I'm ignoring this case.

You're right..now I wonder if my observation was made before I 
introduced that check...

> 
>> Hence I decided to fully trigger the unregistration.
> 
> That's the bit that doesn't make sense to me: the device is going
> away, so you trigger a manual unregister. Cleaning up some additional
> resources (peers etc), that makes sense. But calling
> unregister_netdevice (when you're most likely getting called from
> unregister_netdevice already, because I don't see other spots setting
> dev->reg_state = NETREG_UNREGISTERING) is what I don't get. And I
> wonder why you're not hitting the BUG_ON in
> unregister_netdevice_many_notify:
> 
>      BUG_ON(dev->reg_state != NETREG_REGISTERED);

I think because we have our ovpn->registered check.
It ensures that we don't call ovpn_iface_destruct more than once.

But now, that I implemented the rtnl_link_ops I can confirm I am hitting 
the BUG_ON. And now it makes sense.

I presume that now I can I simply remove the call to 
unregister_netdevice() from ovpn_iface_destruct() and move it to 
ovpn_nl_del_iface_doit().

This way, upon netns exit, the real UNREGISTER handler (triggered thanks 
to rtnl_link_ops) will still perform the destruct, but won't try to 
schedule an UNREGISTER event again.

> 
> 
>>>>>> @@ -62,6 +210,24 @@ static struct notifier_block ovpn_netdev_notifier = {
>>>>>>     	.notifier_call = ovpn_netdev_notifier_call,
>>>>>>     };
>>>>>> +static void ovpn_netns_pre_exit(struct net *net)
> 
> BTW, in case you end up keeping this function, it should have
> __net_exit annotation (see for example ipv4_frags_exit_net).

ACK, but thanks to the rtnl_link_ops trick we are definitely ditching it.
> 
>>>>>> +{
>>>>>> +	struct ovpn_struct *ovpn;
>>>>>> +
>>>>>> +	rtnl_lock();
>>>>>> +	list_for_each_entry(ovpn, &dev_list, dev_list) {
>>>>>> +		if (dev_net(ovpn->dev) != net)
>>>>>> +			continue;
>>>>>> +
>>>>>> +		ovpn_iface_destruct(ovpn);
>>>>>
>>>>> Is this needed? On netns destruction all devices within the ns will be
>>>>> destroyed by the networking core.
>>>>
>>>> Before implementing ovpn_netns_pre_exit() this way, upon namespace deletion
>>>> the ovpn interface was being moved to the global namespace.
>>>
>>> Crap it's only the devices with ->rtnl_link_ops that get killed by the
>>> core.
>>
>> exactly! this goes hand to hand with my comment above: event delivered but
>> interface not destroyed.
> 
> There's no event sent to ovpn_netdev_notifier_call in that case (well,
> only the fake "unregister" out of the current netns that you're
> ignoring). Otherwise, you wouldn't need ovpn_netns_pre_exit.

Yeah you're right. I think I wanted to conclude the same thing but my 
brain was unable to produce a meaningful sentence.

> 
>>> Because you create your devices via genl (which I'm not a fan
>>> of, even if it's a bit nicer for userspace having a single netlink api
>>> to deal with),
>>
>> Originally I had implemented the rtnl_link_ops, but the (meaningful)
>> objection was that a user is never supposed to create an ovpn iface by
>> himself, but there should always be an openvpn process running in userspace.
>> Hence the restriction to genl only.
> 
> Sorry, but how does genl prevent a user from creating the ovpn
> interface manually? Whatever API you define, anyone who manages to
> come up with the right netlink message will be able to create an
> interface. You can't stop people from using your API without your
> official client.

I don't want to prevent people from creating ovpn ifaces the way they like.
I just don't see how the rtnl_link API can be useful, other than 
allowing users to execute 'ip link add/del..'.
And by design that is not a usecase we want to support, because once the 
iface is created, nothing will happen if there is no userspace software 
driving it (no matter if it is openvpn or anything else).

When explaining this decision, I like to make a comparison to virtual 
802.11/wifi ifaces.
They also lack rtnl_link (AFAIR) as they also require some userspace 
software to handle them in order to be useful.

All this said, having everything in one place looks cleaner too :)

> 
>>> default_device_exit_batch/default_device_exit_net think
>>> ovpn devices are real NICs and move them back to init_net instead of
>>> destroying them.
>>>
>>> Maybe we can extend the condition in default_device_exit_net with a
>>> new flag so that ovpn devices get destroyed by the core, even without
>>> rtnl_link_ops?
>>
>> Thanks for pointing out the function responsible for this decision.
>> How would you extend the check though?
>>
>> Alternatively, what if ovpn simply registers an empty rtnl_link_ops with
>> netns_fund set to false? That should make the condition happy, while keeping
>> ovpn genl-only
> 
> Yes. I was thinking about adding a flag to the device, because I
> wasn't sure an almost empty rtnl_link_ops could be handled safely, but
> it seems ok. ovs does it, see commit 5b9e7e160795 ("openvswitch:
> introduce rtnl ops stub"). And, as that commit message says, "ip -d
> link show" would also show that the device is of type openvpn (or
> ovpn, whatever you put in ops->kind), which would be nice.

I just coded something along those lines.

It seems pretty clean and we don't need to touch core (+ the bonus of 
having the name in "ip -d link")....and the iface does get destroyed 
upon netns exit! :-)

I am grasping much better how all these APIs work together now.

Thanks!

-- 
Antonio Quartulli
OpenVPN Inc.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 08/24] ovpn: introduce the ovpn_socket object
  2024-05-08 20:38     ` Antonio Quartulli
@ 2024-05-09 13:32       ` Sabrina Dubroca
  2024-05-09 13:46         ` Antonio Quartulli
  0 siblings, 1 reply; 111+ messages in thread
From: Sabrina Dubroca @ 2024-05-09 13:32 UTC (permalink / raw
  To: Antonio Quartulli
  Cc: netdev, Jakub Kicinski, Sergey Ryazanov, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Esben Haabendal

2024-05-08, 22:38:58 +0200, Antonio Quartulli wrote:
> On 08/05/2024 19:10, Sabrina Dubroca wrote:
> > 2024-05-06, 03:16:21 +0200, Antonio Quartulli wrote:
> > > diff --git a/drivers/net/ovpn/socket.c b/drivers/net/ovpn/socket.c
> > > new file mode 100644
> > > index 000000000000..a4a4d69162f0
> > > --- /dev/null
> > > +++ b/drivers/net/ovpn/socket.c
> > [...]
> > > +
> > > +/* Finalize release of socket, called after RCU grace period */
> > 
> > kref_put seems to call ovpn_socket_release_kref without waiting, and
> > then that calls ovpn_socket_detach immediately as well. Am I missing
> > something?
> 
> hmm what do we need to wait for exactly? (Maybe I am missing something)
> The ovpn_socket will survive a bit longer thanks to kfree_rcu.

The way I read this comment, it says that ovpn_socket_detach will be
called after one RCU grace period, but I don't see where that grace
period would come from.

    ovpn_socket_put -> kref_put(release=ovpn_socket_release_kref) ->
      ovpn_socket_release_kref -> ovpn_socket_detach

No grace period here.

Or am I misinterpreting the comment? There will be a grace period
caused by kfree_rcu before the ovpn_socket is actually freed, is that
what the comment means?

> > > +static void ovpn_socket_detach(struct socket *sock)
> > > +{
> > > +	if (!sock)
> > > +		return;
> > > +
> > > +	sockfd_put(sock);
> > > +}
> > 

-- 
Sabrina


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 07/24] ovpn: introduce the ovpn_peer object
  2024-05-09 13:04       ` Sabrina Dubroca
  2024-05-09 13:24         ` Andrew Lunn
@ 2024-05-09 13:44         ` Antonio Quartulli
  2024-05-09 13:55           ` Andrew Lunn
  2024-05-09 14:17           ` Sabrina Dubroca
  1 sibling, 2 replies; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-09 13:44 UTC (permalink / raw
  To: Sabrina Dubroca
  Cc: netdev, Jakub Kicinski, Sergey Ryazanov, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Esben Haabendal

On 09/05/2024 15:04, Sabrina Dubroca wrote:
[..]
>>>> +	struct workqueue_struct *events_wq;
>>>> +	struct ovpn_peer __rcu *peer;
>>>>    	struct list_head dev_list;
>>>>    };
>>>> diff --git a/drivers/net/ovpn/peer.c b/drivers/net/ovpn/peer.c
>>>> new file mode 100644
>>>> index 000000000000..2948b7320d47
>>>> --- /dev/null
>>>> +++ b/drivers/net/ovpn/peer.c
>>> [...]
>>>> +/**
>>>> + * ovpn_peer_free - release private members and free peer object
>>>> + * @peer: the peer to free
>>>> + */
>>>> +static void ovpn_peer_free(struct ovpn_peer *peer)
>>>> +{
>>>> +	ovpn_bind_reset(peer, NULL);
>>>> +
>>>> +	WARN_ON(!__ptr_ring_empty(&peer->tx_ring));
>>>
>>> Could you pass a destructor to ptr_ring_cleanup instead of all these WARNs?
>>
>> hmm but if we remove the WARNs then we lose the possibility to catch
>> potential bugs, no? rings should definitely be empty at this point.
> 
> Ok, I haven't looked deep enough into how all the parts interact to
> understand that. The refcount bump around the tx_ring loop in
> ovpn_encrypt_work() takes care of that? Maybe worth a comment "$RING
> should be empty at this point because of XYZ" (for each of the rings).

Yeah, all piped skbs will be processed before exiting.
Ok, will add a comment.

> 
>> Or you think I should just not care and free any potentially remaining item?
> 
> Whether you WARN or not, any remaining item is going to be leaked. I'd
> go with WARN (or maybe DEBUG_NET_WARN_ON_ONCE) and free remaining
> items. It should never happen but seems easy to deal with, so why not
> handle it?

Sure, passing consume_skb as destructor to ptr_ring_cleanup should be 
enough.

> 
>>>> +void ovpn_peer_release(struct ovpn_peer *peer)
>>>> +{
>>>> +	call_rcu(&peer->rcu, ovpn_peer_release_rcu);
>>>> +}
>>>> +
>>>> +/**
>>>> + * ovpn_peer_delete_work - work scheduled to release peer in process context
>>>> + * @work: the work object
>>>> + */
>>>> +static void ovpn_peer_delete_work(struct work_struct *work)
>>>> +{
>>>> +	struct ovpn_peer *peer = container_of(work, struct ovpn_peer,
>>>> +					      delete_work);
>>>> +	ovpn_peer_release(peer);
>>>
>>> Does call_rcu really need to run in process context?
>>
>> Reason for switching to process context is that we have to invoke
>> ovpn_nl_notify_del_peer (that sends a netlink event to userspace) and the
>> latter requires a reference to the peer.
> 
> I'm confused. When you say "requires a reference to the peer", do you
> mean accessing fields of the peer object? I don't see why this
> requires ovpn_nl_notify_del_peer to to run from process context.

ovpn_nl_notify_del_peer sends a netlink message to userspace and I was 
under the impression that it may block/sleep, no?
For this reason I assumed it must be executed in process context.

> 
>> For this reason I thought it would be safe to have ovpn_nl_notify_del_peer
>> and call_rcu invoked by the same context.
>>
>> If I invoke call_rcu in ovpn_peer_release_kref, how can I be sure that the
>> peer hasn't been free'd already when ovpn_nl_notify_del_peer is executed?
> 
> Put the ovpn_nl_notify_del_peer call before the call_rcu, it will
> access the peer and then once that's done call_rcu will do its job?

If ovpn_nl_notify_del_peer is allowed to run out of process context, 
then I totally agree.

Will test again.

> 
> 
>>>> +/**
>>>> + * ovpn_peer_del_p2p - delete peer from related tables in a P2P instance
>>>> + * @peer: the peer to delete
>>>> + * @reason: reason why the peer was deleted (sent to userspace)
>>>> + *
>>>> + * Return: 0 on success or a negative error code otherwise
>>>> + */
>>>> +static int ovpn_peer_del_p2p(struct ovpn_peer *peer,
>>>> +			     enum ovpn_del_peer_reason reason)
>>>> +{
>>>> +	struct ovpn_peer *tmp;
>>>> +	int ret = -ENOENT;
>>>> +
>>>> +	spin_lock_bh(&peer->ovpn->lock);
>>>> +	tmp = rcu_dereference(peer->ovpn->peer);
>>>> +	if (tmp != peer)
>>>> +		goto unlock;
>>>
>>> How do we recover if all those objects got out of sync? Are we stuck
>>> with a broken peer?
>>
>> mhhh I don't fully get the scenario you are depicting.
>>
>> In P2P mode there is only peer stored (reference is saved in ovpn->peer)
>>
>> When we want to get rid of it, we invoke ovpn_peer_del_p2p().
>> The check we are performing here is just about being sure that we are
>> removing the exact peer we requested to remove (and not some other peer that
>> was still floating around for some reason).
> 
> But it's the right peer because it's the one the caller decided to get
> rid of.  How about DEBUG_NET_WARN_ON_ONCE(tmp != peer) and always
> releasing the peer?

sounds good. I should force myself to use more WARN_ON for conditions 
that are truly unexpected.

This said, I have a question regarding DEBUG_NET_WARN_ON_ONCE: it prints 
something only if CONFIG_DEBUG_NET is enabled.
Is this the case on standard desktop/server distribution? Otherwise how 
are we going to get reports from users?

> 
>>> And if this happens during interface deletion, aren't we leaking the
>>> peer memory here?
>>
>> at interface deletion we call
>>
>> ovpn_iface_destruct -> ovpn_peer_release_p2p ->
>> ovpn_peer_del_p2p(ovpn->peer)
>>
>> so at the last step we just ask to remove the very same peer that is
>> curently stored, which should just never fail.
> 
> But that's not what the test checks for. If ovpn->peer->ovpn != ovpn,
> the test in ovpn_peer_del_p2p will fail. That's "objects getting out
> of sync" in my previous email. The peer has a bogus back reference to
> its ovpn parent, but it's ovpn->peer nevertheless.
> 

Oh thanks for explaining that.

Ok, my assumption is that "ovpn->peer->ovpn != ovpn" can never be true.

Peers are created within the context of one ovpn object and are never 
exposed to other ovpns.

I hope it makes sense.

-- 
Antonio Quartulli
OpenVPN Inc.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 08/24] ovpn: introduce the ovpn_socket object
  2024-05-09 13:32       ` Sabrina Dubroca
@ 2024-05-09 13:46         ` Antonio Quartulli
  0 siblings, 0 replies; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-09 13:46 UTC (permalink / raw
  To: Sabrina Dubroca
  Cc: netdev, Jakub Kicinski, Sergey Ryazanov, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Esben Haabendal

On 09/05/2024 15:32, Sabrina Dubroca wrote:
> 2024-05-08, 22:38:58 +0200, Antonio Quartulli wrote:
>> On 08/05/2024 19:10, Sabrina Dubroca wrote:
>>> 2024-05-06, 03:16:21 +0200, Antonio Quartulli wrote:
>>>> diff --git a/drivers/net/ovpn/socket.c b/drivers/net/ovpn/socket.c
>>>> new file mode 100644
>>>> index 000000000000..a4a4d69162f0
>>>> --- /dev/null
>>>> +++ b/drivers/net/ovpn/socket.c
>>> [...]
>>>> +
>>>> +/* Finalize release of socket, called after RCU grace period */
>>>
>>> kref_put seems to call ovpn_socket_release_kref without waiting, and
>>> then that calls ovpn_socket_detach immediately as well. Am I missing
>>> something?
>>
>> hmm what do we need to wait for exactly? (Maybe I am missing something)
>> The ovpn_socket will survive a bit longer thanks to kfree_rcu.
> 
> The way I read this comment, it says that ovpn_socket_detach will be
> called after one RCU grace period, but I don't see where that grace
> period would come from.
> 
>      ovpn_socket_put -> kref_put(release=ovpn_socket_release_kref) ->
>        ovpn_socket_release_kref -> ovpn_socket_detach
> 
> No grace period here.
> 
> Or am I misinterpreting the comment? There will be a grace period
> caused by kfree_rcu before the ovpn_socket is actually freed, is that
> what the comment means?

Forgive me - only now I realized that you were referring to what the 
comment says.

That comment is just totally busted. I think it was there since the code 
was doing something totally different and was carried over and over by 
mistake.

Sorry

-- 
Antonio Quartulli
OpenVPN Inc.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 04/24] ovpn: add basic interface creation/destruction/management routines
  2024-05-09 13:25             ` Antonio Quartulli
@ 2024-05-09 13:52               ` Sabrina Dubroca
  0 siblings, 0 replies; 111+ messages in thread
From: Sabrina Dubroca @ 2024-05-09 13:52 UTC (permalink / raw
  To: Antonio Quartulli
  Cc: netdev, Jakub Kicinski, Sergey Ryazanov, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Esben Haabendal

2024-05-09, 15:25:21 +0200, Antonio Quartulli wrote:
> By the way, thank you very much for taking the time to have this
> constructive discussion. I really appreciate it!

Cheers :)


> On 09/05/2024 14:16, Sabrina Dubroca wrote:
> > 2024-05-09, 12:35:32 +0200, Antonio Quartulli wrote:
> > > On 09/05/2024 12:09, Sabrina Dubroca wrote:
> > > Hence I decided to fully trigger the unregistration.
> > 
> > That's the bit that doesn't make sense to me: the device is going
> > away, so you trigger a manual unregister. Cleaning up some additional
> > resources (peers etc), that makes sense. But calling
> > unregister_netdevice (when you're most likely getting called from
> > unregister_netdevice already, because I don't see other spots setting
> > dev->reg_state = NETREG_UNREGISTERING) is what I don't get. And I
> > wonder why you're not hitting the BUG_ON in
> > unregister_netdevice_many_notify:
> > 
> >      BUG_ON(dev->reg_state != NETREG_REGISTERED);
> 
> I think because we have our ovpn->registered check.
>
> It ensures that we don't call ovpn_iface_destruct more than once.

Ah, probably, yes.

> But now, that I implemented the rtnl_link_ops I can confirm I am hitting the
> BUG_ON. And now it makes sense.
> 
> I presume that now I can I simply remove the call to unregister_netdevice()
> from ovpn_iface_destruct() and move it to ovpn_nl_del_iface_doit().

Sounds good.


> > > > Because you create your devices via genl (which I'm not a fan
> > > > of, even if it's a bit nicer for userspace having a single netlink api
> > > > to deal with),
> > > 
> > > Originally I had implemented the rtnl_link_ops, but the (meaningful)
> > > objection was that a user is never supposed to create an ovpn iface by
> > > himself, but there should always be an openvpn process running in userspace.
> > > Hence the restriction to genl only.
> > 
> > Sorry, but how does genl prevent a user from creating the ovpn
> > interface manually? Whatever API you define, anyone who manages to
> > come up with the right netlink message will be able to create an
> > interface. You can't stop people from using your API without your
> > official client.
> 
> I don't want to prevent people from creating ovpn ifaces the way they like.
> I just don't see how the rtnl_link API can be useful, other than allowing
> users to execute 'ip link add/del..'.
>
> And by design that is not a usecase we want to support, because once the
> iface is created, nothing will happen if there is no userspace software
> driving it (no matter if it is openvpn or anything else).
> 
> When explaining this decision, I like to make a comparison to virtual
> 802.11/wifi ifaces.
> They also lack rtnl_link (AFAIR) as they also require some userspace
> software to handle them in order to be useful.
> 
> All this said, having everything in one place looks cleaner too :)

From an API point of view, maybe. But for the kernel implementation,
using rtnl_link_ops->newlink is easier.

> > > > default_device_exit_batch/default_device_exit_net think
> > > > ovpn devices are real NICs and move them back to init_net instead of
> > > > destroying them.
> > > > 
> > > > Maybe we can extend the condition in default_device_exit_net with a
> > > > new flag so that ovpn devices get destroyed by the core, even without
> > > > rtnl_link_ops?
> > > 
> > > Thanks for pointing out the function responsible for this decision.
> > > How would you extend the check though?
> > > 
> > > Alternatively, what if ovpn simply registers an empty rtnl_link_ops with
> > > netns_fund set to false? That should make the condition happy, while keeping
> > > ovpn genl-only
> > 
> > Yes. I was thinking about adding a flag to the device, because I
> > wasn't sure an almost empty rtnl_link_ops could be handled safely, but
> > it seems ok. ovs does it, see commit 5b9e7e160795 ("openvswitch:
> > introduce rtnl ops stub"). And, as that commit message says, "ip -d
> > link show" would also show that the device is of type openvpn (or
> > ovpn, whatever you put in ops->kind), which would be nice.
> 
> I just coded something along those lines.

Great, thanks.

> It seems pretty clean and we don't need to touch core (+ the bonus of having
> the name in "ip -d link")....and the iface does get destroyed upon netns
> exit! :-)
> 
> I am grasping much better how all these APIs work together now.

Nice :)

-- 
Sabrina


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 07/24] ovpn: introduce the ovpn_peer object
  2024-05-09 13:44         ` Antonio Quartulli
@ 2024-05-09 13:55           ` Andrew Lunn
  2024-05-09 14:17           ` Sabrina Dubroca
  1 sibling, 0 replies; 111+ messages in thread
From: Andrew Lunn @ 2024-05-09 13:55 UTC (permalink / raw
  To: Antonio Quartulli
  Cc: Sabrina Dubroca, netdev, Jakub Kicinski, Sergey Ryazanov,
	Paolo Abeni, Eric Dumazet, Esben Haabendal

> > Whether you WARN or not, any remaining item is going to be leaked. I'd
> > go with WARN (or maybe DEBUG_NET_WARN_ON_ONCE) and free remaining
> > items. It should never happen but seems easy to deal with, so why not
> > handle it?

> This said, I have a question regarding DEBUG_NET_WARN_ON_ONCE: it prints
> something only if CONFIG_DEBUG_NET is enabled.
> Is this the case on standard desktop/server distribution? Otherwise how are
> we going to get reports from users?

A bit tangential, but:

https://lwn.net/Articles/969923/

	Andrew

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 07/24] ovpn: introduce the ovpn_peer object
  2024-05-09 13:44         ` Antonio Quartulli
  2024-05-09 13:55           ` Andrew Lunn
@ 2024-05-09 14:17           ` Sabrina Dubroca
  2024-05-09 14:36             ` Antonio Quartulli
  1 sibling, 1 reply; 111+ messages in thread
From: Sabrina Dubroca @ 2024-05-09 14:17 UTC (permalink / raw
  To: Antonio Quartulli
  Cc: netdev, Jakub Kicinski, Sergey Ryazanov, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Esben Haabendal

2024-05-09, 15:44:26 +0200, Antonio Quartulli wrote:
> On 09/05/2024 15:04, Sabrina Dubroca wrote:
> > > > > +void ovpn_peer_release(struct ovpn_peer *peer)
> > > > > +{
> > > > > +	call_rcu(&peer->rcu, ovpn_peer_release_rcu);
> > > > > +}
> > > > > +
> > > > > +/**
> > > > > + * ovpn_peer_delete_work - work scheduled to release peer in process context
> > > > > + * @work: the work object
> > > > > + */
> > > > > +static void ovpn_peer_delete_work(struct work_struct *work)
> > > > > +{
> > > > > +	struct ovpn_peer *peer = container_of(work, struct ovpn_peer,
> > > > > +					      delete_work);
> > > > > +	ovpn_peer_release(peer);
> > > > 
> > > > Does call_rcu really need to run in process context?
> > > 
> > > Reason for switching to process context is that we have to invoke
> > > ovpn_nl_notify_del_peer (that sends a netlink event to userspace) and the
> > > latter requires a reference to the peer.
> > 
> > I'm confused. When you say "requires a reference to the peer", do you
> > mean accessing fields of the peer object? I don't see why this
> > requires ovpn_nl_notify_del_peer to to run from process context.
> 
> ovpn_nl_notify_del_peer sends a netlink message to userspace and I was under
> the impression that it may block/sleep, no?
> For this reason I assumed it must be executed in process context.

With s/GFP_KERNEL/GFP_ATOMIC/, it should be ok to run from whatever
context. Firing up a workqueue just to send a 100B netlink message
seems a bit overkill.



> This said, I have a question regarding DEBUG_NET_WARN_ON_ONCE: it prints
> something only if CONFIG_DEBUG_NET is enabled.
> Is this the case on standard desktop/server distribution? Otherwise how are
> we going to get reports from users?

That's pretty much why I'm suggesting to use it. For those things that
should really never happen, I think letting developers find them
during testing (or syzbot when it gets to your driver) is enough. I'm
not convinced getting a stack trace from a user without any ability to
reproduce is that useful.

But if you or someone else really want some WARN_ONs, I can live with
that.

> > > > And if this happens during interface deletion, aren't we leaking the
> > > > peer memory here?
> > > 
> > > at interface deletion we call
> > > 
> > > ovpn_iface_destruct -> ovpn_peer_release_p2p ->
> > > ovpn_peer_del_p2p(ovpn->peer)
> > > 
> > > so at the last step we just ask to remove the very same peer that is
> > > curently stored, which should just never fail.
> > 
> > But that's not what the test checks for. If ovpn->peer->ovpn != ovpn,
> > the test in ovpn_peer_del_p2p will fail. That's "objects getting out
> > of sync" in my previous email. The peer has a bogus back reference to
> > its ovpn parent, but it's ovpn->peer nevertheless.
> > 
> 
> Oh thanks for explaining that.
> 
> Ok, my assumption is that "ovpn->peer->ovpn != ovpn" can never be true.
> 
> Peers are created within the context of one ovpn object and are never
> exposed to other ovpns.
> 
> I hope it makes sense.

Ok, so this would indicate that something has gone badly wrong. Is it
really worth checking for that (or maybe just during development)?

-- 
Sabrina


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 07/24] ovpn: introduce the ovpn_peer object
  2024-05-09 14:17           ` Sabrina Dubroca
@ 2024-05-09 14:36             ` Antonio Quartulli
  2024-05-09 14:53               ` Antonio Quartulli
  0 siblings, 1 reply; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-09 14:36 UTC (permalink / raw
  To: Sabrina Dubroca
  Cc: netdev, Jakub Kicinski, Sergey Ryazanov, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Esben Haabendal

On 09/05/2024 16:17, Sabrina Dubroca wrote:
> 2024-05-09, 15:44:26 +0200, Antonio Quartulli wrote:
>> On 09/05/2024 15:04, Sabrina Dubroca wrote:
>>>>>> +void ovpn_peer_release(struct ovpn_peer *peer)
>>>>>> +{
>>>>>> +	call_rcu(&peer->rcu, ovpn_peer_release_rcu);
>>>>>> +}
>>>>>> +
>>>>>> +/**
>>>>>> + * ovpn_peer_delete_work - work scheduled to release peer in process context
>>>>>> + * @work: the work object
>>>>>> + */
>>>>>> +static void ovpn_peer_delete_work(struct work_struct *work)
>>>>>> +{
>>>>>> +	struct ovpn_peer *peer = container_of(work, struct ovpn_peer,
>>>>>> +					      delete_work);
>>>>>> +	ovpn_peer_release(peer);
>>>>>
>>>>> Does call_rcu really need to run in process context?
>>>>
>>>> Reason for switching to process context is that we have to invoke
>>>> ovpn_nl_notify_del_peer (that sends a netlink event to userspace) and the
>>>> latter requires a reference to the peer.
>>>
>>> I'm confused. When you say "requires a reference to the peer", do you
>>> mean accessing fields of the peer object? I don't see why this
>>> requires ovpn_nl_notify_del_peer to to run from process context.
>>
>> ovpn_nl_notify_del_peer sends a netlink message to userspace and I was under
>> the impression that it may block/sleep, no?
>> For this reason I assumed it must be executed in process context.
> 
> With s/GFP_KERNEL/GFP_ATOMIC/, it should be ok to run from whatever
> context. Firing up a workqueue just to send a 100B netlink message
> seems a bit overkill.

Oh ok, I thought the send could be a problem too.

Will test with GFP_ATOMIC then. Thanks for the hint.

> 
> 
> 
>> This said, I have a question regarding DEBUG_NET_WARN_ON_ONCE: it prints
>> something only if CONFIG_DEBUG_NET is enabled.
>> Is this the case on standard desktop/server distribution? Otherwise how are
>> we going to get reports from users?
> 
> That's pretty much why I'm suggesting to use it. For those things that
> should really never happen, I think letting developers find them
> during testing (or syzbot when it gets to your driver) is enough. I'm
> not convinced getting a stack trace from a user without any ability to
> reproduce is that useful.
> 
> But if you or someone else really want some WARN_ONs, I can live with
> that.

I would personally prefer to keep the WARN_ON.

Since these bogus conditions may have consequences, users will open 
report in any case.
Having some extra text that they can post for us to contextualize the 
issue may be useful.

> 
>>>>> And if this happens during interface deletion, aren't we leaking the
>>>>> peer memory here?
>>>>
>>>> at interface deletion we call
>>>>
>>>> ovpn_iface_destruct -> ovpn_peer_release_p2p ->
>>>> ovpn_peer_del_p2p(ovpn->peer)
>>>>
>>>> so at the last step we just ask to remove the very same peer that is
>>>> curently stored, which should just never fail.
>>>
>>> But that's not what the test checks for. If ovpn->peer->ovpn != ovpn,
>>> the test in ovpn_peer_del_p2p will fail. That's "objects getting out
>>> of sync" in my previous email. The peer has a bogus back reference to
>>> its ovpn parent, but it's ovpn->peer nevertheless.
>>>
>>
>> Oh thanks for explaining that.
>>
>> Ok, my assumption is that "ovpn->peer->ovpn != ovpn" can never be true.
>>
>> Peers are created within the context of one ovpn object and are never
>> exposed to other ovpns.
>>
>> I hope it makes sense.
> 
> Ok, so this would indicate that something has gone badly wrong. Is it
> really worth checking for that (or maybe just during development)?
> 

A peer is created in ovpn_nl_set_peer_doit(), where the ovpn object is 
used to first assign peer->ovpn and then to store the peer in its own 
members. This all happens in one call and the value of ovpn can't be 
switched.

Anyway, bugs hide where we are most confident that things cannot go 
wrong :) So I'll still add a WARN_ON, just in case.

Thanks

-- 
Antonio Quartulli
OpenVPN Inc.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 07/24] ovpn: introduce the ovpn_peer object
  2024-05-09 14:36             ` Antonio Quartulli
@ 2024-05-09 14:53               ` Antonio Quartulli
  2024-05-10 10:30                 ` Sabrina Dubroca
  0 siblings, 1 reply; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-09 14:53 UTC (permalink / raw
  To: Sabrina Dubroca
  Cc: netdev, Jakub Kicinski, Sergey Ryazanov, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Esben Haabendal



On 09/05/2024 16:36, Antonio Quartulli wrote:
> On 09/05/2024 16:17, Sabrina Dubroca wrote:
>> 2024-05-09, 15:44:26 +0200, Antonio Quartulli wrote:
>>> On 09/05/2024 15:04, Sabrina Dubroca wrote:
>>>>>>> +void ovpn_peer_release(struct ovpn_peer *peer)
>>>>>>> +{
>>>>>>> +    call_rcu(&peer->rcu, ovpn_peer_release_rcu);
>>>>>>> +}
>>>>>>> +
>>>>>>> +/**
>>>>>>> + * ovpn_peer_delete_work - work scheduled to release peer in 
>>>>>>> process context
>>>>>>> + * @work: the work object
>>>>>>> + */
>>>>>>> +static void ovpn_peer_delete_work(struct work_struct *work)
>>>>>>> +{
>>>>>>> +    struct ovpn_peer *peer = container_of(work, struct ovpn_peer,
>>>>>>> +                          delete_work);
>>>>>>> +    ovpn_peer_release(peer);
>>>>>>
>>>>>> Does call_rcu really need to run in process context?
>>>>>
>>>>> Reason for switching to process context is that we have to invoke
>>>>> ovpn_nl_notify_del_peer (that sends a netlink event to userspace) 
>>>>> and the
>>>>> latter requires a reference to the peer.
>>>>
>>>> I'm confused. When you say "requires a reference to the peer", do you
>>>> mean accessing fields of the peer object? I don't see why this
>>>> requires ovpn_nl_notify_del_peer to to run from process context.
>>>
>>> ovpn_nl_notify_del_peer sends a netlink message to userspace and I 
>>> was under
>>> the impression that it may block/sleep, no?
>>> For this reason I assumed it must be executed in process context.
>>
>> With s/GFP_KERNEL/GFP_ATOMIC/, it should be ok to run from whatever
>> context. Firing up a workqueue just to send a 100B netlink message
>> seems a bit overkill.
> 
> Oh ok, I thought the send could be a problem too.
> 
> Will test with GFP_ATOMIC then. Thanks for the hint.

I am back and unfortunately we also have (added by a later patch):

  294         napi_disable(&peer->napi);
  295         netif_napi_del(&peer->napi);

that need to be executed in process context.
So it seems I must fire up the worker anyway..


-- 
Antonio Quartulli
OpenVPN Inc.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 07/24] ovpn: introduce the ovpn_peer object
  2024-05-09 14:53               ` Antonio Quartulli
@ 2024-05-10 10:30                 ` Sabrina Dubroca
  2024-05-10 12:34                   ` Antonio Quartulli
  0 siblings, 1 reply; 111+ messages in thread
From: Sabrina Dubroca @ 2024-05-10 10:30 UTC (permalink / raw
  To: Antonio Quartulli
  Cc: netdev, Jakub Kicinski, Sergey Ryazanov, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Esben Haabendal

2024-05-09, 16:53:42 +0200, Antonio Quartulli wrote:
> 
> 
> On 09/05/2024 16:36, Antonio Quartulli wrote:
> > On 09/05/2024 16:17, Sabrina Dubroca wrote:
> > > 2024-05-09, 15:44:26 +0200, Antonio Quartulli wrote:
> > > > On 09/05/2024 15:04, Sabrina Dubroca wrote:
> > > > > > > > +void ovpn_peer_release(struct ovpn_peer *peer)
> > > > > > > > +{
> > > > > > > > +    call_rcu(&peer->rcu, ovpn_peer_release_rcu);
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > +/**
> > > > > > > > + * ovpn_peer_delete_work - work scheduled to
> > > > > > > > release peer in process context
> > > > > > > > + * @work: the work object
> > > > > > > > + */
> > > > > > > > +static void ovpn_peer_delete_work(struct work_struct *work)
> > > > > > > > +{
> > > > > > > > +    struct ovpn_peer *peer = container_of(work, struct ovpn_peer,
> > > > > > > > +                          delete_work);
> > > > > > > > +    ovpn_peer_release(peer);
> > > > > > > 
> > > > > > > Does call_rcu really need to run in process context?
> > > > > > 
> > > > > > Reason for switching to process context is that we have to invoke
> > > > > > ovpn_nl_notify_del_peer (that sends a netlink event to
> > > > > > userspace) and the
> > > > > > latter requires a reference to the peer.
> > > > > 
> > > > > I'm confused. When you say "requires a reference to the peer", do you
> > > > > mean accessing fields of the peer object? I don't see why this
> > > > > requires ovpn_nl_notify_del_peer to to run from process context.
> > > > 
> > > > ovpn_nl_notify_del_peer sends a netlink message to userspace and
> > > > I was under
> > > > the impression that it may block/sleep, no?
> > > > For this reason I assumed it must be executed in process context.
> > > 
> > > With s/GFP_KERNEL/GFP_ATOMIC/, it should be ok to run from whatever
> > > context. Firing up a workqueue just to send a 100B netlink message
> > > seems a bit overkill.
> > 
> > Oh ok, I thought the send could be a problem too.
> > 
> > Will test with GFP_ATOMIC then. Thanks for the hint.
> 
> I am back and unfortunately we also have (added by a later patch):
> 
>  294         napi_disable(&peer->napi);
>  295         netif_napi_del(&peer->napi);

Do you need the napi instance to be per peer, or can it be per
netdevice? If it's per netdevice you can clean it up in
->priv_destructor.

> that need to be executed in process context.
> So it seems I must fire up the worker anyway..

I hope with can simplify all that logic. There's some complexity
that's unavoidable in this kind of driver, but maybe not as much as
you've got here.

-- 
Sabrina


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 07/24] ovpn: introduce the ovpn_peer object
  2024-05-10 10:30                 ` Sabrina Dubroca
@ 2024-05-10 12:34                   ` Antonio Quartulli
  2024-05-10 14:11                     ` Sabrina Dubroca
  0 siblings, 1 reply; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-10 12:34 UTC (permalink / raw
  To: Sabrina Dubroca
  Cc: netdev, Jakub Kicinski, Sergey Ryazanov, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Esben Haabendal

On 10/05/2024 12:30, Sabrina Dubroca wrote:
> 2024-05-09, 16:53:42 +0200, Antonio Quartulli wrote:
>>
>>
>> On 09/05/2024 16:36, Antonio Quartulli wrote:
>>> On 09/05/2024 16:17, Sabrina Dubroca wrote:
>>>> 2024-05-09, 15:44:26 +0200, Antonio Quartulli wrote:
>>>>> On 09/05/2024 15:04, Sabrina Dubroca wrote:
>>>>>>>>> +void ovpn_peer_release(struct ovpn_peer *peer)
>>>>>>>>> +{
>>>>>>>>> +    call_rcu(&peer->rcu, ovpn_peer_release_rcu);
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +/**
>>>>>>>>> + * ovpn_peer_delete_work - work scheduled to
>>>>>>>>> release peer in process context
>>>>>>>>> + * @work: the work object
>>>>>>>>> + */
>>>>>>>>> +static void ovpn_peer_delete_work(struct work_struct *work)
>>>>>>>>> +{
>>>>>>>>> +    struct ovpn_peer *peer = container_of(work, struct ovpn_peer,
>>>>>>>>> +                          delete_work);
>>>>>>>>> +    ovpn_peer_release(peer);
>>>>>>>>
>>>>>>>> Does call_rcu really need to run in process context?
>>>>>>>
>>>>>>> Reason for switching to process context is that we have to invoke
>>>>>>> ovpn_nl_notify_del_peer (that sends a netlink event to
>>>>>>> userspace) and the
>>>>>>> latter requires a reference to the peer.
>>>>>>
>>>>>> I'm confused. When you say "requires a reference to the peer", do you
>>>>>> mean accessing fields of the peer object? I don't see why this
>>>>>> requires ovpn_nl_notify_del_peer to to run from process context.
>>>>>
>>>>> ovpn_nl_notify_del_peer sends a netlink message to userspace and
>>>>> I was under
>>>>> the impression that it may block/sleep, no?
>>>>> For this reason I assumed it must be executed in process context.
>>>>
>>>> With s/GFP_KERNEL/GFP_ATOMIC/, it should be ok to run from whatever
>>>> context. Firing up a workqueue just to send a 100B netlink message
>>>> seems a bit overkill.
>>>
>>> Oh ok, I thought the send could be a problem too.
>>>
>>> Will test with GFP_ATOMIC then. Thanks for the hint.
>>
>> I am back and unfortunately we also have (added by a later patch):
>>
>>   294         napi_disable(&peer->napi);
>>   295         netif_napi_del(&peer->napi);
> 
> Do you need the napi instance to be per peer, or can it be per
> netdevice? If it's per netdevice you can clean it up in
> ->priv_destructor.

In an ideal world, at some point I could leverage on multiple CPUs 
handling traffic from multiple peers, therefore every queue in the 
driver is per peer, NAPI included.

Does it make sense?

Now, whether this is truly feasible from the core perspective is 
something I don't know yet.

For sure, for the time being I could shrink everything to one queue.
There is one workqueue only encrypting/decrypting packets right now, 
therefore multiple NAPI queues are not truly useful at this time.


> 
>> that need to be executed in process context.
>> So it seems I must fire up the worker anyway..
> 
> I hope with can simplify all that logic. There's some complexity
> that's unavoidable in this kind of driver, but maybe not as much as
> you've got here.
> 

I am all for simplification, and rest assured that the current version 
is already much simpler than what it originally was :-)


-- 
Antonio Quartulli
OpenVPN Inc.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 09/24] ovpn: implement basic TX path (UDP)
  2024-05-06  1:16 ` [PATCH net-next v3 09/24] ovpn: implement basic TX path (UDP) Antonio Quartulli
@ 2024-05-10 13:01   ` Sabrina Dubroca
  2024-05-10 13:39     ` Antonio Quartulli
  2024-05-12 21:35   ` Sabrina Dubroca
  1 sibling, 1 reply; 111+ messages in thread
From: Sabrina Dubroca @ 2024-05-10 13:01 UTC (permalink / raw
  To: Antonio Quartulli
  Cc: netdev, Jakub Kicinski, Sergey Ryazanov, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Esben Haabendal

2024-05-06, 03:16:22 +0200, Antonio Quartulli wrote:
> diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c
> index a420bb45f25f..36cfb95edbf4 100644
> --- a/drivers/net/ovpn/io.c
> +++ b/drivers/net/ovpn/io.c
> @@ -28,6 +30,12 @@ int ovpn_struct_init(struct net_device *dev)
>  
>  	spin_lock_init(&ovpn->lock);
>  
> +	ovpn->crypto_wq = alloc_workqueue("ovpn-crypto-wq-%s",
> +					  WQ_CPU_INTENSIVE | WQ_MEM_RECLAIM, 0,
> +					  dev->name);
> +	if (!ovpn->crypto_wq)
> +		return -ENOMEM;
> +
>  	ovpn->events_wq = alloc_workqueue("ovpn-events-wq-%s", WQ_MEM_RECLAIM,
>  					  0, dev->name);
>  	if (!ovpn->events_wq)
>  		return -ENOMEM;

This will leak crypto_wq on failure. You need to roll back all
previous changes when something fails (also if you move all this stuff
into ndo_init).

> diff --git a/drivers/net/ovpn/peer.h b/drivers/net/ovpn/peer.h
> index 659df320525c..f915afa260c3 100644
> --- a/drivers/net/ovpn/peer.h
> +++ b/drivers/net/ovpn/peer.h
> @@ -22,9 +23,12 @@
>   * @id: unique identifier
>   * @vpn_addrs.ipv4: IPv4 assigned to peer on the tunnel
>   * @vpn_addrs.ipv6: IPv6 assigned to peer on the tunnel
> + * @encrypt_work: work used to process outgoing packets
> + * @decrypt_work: work used to process incoming packets

nit: Only encrypt_work is used in this patch, decrypt_work is for RX


> diff --git a/drivers/net/ovpn/udp.c b/drivers/net/ovpn/udp.c
> index 4b7d96a13df0..f434da76dc0a 100644
> --- a/drivers/net/ovpn/udp.c
> +++ b/drivers/net/ovpn/udp.c
> +/**
> + * ovpn_udp4_output - send IPv4 packet over udp socket
> + * @ovpn: the openvpn instance
> + * @bind: the binding related to the destination peer
> + * @cache: dst cache
> + * @sk: the socket to send the packet over
> + * @skb: the packet to send
> + *
> + * Return: 0 on success or a negative error code otherwise
> + */
> +static int ovpn_udp4_output(struct ovpn_struct *ovpn, struct ovpn_bind *bind,
> +			    struct dst_cache *cache, struct sock *sk,
> +			    struct sk_buff *skb)
> +{
> +	struct rtable *rt;
> +	struct flowi4 fl = {
> +		.saddr = bind->local.ipv4.s_addr,
> +		.daddr = bind->sa.in4.sin_addr.s_addr,
> +		.fl4_sport = inet_sk(sk)->inet_sport,
> +		.fl4_dport = bind->sa.in4.sin_port,
> +		.flowi4_proto = sk->sk_protocol,
> +		.flowi4_mark = sk->sk_mark,
> +	};
> +	int ret;
> +
> +	local_bh_disable();
> +	rt = dst_cache_get_ip4(cache, &fl.saddr);
> +	if (rt)
> +		goto transmit;
> +
> +	if (unlikely(!inet_confirm_addr(sock_net(sk), NULL, 0, fl.saddr,
> +					RT_SCOPE_HOST))) {
> +		/* we may end up here when the cached address is not usable
> +		 * anymore. In this case we reset address/cache and perform a
> +		 * new look up

What exactly are you trying to guard against here? The ipv4 address
used for the last packet being removed from the device/host? I don't
see other tunnels using dst_cache doing this kind of thing (except
wireguard).

> +		 */
> +		fl.saddr = 0;
> +		bind->local.ipv4.s_addr = 0;
> +		dst_cache_reset(cache);
> +	}
> +
> +	rt = ip_route_output_flow(sock_net(sk), &fl, sk);
> +	if (IS_ERR(rt) && PTR_ERR(rt) == -EINVAL) {
> +		fl.saddr = 0;
> +		bind->local.ipv4.s_addr = 0;
> +		dst_cache_reset(cache);
> +
> +		rt = ip_route_output_flow(sock_net(sk), &fl, sk);

Why do you need to repeat the lookup? And why only for ipv4, but not
for ipv6?

> +	}
> +
> +	if (IS_ERR(rt)) {
> +		ret = PTR_ERR(rt);
> +		net_dbg_ratelimited("%s: no route to host %pISpc: %d\n",
> +				    ovpn->dev->name, &bind->sa.in4, ret);
> +		goto err;
> +	}
> +	dst_cache_set_ip4(cache, &rt->dst, fl.saddr);

Overall this looks a whole lot like udp_tunnel_dst_lookup, except for:
 - 2nd lookup
 - inet_confirm_addr/dst_cache_reset

(and there's udp_tunnel6_dst_lookup for ipv6)

-- 
Sabrina


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 09/24] ovpn: implement basic TX path (UDP)
  2024-05-10 13:01   ` Sabrina Dubroca
@ 2024-05-10 13:39     ` Antonio Quartulli
  0 siblings, 0 replies; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-10 13:39 UTC (permalink / raw
  To: Sabrina Dubroca
  Cc: netdev, Jakub Kicinski, Sergey Ryazanov, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Esben Haabendal

On 10/05/2024 15:01, Sabrina Dubroca wrote:
> 2024-05-06, 03:16:22 +0200, Antonio Quartulli wrote:
>> diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c
>> index a420bb45f25f..36cfb95edbf4 100644
>> --- a/drivers/net/ovpn/io.c
>> +++ b/drivers/net/ovpn/io.c
>> @@ -28,6 +30,12 @@ int ovpn_struct_init(struct net_device *dev)
>>   
>>   	spin_lock_init(&ovpn->lock);
>>   
>> +	ovpn->crypto_wq = alloc_workqueue("ovpn-crypto-wq-%s",
>> +					  WQ_CPU_INTENSIVE | WQ_MEM_RECLAIM, 0,
>> +					  dev->name);
>> +	if (!ovpn->crypto_wq)
>> +		return -ENOMEM;
>> +
>>   	ovpn->events_wq = alloc_workqueue("ovpn-events-wq-%s", WQ_MEM_RECLAIM,
>>   					  0, dev->name);
>>   	if (!ovpn->events_wq)
>>   		return -ENOMEM;
> 
> This will leak crypto_wq on failure. You need to roll back all
> previous changes when something fails (also if you move all this stuff
> into ndo_init).

ouch, good catch! thanks.

> 
>> diff --git a/drivers/net/ovpn/peer.h b/drivers/net/ovpn/peer.h
>> index 659df320525c..f915afa260c3 100644
>> --- a/drivers/net/ovpn/peer.h
>> +++ b/drivers/net/ovpn/peer.h
>> @@ -22,9 +23,12 @@
>>    * @id: unique identifier
>>    * @vpn_addrs.ipv4: IPv4 assigned to peer on the tunnel
>>    * @vpn_addrs.ipv6: IPv6 assigned to peer on the tunnel
>> + * @encrypt_work: work used to process outgoing packets
>> + * @decrypt_work: work used to process incoming packets
> 
> nit: Only encrypt_work is used in this patch, decrypt_work is for RX

Right, same for tx_ring actually. will move both to the next patch

> 
> 
>> diff --git a/drivers/net/ovpn/udp.c b/drivers/net/ovpn/udp.c
>> index 4b7d96a13df0..f434da76dc0a 100644
>> --- a/drivers/net/ovpn/udp.c
>> +++ b/drivers/net/ovpn/udp.c
>> +/**
>> + * ovpn_udp4_output - send IPv4 packet over udp socket
>> + * @ovpn: the openvpn instance
>> + * @bind: the binding related to the destination peer
>> + * @cache: dst cache
>> + * @sk: the socket to send the packet over
>> + * @skb: the packet to send
>> + *
>> + * Return: 0 on success or a negative error code otherwise
>> + */
>> +static int ovpn_udp4_output(struct ovpn_struct *ovpn, struct ovpn_bind *bind,
>> +			    struct dst_cache *cache, struct sock *sk,
>> +			    struct sk_buff *skb)
>> +{
>> +	struct rtable *rt;
>> +	struct flowi4 fl = {
>> +		.saddr = bind->local.ipv4.s_addr,
>> +		.daddr = bind->sa.in4.sin_addr.s_addr,
>> +		.fl4_sport = inet_sk(sk)->inet_sport,
>> +		.fl4_dport = bind->sa.in4.sin_port,
>> +		.flowi4_proto = sk->sk_protocol,
>> +		.flowi4_mark = sk->sk_mark,
>> +	};
>> +	int ret;
>> +
>> +	local_bh_disable();
>> +	rt = dst_cache_get_ip4(cache, &fl.saddr);
>> +	if (rt)
>> +		goto transmit;
>> +
>> +	if (unlikely(!inet_confirm_addr(sock_net(sk), NULL, 0, fl.saddr,
>> +					RT_SCOPE_HOST))) {
>> +		/* we may end up here when the cached address is not usable
>> +		 * anymore. In this case we reset address/cache and perform a
>> +		 * new look up
> 
> What exactly are you trying to guard against here? The ipv4 address
> used for the last packet being removed from the device/host? I don't
> see other tunnels using dst_cache doing this kind of thing (except
> wireguard).

yes, that's the scenario being checked (which hopefully is what the 
comment conveys).

> 
>> +		 */
>> +		fl.saddr = 0;
>> +		bind->local.ipv4.s_addr = 0;
>> +		dst_cache_reset(cache);
>> +	}
>> +
>> +	rt = ip_route_output_flow(sock_net(sk), &fl, sk);
>> +	if (IS_ERR(rt) && PTR_ERR(rt) == -EINVAL) {
>> +		fl.saddr = 0;
>> +		bind->local.ipv4.s_addr = 0;
>> +		dst_cache_reset(cache);
>> +
>> +		rt = ip_route_output_flow(sock_net(sk), &fl, sk);
> 
> Why do you need to repeat the lookup? And why only for ipv4, but not
> for ipv6?

We are repeating the lookup without the saddr.

The first lookup may have failed because the destination is not 
reachable anymore from that specific source address, but it may be 
reachable from another one (i.e. routing table change in a multi-homed 
setup).

Why not for v6..that's a good question..I wonder if I should just do the 
same and repeat the lookup with ip6addr_any as source..I think it would 
make sense as we could end up in the same scenario as described for IPv4.

What do you think?

> 
>> +	}
>> +
>> +	if (IS_ERR(rt)) {
>> +		ret = PTR_ERR(rt);
>> +		net_dbg_ratelimited("%s: no route to host %pISpc: %d\n",
>> +				    ovpn->dev->name, &bind->sa.in4, ret);
>> +		goto err;
>> +	}
>> +	dst_cache_set_ip4(cache, &rt->dst, fl.saddr);
> 
> Overall this looks a whole lot like udp_tunnel_dst_lookup, except for:
>   - 2nd lookup
>   - inet_confirm_addr/dst_cache_reset

but why doesn't udp_tunnel_dst_lookup account for cases where the source 
address is not usable anymore? I think they are reasonable, no?

Maybe I could still use udp_tunnel_dst_lookup, but call it a second time 
without saddr in case of failure?

> 
> (and there's udp_tunnel6_dst_lookup for ipv6)
> 

-- 
Antonio Quartulli
OpenVPN Inc.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 10/24] ovpn: implement basic RX path (UDP)
  2024-05-06  1:16 ` [PATCH net-next v3 10/24] ovpn: implement basic RX " Antonio Quartulli
@ 2024-05-10 13:45   ` Sabrina Dubroca
  2024-05-10 14:41     ` Antonio Quartulli
  0 siblings, 1 reply; 111+ messages in thread
From: Sabrina Dubroca @ 2024-05-10 13:45 UTC (permalink / raw
  To: Antonio Quartulli
  Cc: netdev, Jakub Kicinski, Sergey Ryazanov, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Esben Haabendal

2024-05-06, 03:16:23 +0200, Antonio Quartulli wrote:
> diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c
> index 36cfb95edbf4..9935a863bffe 100644
> --- a/drivers/net/ovpn/io.c
> +++ b/drivers/net/ovpn/io.c
> +/* Called after decrypt to write the IP packet to the device.
> + * This method is expected to manage/free the skb.
> + */
> +static void ovpn_netdev_write(struct ovpn_peer *peer, struct sk_buff *skb)
> +{
> +	/* packet integrity was verified on the VPN layer - no need to perform
> +	 * any additional check along the stack

But it could have been corrupted before it got into the VPN?

> +	 */
> +	skb->ip_summed = CHECKSUM_UNNECESSARY;
> +	skb->csum_level = ~0;
> +

[...]
> +int ovpn_napi_poll(struct napi_struct *napi, int budget)
> +{
> +	struct ovpn_peer *peer = container_of(napi, struct ovpn_peer, napi);
> +	struct sk_buff *skb;
> +	int work_done = 0;
> +
> +	if (unlikely(budget <= 0))
> +		return 0;
> +	/* this function should schedule at most 'budget' number of
> +	 * packets for delivery to the interface.
> +	 * If in the queue we have more packets than what allowed by the
> +	 * budget, the next polling will take care of those
> +	 */
> +	while ((work_done < budget) &&
> +	       (skb = ptr_ring_consume_bh(&peer->netif_rx_ring))) {
> +		ovpn_netdev_write(peer, skb);
> +		work_done++;
> +	}
> +
> +	if (work_done < budget)
> +		napi_complete_done(napi, work_done);
> +
> +	return work_done;
> +}

Why not use gro_cells? It would avoid all that napi polling and
netif_rx_ring code (and it's per-cpu, going back to our other
discussion around napi).


> diff --git a/drivers/net/ovpn/proto.h b/drivers/net/ovpn/proto.h
> new file mode 100644
> index 000000000000..0a51104ed931
> --- /dev/null
> +++ b/drivers/net/ovpn/proto.h
[...]
> +/**
> + * ovpn_key_id_from_skb - extract key ID from the skb head
> + * @skb: the packet to extract the key ID code from
> + *
> + * Note: this function assumes that the skb head was pulled enough
> + * to access the first byte.
> + *
> + * Return: the key ID
> + */
> +static inline u8 ovpn_key_id_from_skb(const struct sk_buff *skb)

> +static inline u32 ovpn_opcode_compose(u8 opcode, u8 key_id, u32 peer_id)

(tiny nit: those aren't used yet in this patch. probably not worth
moving them into the right patch.)


> diff --git a/drivers/net/ovpn/udp.c b/drivers/net/ovpn/udp.c
> index f434da76dc0a..07182703e598 100644
> --- a/drivers/net/ovpn/udp.c
> +++ b/drivers/net/ovpn/udp.c
> @@ -20,9 +20,117 @@
>  #include "bind.h"
>  #include "io.h"
>  #include "peer.h"
> +#include "proto.h"
>  #include "socket.h"
>  #include "udp.h"
>  
> +/**
> + * ovpn_udp_encap_recv - Start processing a received UDP packet.
> + * @sk: socket over which the packet was received
> + * @skb: the received packet
> + *
> + * If the first byte of the payload is DATA_V2, the packet is further processed,
> + * otherwise it is forwarded to the UDP stack for delivery to user space.
> + *
> + * Return:
> + *  0 if skb was consumed or dropped
> + * >0 if skb should be passed up to userspace as UDP (packet not consumed)
> + * <0 if skb should be resubmitted as proto -N (packet not consumed)
> + */
> +static int ovpn_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
> +{
> +	struct ovpn_peer *peer = NULL;
> +	struct ovpn_struct *ovpn;
> +	u32 peer_id;
> +	u8 opcode;
> +	int ret;
> +
> +	ovpn = ovpn_from_udp_sock(sk);
> +	if (unlikely(!ovpn)) {
> +		net_err_ratelimited("%s: cannot obtain ovpn object from UDP socket\n",
> +				    __func__);
> +		goto drop;
> +	}
> +
> +	/* Make sure the first 4 bytes of the skb data buffer after the UDP
> +	 * header are accessible.
> +	 * They are required to fetch the OP code, the key ID and the peer ID.
> +	 */
> +	if (unlikely(!pskb_may_pull(skb, sizeof(struct udphdr) + 4))) {

Is this OVPN_OP_SIZE_V2?

> +		net_dbg_ratelimited("%s: packet too small\n", __func__);
> +		goto drop;
> +	}
> +
> +	opcode = ovpn_opcode_from_skb(skb, sizeof(struct udphdr));
> +	if (unlikely(opcode != OVPN_DATA_V2)) {
> +		/* DATA_V1 is not supported */
> +		if (opcode == OVPN_DATA_V1)
> +			goto drop;
> +
> +		/* unknown or control packet: let it bubble up to userspace */
> +		return 1;
> +	}
> +
> +	peer_id = ovpn_peer_id_from_skb(skb, sizeof(struct udphdr));
> +	/* some OpenVPN server implementations send data packets with the
> +	 * peer-id set to undef. In this case we skip the peer lookup by peer-id
> +	 * and we try with the transport address
> +	 */
> +	if (peer_id != OVPN_PEER_ID_UNDEF) {
> +		peer = ovpn_peer_get_by_id(ovpn, peer_id);
> +		if (!peer) {
> +			net_err_ratelimited("%s: received data from unknown peer (id: %d)\n",
> +					    __func__, peer_id);
> +			goto drop;
> +		}
> +	}
> +
> +	if (!peer) {
> +		/* data packet with undef peer-id */
> +		peer = ovpn_peer_get_by_transp_addr(ovpn, skb);
> +		if (unlikely(!peer)) {
> +			netdev_dbg(ovpn->dev,
> +				   "%s: received data with undef peer-id from unknown source\n",
> +				   __func__);

_ratelimited?

> +			goto drop;
> +		}
> +	}
> +
> +	/* At this point we know the packet is from a configured peer.
> +	 * DATA_V2 packets are handled in kernel space, the rest goes to user
> +	 * space.
> +	 *
> +	 * Return 1 to instruct the stack to let the packet bubble up to
> +	 * userspace
> +	 */
> +	if (unlikely(opcode != OVPN_DATA_V2)) {

You already handled those earlier, before getting the peer.


[...]
> @@ -255,10 +368,20 @@ int ovpn_udp_socket_attach(struct socket *sock, struct ovpn_struct *ovpn)
>  			return -EALREADY;
>  		}
>  
> -		netdev_err(ovpn->dev, "%s: provided socket already taken by other user\n",
> +		netdev_err(ovpn->dev,
> +			   "%s: provided socket already taken by other user\n",

I guess you meant to break that line in the patch that introduced it,
rather than here? :)


> +void ovpn_udp_socket_detach(struct socket *sock)
> +{
> +	struct udp_tunnel_sock_cfg cfg = { };
> +
> +	setup_udp_tunnel_sock(sock_net(sock->sk), sock, &cfg);

I can't find anything in the kernel currently using
setup_udp_tunnel_sock the way you're using it here.

Does this provide any benefit compared to just letting the kernel
disable encap when the socket goes away? Are you planning to detach
and then re-attach the same socket?

-- 
Sabrina


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 07/24] ovpn: introduce the ovpn_peer object
  2024-05-10 12:34                   ` Antonio Quartulli
@ 2024-05-10 14:11                     ` Sabrina Dubroca
  0 siblings, 0 replies; 111+ messages in thread
From: Sabrina Dubroca @ 2024-05-10 14:11 UTC (permalink / raw
  To: Antonio Quartulli
  Cc: netdev, Jakub Kicinski, Sergey Ryazanov, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Esben Haabendal

2024-05-10, 14:34:16 +0200, Antonio Quartulli wrote:
> On 10/05/2024 12:30, Sabrina Dubroca wrote:
> > 2024-05-09, 16:53:42 +0200, Antonio Quartulli wrote:
> > > 
> > > 
> > > On 09/05/2024 16:36, Antonio Quartulli wrote:
> > > > On 09/05/2024 16:17, Sabrina Dubroca wrote:
> > > > > 2024-05-09, 15:44:26 +0200, Antonio Quartulli wrote:
> > > > > > On 09/05/2024 15:04, Sabrina Dubroca wrote:
> > > > > > > > > > +void ovpn_peer_release(struct ovpn_peer *peer)
> > > > > > > > > > +{
> > > > > > > > > > +    call_rcu(&peer->rcu, ovpn_peer_release_rcu);
> > > > > > > > > > +}
> > > > > > > > > > +
> > > > > > > > > > +/**
> > > > > > > > > > + * ovpn_peer_delete_work - work scheduled to
> > > > > > > > > > release peer in process context
> > > > > > > > > > + * @work: the work object
> > > > > > > > > > + */
> > > > > > > > > > +static void ovpn_peer_delete_work(struct work_struct *work)
> > > > > > > > > > +{
> > > > > > > > > > +    struct ovpn_peer *peer = container_of(work, struct ovpn_peer,
> > > > > > > > > > +                          delete_work);
> > > > > > > > > > +    ovpn_peer_release(peer);
> > > > > > > > > 
> > > > > > > > > Does call_rcu really need to run in process context?
> > > > > > > > 
> > > > > > > > Reason for switching to process context is that we have to invoke
> > > > > > > > ovpn_nl_notify_del_peer (that sends a netlink event to
> > > > > > > > userspace) and the
> > > > > > > > latter requires a reference to the peer.
> > > > > > > 
> > > > > > > I'm confused. When you say "requires a reference to the peer", do you
> > > > > > > mean accessing fields of the peer object? I don't see why this
> > > > > > > requires ovpn_nl_notify_del_peer to to run from process context.
> > > > > > 
> > > > > > ovpn_nl_notify_del_peer sends a netlink message to userspace and
> > > > > > I was under
> > > > > > the impression that it may block/sleep, no?
> > > > > > For this reason I assumed it must be executed in process context.
> > > > > 
> > > > > With s/GFP_KERNEL/GFP_ATOMIC/, it should be ok to run from whatever
> > > > > context. Firing up a workqueue just to send a 100B netlink message
> > > > > seems a bit overkill.
> > > > 
> > > > Oh ok, I thought the send could be a problem too.
> > > > 
> > > > Will test with GFP_ATOMIC then. Thanks for the hint.
> > > 
> > > I am back and unfortunately we also have (added by a later patch):
> > > 
> > >   294         napi_disable(&peer->napi);
> > >   295         netif_napi_del(&peer->napi);
> > 
> > Do you need the napi instance to be per peer, or can it be per
> > netdevice? If it's per netdevice you can clean it up in
> > ->priv_destructor.
> 
> In an ideal world, at some point I could leverage on multiple CPUs handling
> traffic from multiple peers, therefore every queue in the driver is per
> peer, NAPI included.

But for that they could also be per-CPU at the device level, instead
of being created at the peer level.

> Does it make sense?
> 
> Now, whether this is truly feasible from the core perspective is something I
> don't know yet.

If those packets arrive on different CPUs/queues, probably.

> For sure, for the time being I could shrink everything to one queue.
> There is one workqueue only encrypting/decrypting packets right now,
> therefore multiple NAPI queues are not truly useful at this time.
> 
> > 
> > > that need to be executed in process context.
> > > So it seems I must fire up the worker anyway..
> > 
> > I hope with can simplify all that logic. There's some complexity
> > that's unavoidable in this kind of driver, but maybe not as much as
> > you've got here.
> > 
> 
> I am all for simplification, and rest assured that the current version is
> already much simpler than what it originally was :-)

Ouch :)

-- 
Sabrina


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 10/24] ovpn: implement basic RX path (UDP)
  2024-05-10 13:45   ` Sabrina Dubroca
@ 2024-05-10 14:41     ` Antonio Quartulli
  0 siblings, 0 replies; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-10 14:41 UTC (permalink / raw
  To: Sabrina Dubroca
  Cc: netdev, Jakub Kicinski, Sergey Ryazanov, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Esben Haabendal

On 10/05/2024 15:45, Sabrina Dubroca wrote:
> 2024-05-06, 03:16:23 +0200, Antonio Quartulli wrote:
>> diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c
>> index 36cfb95edbf4..9935a863bffe 100644
>> --- a/drivers/net/ovpn/io.c
>> +++ b/drivers/net/ovpn/io.c
>> +/* Called after decrypt to write the IP packet to the device.
>> + * This method is expected to manage/free the skb.
>> + */
>> +static void ovpn_netdev_write(struct ovpn_peer *peer, struct sk_buff *skb)
>> +{
>> +	/* packet integrity was verified on the VPN layer - no need to perform
>> +	 * any additional check along the stack
> 
> But it could have been corrupted before it got into the VPN?

It could, but I believe a VPN should only take care of integrity along 
its tunnel (and this is guaranteed by the OpenVPN protocol).
If something corrupted enters the tunnel, we will just deliver it as is 
to the other end. Upper layers (where the corruption actually happened) 
have to deal with that.

> 
>> +	 */
>> +	skb->ip_summed = CHECKSUM_UNNECESSARY;
>> +	skb->csum_level = ~0;
>> +
> 
> [...]
>> +int ovpn_napi_poll(struct napi_struct *napi, int budget)
>> +{
>> +	struct ovpn_peer *peer = container_of(napi, struct ovpn_peer, napi);
>> +	struct sk_buff *skb;
>> +	int work_done = 0;
>> +
>> +	if (unlikely(budget <= 0))
>> +		return 0;
>> +	/* this function should schedule at most 'budget' number of
>> +	 * packets for delivery to the interface.
>> +	 * If in the queue we have more packets than what allowed by the
>> +	 * budget, the next polling will take care of those
>> +	 */
>> +	while ((work_done < budget) &&
>> +	       (skb = ptr_ring_consume_bh(&peer->netif_rx_ring))) {
>> +		ovpn_netdev_write(peer, skb);
>> +		work_done++;
>> +	}
>> +
>> +	if (work_done < budget)
>> +		napi_complete_done(napi, work_done);
>> +
>> +	return work_done;
>> +}
> 
> Why not use gro_cells?

First because I did not know they existed :-)

> It would avoid all that napi polling and
> netif_rx_ring code (and it's per-cpu, going back to our other
> discussion around napi).

This sounds truly appealing. And if we can make this per-cpu by design, 
I believe we can definitely drop the per-peer NAPI logic.

> 
> 
>> diff --git a/drivers/net/ovpn/proto.h b/drivers/net/ovpn/proto.h
>> new file mode 100644
>> index 000000000000..0a51104ed931
>> --- /dev/null
>> +++ b/drivers/net/ovpn/proto.h
> [...]
>> +/**
>> + * ovpn_key_id_from_skb - extract key ID from the skb head
>> + * @skb: the packet to extract the key ID code from
>> + *
>> + * Note: this function assumes that the skb head was pulled enough
>> + * to access the first byte.
>> + *
>> + * Return: the key ID
>> + */
>> +static inline u8 ovpn_key_id_from_skb(const struct sk_buff *skb)
> 
>> +static inline u32 ovpn_opcode_compose(u8 opcode, u8 key_id, u32 peer_id)
> 
> (tiny nit: those aren't used yet in this patch. probably not worth
> moving them into the right patch.)

ouch. I am already going at a speed of 20-25rph (Rebases Per Hour).
It shouldn't be a problem to clean this up too.

> 
> 
>> diff --git a/drivers/net/ovpn/udp.c b/drivers/net/ovpn/udp.c
>> index f434da76dc0a..07182703e598 100644
>> --- a/drivers/net/ovpn/udp.c
>> +++ b/drivers/net/ovpn/udp.c
>> @@ -20,9 +20,117 @@
>>   #include "bind.h"
>>   #include "io.h"
>>   #include "peer.h"
>> +#include "proto.h"
>>   #include "socket.h"
>>   #include "udp.h"
>>   
>> +/**
>> + * ovpn_udp_encap_recv - Start processing a received UDP packet.
>> + * @sk: socket over which the packet was received
>> + * @skb: the received packet
>> + *
>> + * If the first byte of the payload is DATA_V2, the packet is further processed,
>> + * otherwise it is forwarded to the UDP stack for delivery to user space.
>> + *
>> + * Return:
>> + *  0 if skb was consumed or dropped
>> + * >0 if skb should be passed up to userspace as UDP (packet not consumed)
>> + * <0 if skb should be resubmitted as proto -N (packet not consumed)
>> + */
>> +static int ovpn_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
>> +{
>> +	struct ovpn_peer *peer = NULL;
>> +	struct ovpn_struct *ovpn;
>> +	u32 peer_id;
>> +	u8 opcode;
>> +	int ret;
>> +
>> +	ovpn = ovpn_from_udp_sock(sk);
>> +	if (unlikely(!ovpn)) {
>> +		net_err_ratelimited("%s: cannot obtain ovpn object from UDP socket\n",
>> +				    __func__);
>> +		goto drop;
>> +	}
>> +
>> +	/* Make sure the first 4 bytes of the skb data buffer after the UDP
>> +	 * header are accessible.
>> +	 * They are required to fetch the OP code, the key ID and the peer ID.
>> +	 */
>> +	if (unlikely(!pskb_may_pull(skb, sizeof(struct udphdr) + 4))) {
> 
> Is this OVPN_OP_SIZE_V2?

It is! I will use that define. thanks

> 
>> +		net_dbg_ratelimited("%s: packet too small\n", __func__);
>> +		goto drop;
>> +	}
>> +
>> +	opcode = ovpn_opcode_from_skb(skb, sizeof(struct udphdr));
>> +	if (unlikely(opcode != OVPN_DATA_V2)) {
>> +		/* DATA_V1 is not supported */
>> +		if (opcode == OVPN_DATA_V1)
>> +			goto drop;
>> +
>> +		/* unknown or control packet: let it bubble up to userspace */
>> +		return 1;
>> +	}
>> +
>> +	peer_id = ovpn_peer_id_from_skb(skb, sizeof(struct udphdr));
>> +	/* some OpenVPN server implementations send data packets with the
>> +	 * peer-id set to undef. In this case we skip the peer lookup by peer-id
>> +	 * and we try with the transport address
>> +	 */
>> +	if (peer_id != OVPN_PEER_ID_UNDEF) {
>> +		peer = ovpn_peer_get_by_id(ovpn, peer_id);
>> +		if (!peer) {
>> +			net_err_ratelimited("%s: received data from unknown peer (id: %d)\n",
>> +					    __func__, peer_id);
>> +			goto drop;
>> +		}
>> +	}
>> +
>> +	if (!peer) {
>> +		/* data packet with undef peer-id */
>> +		peer = ovpn_peer_get_by_transp_addr(ovpn, skb);
>> +		if (unlikely(!peer)) {
>> +			netdev_dbg(ovpn->dev,
>> +				   "%s: received data with undef peer-id from unknown source\n",
>> +				   __func__);
> 
> _ratelimited?

makes sense. will use net_dbg_ratelimited

> 
>> +			goto drop;
>> +		}
>> +	}
>> +
>> +	/* At this point we know the packet is from a configured peer.
>> +	 * DATA_V2 packets are handled in kernel space, the rest goes to user
>> +	 * space.
>> +	 *
>> +	 * Return 1 to instruct the stack to let the packet bubble up to
>> +	 * userspace
>> +	 */
>> +	if (unlikely(opcode != OVPN_DATA_V2)) {
> 
> You already handled those earlier, before getting the peer.

ouch..you're right. This can just go.

> 
> 
> [...]
>> @@ -255,10 +368,20 @@ int ovpn_udp_socket_attach(struct socket *sock, struct ovpn_struct *ovpn)
>>   			return -EALREADY;
>>   		}
>>   
>> -		netdev_err(ovpn->dev, "%s: provided socket already taken by other user\n",
>> +		netdev_err(ovpn->dev,
>> +			   "%s: provided socket already taken by other user\n",
> 
> I guess you meant to break that line in the patch that introduced it,
> rather than here? :)

indeed.

> 
> 
>> +void ovpn_udp_socket_detach(struct socket *sock)
>> +{
>> +	struct udp_tunnel_sock_cfg cfg = { };
>> +
>> +	setup_udp_tunnel_sock(sock_net(sock->sk), sock, &cfg);
> 
> I can't find anything in the kernel currently using
> setup_udp_tunnel_sock the way you're using it here.
> 
> Does this provide any benefit compared to just letting the kernel
> disable encap when the socket goes away? Are you planning to detach
> and then re-attach the same socket?

Technically, we don't know what happens to this socket after we detach.
We have no guarantee that it will be closed.

Right now we detach when the instance is closed, so it's likely that the 
socket will go, but I don't want to make hard assumptions about what 
userspace may decide to do with this socket in the future.

If it doesn't hurt, why not doing this easy cleanup?


Thanks!

> 

-- 
Antonio Quartulli
OpenVPN Inc.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 07/24] ovpn: introduce the ovpn_peer object
  2024-05-09 13:24         ` Andrew Lunn
@ 2024-05-10 18:57           ` Antonio Quartulli
  2024-05-11  0:28             ` Jakub Kicinski
  0 siblings, 1 reply; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-10 18:57 UTC (permalink / raw
  To: Andrew Lunn, Sabrina Dubroca
  Cc: netdev, Jakub Kicinski, Sergey Ryazanov, Paolo Abeni,
	Eric Dumazet, Esben Haabendal

On 09/05/2024 15:24, Andrew Lunn wrote:
> On Thu, May 09, 2024 at 03:04:36PM +0200, Sabrina Dubroca wrote:
>> 2024-05-08, 22:31:51 +0200, Antonio Quartulli wrote:
>>> On 08/05/2024 18:06, Sabrina Dubroca wrote:
>>>> 2024-05-06, 03:16:20 +0200, Antonio Quartulli wrote:
>>>>> diff --git a/drivers/net/ovpn/ovpnstruct.h b/drivers/net/ovpn/ovpnstruct.h
>>>>> index ee05b8a2c61d..b79d4f0474b0 100644
>>>>> --- a/drivers/net/ovpn/ovpnstruct.h
>>>>> +++ b/drivers/net/ovpn/ovpnstruct.h
>>>>> @@ -17,12 +17,19 @@
>>>>>     * @dev: the actual netdev representing the tunnel
>>>>>     * @registered: whether dev is still registered with netdev or not
>>>>>     * @mode: device operation mode (i.e. p2p, mp, ..)
>>>>> + * @lock: protect this object
>>>>> + * @event_wq: used to schedule generic events that may sleep and that need to be
>>>>> + *            performed outside of softirq context
>>>>> + * @peer: in P2P mode, this is the only remote peer
>>>>>     * @dev_list: entry for the module wide device list
>>>>>     */
>>>>>    struct ovpn_struct {
>>>>>    	struct net_device *dev;
>>>>>    	bool registered;
>>>>>    	enum ovpn_mode mode;
>>>>> +	spinlock_t lock; /* protect writing to the ovpn_struct object */
>>>>
>>>> nit: the comment isn't really needed since you have kdoc saying the same thing
>>>
>>> True, but checkpatch.pl (or some other script?) was still throwing a
>>> warning, therefore I added this comment to silence it.
>>
>> Ok, then I guess the comment (and the other one below) can stay. That
>> sounds like a checkpatch.pl bug.
> 
> I suspect it is more complex than that. checkpatch does not understand
> kdoc. It just knows the rule that there should be a comment next to a
> lock, hopefully indicating what the lock protects. In order to fix
> this, checkpatch would need to somehow invoke the kdoc parser, and ask
> it if the lock has kdoc documentation.
> 
> I suspect we are just going to have to live with this.

since we are now requiring new code to always have kdoc, can't we just 
drop the checkpatch warning?

Regards,


-- 
Antonio Quartulli
OpenVPN Inc.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 07/24] ovpn: introduce the ovpn_peer object
  2024-05-10 18:57           ` Antonio Quartulli
@ 2024-05-11  0:28             ` Jakub Kicinski
  0 siblings, 0 replies; 111+ messages in thread
From: Jakub Kicinski @ 2024-05-11  0:28 UTC (permalink / raw
  To: Antonio Quartulli
  Cc: Andrew Lunn, Sabrina Dubroca, netdev, Sergey Ryazanov,
	Paolo Abeni, Eric Dumazet, Esben Haabendal

On Fri, 10 May 2024 20:57:33 +0200 Antonio Quartulli wrote:
> > I suspect it is more complex than that. checkpatch does not understand
> > kdoc. It just knows the rule that there should be a comment next to a
> > lock, hopefully indicating what the lock protects. In order to fix
> > this, checkpatch would need to somehow invoke the kdoc parser, and ask
> > it if the lock has kdoc documentation.
> > 
> > I suspect we are just going to have to live with this.  
> 
> since we are now requiring new code to always have kdoc, can't we just 
> drop the checkpatch warning?

I don't think we require kdoc, but I agree that the warning is rather
ineffective.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 11/24] ovpn: implement packet processing
  2024-05-06  1:16 ` [PATCH net-next v3 11/24] ovpn: implement packet processing Antonio Quartulli
@ 2024-05-12  8:46   ` Sabrina Dubroca
  2024-05-13  7:14     ` Antonio Quartulli
  2024-05-22 14:08     ` Antonio Quartulli
  0 siblings, 2 replies; 111+ messages in thread
From: Sabrina Dubroca @ 2024-05-12  8:46 UTC (permalink / raw
  To: Antonio Quartulli
  Cc: netdev, Jakub Kicinski, Sergey Ryazanov, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Esben Haabendal

2024-05-06, 03:16:24 +0200, Antonio Quartulli wrote:
> diff --git a/drivers/net/ovpn/bind.c b/drivers/net/ovpn/bind.c
> index c1f842c06e32..7240d1036fb7 100644
> --- a/drivers/net/ovpn/bind.c
> +++ b/drivers/net/ovpn/bind.c
> @@ -13,6 +13,7 @@
>  #include "ovpnstruct.h"
>  #include "io.h"
>  #include "bind.h"
> +#include "packet.h"
>  #include "peer.h"

You have a few hunks like that in this patch, adding an include to a
file that is otherwise not being modified. That's odd.

> diff --git a/drivers/net/ovpn/crypto.c b/drivers/net/ovpn/crypto.c
> new file mode 100644
> index 000000000000..98ef1ceb75e0
> --- /dev/null
> +++ b/drivers/net/ovpn/crypto.c
> @@ -0,0 +1,162 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*  OpenVPN data channel offload
> + *
> + *  Copyright (C) 2020-2024 OpenVPN, Inc.
> + *
> + *  Author:	James Yonan <james@openvpn.net>
> + *		Antonio Quartulli <antonio@openvpn.net>
> + */
> +
> +#include <linux/types.h>
> +#include <linux/net.h>
> +#include <linux/netdevice.h>
> +//#include <linux/skbuff.h>

That's also odd :)


[...]
> +/* Reset the ovpn_crypto_state object in a way that is atomic
> + * to RCU readers.
> + */
> +int ovpn_crypto_state_reset(struct ovpn_crypto_state *cs,
> +			    const struct ovpn_peer_key_reset *pkr)
> +	__must_hold(cs->mutex)
> +{
> +	struct ovpn_crypto_key_slot *old = NULL;
> +	struct ovpn_crypto_key_slot *new;
> +
> +	lockdep_assert_held(&cs->mutex);
> +
> +	new = ovpn_aead_crypto_key_slot_new(&pkr->key);

This doesn't need the lock to be held, you could move the lock to a
smaller section (only around the pointer swap).

> +	if (IS_ERR(new))
> +		return PTR_ERR(new);
> +
> +	switch (pkr->slot) {
> +	case OVPN_KEY_SLOT_PRIMARY:
> +		old = rcu_replace_pointer(cs->primary, new,
> +					  lockdep_is_held(&cs->mutex));
> +		break;
> +	case OVPN_KEY_SLOT_SECONDARY:
> +		old = rcu_replace_pointer(cs->secondary, new,
> +					  lockdep_is_held(&cs->mutex));
> +		break;
> +	default:
> +		goto free_key;

And validating pkr->slot before alloc could avoid a pointless
alloc/free (and simplify the code: once _new() has succeeded, no
failure can occur anymore).

> +	}
> +
> +	if (old)
> +		ovpn_crypto_key_slot_put(old);
> +
> +	return 0;
> +free_key:
> +	ovpn_crypto_key_slot_put(new);
> +	return -EINVAL;
> +}
> +
> +void ovpn_crypto_key_slot_delete(struct ovpn_crypto_state *cs,
> +				 enum ovpn_key_slot slot)
> +{
> +	struct ovpn_crypto_key_slot *ks = NULL;
> +
> +	mutex_lock(&cs->mutex);
> +	switch (slot) {
> +	case OVPN_KEY_SLOT_PRIMARY:
> +		ks = rcu_replace_pointer(cs->primary, NULL,
> +					 lockdep_is_held(&cs->mutex));
> +		break;
> +	case OVPN_KEY_SLOT_SECONDARY:
> +		ks = rcu_replace_pointer(cs->secondary, NULL,
> +					 lockdep_is_held(&cs->mutex));
> +		break;
> +	default:
> +		pr_warn("Invalid slot to release: %u\n", slot);
> +		break;
> +	}
> +	mutex_unlock(&cs->mutex);
> +
> +	if (!ks) {
> +		pr_debug("Key slot already released: %u\n", slot);

This will also be printed in case of an invalid argument, which would
be mildly confusing.

> +		return;
> +	}
> +	pr_debug("deleting key slot %u, key_id=%u\n", slot, ks->key_id);
> +
> +	ovpn_crypto_key_slot_put(ks);
> +}


> +static struct ovpn_crypto_key_slot *
> +ovpn_aead_crypto_key_slot_init(enum ovpn_cipher_alg alg,
> +			       const unsigned char *encrypt_key,
> +			       unsigned int encrypt_keylen,
> +			       const unsigned char *decrypt_key,
> +			       unsigned int decrypt_keylen,
> +			       const unsigned char *encrypt_nonce_tail,
> +			       unsigned int encrypt_nonce_tail_len,
> +			       const unsigned char *decrypt_nonce_tail,
> +			       unsigned int decrypt_nonce_tail_len,
> +			       u16 key_id)
> +{
[...]
> +
> +	if (sizeof(struct ovpn_nonce_tail) != encrypt_nonce_tail_len ||
> +	    sizeof(struct ovpn_nonce_tail) != decrypt_nonce_tail_len) {
> +		ret = -EINVAL;
> +		goto destroy_ks;
> +	}

Those checks could be done earlier, before bothering with any
allocations.

> +
> +	memcpy(ks->nonce_tail_xmit.u8, encrypt_nonce_tail,
> +	       sizeof(struct ovpn_nonce_tail));
> +	memcpy(ks->nonce_tail_recv.u8, decrypt_nonce_tail,
> +	       sizeof(struct ovpn_nonce_tail));
> +
> +	/* init packet ID generation/validation */
> +	ovpn_pktid_xmit_init(&ks->pid_xmit);
> +	ovpn_pktid_recv_init(&ks->pid_recv);
> +
> +	return ks;
> +
> +destroy_ks:
> +	ovpn_aead_crypto_key_slot_destroy(ks);
> +	return ERR_PTR(ret);
> +}
> +
> +struct ovpn_crypto_key_slot *
> +ovpn_aead_crypto_key_slot_new(const struct ovpn_key_config *kc)
> +{
> +	return ovpn_aead_crypto_key_slot_init(kc->cipher_alg,
> +					      kc->encrypt.cipher_key,
> +					      kc->encrypt.cipher_key_size,
> +					      kc->decrypt.cipher_key,
> +					      kc->decrypt.cipher_key_size,
> +					      kc->encrypt.nonce_tail,
> +					      kc->encrypt.nonce_tail_size,
> +					      kc->decrypt.nonce_tail,
> +					      kc->decrypt.nonce_tail_size,
> +					      kc->key_id);
> +}

Why the wrapper? You could just call ovpn_aead_crypto_key_slot_init
directly.

> diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c
> index 9935a863bffe..66a4c551c191 100644
> --- a/drivers/net/ovpn/io.c
> +++ b/drivers/net/ovpn/io.c
> @@ -110,6 +114,27 @@ int ovpn_napi_poll(struct napi_struct *napi, int budget)
>  	return work_done;
>  }
>  
> +/* Return IP protocol version from skb header.
> + * Return 0 if protocol is not IPv4/IPv6 or cannot be read.
> + */
> +static __be16 ovpn_ip_check_protocol(struct sk_buff *skb)

nit: if you put this function higher up in the patch that introduced
it, you wouldn't have to move it now

> +{
> +	__be16 proto = 0;
> +
> +	/* skb could be non-linear, make sure IP header is in non-fragmented
> +	 * part
> +	 */
> +	if (!pskb_network_may_pull(skb, sizeof(struct iphdr)))
> +		return 0;
> +
> +	if (ip_hdr(skb)->version == 4)
> +		proto = htons(ETH_P_IP);
> +	else if (ip_hdr(skb)->version == 6)
> +		proto = htons(ETH_P_IPV6);
> +
> +	return proto;
> +}
> +
>  /* Entry point for processing an incoming packet (in skb form)
>   *
>   * Enqueue the packet and schedule RX consumer.
> @@ -132,7 +157,81 @@ int ovpn_recv(struct ovpn_struct *ovpn, struct ovpn_peer *peer,
>  
>  static int ovpn_decrypt_one(struct ovpn_peer *peer, struct sk_buff *skb)
>  {
> -	return true;

I missed that in the RX patch, true isn't an int :)
Were you intending this function to be bool like ovpn_encrypt_one?
Since you're not actually using the returned value in the caller, it
would be reasonable, but you'd have to convert all the <0 error values
to bool.

> +	struct ovpn_peer *allowed_peer = NULL;
> +	struct ovpn_crypto_key_slot *ks;
> +	__be16 proto;
> +	int ret = -1;
> +	u8 key_id;
> +
> +	/* get the key slot matching the key Id in the received packet */
> +	key_id = ovpn_key_id_from_skb(skb);
> +	ks = ovpn_crypto_key_id_to_slot(&peer->crypto, key_id);
> +	if (unlikely(!ks)) {
> +		net_info_ratelimited("%s: no available key for peer %u, key-id: %u\n",
> +				     peer->ovpn->dev->name, peer->id, key_id);
> +		goto drop;
> +	}
> +
> +	/* decrypt */
> +	ret = ovpn_aead_decrypt(ks, skb);
> +
> +	ovpn_crypto_key_slot_put(ks);
> +
> +	if (unlikely(ret < 0)) {
> +		net_err_ratelimited("%s: error during decryption for peer %u, key-id %u: %d\n",
> +				    peer->ovpn->dev->name, peer->id, key_id,
> +				    ret);
> +		goto drop;
> +	}
> +
> +	/* check if this is a valid datapacket that has to be delivered to the
> +	 * tun interface

s/tun/ovpn/ ?

> +	 */
> +	skb_reset_network_header(skb);
> +	proto = ovpn_ip_check_protocol(skb);
> +	if (unlikely(!proto)) {
> +		/* check if null packet */
> +		if (unlikely(!pskb_may_pull(skb, 1))) {
> +			netdev_dbg(peer->ovpn->dev,
> +				   "NULL packet received from peer %u\n",
> +				   peer->id);
> +			ret = -EINVAL;
> +			goto drop;
> +		}
> +
> +		netdev_dbg(peer->ovpn->dev,
> +			   "unsupported protocol received from peer %u\n",
> +			   peer->id);
> +
> +		ret = -EPROTONOSUPPORT;
> +		goto drop;
> +	}
> +	skb->protocol = proto;
> +
> +	/* perform Reverse Path Filtering (RPF) */
> +	allowed_peer = ovpn_peer_get_by_src(peer->ovpn, skb);
> +	if (unlikely(allowed_peer != peer)) {
> +		if (skb_protocol_to_family(skb) == AF_INET6)
> +			net_warn_ratelimited("%s: RPF dropped packet from peer %u, src: %pI6c\n",
> +					     peer->ovpn->dev->name, peer->id,
> +					     &ipv6_hdr(skb)->saddr);
> +		else
> +			net_warn_ratelimited("%s: RPF dropped packet from peer %u, src: %pI4\n",
> +					     peer->ovpn->dev->name, peer->id,
> +					     &ip_hdr(skb)->saddr);
> +		ret = -EPERM;
> +		goto drop;
> +	}

Have you considered holding rcu_read_lock around this whole RPF check?
It would avoid taking a reference on the peer just to release it 3
lines later. And the same could likely be done for some of the other
ovpn_peer_get_* lookups too.


> +	ret = ptr_ring_produce_bh(&peer->netif_rx_ring, skb);
> +drop:
> +	if (likely(allowed_peer))
> +		ovpn_peer_put(allowed_peer);
> +
> +	if (unlikely(ret < 0))
> +		kfree_skb(skb);
> +
> +	return ret;

Mixing the drop/success returns looks kind of strange. This would be a
bit simpler:

ovpn_peer_put(allowed_peer);
return ptr_ring_produce_bh(&peer->netif_rx_ring, skb);

drop:
if (allowed_peer)
    ovpn_peer_put(allowed_peer);
kfree_skb(skb);
return ret;


> diff --git a/drivers/net/ovpn/packet.h b/drivers/net/ovpn/packet.h
> index 7ed146f5932a..e14c9bf464f7 100644
> --- a/drivers/net/ovpn/packet.h
> +++ b/drivers/net/ovpn/packet.h
> @@ -10,7 +10,7 @@
>  #ifndef _NET_OVPN_PACKET_H_
>  #define _NET_OVPN_PACKET_H_
>  
> -/* When the OpenVPN protocol is ran in AEAD mode, use
> +/* When the OpenVPN protocol is run in AEAD mode, use

nit: that typo came in earlier

-- 
Sabrina


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 12/24] ovpn: store tunnel and transport statistics
  2024-05-06  1:16 ` [PATCH net-next v3 12/24] ovpn: store tunnel and transport statistics Antonio Quartulli
@ 2024-05-12  8:47   ` Sabrina Dubroca
  2024-05-13  7:25     ` Antonio Quartulli
  0 siblings, 1 reply; 111+ messages in thread
From: Sabrina Dubroca @ 2024-05-12  8:47 UTC (permalink / raw
  To: Antonio Quartulli
  Cc: netdev, Jakub Kicinski, Sergey Ryazanov, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Esben Haabendal

2024-05-06, 03:16:25 +0200, Antonio Quartulli wrote:
> Byte/packet counters for in-tunnel and transport streams
> are now initialized and updated as needed.
> 
> To be exported via netlink.
> 
> Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
> ---
>  drivers/net/ovpn/Makefile |  1 +
>  drivers/net/ovpn/io.c     | 10 ++++++++
>  drivers/net/ovpn/peer.c   |  3 +++
>  drivers/net/ovpn/peer.h   | 13 +++++++---
>  drivers/net/ovpn/stats.c  | 21 ++++++++++++++++
>  drivers/net/ovpn/stats.h  | 52 +++++++++++++++++++++++++++++++++++++++

What I'm seeing in this patch are "success" counters. I don't see any
stats for dropped packets that would help the user figure out why
their VPN isn't working, or why their CPU is burning up decrypting
packets that don't show up on the host, etc. You can guess there are
issues by subtracting the link and vpn stats, but that's very limited.

For example:
 - counter for packets dropped during the udp encap/decap
 - counter for failed encrypt/decrypt (especially failed decrypt)
 - counter for replay protection failures
 - counter for malformed packets

Maybe not a separate counter for each of the prints you added in the
rx/tx code, but at least enough of them to start figuring out what's
going on without enabling all the prints and parsing dmesg.


> diff --git a/drivers/net/ovpn/peer.h b/drivers/net/ovpn/peer.h
> index da41d711745c..b5ff59a4b40f 100644
> --- a/drivers/net/ovpn/peer.h
> +++ b/drivers/net/ovpn/peer.h
> @@ -10,14 +10,15 @@
>  #ifndef _NET_OVPN_OVPNPEER_H_
>  #define _NET_OVPN_OVPNPEER_H_
>  
> +#include <linux/ptr_ring.h>
> +#include <net/dst_cache.h>
> +#include <uapi/linux/ovpn.h>
> +
>  #include "bind.h"
>  #include "pktid.h"
>  #include "crypto.h"
>  #include "socket.h"
> -
> -#include <linux/ptr_ring.h>
> -#include <net/dst_cache.h>
> -#include <uapi/linux/ovpn.h>
> +#include "stats.h"

Header reshuffling got squashed into the wrong patch?


> diff --git a/drivers/net/ovpn/stats.h b/drivers/net/ovpn/stats.h
> new file mode 100644
> index 000000000000..5134e49c0458
> --- /dev/null
> +++ b/drivers/net/ovpn/stats.h
> @@ -0,0 +1,52 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*  OpenVPN data channel offload
> + *
> + *  Copyright (C) 2020-2024 OpenVPN, Inc.
> + *
> + *  Author:	James Yonan <james@openvpn.net>
> + *		Antonio Quartulli <antonio@openvpn.net>
> + *		Lev Stipakov <lev@openvpn.net>
> + */
> +
> +#ifndef _NET_OVPN_OVPNSTATS_H_
> +#define _NET_OVPN_OVPNSTATS_H_
> +
> +//#include <linux/atomic.h>
> +//#include <linux/jiffies.h>

Forgot a clean up before posting? :)

-- 
Sabrina


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 09/24] ovpn: implement basic TX path (UDP)
  2024-05-06  1:16 ` [PATCH net-next v3 09/24] ovpn: implement basic TX path (UDP) Antonio Quartulli
  2024-05-10 13:01   ` Sabrina Dubroca
@ 2024-05-12 21:35   ` Sabrina Dubroca
  2024-05-13  7:37     ` Antonio Quartulli
  1 sibling, 1 reply; 111+ messages in thread
From: Sabrina Dubroca @ 2024-05-12 21:35 UTC (permalink / raw
  To: Antonio Quartulli
  Cc: netdev, Jakub Kicinski, Sergey Ryazanov, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Esben Haabendal

2024-05-06, 03:16:22 +0200, Antonio Quartulli wrote:
> +/* send skb to connected peer, if any */
> +static void ovpn_queue_skb(struct ovpn_struct *ovpn, struct sk_buff *skb,
> +			   struct ovpn_peer *peer)
> +{
> +	int ret;
> +
> +	if (likely(!peer))
> +		/* retrieve peer serving the destination IP of this packet */
> +		peer = ovpn_peer_get_by_dst(ovpn, skb);
> +	if (unlikely(!peer)) {
> +		net_dbg_ratelimited("%s: no peer to send data to\n",
> +				    ovpn->dev->name);
> +		goto drop;
> +	}
> +
> +	ret = ptr_ring_produce_bh(&peer->tx_ring, skb);
> +	if (unlikely(ret < 0)) {
> +		net_err_ratelimited("%s: cannot queue packet to TX ring\n",
> +				    peer->ovpn->dev->name);
> +		goto drop;
> +	}
> +
> +	if (!queue_work(ovpn->crypto_wq, &peer->encrypt_work))
> +		ovpn_peer_put(peer);

I wanted to come back to this after going through the crypto patch,
because this felt like a strange construct when I first looked at this
patch.

Why are you using a workqueue here? Based on the kdoc for crypto_wq
("used to schedule crypto work that may sleep during TX/RX"), it's to
deal with async crypto.

If so, why not use the more standard way of dealing with async crypto
in contexts that cannot sleep, ie letting the crypto core call the
"done" callback asynchronously? You need to do all the proper refcount
handling, but IMO it's cleaner and simpler than this workqueue and
ptr_ring. You can see an example of that in macsec (macsec_encrypt_*
in drivers/net/macsec.c).

-- 
Sabrina

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 11/24] ovpn: implement packet processing
  2024-05-12  8:46   ` Sabrina Dubroca
@ 2024-05-13  7:14     ` Antonio Quartulli
  2024-05-13  9:24       ` Sabrina Dubroca
  2024-05-22 14:08     ` Antonio Quartulli
  1 sibling, 1 reply; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-13  7:14 UTC (permalink / raw
  To: Sabrina Dubroca
  Cc: netdev, Jakub Kicinski, Sergey Ryazanov, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Esben Haabendal

On 12/05/2024 10:46, Sabrina Dubroca wrote:
> 2024-05-06, 03:16:24 +0200, Antonio Quartulli wrote:
>> diff --git a/drivers/net/ovpn/bind.c b/drivers/net/ovpn/bind.c
>> index c1f842c06e32..7240d1036fb7 100644
>> --- a/drivers/net/ovpn/bind.c
>> +++ b/drivers/net/ovpn/bind.c
>> @@ -13,6 +13,7 @@
>>   #include "ovpnstruct.h"
>>   #include "io.h"
>>   #include "bind.h"
>> +#include "packet.h"
>>   #include "peer.h"
> 
> You have a few hunks like that in this patch, adding an include to a
> file that is otherwise not being modified. That's odd.

Argh. The whole ovpn was originall a single patch, which I the went and 
divided in smaller changes for easier review.

As you may imagine this process is prone to mistakes like this, 
expecially when the number of patches is quite high...

I will go through all the patches and clean them up from issues like 
this and like the one below..

Sorry about that.

> 
>> diff --git a/drivers/net/ovpn/crypto.c b/drivers/net/ovpn/crypto.c
>> new file mode 100644
>> index 000000000000..98ef1ceb75e0
>> --- /dev/null
>> +++ b/drivers/net/ovpn/crypto.c
>> @@ -0,0 +1,162 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*  OpenVPN data channel offload
>> + *
>> + *  Copyright (C) 2020-2024 OpenVPN, Inc.
>> + *
>> + *  Author:	James Yonan <james@openvpn.net>
>> + *		Antonio Quartulli <antonio@openvpn.net>
>> + */
>> +
>> +#include <linux/types.h>
>> +#include <linux/net.h>
>> +#include <linux/netdevice.h>
>> +//#include <linux/skbuff.h>
> 
> That's also odd :)
> 
> 
> [...]
>> +/* Reset the ovpn_crypto_state object in a way that is atomic
>> + * to RCU readers.
>> + */
>> +int ovpn_crypto_state_reset(struct ovpn_crypto_state *cs,
>> +			    const struct ovpn_peer_key_reset *pkr)
>> +	__must_hold(cs->mutex)
>> +{
>> +	struct ovpn_crypto_key_slot *old = NULL;
>> +	struct ovpn_crypto_key_slot *new;
>> +
>> +	lockdep_assert_held(&cs->mutex);
>> +
>> +	new = ovpn_aead_crypto_key_slot_new(&pkr->key);
> 
> This doesn't need the lock to be held, you could move the lock to a
> smaller section (only around the pointer swap).

I think you're right. I also like the idea of shrinking the lock active 
area.. Will fix this!


> 
>> +	if (IS_ERR(new))
>> +		return PTR_ERR(new);
>> +
>> +	switch (pkr->slot) {
>> +	case OVPN_KEY_SLOT_PRIMARY:
>> +		old = rcu_replace_pointer(cs->primary, new,
>> +					  lockdep_is_held(&cs->mutex));
>> +		break;
>> +	case OVPN_KEY_SLOT_SECONDARY:
>> +		old = rcu_replace_pointer(cs->secondary, new,
>> +					  lockdep_is_held(&cs->mutex));
>> +		break;
>> +	default:
>> +		goto free_key;
> 
> And validating pkr->slot before alloc could avoid a pointless
> alloc/free (and simplify the code: once _new() has succeeded, no
> failure can occur anymore).

right! will fix

> 
>> +	}
>> +
>> +	if (old)
>> +		ovpn_crypto_key_slot_put(old);
>> +
>> +	return 0;
>> +free_key:
>> +	ovpn_crypto_key_slot_put(new);
>> +	return -EINVAL;
>> +}
>> +
>> +void ovpn_crypto_key_slot_delete(struct ovpn_crypto_state *cs,
>> +				 enum ovpn_key_slot slot)
>> +{
>> +	struct ovpn_crypto_key_slot *ks = NULL;
>> +
>> +	mutex_lock(&cs->mutex);
>> +	switch (slot) {
>> +	case OVPN_KEY_SLOT_PRIMARY:
>> +		ks = rcu_replace_pointer(cs->primary, NULL,
>> +					 lockdep_is_held(&cs->mutex));
>> +		break;
>> +	case OVPN_KEY_SLOT_SECONDARY:
>> +		ks = rcu_replace_pointer(cs->secondary, NULL,
>> +					 lockdep_is_held(&cs->mutex));
>> +		break;
>> +	default:
>> +		pr_warn("Invalid slot to release: %u\n", slot);
>> +		break;
>> +	}
>> +	mutex_unlock(&cs->mutex);
>> +
>> +	if (!ks) {
>> +		pr_debug("Key slot already released: %u\n", slot);
> 
> This will also be printed in case of an invalid argument, which would
> be mildly confusing.

although we will have the pr_warn printed in as well that case.
But I agree this is not nice. will fix

> 
>> +		return;
>> +	}
>> +	pr_debug("deleting key slot %u, key_id=%u\n", slot, ks->key_id);
>> +
>> +	ovpn_crypto_key_slot_put(ks);
>> +}
> 
> 
>> +static struct ovpn_crypto_key_slot *
>> +ovpn_aead_crypto_key_slot_init(enum ovpn_cipher_alg alg,
>> +			       const unsigned char *encrypt_key,
>> +			       unsigned int encrypt_keylen,
>> +			       const unsigned char *decrypt_key,
>> +			       unsigned int decrypt_keylen,
>> +			       const unsigned char *encrypt_nonce_tail,
>> +			       unsigned int encrypt_nonce_tail_len,
>> +			       const unsigned char *decrypt_nonce_tail,
>> +			       unsigned int decrypt_nonce_tail_len,
>> +			       u16 key_id)
>> +{
> [...]
>> +
>> +	if (sizeof(struct ovpn_nonce_tail) != encrypt_nonce_tail_len ||
>> +	    sizeof(struct ovpn_nonce_tail) != decrypt_nonce_tail_len) {
>> +		ret = -EINVAL;
>> +		goto destroy_ks;
>> +	}
> 
> Those checks could be done earlier, before bothering with any
> allocations.

ACK

> 
>> +
>> +	memcpy(ks->nonce_tail_xmit.u8, encrypt_nonce_tail,
>> +	       sizeof(struct ovpn_nonce_tail));
>> +	memcpy(ks->nonce_tail_recv.u8, decrypt_nonce_tail,
>> +	       sizeof(struct ovpn_nonce_tail));
>> +
>> +	/* init packet ID generation/validation */
>> +	ovpn_pktid_xmit_init(&ks->pid_xmit);
>> +	ovpn_pktid_recv_init(&ks->pid_recv);
>> +
>> +	return ks;
>> +
>> +destroy_ks:
>> +	ovpn_aead_crypto_key_slot_destroy(ks);
>> +	return ERR_PTR(ret);
>> +}
>> +
>> +struct ovpn_crypto_key_slot *
>> +ovpn_aead_crypto_key_slot_new(const struct ovpn_key_config *kc)
>> +{
>> +	return ovpn_aead_crypto_key_slot_init(kc->cipher_alg,
>> +					      kc->encrypt.cipher_key,
>> +					      kc->encrypt.cipher_key_size,
>> +					      kc->decrypt.cipher_key,
>> +					      kc->decrypt.cipher_key_size,
>> +					      kc->encrypt.nonce_tail,
>> +					      kc->encrypt.nonce_tail_size,
>> +					      kc->decrypt.nonce_tail,
>> +					      kc->decrypt.nonce_tail_size,
>> +					      kc->key_id);
>> +}
> 
> Why the wrapper? You could just call ovpn_aead_crypto_key_slot_init
> directly.

Mostly for ahestetic reasons, being the call very large.
On top of that this is a little leftover from a previous version where 
this call happened more than once as part of an internal abstraction, 
hence the decision to create a wrapper.

But I think it's ok to remove it now.

> 
>> diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c
>> index 9935a863bffe..66a4c551c191 100644
>> --- a/drivers/net/ovpn/io.c
>> +++ b/drivers/net/ovpn/io.c
>> @@ -110,6 +114,27 @@ int ovpn_napi_poll(struct napi_struct *napi, int budget)
>>   	return work_done;
>>   }
>>   
>> +/* Return IP protocol version from skb header.
>> + * Return 0 if protocol is not IPv4/IPv6 or cannot be read.
>> + */
>> +static __be16 ovpn_ip_check_protocol(struct sk_buff *skb)
> 
> nit: if you put this function higher up in the patch that introduced
> it, you wouldn't have to move it now

ACK

> 
>> +{
>> +	__be16 proto = 0;
>> +
>> +	/* skb could be non-linear, make sure IP header is in non-fragmented
>> +	 * part
>> +	 */
>> +	if (!pskb_network_may_pull(skb, sizeof(struct iphdr)))
>> +		return 0;
>> +
>> +	if (ip_hdr(skb)->version == 4)
>> +		proto = htons(ETH_P_IP);
>> +	else if (ip_hdr(skb)->version == 6)
>> +		proto = htons(ETH_P_IPV6);
>> +
>> +	return proto;
>> +}
>> +
>>   /* Entry point for processing an incoming packet (in skb form)
>>    *
>>    * Enqueue the packet and schedule RX consumer.
>> @@ -132,7 +157,81 @@ int ovpn_recv(struct ovpn_struct *ovpn, struct ovpn_peer *peer,
>>   
>>   static int ovpn_decrypt_one(struct ovpn_peer *peer, struct sk_buff *skb)
>>   {
>> -	return true;
> 
> I missed that in the RX patch, true isn't an int :)
> Were you intending this function to be bool like ovpn_encrypt_one?
> Since you're not actually using the returned value in the caller, it
> would be reasonable, but you'd have to convert all the <0 error values
> to bool.

Mhh let me think what's best and I wil make this uniform.

> 
>> +	struct ovpn_peer *allowed_peer = NULL;
>> +	struct ovpn_crypto_key_slot *ks;
>> +	__be16 proto;
>> +	int ret = -1;
>> +	u8 key_id;
>> +
>> +	/* get the key slot matching the key Id in the received packet */
>> +	key_id = ovpn_key_id_from_skb(skb);
>> +	ks = ovpn_crypto_key_id_to_slot(&peer->crypto, key_id);
>> +	if (unlikely(!ks)) {
>> +		net_info_ratelimited("%s: no available key for peer %u, key-id: %u\n",
>> +				     peer->ovpn->dev->name, peer->id, key_id);
>> +		goto drop;
>> +	}
>> +
>> +	/* decrypt */
>> +	ret = ovpn_aead_decrypt(ks, skb);
>> +
>> +	ovpn_crypto_key_slot_put(ks);
>> +
>> +	if (unlikely(ret < 0)) {
>> +		net_err_ratelimited("%s: error during decryption for peer %u, key-id %u: %d\n",
>> +				    peer->ovpn->dev->name, peer->id, key_id,
>> +				    ret);
>> +		goto drop;
>> +	}
>> +
>> +	/* check if this is a valid datapacket that has to be delivered to the
>> +	 * tun interface
> 
> s/tun/ovpn/ ?

yap. we used to call "tun" any interface used by openvpn...just legacy 
that fills our brains :-) will fix

> 
>> +	 */
>> +	skb_reset_network_header(skb);
>> +	proto = ovpn_ip_check_protocol(skb);
>> +	if (unlikely(!proto)) {
>> +		/* check if null packet */
>> +		if (unlikely(!pskb_may_pull(skb, 1))) {
>> +			netdev_dbg(peer->ovpn->dev,
>> +				   "NULL packet received from peer %u\n",
>> +				   peer->id);
>> +			ret = -EINVAL;
>> +			goto drop;
>> +		}
>> +
>> +		netdev_dbg(peer->ovpn->dev,
>> +			   "unsupported protocol received from peer %u\n",
>> +			   peer->id);
>> +
>> +		ret = -EPROTONOSUPPORT;
>> +		goto drop;
>> +	}
>> +	skb->protocol = proto;
>> +
>> +	/* perform Reverse Path Filtering (RPF) */
>> +	allowed_peer = ovpn_peer_get_by_src(peer->ovpn, skb);
>> +	if (unlikely(allowed_peer != peer)) {
>> +		if (skb_protocol_to_family(skb) == AF_INET6)
>> +			net_warn_ratelimited("%s: RPF dropped packet from peer %u, src: %pI6c\n",
>> +					     peer->ovpn->dev->name, peer->id,
>> +					     &ipv6_hdr(skb)->saddr);
>> +		else
>> +			net_warn_ratelimited("%s: RPF dropped packet from peer %u, src: %pI4\n",
>> +					     peer->ovpn->dev->name, peer->id,
>> +					     &ip_hdr(skb)->saddr);
>> +		ret = -EPERM;
>> +		goto drop;
>> +	}
> 
> Have you considered holding rcu_read_lock around this whole RPF check?
> It would avoid taking a reference on the peer just to release it 3
> lines later. And the same could likely be done for some of the other
> ovpn_peer_get_* lookups too.

thinking about this..you're right, because the peer object never leavs 
this context and therefore it is not stricly needed to hold the 
reference and do the full dance..

Sometimes I fear that I may envelope too many instructions within the 
rcu_read_lock and thus I go with the smallest area needed (a bit like a 
classic lock). But I agree that for these lookups this is not truly the 
case.

Will review the other lookups and change them accordingly.

Thanks!

> 
> 
>> +	ret = ptr_ring_produce_bh(&peer->netif_rx_ring, skb);
>> +drop:
>> +	if (likely(allowed_peer))
>> +		ovpn_peer_put(allowed_peer);
>> +
>> +	if (unlikely(ret < 0))
>> +		kfree_skb(skb);
>> +
>> +	return ret;
> 
> Mixing the drop/success returns looks kind of strange. This would be a
> bit simpler:
> 
> ovpn_peer_put(allowed_peer);
> return ptr_ring_produce_bh(&peer->netif_rx_ring, skb);
> 
> drop:
> if (allowed_peer)
>      ovpn_peer_put(allowed_peer);
> kfree_skb(skb);
> return ret;

Honestly I have seen this pattern fairly often (and implemented it this 
way fairly often).

I presume it is mostly a matter of taste.

The idea is: when exiting a function 90% of the code is shared between 
success and failure, therefore let's just write it once and simply add a 
few branches based on ret.

This way we have less code and if we need to chang somethig in the exit 
path, we can change it once only.

A few examples:
* 
https://elixir.bootlin.com/linux/v6.9-rc7/source/net/batman-adv/translation-table.c#L813
* 
https://elixir.bootlin.com/linux/v6.9-rc7/source/net/batman-adv/routing.c#L269
* https://elixir.bootlin.com/linux/v6.9-rc7/source/net/mac80211/scan.c#L1344


ovpn code can be further simplified by setting skb to NULL in case of 
success (this way we avoid checking ret) and let ovpn_peer_put handle 
the case of peer == NULL (we avoid the NULL check before calling it).

What do you think?


> 
> 
>> diff --git a/drivers/net/ovpn/packet.h b/drivers/net/ovpn/packet.h
>> index 7ed146f5932a..e14c9bf464f7 100644
>> --- a/drivers/net/ovpn/packet.h
>> +++ b/drivers/net/ovpn/packet.h
>> @@ -10,7 +10,7 @@
>>   #ifndef _NET_OVPN_PACKET_H_
>>   #define _NET_OVPN_PACKET_H_
>>   
>> -/* When the OpenVPN protocol is ran in AEAD mode, use
>> +/* When the OpenVPN protocol is run in AEAD mode, use
> 
> nit: that typo came in earlier

ops


Thanks!

> 

-- 
Antonio Quartulli
OpenVPN Inc.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 12/24] ovpn: store tunnel and transport statistics
  2024-05-12  8:47   ` Sabrina Dubroca
@ 2024-05-13  7:25     ` Antonio Quartulli
  2024-05-13  9:19       ` Sabrina Dubroca
  0 siblings, 1 reply; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-13  7:25 UTC (permalink / raw
  To: Sabrina Dubroca
  Cc: netdev, Jakub Kicinski, Sergey Ryazanov, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Esben Haabendal

On 12/05/2024 10:47, Sabrina Dubroca wrote:
> 2024-05-06, 03:16:25 +0200, Antonio Quartulli wrote:
>> Byte/packet counters for in-tunnel and transport streams
>> are now initialized and updated as needed.
>>
>> To be exported via netlink.
>>
>> Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
>> ---
>>   drivers/net/ovpn/Makefile |  1 +
>>   drivers/net/ovpn/io.c     | 10 ++++++++
>>   drivers/net/ovpn/peer.c   |  3 +++
>>   drivers/net/ovpn/peer.h   | 13 +++++++---
>>   drivers/net/ovpn/stats.c  | 21 ++++++++++++++++
>>   drivers/net/ovpn/stats.h  | 52 +++++++++++++++++++++++++++++++++++++++
> 
> What I'm seeing in this patch are "success" counters. I don't see any
> stats for dropped packets that would help the user figure out why
> their VPN isn't working, or why their CPU is burning up decrypting
> packets that don't show up on the host, etc. You can guess there are
> issues by subtracting the link and vpn stats, but that's very limited.

This stats are just the bare minimum to make our current userspace happy :-)

But we can always extend the stats reporting later on, no?

> 
> For example:
>   - counter for packets dropped during the udp encap/decap
>   - counter for failed encrypt/decrypt (especially failed decrypt)
>   - counter for replay protection failures
>   - counter for malformed packets
> 
> Maybe not a separate counter for each of the prints you added in the
> rx/tx code, but at least enough of them to start figuring out what's
> going on without enabling all the prints and parsing dmesg.

Definitely a good suggestion! I'd just postpone it for later, unless you 
think it's a blocker.

> 
> 
>> diff --git a/drivers/net/ovpn/peer.h b/drivers/net/ovpn/peer.h
>> index da41d711745c..b5ff59a4b40f 100644
>> --- a/drivers/net/ovpn/peer.h
>> +++ b/drivers/net/ovpn/peer.h
>> @@ -10,14 +10,15 @@
>>   #ifndef _NET_OVPN_OVPNPEER_H_
>>   #define _NET_OVPN_OVPNPEER_H_
>>   
>> +#include <linux/ptr_ring.h>
>> +#include <net/dst_cache.h>
>> +#include <uapi/linux/ovpn.h>
>> +
>>   #include "bind.h"
>>   #include "pktid.h"
>>   #include "crypto.h"
>>   #include "socket.h"
>> -
>> -#include <linux/ptr_ring.h>
>> -#include <net/dst_cache.h>
>> -#include <uapi/linux/ovpn.h>
>> +#include "stats.h"
> 
> Header reshuffling got squashed into the wrong patch?

indeed, darn. Juggling this many patches has been quite tedious

> 
> 
>> diff --git a/drivers/net/ovpn/stats.h b/drivers/net/ovpn/stats.h
>> new file mode 100644
>> index 000000000000..5134e49c0458
>> --- /dev/null
>> +++ b/drivers/net/ovpn/stats.h
>> @@ -0,0 +1,52 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only */
>> +/*  OpenVPN data channel offload
>> + *
>> + *  Copyright (C) 2020-2024 OpenVPN, Inc.
>> + *
>> + *  Author:	James Yonan <james@openvpn.net>
>> + *		Antonio Quartulli <antonio@openvpn.net>
>> + *		Lev Stipakov <lev@openvpn.net>
>> + */
>> +
>> +#ifndef _NET_OVPN_OVPNSTATS_H_
>> +#define _NET_OVPN_OVPNSTATS_H_
>> +
>> +//#include <linux/atomic.h>
>> +//#include <linux/jiffies.h>
> 
> Forgot a clean up before posting? :)

Yeah..I guess I'll write a small script to catch all these things..it's 
easy to lose them across the whole patchset.

Thanks for spotting them! I will make sure they all go away

> 

-- 
Antonio Quartulli
OpenVPN Inc.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 09/24] ovpn: implement basic TX path (UDP)
  2024-05-12 21:35   ` Sabrina Dubroca
@ 2024-05-13  7:37     ` Antonio Quartulli
  2024-05-13  9:36       ` Sabrina Dubroca
  0 siblings, 1 reply; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-13  7:37 UTC (permalink / raw
  To: Sabrina Dubroca
  Cc: netdev, Jakub Kicinski, Sergey Ryazanov, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Esben Haabendal

On 12/05/2024 23:35, Sabrina Dubroca wrote:
> 2024-05-06, 03:16:22 +0200, Antonio Quartulli wrote:
>> +/* send skb to connected peer, if any */
>> +static void ovpn_queue_skb(struct ovpn_struct *ovpn, struct sk_buff *skb,
>> +			   struct ovpn_peer *peer)
>> +{
>> +	int ret;
>> +
>> +	if (likely(!peer))
>> +		/* retrieve peer serving the destination IP of this packet */
>> +		peer = ovpn_peer_get_by_dst(ovpn, skb);
>> +	if (unlikely(!peer)) {
>> +		net_dbg_ratelimited("%s: no peer to send data to\n",
>> +				    ovpn->dev->name);
>> +		goto drop;
>> +	}
>> +
>> +	ret = ptr_ring_produce_bh(&peer->tx_ring, skb);
>> +	if (unlikely(ret < 0)) {
>> +		net_err_ratelimited("%s: cannot queue packet to TX ring\n",
>> +				    peer->ovpn->dev->name);
>> +		goto drop;
>> +	}
>> +
>> +	if (!queue_work(ovpn->crypto_wq, &peer->encrypt_work))
>> +		ovpn_peer_put(peer);
> 
> I wanted to come back to this after going through the crypto patch,
> because this felt like a strange construct when I first looked at this
> patch.
> 
> Why are you using a workqueue here? Based on the kdoc for crypto_wq
> ("used to schedule crypto work that may sleep during TX/RX"), it's to
> deal with async crypto.
> 
> If so, why not use the more standard way of dealing with async crypto
> in contexts that cannot sleep, ie letting the crypto core call the
> "done" callback asynchronously? You need to do all the proper refcount
> handling, but IMO it's cleaner and simpler than this workqueue and
> ptr_ring. You can see an example of that in macsec (macsec_encrypt_*
> in drivers/net/macsec.c).

Aha! You don't know how happy I was when I found the doc describing how 
to convert the async code into sync-looking :-) With the detail that I 
had to move to a different context, as the code may want to sleep (hence 
the introduction of the workqueue).

It looks like I am little fan of WQs, while you are telling me to avoid 
them if possible.

I presume that using WQs comes with a non-negligible cost, therefore if 
we can just get things done without having to use them, then I should 
just don't.

I think I could go back to no-workqueue encrypt/decrypt.
Do you think this may have any impact on any future multi-core 
optimization? Back then I also thought that going through workers may 
make improvements in this area easier. But I could just be wrong.

Regards,


-- 
Antonio Quartulli
OpenVPN Inc.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 12/24] ovpn: store tunnel and transport statistics
  2024-05-13  7:25     ` Antonio Quartulli
@ 2024-05-13  9:19       ` Sabrina Dubroca
  2024-05-13  9:33         ` Antonio Quartulli
  0 siblings, 1 reply; 111+ messages in thread
From: Sabrina Dubroca @ 2024-05-13  9:19 UTC (permalink / raw
  To: Antonio Quartulli
  Cc: netdev, Jakub Kicinski, Sergey Ryazanov, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Esben Haabendal

2024-05-13, 09:25:29 +0200, Antonio Quartulli wrote:
> On 12/05/2024 10:47, Sabrina Dubroca wrote:
> > 2024-05-06, 03:16:25 +0200, Antonio Quartulli wrote:
> > > Byte/packet counters for in-tunnel and transport streams
> > > are now initialized and updated as needed.
> > > 
> > > To be exported via netlink.
> > > 
> > > Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
> > > ---
> > >   drivers/net/ovpn/Makefile |  1 +
> > >   drivers/net/ovpn/io.c     | 10 ++++++++
> > >   drivers/net/ovpn/peer.c   |  3 +++
> > >   drivers/net/ovpn/peer.h   | 13 +++++++---
> > >   drivers/net/ovpn/stats.c  | 21 ++++++++++++++++
> > >   drivers/net/ovpn/stats.h  | 52 +++++++++++++++++++++++++++++++++++++++
> > 
> > What I'm seeing in this patch are "success" counters. I don't see any
> > stats for dropped packets that would help the user figure out why
> > their VPN isn't working, or why their CPU is burning up decrypting
> > packets that don't show up on the host, etc. You can guess there are
> > issues by subtracting the link and vpn stats, but that's very limited.
> 
> This stats are just the bare minimum to make our current userspace happy :-)
> 
> But we can always extend the stats reporting later on, no?
> 
> > 
> > For example:
> >   - counter for packets dropped during the udp encap/decap
> >   - counter for failed encrypt/decrypt (especially failed decrypt)
> >   - counter for replay protection failures
> >   - counter for malformed packets
> > 
> > Maybe not a separate counter for each of the prints you added in the
> > rx/tx code, but at least enough of them to start figuring out what's
> > going on without enabling all the prints and parsing dmesg.
> 
> Definitely a good suggestion! I'd just postpone it for later, unless you
> think it's a blocker.

I'm not sure. It's not strictly necessary to make the driver work, but
from a user/admin's point of view, I think counters would be really
useful.

Maybe at least increment the rx_dropped/rx_errors/etc counters from
rtnl_link_stats on the netdevice?


> indeed, darn. Juggling this many patches has been quite tedious
>
>
> 
> Yeah..I guess I'll write a small script to catch all these things..it's easy
> to lose them across the whole patchset.
> 
> Thanks for spotting them! I will make sure they all go away

Thanks. I know it's painful :(

-- 
Sabrina


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 11/24] ovpn: implement packet processing
  2024-05-13  7:14     ` Antonio Quartulli
@ 2024-05-13  9:24       ` Sabrina Dubroca
  2024-05-13  9:31         ` Antonio Quartulli
  0 siblings, 1 reply; 111+ messages in thread
From: Sabrina Dubroca @ 2024-05-13  9:24 UTC (permalink / raw
  To: Antonio Quartulli
  Cc: netdev, Jakub Kicinski, Sergey Ryazanov, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Esben Haabendal

2024-05-13, 09:14:39 +0200, Antonio Quartulli wrote:
> On 12/05/2024 10:46, Sabrina Dubroca wrote:
> > 2024-05-06, 03:16:24 +0200, Antonio Quartulli wrote:
> > > diff --git a/drivers/net/ovpn/bind.c b/drivers/net/ovpn/bind.c
> > > index c1f842c06e32..7240d1036fb7 100644
> > > --- a/drivers/net/ovpn/bind.c
> > > +++ b/drivers/net/ovpn/bind.c
> > > @@ -13,6 +13,7 @@
> > >   #include "ovpnstruct.h"
> > >   #include "io.h"
> > >   #include "bind.h"
> > > +#include "packet.h"
> > >   #include "peer.h"
> > 
> > You have a few hunks like that in this patch, adding an include to a
> > file that is otherwise not being modified. That's odd.
> 
> Argh. The whole ovpn was originall a single patch, which I the went and
> divided in smaller changes for easier review.
> 
> As you may imagine this process is prone to mistakes like this, expecially
> when the number of patches is quite high...
> 
> I will go through all the patches and clean them up from issues like this
> and like the one below..
> 
> Sorry about that.

Yep, I understand.

> > > +struct ovpn_crypto_key_slot *
> > > +ovpn_aead_crypto_key_slot_new(const struct ovpn_key_config *kc)
> > > +{
> > > +	return ovpn_aead_crypto_key_slot_init(kc->cipher_alg,
> > > +					      kc->encrypt.cipher_key,
> > > +					      kc->encrypt.cipher_key_size,
> > > +					      kc->decrypt.cipher_key,
> > > +					      kc->decrypt.cipher_key_size,
> > > +					      kc->encrypt.nonce_tail,
> > > +					      kc->encrypt.nonce_tail_size,
> > > +					      kc->decrypt.nonce_tail,
> > > +					      kc->decrypt.nonce_tail_size,
> > > +					      kc->key_id);
> > > +}
> > 
> > Why the wrapper? You could just call ovpn_aead_crypto_key_slot_init
> > directly.
> 
> Mostly for ahestetic reasons, being the call very large.

But that wrapper doesn't really do anything.

In case my previous comment wasn't clear: I would keep the single
argument at the callsite (whether it's called _new or _init), and kill
the 10-args variant (it's too verbose and _very_ easy to mess up).


> > > @@ -132,7 +157,81 @@ int ovpn_recv(struct ovpn_struct *ovpn, struct ovpn_peer *peer,
> > >   static int ovpn_decrypt_one(struct ovpn_peer *peer, struct sk_buff *skb)
> > >   {
> > > -	return true;
> > 
> > I missed that in the RX patch, true isn't an int :)
> > Were you intending this function to be bool like ovpn_encrypt_one?
> > Since you're not actually using the returned value in the caller, it
> > would be reasonable, but you'd have to convert all the <0 error values
> > to bool.
> 
> Mhh let me think what's best and I wil make this uniform.

Yes please. If you can make the returns consistent (on success, one
returns true and the other returns 0), it would be nice.


> > > +	ret = ptr_ring_produce_bh(&peer->netif_rx_ring, skb);
> > > +drop:
> > > +	if (likely(allowed_peer))
> > > +		ovpn_peer_put(allowed_peer);
> > > +
> > > +	if (unlikely(ret < 0))
> > > +		kfree_skb(skb);
> > > +
> > > +	return ret;
> > 
> > Mixing the drop/success returns looks kind of strange. This would be a
> > bit simpler:
> > 
> > ovpn_peer_put(allowed_peer);
> > return ptr_ring_produce_bh(&peer->netif_rx_ring, skb);
> > 
> > drop:
> > if (allowed_peer)
> >      ovpn_peer_put(allowed_peer);
> > kfree_skb(skb);
> > return ret;

Scratch that, it's broken (we'd leak the skb if ptr_ring_produce_bh
fails). Let's keep your version.

> Honestly I have seen this pattern fairly often (and implemented it this way
> fairly often).
> 
> I presume it is mostly a matter of taste.

Maybe. As a reader I find it confusing to land into the "drop" label
on success and conditionally free the skb.

> The idea is: when exiting a function 90% of the code is shared between
> success and failure, therefore let's just write it once and simply add a few
> branches based on ret.

If it's 90%, yes. Here, it looked like very little common code.

> This way we have less code and if we need to chang somethig in the exit
> path, we can change it once only.
> 
> A few examples:
> * https://elixir.bootlin.com/linux/v6.9-rc7/source/net/batman-adv/translation-table.c#L813
> * https://elixir.bootlin.com/linux/v6.9-rc7/source/net/batman-adv/routing.c#L269
> * https://elixir.bootlin.com/linux/v6.9-rc7/source/net/mac80211/scan.c#L1344
> 
> 
> ovpn code can be further simplified by setting skb to NULL in case of
> success (this way we avoid checking ret) and let ovpn_peer_put handle the
> case of peer == NULL (we avoid the NULL check before calling it).

That won't be needed if you don't take a reference. Anyway,
netif_rx_ring will be gone if you switch to gro_cells, so that code is
likely to change.

-- 
Sabrina


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 11/24] ovpn: implement packet processing
  2024-05-13  9:24       ` Sabrina Dubroca
@ 2024-05-13  9:31         ` Antonio Quartulli
  0 siblings, 0 replies; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-13  9:31 UTC (permalink / raw
  To: Sabrina Dubroca
  Cc: netdev, Jakub Kicinski, Sergey Ryazanov, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Esben Haabendal

On 13/05/2024 11:24, Sabrina Dubroca wrote:
>>>> +struct ovpn_crypto_key_slot *
>>>> +ovpn_aead_crypto_key_slot_new(const struct ovpn_key_config *kc)
>>>> +{
>>>> +	return ovpn_aead_crypto_key_slot_init(kc->cipher_alg,
>>>> +					      kc->encrypt.cipher_key,
>>>> +					      kc->encrypt.cipher_key_size,
>>>> +					      kc->decrypt.cipher_key,
>>>> +					      kc->decrypt.cipher_key_size,
>>>> +					      kc->encrypt.nonce_tail,
>>>> +					      kc->encrypt.nonce_tail_size,
>>>> +					      kc->decrypt.nonce_tail,
>>>> +					      kc->decrypt.nonce_tail_size,
>>>> +					      kc->key_id);
>>>> +}
>>>
>>> Why the wrapper? You could just call ovpn_aead_crypto_key_slot_init
>>> directly.
>>
>> Mostly for ahestetic reasons, being the call very large.
> 
> But that wrapper doesn't really do anything.
> 
> In case my previous comment wasn't clear: I would keep the single
> argument at the callsite (whether it's called _new or _init), and kill
> the 10-args variant (it's too verbose and _very_ easy to mess up).

Oh ok, then I misunderstood your earlier comment.

Now it's clear and I totally agree. Originally there was a crypto 
abstraction in ovpn, to allow more crypto families later on.

But I deemed it being too complex and overkill.
This wrapper is a useless leftover of that approach.

Will get rid of this 10-args variant.

> 
> 
>>>> @@ -132,7 +157,81 @@ int ovpn_recv(struct ovpn_struct *ovpn, struct ovpn_peer *peer,
>>>>    static int ovpn_decrypt_one(struct ovpn_peer *peer, struct sk_buff *skb)
>>>>    {
>>>> -	return true;
>>>
>>> I missed that in the RX patch, true isn't an int :)
>>> Were you intending this function to be bool like ovpn_encrypt_one?
>>> Since you're not actually using the returned value in the caller, it
>>> would be reasonable, but you'd have to convert all the <0 error values
>>> to bool.
>>
>> Mhh let me think what's best and I wil make this uniform.
> 
> Yes please. If you can make the returns consistent (on success, one
> returns true and the other returns 0), it would be nice.

I am normally all for int, as I don't like failing with no exact code.
Will most likely go with that.

> 
> 
>>>> +	ret = ptr_ring_produce_bh(&peer->netif_rx_ring, skb);
>>>> +drop:
>>>> +	if (likely(allowed_peer))
>>>> +		ovpn_peer_put(allowed_peer);
>>>> +
>>>> +	if (unlikely(ret < 0))
>>>> +		kfree_skb(skb);
>>>> +
>>>> +	return ret;
>>>
>>> Mixing the drop/success returns looks kind of strange. This would be a
>>> bit simpler:
>>>
>>> ovpn_peer_put(allowed_peer);
>>> return ptr_ring_produce_bh(&peer->netif_rx_ring, skb);
>>>
>>> drop:
>>> if (allowed_peer)
>>>       ovpn_peer_put(allowed_peer);
>>> kfree_skb(skb);
>>> return ret;
> 
> Scratch that, it's broken (we'd leak the skb if ptr_ring_produce_bh
> fails). Let's keep your version.

Right.

> 
>> Honestly I have seen this pattern fairly often (and implemented it this way
>> fairly often).
>>
>> I presume it is mostly a matter of taste.
> 
> Maybe. As a reader I find it confusing to land into the "drop" label
> on success and conditionally free the skb.
> 
>> The idea is: when exiting a function 90% of the code is shared between
>> success and failure, therefore let's just write it once and simply add a few
>> branches based on ret.
> 
> If it's 90%, yes. Here, it looked like very little common code.
> 
>> This way we have less code and if we need to chang somethig in the exit
>> path, we can change it once only.
>>
>> A few examples:
>> * https://elixir.bootlin.com/linux/v6.9-rc7/source/net/batman-adv/translation-table.c#L813
>> * https://elixir.bootlin.com/linux/v6.9-rc7/source/net/batman-adv/routing.c#L269
>> * https://elixir.bootlin.com/linux/v6.9-rc7/source/net/mac80211/scan.c#L1344
>>
>>
>> ovpn code can be further simplified by setting skb to NULL in case of
>> success (this way we avoid checking ret) and let ovpn_peer_put handle the
>> case of peer == NULL (we avoid the NULL check before calling it).
> 
> That won't be needed if you don't take a reference. Anyway,
> netif_rx_ring will be gone if you switch to gro_cells, so that code is
> likely to change.

Yap, working on gro_cells right now!

Thanks

-- 
Antonio Quartulli
OpenVPN Inc.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 12/24] ovpn: store tunnel and transport statistics
  2024-05-13  9:19       ` Sabrina Dubroca
@ 2024-05-13  9:33         ` Antonio Quartulli
  0 siblings, 0 replies; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-13  9:33 UTC (permalink / raw
  To: Sabrina Dubroca
  Cc: netdev, Jakub Kicinski, Sergey Ryazanov, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Esben Haabendal

On 13/05/2024 11:19, Sabrina Dubroca wrote:
> 2024-05-13, 09:25:29 +0200, Antonio Quartulli wrote:
>> On 12/05/2024 10:47, Sabrina Dubroca wrote:
>>> 2024-05-06, 03:16:25 +0200, Antonio Quartulli wrote:
>>>> Byte/packet counters for in-tunnel and transport streams
>>>> are now initialized and updated as needed.
>>>>
>>>> To be exported via netlink.
>>>>
>>>> Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
>>>> ---
>>>>    drivers/net/ovpn/Makefile |  1 +
>>>>    drivers/net/ovpn/io.c     | 10 ++++++++
>>>>    drivers/net/ovpn/peer.c   |  3 +++
>>>>    drivers/net/ovpn/peer.h   | 13 +++++++---
>>>>    drivers/net/ovpn/stats.c  | 21 ++++++++++++++++
>>>>    drivers/net/ovpn/stats.h  | 52 +++++++++++++++++++++++++++++++++++++++
>>>
>>> What I'm seeing in this patch are "success" counters. I don't see any
>>> stats for dropped packets that would help the user figure out why
>>> their VPN isn't working, or why their CPU is burning up decrypting
>>> packets that don't show up on the host, etc. You can guess there are
>>> issues by subtracting the link and vpn stats, but that's very limited.
>>
>> This stats are just the bare minimum to make our current userspace happy :-)
>>
>> But we can always extend the stats reporting later on, no?
>>
>>>
>>> For example:
>>>    - counter for packets dropped during the udp encap/decap
>>>    - counter for failed encrypt/decrypt (especially failed decrypt)
>>>    - counter for replay protection failures
>>>    - counter for malformed packets
>>>
>>> Maybe not a separate counter for each of the prints you added in the
>>> rx/tx code, but at least enough of them to start figuring out what's
>>> going on without enabling all the prints and parsing dmesg.
>>
>> Definitely a good suggestion! I'd just postpone it for later, unless you
>> think it's a blocker.
> 
> I'm not sure. It's not strictly necessary to make the driver work, but
> from a user/admin's point of view, I think counters would be really
> useful.
> 
> Maybe at least increment the rx_dropped/rx_errors/etc counters from
> rtnl_link_stats on the netdevice?

Ok, will start with this and see how much work is to add the err 
counters right away.

Thanks for the hint!


-- 
Antonio Quartulli
OpenVPN Inc.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 09/24] ovpn: implement basic TX path (UDP)
  2024-05-13  7:37     ` Antonio Quartulli
@ 2024-05-13  9:36       ` Sabrina Dubroca
  2024-05-13  9:47         ` Antonio Quartulli
  0 siblings, 1 reply; 111+ messages in thread
From: Sabrina Dubroca @ 2024-05-13  9:36 UTC (permalink / raw
  To: Antonio Quartulli
  Cc: netdev, Jakub Kicinski, Sergey Ryazanov, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Esben Haabendal

2024-05-13, 09:37:06 +0200, Antonio Quartulli wrote:
> On 12/05/2024 23:35, Sabrina Dubroca wrote:
> > 2024-05-06, 03:16:22 +0200, Antonio Quartulli wrote:
> > > +/* send skb to connected peer, if any */
> > > +static void ovpn_queue_skb(struct ovpn_struct *ovpn, struct sk_buff *skb,
> > > +			   struct ovpn_peer *peer)
> > > +{
> > > +	int ret;
> > > +
> > > +	if (likely(!peer))
> > > +		/* retrieve peer serving the destination IP of this packet */
> > > +		peer = ovpn_peer_get_by_dst(ovpn, skb);
> > > +	if (unlikely(!peer)) {
> > > +		net_dbg_ratelimited("%s: no peer to send data to\n",
> > > +				    ovpn->dev->name);
> > > +		goto drop;
> > > +	}
> > > +
> > > +	ret = ptr_ring_produce_bh(&peer->tx_ring, skb);
> > > +	if (unlikely(ret < 0)) {
> > > +		net_err_ratelimited("%s: cannot queue packet to TX ring\n",
> > > +				    peer->ovpn->dev->name);
> > > +		goto drop;
> > > +	}
> > > +
> > > +	if (!queue_work(ovpn->crypto_wq, &peer->encrypt_work))
> > > +		ovpn_peer_put(peer);
> > 
> > I wanted to come back to this after going through the crypto patch,
> > because this felt like a strange construct when I first looked at this
> > patch.
> > 
> > Why are you using a workqueue here? Based on the kdoc for crypto_wq
> > ("used to schedule crypto work that may sleep during TX/RX"), it's to
> > deal with async crypto.
> > 
> > If so, why not use the more standard way of dealing with async crypto
> > in contexts that cannot sleep, ie letting the crypto core call the
> > "done" callback asynchronously? You need to do all the proper refcount
> > handling, but IMO it's cleaner and simpler than this workqueue and
> > ptr_ring. You can see an example of that in macsec (macsec_encrypt_*
> > in drivers/net/macsec.c).
> 
> Aha! You don't know how happy I was when I found the doc describing how to
> convert the async code into sync-looking :-) With the detail that I had to
> move to a different context, as the code may want to sleep (hence the
> introduction of the workqueue).
> 
> It looks like I am little fan of WQs, while you are telling me to avoid them
> if possible.

I'm mainly trying to simplify the code (get rid of some ptr_rings, get
rid of some ping-pong between functions and changes of context,
etc). And here, I'm also trying to make it look more like other
similar pieces of code, because I'm already familiar with a few kernel
implementations of protocols doing crypto (macsec, ipsec, tls).

> I presume that using WQs comes with a non-negligible cost, therefore if we
> can just get things done without having to use them, then I should just
> don't.

If you're using AESNI for your GCM implementation, the crypto API will
also be using a workqueue (see crypto/cryptd.c), but only when the
crypto can't be done immediately (ie, when the FPU is already busing).

In the case of crypto accelerators, there might be benefits from
queueing multiple requests and then letting them live their life,
instead of waiting for each request separately. I don't have access to
that HW so I cannot test this.

> I think I could go back to no-workqueue encrypt/decrypt.
> Do you think this may have any impact on any future multi-core optimization?
> Back then I also thought that going through workers may make improvements in
> this area easier. But I could just be wrong.

Without thinking about it too deeply, the workqueue looks more like a
bottleneck that a no-workqueue approach just wouldn't have. You would
probably need per-CPU WQs (probably separated for encrypt and
decrypt). cryptd_enqueue_request (crypto/cryptd.c) has an example of
that.

-- 
Sabrina


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 09/24] ovpn: implement basic TX path (UDP)
  2024-05-13  9:36       ` Sabrina Dubroca
@ 2024-05-13  9:47         ` Antonio Quartulli
  0 siblings, 0 replies; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-13  9:47 UTC (permalink / raw
  To: Sabrina Dubroca
  Cc: netdev, Jakub Kicinski, Sergey Ryazanov, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Esben Haabendal

On 13/05/2024 11:36, Sabrina Dubroca wrote:
> 2024-05-13, 09:37:06 +0200, Antonio Quartulli wrote:
>> On 12/05/2024 23:35, Sabrina Dubroca wrote:
>>> 2024-05-06, 03:16:22 +0200, Antonio Quartulli wrote:
>>>> +/* send skb to connected peer, if any */
>>>> +static void ovpn_queue_skb(struct ovpn_struct *ovpn, struct sk_buff *skb,
>>>> +			   struct ovpn_peer *peer)
>>>> +{
>>>> +	int ret;
>>>> +
>>>> +	if (likely(!peer))
>>>> +		/* retrieve peer serving the destination IP of this packet */
>>>> +		peer = ovpn_peer_get_by_dst(ovpn, skb);
>>>> +	if (unlikely(!peer)) {
>>>> +		net_dbg_ratelimited("%s: no peer to send data to\n",
>>>> +				    ovpn->dev->name);
>>>> +		goto drop;
>>>> +	}
>>>> +
>>>> +	ret = ptr_ring_produce_bh(&peer->tx_ring, skb);
>>>> +	if (unlikely(ret < 0)) {
>>>> +		net_err_ratelimited("%s: cannot queue packet to TX ring\n",
>>>> +				    peer->ovpn->dev->name);
>>>> +		goto drop;
>>>> +	}
>>>> +
>>>> +	if (!queue_work(ovpn->crypto_wq, &peer->encrypt_work))
>>>> +		ovpn_peer_put(peer);
>>>
>>> I wanted to come back to this after going through the crypto patch,
>>> because this felt like a strange construct when I first looked at this
>>> patch.
>>>
>>> Why are you using a workqueue here? Based on the kdoc for crypto_wq
>>> ("used to schedule crypto work that may sleep during TX/RX"), it's to
>>> deal with async crypto.
>>>
>>> If so, why not use the more standard way of dealing with async crypto
>>> in contexts that cannot sleep, ie letting the crypto core call the
>>> "done" callback asynchronously? You need to do all the proper refcount
>>> handling, but IMO it's cleaner and simpler than this workqueue and
>>> ptr_ring. You can see an example of that in macsec (macsec_encrypt_*
>>> in drivers/net/macsec.c).
>>
>> Aha! You don't know how happy I was when I found the doc describing how to
>> convert the async code into sync-looking :-) With the detail that I had to
>> move to a different context, as the code may want to sleep (hence the
>> introduction of the workqueue).
>>
>> It looks like I am little fan of WQs, while you are telling me to avoid them
>> if possible.
> 
> I'm mainly trying to simplify the code (get rid of some ptr_rings, get
> rid of some ping-pong between functions and changes of context,
> etc). And here, I'm also trying to make it look more like other
> similar pieces of code, because I'm already familiar with a few kernel
> implementations of protocols doing crypto (macsec, ipsec, tls).

Thanks for that, you already helped me getting rid of several constructs 
that weren't really needed.

> 
>> I presume that using WQs comes with a non-negligible cost, therefore if we
>> can just get things done without having to use them, then I should just
>> don't.
> 
> If you're using AESNI for your GCM implementation, the crypto API will
> also be using a workqueue (see crypto/cryptd.c), but only when the
> crypto can't be done immediately (ie, when the FPU is already busing).

Right.

> 
> In the case of crypto accelerators, there might be benefits from
> queueing multiple requests and then letting them live their life,
> instead of waiting for each request separately. I don't have access to
> that HW so I cannot test this.
> 
>> I think I could go back to no-workqueue encrypt/decrypt.
>> Do you think this may have any impact on any future multi-core optimization?
>> Back then I also thought that going through workers may make improvements in
>> this area easier. But I could just be wrong.
> 
> Without thinking about it too deeply, the workqueue looks more like a
> bottleneck that a no-workqueue approach just wouldn't have. You would
> probably need per-CPU WQs (probably separated for encrypt and
> decrypt). cryptd_enqueue_request (crypto/cryptd.c) has an example of
> that.

Ok. Sounds like something we can re-consider later on.

Let's start by killing the wq in the current code.

Thanks for the pointers.


-- 
Antonio Quartulli
OpenVPN Inc.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 07/24] ovpn: introduce the ovpn_peer object
  2024-05-06  1:16 ` [PATCH net-next v3 07/24] ovpn: introduce the ovpn_peer object Antonio Quartulli
  2024-05-08 16:06   ` Sabrina Dubroca
@ 2024-05-13 10:09   ` Simon Horman
  2024-05-13 10:53     ` Antonio Quartulli
  1 sibling, 1 reply; 111+ messages in thread
From: Simon Horman @ 2024-05-13 10:09 UTC (permalink / raw
  To: Antonio Quartulli
  Cc: netdev, Jakub Kicinski, Sergey Ryazanov, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Esben Haabendal

On Mon, May 06, 2024 at 03:16:20AM +0200, Antonio Quartulli wrote:
> An ovpn_peer object holds the whole status of a remote peer
> (regardless whether it is a server or a client).
> 
> This includes status for crypto, tx/rx buffers, napi, etc.
> 
> Only support for one peer is introduced (P2P mode).
> Multi peer support is introduced with a later patch.
> 
> Along with the ovpn_peer, also the ovpn_bind object is introcued
> as the two are strictly related.
> An ovpn_bind object wraps a sockaddr representing the local
> coordinates being used to talk to a specific peer.
> 
> Signed-off-by: Antonio Quartulli <antonio@openvpn.net>

...

> diff --git a/drivers/net/ovpn/ovpnstruct.h b/drivers/net/ovpn/ovpnstruct.h
> index ee05b8a2c61d..b79d4f0474b0 100644
> --- a/drivers/net/ovpn/ovpnstruct.h
> +++ b/drivers/net/ovpn/ovpnstruct.h
> @@ -17,12 +17,19 @@
>   * @dev: the actual netdev representing the tunnel
>   * @registered: whether dev is still registered with netdev or not
>   * @mode: device operation mode (i.e. p2p, mp, ..)
> + * @lock: protect this object
> + * @event_wq: used to schedule generic events that may sleep and that need to be
> + *            performed outside of softirq context

nit: events_wq

> + * @peer: in P2P mode, this is the only remote peer
>   * @dev_list: entry for the module wide device list
>   */
>  struct ovpn_struct {
>  	struct net_device *dev;
>  	bool registered;
>  	enum ovpn_mode mode;
> +	spinlock_t lock; /* protect writing to the ovpn_struct object */
> +	struct workqueue_struct *events_wq;
> +	struct ovpn_peer __rcu *peer;
>  	struct list_head dev_list;
>  };
>  

...

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 07/24] ovpn: introduce the ovpn_peer object
  2024-05-13 10:09   ` Simon Horman
@ 2024-05-13 10:53     ` Antonio Quartulli
  2024-05-13 15:04       ` Simon Horman
  0 siblings, 1 reply; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-13 10:53 UTC (permalink / raw
  To: Simon Horman; +Cc: netdev

On 13/05/2024 12:09, Simon Horman wrote:
> On Mon, May 06, 2024 at 03:16:20AM +0200, Antonio Quartulli wrote:
>> An ovpn_peer object holds the whole status of a remote peer
>> (regardless whether it is a server or a client).
>>
>> This includes status for crypto, tx/rx buffers, napi, etc.
>>
>> Only support for one peer is introduced (P2P mode).
>> Multi peer support is introduced with a later patch.
>>
>> Along with the ovpn_peer, also the ovpn_bind object is introcued
>> as the two are strictly related.
>> An ovpn_bind object wraps a sockaddr representing the local
>> coordinates being used to talk to a specific peer.
>>
>> Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
> 
> ...
> 
>> diff --git a/drivers/net/ovpn/ovpnstruct.h b/drivers/net/ovpn/ovpnstruct.h
>> index ee05b8a2c61d..b79d4f0474b0 100644
>> --- a/drivers/net/ovpn/ovpnstruct.h
>> +++ b/drivers/net/ovpn/ovpnstruct.h
>> @@ -17,12 +17,19 @@
>>    * @dev: the actual netdev representing the tunnel
>>    * @registered: whether dev is still registered with netdev or not
>>    * @mode: device operation mode (i.e. p2p, mp, ..)
>> + * @lock: protect this object
>> + * @event_wq: used to schedule generic events that may sleep and that need to be
>> + *            performed outside of softirq context
> 
> nit: events_wq

Thanks for the report. I fixed this locally already.

You don't know how long I had to stare at the kdoc warning and the code 
in order to realize that I missed a 's' :-S

Regards,

> 
>> + * @peer: in P2P mode, this is the only remote peer
>>    * @dev_list: entry for the module wide device list
>>    */
>>   struct ovpn_struct {
>>   	struct net_device *dev;
>>   	bool registered;
>>   	enum ovpn_mode mode;
>> +	spinlock_t lock; /* protect writing to the ovpn_struct object */
>> +	struct workqueue_struct *events_wq;
>> +	struct ovpn_peer __rcu *peer;
>>   	struct list_head dev_list;
>>   };
>>   
> 
> ...

-- 
Antonio Quartulli
OpenVPN Inc.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 13/24] ovpn: implement TCP transport
  2024-05-06  1:16 ` [PATCH net-next v3 13/24] ovpn: implement TCP transport Antonio Quartulli
@ 2024-05-13 13:37   ` Antonio Quartulli
  2024-05-13 15:34     ` Jakub Kicinski
  2024-05-13 14:50   ` Sabrina Dubroca
  1 sibling, 1 reply; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-13 13:37 UTC (permalink / raw
  To: Simon Horman
  Cc: Jakub Kicinski, Sergey Ryazanov, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Esben Haabendal, netdev

Hi Simon,

On 06/05/2024 03:16, Antonio Quartulli wrote:
[...]
> diff --git a/drivers/net/ovpn/peer.h b/drivers/net/ovpn/peer.h
> index b5ff59a4b40f..ac4907705d98 100644
> --- a/drivers/net/ovpn/peer.h
> +++ b/drivers/net/ovpn/peer.h
> @@ -33,6 +33,16 @@
>    * @netif_rx_ring: queue of packets to be sent to the netdevice via NAPI
>    * @napi: NAPI object
>    * @sock: the socket being used to talk to this peer
> + * @tcp.tx_ring: queue for packets to be forwarded to userspace (TCP only)
> + * @tcp.tx_work: work for processing outgoing socket data (TCP only)
> + * @tcp.rx_work: wok for processing incoming socket data (TCP only)
> + * @tcp.raw_len: next packet length as read from the stream (TCP only)

can you please help me with the following warning from kerneldoc?
As you can see below, raw_len is an array.

May that be the reason why the script isn't picking it up correctly?

drivers/net/ovpn/peer.h:101: warning: Function parameter or struct 
member 'raw_len' not described in 'ovpn_peer'
drivers/net/ovpn/peer.h:101: warning: Excess struct member 'tcp.raw_len' 
description in 'ovpn_peer'

(line number may differ as I am in the middle of a rebase)

Regards,


> + * @tcp.skb: next packet being filled with data from the stream (TCP only)
> + * @tcp.offset: position of the next byte to write in the skb (TCP only)
> + * @tcp.data_len: next packet length converted to host order (TCP only)
> + * @tcp.sk_cb.sk_data_ready: pointer to original cb
> + * @tcp.sk_cb.sk_write_space: pointer to original cb
> + * @tcp.sk_cb.prot: pointer to original prot object
>    * @crypto: the crypto configuration (ciphers, keys, etc..)
>    * @dst_cache: cache for dst_entry used to send to peer
>    * @bind: remote peer binding
> @@ -59,6 +69,25 @@ struct ovpn_peer {
>   	struct ptr_ring netif_rx_ring;
>   	struct napi_struct napi;
>   	struct ovpn_socket *sock;
> +	/* state of the TCP reading. Needed to keep track of how much of a
> +	 * single packet has already been read from the stream and how much is
> +	 * missing
> +	 */
> +	struct {
> +		struct ptr_ring tx_ring;
> +		struct work_struct tx_work;
> +		struct work_struct rx_work;
> +
> +		u8 raw_len[sizeof(u16)];
> +		struct sk_buff *skb;
> +		u16 offset;
> +		u16 data_len;
> +		struct {
> +			void (*sk_data_ready)(struct sock *sk);
> +			void (*sk_write_space)(struct sock *sk);
> +			struct proto *prot;
> +		} sk_cb;
> +	} tcp;
>   	struct ovpn_crypto_state crypto;
>   	struct dst_cache dst_cache;
>   	struct ovpn_bind __rcu *bind;
> diff --git a/drivers/net/ovpn/skb.h b/drivers/net/ovpn/skb.h
> new file mode 100644
> index 000000000000..ba92811e12ff
> --- /dev/null
> +++ b/drivers/net/ovpn/skb.h
> @@ -0,0 +1,51 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*  OpenVPN data channel offload
> + *
> + *  Copyright (C) 2020-2024 OpenVPN, Inc.
> + *
> + *  Author:	Antonio Quartulli <antonio@openvpn.net>
> + *		James Yonan <james@openvpn.net>
> + */
> +
> +#ifndef _NET_OVPN_SKB_H_
> +#define _NET_OVPN_SKB_H_
> +
> +#include <linux/in.h>
> +#include <linux/in6.h>
> +#include <linux/ip.h>
> +#include <linux/skbuff.h>
> +#include <linux/socket.h>
> +#include <linux/types.h>
> +
> +#define OVPN_SKB_CB(skb) ((struct ovpn_skb_cb *)&((skb)->cb))
> +
> +struct ovpn_skb_cb {
> +	union {
> +		struct in_addr ipv4;
> +		struct in6_addr ipv6;
> +	} local;
> +	sa_family_t sa_fam;
> +};
> +
> +/* Return IP protocol version from skb header.
> + * Return 0 if protocol is not IPv4/IPv6 or cannot be read.
> + */
> +static inline __be16 ovpn_ip_check_protocol(struct sk_buff *skb)
> +{
> +	__be16 proto = 0;
> +
> +	/* skb could be non-linear,
> +	 * make sure IP header is in non-fragmented part
> +	 */
> +	if (!pskb_network_may_pull(skb, sizeof(struct iphdr)))
> +		return 0;
> +
> +	if (ip_hdr(skb)->version == 4)
> +		proto = htons(ETH_P_IP);
> +	else if (ip_hdr(skb)->version == 6)
> +		proto = htons(ETH_P_IPV6);
> +
> +	return proto;
> +}
> +
> +#endif /* _NET_OVPN_SKB_H_ */
> diff --git a/drivers/net/ovpn/socket.c b/drivers/net/ovpn/socket.c
> index e099a61b03fa..004db5b13663 100644
> --- a/drivers/net/ovpn/socket.c
> +++ b/drivers/net/ovpn/socket.c
> @@ -16,6 +16,7 @@
>   #include "packet.h"
>   #include "peer.h"
>   #include "socket.h"
> +#include "tcp.h"
>   #include "udp.h"
>   
>   /* Finalize release of socket, called after RCU grace period */
> @@ -26,6 +27,8 @@ static void ovpn_socket_detach(struct socket *sock)
>   
>   	if (sock->sk->sk_protocol == IPPROTO_UDP)
>   		ovpn_udp_socket_detach(sock);
> +	else if (sock->sk->sk_protocol == IPPROTO_TCP)
> +		ovpn_tcp_socket_detach(sock);
>   
>   	sockfd_put(sock);
>   }
> @@ -69,6 +72,8 @@ static int ovpn_socket_attach(struct socket *sock, struct ovpn_peer *peer)
>   
>   	if (sock->sk->sk_protocol == IPPROTO_UDP)
>   		ret = ovpn_udp_socket_attach(sock, peer->ovpn);
> +	else if (sock->sk->sk_protocol == IPPROTO_TCP)
> +		ret = ovpn_tcp_socket_attach(sock, peer);
>   
>   	return ret;
>   }
> @@ -124,6 +129,21 @@ struct ovpn_socket *ovpn_socket_new(struct socket *sock, struct ovpn_peer *peer)
>   	ovpn_sock->sock = sock;
>   	kref_init(&ovpn_sock->refcount);
>   
> +	/* TCP sockets are per-peer, therefore they are linked to their unique
> +	 * peer
> +	 */
> +	if (sock->sk->sk_protocol == IPPROTO_TCP) {
> +		ovpn_sock->peer = peer;
> +		ret = ptr_ring_init(&ovpn_sock->recv_ring, OVPN_QUEUE_LEN,
> +				    GFP_KERNEL);
> +		if (ret < 0) {
> +			netdev_err(peer->ovpn->dev, "%s: cannot allocate TCP recv ring\n",
> +				   __func__);
> +			kfree(ovpn_sock);
> +			return ERR_PTR(ret);
> +		}
> +	}
> +
>   	rcu_assign_sk_user_data(sock->sk, ovpn_sock);
>   
>   	return ovpn_sock;
> diff --git a/drivers/net/ovpn/socket.h b/drivers/net/ovpn/socket.h
> index 0d23de5a9344..88c6271ba5c7 100644
> --- a/drivers/net/ovpn/socket.h
> +++ b/drivers/net/ovpn/socket.h
> @@ -21,12 +21,25 @@ struct ovpn_peer;
>   /**
>    * struct ovpn_socket - a kernel socket referenced in the ovpn code
>    * @ovpn: ovpn instance owning this socket (UDP only)
> + * @peer: unique peer transmitting over this socket (TCP only)
> + * @recv_ring: queue where non-data packets directed to userspace are stored
>    * @sock: the low level sock object
>    * @refcount: amount of contexts currently referencing this object
>    * @rcu: member used to schedule RCU destructor callback
>    */
>   struct ovpn_socket {
> -	struct ovpn_struct *ovpn;
> +	union {
> +		/* the VPN session object owning this socket (UDP only) */
> +		struct ovpn_struct *ovpn;
> +
> +		/* TCP only */
> +		struct {
> +			/** @peer: unique peer transmitting over this socket */
> +			struct ovpn_peer *peer;
> +			struct ptr_ring recv_ring;
> +		};
> +	};
> +
>   	struct socket *sock;
>   	struct kref refcount;
>   	struct rcu_head rcu;
> diff --git a/drivers/net/ovpn/tcp.c b/drivers/net/ovpn/tcp.c
> new file mode 100644
> index 000000000000..84ad7cd4fc4f
> --- /dev/null
> +++ b/drivers/net/ovpn/tcp.c
> @@ -0,0 +1,511 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*  OpenVPN data channel offload
> + *
> + *  Copyright (C) 2019-2024 OpenVPN, Inc.
> + *
> + *  Author:	Antonio Quartulli <antonio@openvpn.net>
> + */
> +
> +#include <linux/ptr_ring.h>
> +#include <linux/skbuff.h>
> +#include <net/tcp.h>
> +#include <net/route.h>
> +
> +#include "ovpnstruct.h"
> +#include "main.h"
> +#include "io.h"
> +#include "packet.h"
> +#include "peer.h"
> +#include "proto.h"
> +#include "skb.h"
> +#include "socket.h"
> +#include "tcp.h"
> +
> +static struct proto ovpn_tcp_prot;
> +
> +static int ovpn_tcp_read_sock(read_descriptor_t *desc, struct sk_buff *in_skb,
> +			      unsigned int in_offset, size_t in_len)
> +{
> +	struct sock *sk = desc->arg.data;
> +	struct ovpn_socket *sock;
> +	struct ovpn_skb_cb *cb;
> +	struct ovpn_peer *peer;
> +	size_t chunk, copied = 0;
> +	void *data;
> +	u16 len;
> +	int st;
> +
> +	rcu_read_lock();
> +	sock = rcu_dereference_sk_user_data(sk);
> +	rcu_read_unlock();
> +
> +	if (unlikely(!sock || !sock->peer)) {
> +		pr_err("ovpn: read_sock triggered for socket with no metadata\n");
> +		desc->error = -EINVAL;
> +		return 0;
> +	}
> +
> +	peer = sock->peer;
> +
> +	while (in_len > 0) {
> +		/* no skb allocated means that we have to read (or finish
> +		 * reading) the 2 bytes prefix containing the actual packet
> +		 * size.
> +		 */
> +		if (!peer->tcp.skb) {
> +			chunk = min_t(size_t, in_len,
> +				      sizeof(u16) - peer->tcp.offset);
> +			WARN_ON(skb_copy_bits(in_skb, in_offset,
> +					      peer->tcp.raw_len +
> +					      peer->tcp.offset, chunk) < 0);
> +			peer->tcp.offset += chunk;
> +
> +			/* keep on reading until we got the whole packet size */
> +			if (peer->tcp.offset != sizeof(u16))
> +				goto next_read;
> +
> +			len = ntohs(*(__be16 *)peer->tcp.raw_len);
> +			/* invalid packet length: this is a fatal TCP error */
> +			if (!len) {
> +				netdev_err(peer->ovpn->dev,
> +					   "%s: received invalid packet length: %d\n",
> +					   __func__, len);
> +				desc->error = -EINVAL;
> +				goto err;
> +			}
> +
> +			/* add 2 bytes to allocated space (and immediately
> +			 * reserve them) for packet length prepending, in case
> +			 * the skb has to be forwarded to userspace
> +			 */
> +			peer->tcp.skb =
> +				netdev_alloc_skb_ip_align(peer->ovpn->dev,
> +							  len + sizeof(u16));
> +			if (!peer->tcp.skb) {
> +				desc->error = -ENOMEM;
> +				goto err;
> +			}
> +			skb_reserve(peer->tcp.skb, sizeof(u16));
> +
> +			peer->tcp.offset = 0;
> +			peer->tcp.data_len = len;
> +		} else {
> +			chunk = min_t(size_t, in_len,
> +				      peer->tcp.data_len - peer->tcp.offset);
> +
> +			/* extend skb to accommodate the new chunk and copy it
> +			 * from the input skb
> +			 */
> +			data = skb_put(peer->tcp.skb, chunk);
> +			WARN_ON(skb_copy_bits(in_skb, in_offset, data,
> +					      chunk) < 0);
> +			peer->tcp.offset += chunk;
> +
> +			/* keep on reading until we get the full packet */
> +			if (peer->tcp.offset != peer->tcp.data_len)
> +				goto next_read;
> +
> +			/* do not perform IP caching for TCP connections */
> +			cb = OVPN_SKB_CB(peer->tcp.skb);
> +			cb->sa_fam = AF_UNSPEC;
> +
> +			/* At this point we know the packet is from a configured
> +			 * peer.
> +			 * DATA_V2 packets are handled in kernel space, the rest
> +			 * goes to user space.
> +			 *
> +			 * Queue skb for sending to userspace via recvmsg on the
> +			 * socket
> +			 */
> +			if (likely(ovpn_opcode_from_skb(peer->tcp.skb, 0) ==
> +				   OVPN_DATA_V2)) {
> +				/* hold reference to peer as required by
> +				 * ovpn_recv().
> +				 *
> +				 * NOTE: in this context we should already be
> +				 * holding a reference to this peer, therefore
> +				 * ovpn_peer_hold() is not expected to fail
> +				 */
> +				WARN_ON(!ovpn_peer_hold(peer));
> +				st = ovpn_recv(peer->ovpn, peer, peer->tcp.skb);
> +				if (unlikely(st < 0))
> +					ovpn_peer_put(peer);
> +
> +			} else {
> +				/* prepend skb with packet len. this way
> +				 * userspace can parse the packet as if it just
> +				 * arrived from the remote endpoint
> +				 */
> +				void *raw_len = __skb_push(peer->tcp.skb,
> +							   sizeof(u16));
> +
> +				memcpy(raw_len, peer->tcp.raw_len, sizeof(u16));
> +
> +				st = ptr_ring_produce_bh(&peer->sock->recv_ring,
> +							 peer->tcp.skb);
> +				if (likely(!st))
> +					peer->tcp.sk_cb.sk_data_ready(sk);
> +			}
> +
> +			/* skb not consumed - free it now */
> +			if (unlikely(st < 0))
> +				kfree_skb(peer->tcp.skb);
> +
> +			peer->tcp.skb = NULL;
> +			peer->tcp.offset = 0;
> +			peer->tcp.data_len = 0;
> +		}
> +next_read:
> +		in_len -= chunk;
> +		in_offset += chunk;
> +		copied += chunk;
> +	}
> +
> +	return copied;
> +err:
> +	netdev_err(peer->ovpn->dev, "cannot process incoming TCP data: %d\n",
> +		   desc->error);
> +	ovpn_peer_del(peer, OVPN_DEL_PEER_REASON_TRANSPORT_ERROR);
> +	return 0;
> +}
> +
> +static void ovpn_tcp_data_ready(struct sock *sk)
> +{
> +	struct socket *sock = sk->sk_socket;
> +	read_descriptor_t desc;
> +
> +	if (unlikely(!sock || !sock->ops || !sock->ops->read_sock))
> +		return;
> +
> +	desc.arg.data = sk;
> +	desc.error = 0;
> +	desc.count = 1;
> +
> +	sock->ops->read_sock(sk, &desc, ovpn_tcp_read_sock);
> +}
> +
> +static void ovpn_tcp_write_space(struct sock *sk)
> +{
> +	struct ovpn_socket *sock;
> +
> +	rcu_read_lock();
> +	sock = rcu_dereference_sk_user_data(sk);
> +	rcu_read_unlock();
> +
> +	if (!sock || !sock->peer)
> +		return;
> +
> +	queue_work(sock->peer->ovpn->events_wq, &sock->peer->tcp.tx_work);
> +}
> +
> +static bool ovpn_tcp_sock_is_readable(struct sock *sk)
> +
> +{
> +	struct ovpn_socket *sock;
> +
> +	rcu_read_lock();
> +	sock = rcu_dereference_sk_user_data(sk);
> +	rcu_read_unlock();
> +
> +	if (!sock || !sock->peer)
> +		return false;
> +
> +	return !ptr_ring_empty_bh(&sock->recv_ring);
> +}
> +
> +static int ovpn_tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
> +			    int flags, int *addr_len)
> +{
> +	bool tmp = flags & MSG_DONTWAIT;
> +	DEFINE_WAIT_FUNC(wait, woken_wake_function);
> +	int ret, chunk, copied = 0;
> +	struct ovpn_socket *sock;
> +	struct sk_buff *skb;
> +	long timeo;
> +
> +	if (unlikely(flags & MSG_ERRQUEUE))
> +		return sock_recv_errqueue(sk, msg, len, SOL_IP, IP_RECVERR);
> +
> +	timeo = sock_rcvtimeo(sk, tmp);
> +
> +	rcu_read_lock();
> +	sock = rcu_dereference_sk_user_data(sk);
> +	rcu_read_unlock();
> +
> +	if (!sock || !sock->peer) {
> +		ret = -EBADF;
> +		goto unlock;
> +	}
> +
> +	while (ptr_ring_empty_bh(&sock->recv_ring)) {
> +		if (sk->sk_shutdown & RCV_SHUTDOWN)
> +			return 0;
> +
> +		if (sock_flag(sk, SOCK_DONE))
> +			return 0;
> +
> +		if (!timeo) {
> +			ret = -EAGAIN;
> +			goto unlock;
> +		}
> +
> +		add_wait_queue(sk_sleep(sk), &wait);
> +		sk_set_bit(SOCKWQ_ASYNC_WAITDATA, sk);
> +		sk_wait_event(sk, &timeo, !ptr_ring_empty_bh(&sock->recv_ring),
> +			      &wait);
> +		sk_clear_bit(SOCKWQ_ASYNC_WAITDATA, sk);
> +		remove_wait_queue(sk_sleep(sk), &wait);
> +
> +		/* take care of signals */
> +		if (signal_pending(current)) {
> +			ret = sock_intr_errno(timeo);
> +			goto unlock;
> +		}
> +	}
> +
> +	while (len && (skb = __ptr_ring_peek(&sock->recv_ring))) {
> +		chunk = min_t(size_t, len, skb->len);
> +		ret = skb_copy_datagram_msg(skb, 0, msg, chunk);
> +		if (ret < 0) {
> +			pr_err("ovpn: cannot copy TCP data to userspace: %d\n",
> +			       ret);
> +			kfree_skb(skb);
> +			goto unlock;
> +		}
> +
> +		__skb_pull(skb, chunk);
> +
> +		if (!skb->len) {
> +			/* skb was entirely consumed and can now be removed from
> +			 * the ring
> +			 */
> +			__ptr_ring_discard_one(&sock->recv_ring);
> +			consume_skb(skb);
> +		}
> +
> +		len -= chunk;
> +		copied += chunk;
> +	}
> +	ret = copied;
> +
> +unlock:
> +	return ret ? : -EAGAIN;
> +}
> +
> +static void ovpn_destroy_skb(void *skb)
> +{
> +	consume_skb(skb);
> +}
> +
> +void ovpn_tcp_socket_detach(struct socket *sock)
> +{
> +	struct ovpn_socket *ovpn_sock;
> +	struct ovpn_peer *peer;
> +
> +	if (!sock)
> +		return;
> +
> +	rcu_read_lock();
> +	ovpn_sock = rcu_dereference_sk_user_data(sock->sk);
> +	rcu_read_unlock();
> +
> +	if (!ovpn_sock->peer)
> +		return;
> +
> +	peer = ovpn_sock->peer;
> +
> +	/* restore CBs that were saved in ovpn_sock_set_tcp_cb() */
> +	write_lock_bh(&sock->sk->sk_callback_lock);
> +	sock->sk->sk_data_ready = peer->tcp.sk_cb.sk_data_ready;
> +	sock->sk->sk_write_space = peer->tcp.sk_cb.sk_write_space;
> +	sock->sk->sk_prot = peer->tcp.sk_cb.prot;
> +	rcu_assign_sk_user_data(sock->sk, NULL);
> +	write_unlock_bh(&sock->sk->sk_callback_lock);
> +
> +	/* cancel any ongoing work. Done after removing the CBs so that these
> +	 * workers cannot be re-armed
> +	 */
> +	cancel_work_sync(&peer->tcp.tx_work);
> +
> +	ptr_ring_cleanup(&ovpn_sock->recv_ring, ovpn_destroy_skb);
> +	ptr_ring_cleanup(&peer->tcp.tx_ring, ovpn_destroy_skb);
> +}
> +
> +/* Try to send one skb (or part of it) over the TCP stream.
> + *
> + * Return 0 on success or a negative error code otherwise.
> + *
> + * Note that the skb is modified by putting away the data being sent, therefore
> + * the caller should check if skb->len is zero to understand if the full skb was
> + * sent or not.
> + */
> +static int ovpn_tcp_send_one(struct ovpn_peer *peer, struct sk_buff *skb)
> +{
> +	struct msghdr msg = { .msg_flags = MSG_DONTWAIT | MSG_NOSIGNAL };
> +	struct kvec iv = { 0 };
> +	int ret;
> +
> +	if (skb_linearize(skb) < 0) {
> +		net_err_ratelimited("%s: can't linearize packet\n", __func__);
> +		return -ENOMEM;
> +	}
> +
> +	/* initialize iv structure now as skb_linearize() may have changed
> +	 * skb->data
> +	 */
> +	iv.iov_base = skb->data;
> +	iv.iov_len = skb->len;
> +
> +	ret = kernel_sendmsg(peer->sock->sock, &msg, &iv, 1, iv.iov_len);
> +	if (ret > 0) {
> +		__skb_pull(skb, ret);
> +
> +		/* since we update per-cpu stats in process context,
> +		 * we need to disable softirqs
> +		 */
> +		local_bh_disable();
> +		dev_sw_netstats_tx_add(peer->ovpn->dev, 1, ret);
> +		local_bh_enable();
> +
> +		return 0;
> +	}
> +
> +	return ret;
> +}
> +
> +/* Process packets in TCP TX queue */
> +static void ovpn_tcp_tx_work(struct work_struct *work)
> +{
> +	struct ovpn_peer *peer;
> +	struct sk_buff *skb;
> +	int ret;
> +
> +	peer = container_of(work, struct ovpn_peer, tcp.tx_work);
> +	while ((skb = __ptr_ring_peek(&peer->tcp.tx_ring))) {
> +		ret = ovpn_tcp_send_one(peer, skb);
> +		if (ret < 0 && ret != -EAGAIN) {
> +			net_warn_ratelimited("%s: cannot send TCP packet to peer %u: %d\n",
> +					     __func__, peer->id, ret);
> +			/* in case of TCP error stop sending loop and delete
> +			 * peer
> +			 */
> +			ovpn_peer_del(peer,
> +				      OVPN_DEL_PEER_REASON_TRANSPORT_ERROR);
> +			break;
> +		} else if (!skb->len) {
> +			/* skb was entirely consumed and can now be removed from
> +			 * the ring
> +			 */
> +			__ptr_ring_discard_one(&peer->tcp.tx_ring);
> +			consume_skb(skb);
> +		}
> +
> +		/* give a chance to be rescheduled if needed */
> +		cond_resched();
> +	}
> +}
> +
> +/* Put packet into TCP TX queue and schedule a consumer */
> +void ovpn_queue_tcp_skb(struct ovpn_peer *peer, struct sk_buff *skb)
> +{
> +	int ret;
> +
> +	ret = ptr_ring_produce_bh(&peer->tcp.tx_ring, skb);
> +	if (ret < 0) {
> +		kfree_skb_list(skb);
> +		return;
> +	}
> +
> +	queue_work(peer->ovpn->events_wq, &peer->tcp.tx_work);
> +}
> +
> +/* Set TCP encapsulation callbacks */
> +int ovpn_tcp_socket_attach(struct socket *sock, struct ovpn_peer *peer)
> +{
> +	void *old_data;
> +	int ret;
> +
> +	INIT_WORK(&peer->tcp.tx_work, ovpn_tcp_tx_work);
> +
> +	ret = ptr_ring_init(&peer->tcp.tx_ring, OVPN_QUEUE_LEN, GFP_KERNEL);
> +	if (ret < 0) {
> +		netdev_err(peer->ovpn->dev, "cannot allocate TCP TX ring\n");
> +		return ret;
> +	}
> +
> +	peer->tcp.skb = NULL;
> +	peer->tcp.offset = 0;
> +	peer->tcp.data_len = 0;
> +
> +	write_lock_bh(&sock->sk->sk_callback_lock);
> +
> +	/* make sure no pre-existing encapsulation handler exists */
> +	rcu_read_lock();
> +	old_data = rcu_dereference_sk_user_data(sock->sk);
> +	rcu_read_unlock();
> +	if (old_data) {
> +		netdev_err(peer->ovpn->dev,
> +			   "provided socket already taken by other user\n");
> +		ret = -EBUSY;
> +		goto err;
> +	}
> +
> +	/* sanity check */
> +	if (sock->sk->sk_protocol != IPPROTO_TCP) {
> +		netdev_err(peer->ovpn->dev,
> +			   "provided socket is UDP but expected TCP\n");
> +		ret = -EINVAL;
> +		goto err;
> +	}
> +
> +	/* only a fully connected socket are expected. Connection should be
> +	 * handled in userspace
> +	 */
> +	if (sock->sk->sk_state != TCP_ESTABLISHED) {
> +		netdev_err(peer->ovpn->dev,
> +			   "provided TCP socket is not in ESTABLISHED state: %d\n",
> +			   sock->sk->sk_state);
> +		ret = -EINVAL;
> +		goto err;
> +	}
> +
> +	/* save current CBs so that they can be restored upon socket release */
> +	peer->tcp.sk_cb.sk_data_ready = sock->sk->sk_data_ready;
> +	peer->tcp.sk_cb.sk_write_space = sock->sk->sk_write_space;
> +	peer->tcp.sk_cb.prot = sock->sk->sk_prot;
> +
> +	/* assign our static CBs */
> +	sock->sk->sk_data_ready = ovpn_tcp_data_ready;
> +	sock->sk->sk_write_space = ovpn_tcp_write_space;
> +	sock->sk->sk_prot = &ovpn_tcp_prot;
> +
> +	write_unlock_bh(&sock->sk->sk_callback_lock);
> +
> +	return 0;
> +err:
> +	write_unlock_bh(&sock->sk->sk_callback_lock);
> +	ptr_ring_cleanup(&peer->tcp.tx_ring, NULL);
> +
> +	return ret;
> +}
> +
> +int __init ovpn_tcp_init(void)
> +{
> +	/* We need to substitute the recvmsg and the sock_is_readable
> +	 * callbacks in the sk_prot member of the sock object for TCP
> +	 * sockets.
> +	 *
> +	 * However sock->sk_prot is a pointer to a static variable and
> +	 * therefore we can't directly modify it, otherwise every socket
> +	 * pointing to it will be affected.
> +	 *
> +	 * For this reason we create our own static copy and modify what
> +	 * we need. Then we make sk_prot point to this copy
> +	 * (in ovpn_tcp_socket_attach())
> +	 */
> +	ovpn_tcp_prot = tcp_prot;
> +	ovpn_tcp_prot.recvmsg = ovpn_tcp_recvmsg;
> +	ovpn_tcp_prot.sock_is_readable = ovpn_tcp_sock_is_readable;
> +
> +	return 0;
> +}
> diff --git a/drivers/net/ovpn/tcp.h b/drivers/net/ovpn/tcp.h
> new file mode 100644
> index 000000000000..7e73f6e76e6c
> --- /dev/null
> +++ b/drivers/net/ovpn/tcp.h
> @@ -0,0 +1,42 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*  OpenVPN data channel offload
> + *
> + *  Copyright (C) 2019-2024 OpenVPN, Inc.
> + *
> + *  Author:	Antonio Quartulli <antonio@openvpn.net>
> + */
> +
> +#ifndef _NET_OVPN_TCP_H_
> +#define _NET_OVPN_TCP_H_
> +
> +#include <linux/net.h>
> +#include <linux/skbuff.h>
> +#include <linux/types.h>
> +#include <linux/workqueue.h>
> +
> +#include "peer.h"
> +
> +/* Initialize TCP static objects */
> +int __init ovpn_tcp_init(void);
> +
> +void ovpn_queue_tcp_skb(struct ovpn_peer *peer, struct sk_buff *skb);
> +
> +int ovpn_tcp_socket_attach(struct socket *sock, struct ovpn_peer *peer);
> +void ovpn_tcp_socket_detach(struct socket *sock);
> +
> +/* Prepare skb and enqueue it for sending to peer.
> + *
> + * Preparation consist in prepending the skb payload with its size.
> + * Required by the OpenVPN protocol in order to extract packets from
> + * the TCP stream on the receiver side.
> + */
> +static inline void ovpn_tcp_send_skb(struct ovpn_peer *peer,
> +				     struct sk_buff *skb)
> +{
> +	u16 len = skb->len;
> +
> +	*(__be16 *)__skb_push(skb, sizeof(u16)) = htons(len);
> +	ovpn_queue_tcp_skb(peer, skb);
> +}
> +
> +#endif /* _NET_OVPN_TCP_H_ */

-- 
Antonio Quartulli
OpenVPN Inc.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 13/24] ovpn: implement TCP transport
  2024-05-06  1:16 ` [PATCH net-next v3 13/24] ovpn: implement TCP transport Antonio Quartulli
  2024-05-13 13:37   ` Antonio Quartulli
@ 2024-05-13 14:50   ` Sabrina Dubroca
  2024-05-13 22:20     ` Antonio Quartulli
  1 sibling, 1 reply; 111+ messages in thread
From: Sabrina Dubroca @ 2024-05-13 14:50 UTC (permalink / raw
  To: Antonio Quartulli
  Cc: netdev, Jakub Kicinski, Sergey Ryazanov, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Esben Haabendal

2024-05-06, 03:16:26 +0200, Antonio Quartulli wrote:
> @@ -307,6 +308,7 @@ static bool ovpn_encrypt_one(struct ovpn_peer *peer, struct sk_buff *skb)
>  /* Process packets in TX queue in a transport-specific way.
>   *
>   * UDP transport - encrypt and send across the tunnel.
> + * TCP transport - encrypt and put into TCP TX queue.
>   */
>  void ovpn_encrypt_work(struct work_struct *work)
>  {
> @@ -340,6 +342,9 @@ void ovpn_encrypt_work(struct work_struct *work)
>  					ovpn_udp_send_skb(peer->ovpn, peer,
>  							  curr);
>  					break;
> +				case IPPROTO_TCP:
> +					ovpn_tcp_send_skb(peer, curr);
> +					break;
>  				default:
>  					/* no transport configured yet */
>  					consume_skb(skb);
> diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c
> index 9ae9844dd281..a04d6e55a473 100644
> --- a/drivers/net/ovpn/main.c
> +++ b/drivers/net/ovpn/main.c
> @@ -23,6 +23,7 @@
>  #include "io.h"
>  #include "packet.h"
>  #include "peer.h"
> +#include "tcp.h"
>  
>  /* Driver info */
>  #define DRV_DESCRIPTION	"OpenVPN data channel offload (ovpn)"
> @@ -247,8 +248,14 @@ static struct pernet_operations ovpn_pernet_ops = {
>  
>  static int __init ovpn_init(void)
>  {
> -	int err = register_netdevice_notifier(&ovpn_netdev_notifier);
> +	int err = ovpn_tcp_init();
>  
> +	if (err) {

ovpn_tcp_init cannot fail (and if it could, you'd need to clean up
when register_netdevice_notifier fails). I'd make ovpn_tcp_init void
and kill this check.

> +		pr_err("ovpn: cannot initialize TCP component: %d\n", err);
> +		return err;
> +	}
> +
> +	err = register_netdevice_notifier(&ovpn_netdev_notifier);
>  	if (err) {
>  		pr_err("ovpn: can't register netdevice notifier: %d\n", err);
>  		return err;
> diff --git a/drivers/net/ovpn/peer.h b/drivers/net/ovpn/peer.h
> index b5ff59a4b40f..ac4907705d98 100644
> --- a/drivers/net/ovpn/peer.h
> +++ b/drivers/net/ovpn/peer.h
> @@ -33,6 +33,16 @@
>   * @netif_rx_ring: queue of packets to be sent to the netdevice via NAPI
>   * @napi: NAPI object
>   * @sock: the socket being used to talk to this peer
> + * @tcp.tx_ring: queue for packets to be forwarded to userspace (TCP only)
> + * @tcp.tx_work: work for processing outgoing socket data (TCP only)
> + * @tcp.rx_work: wok for processing incoming socket data (TCP only)

Never actually used.
If you keep it: s/wok/work/

> + * @tcp.raw_len: next packet length as read from the stream (TCP only)
> + * @tcp.skb: next packet being filled with data from the stream (TCP only)
> + * @tcp.offset: position of the next byte to write in the skb (TCP only)
> + * @tcp.data_len: next packet length converted to host order (TCP only)

It would be nice to add information about whether they're used for TX or RX.

> + * @tcp.sk_cb.sk_data_ready: pointer to original cb
> + * @tcp.sk_cb.sk_write_space: pointer to original cb
> + * @tcp.sk_cb.prot: pointer to original prot object
>   * @crypto: the crypto configuration (ciphers, keys, etc..)
>   * @dst_cache: cache for dst_entry used to send to peer
>   * @bind: remote peer binding
> @@ -59,6 +69,25 @@ struct ovpn_peer {
>  	struct ptr_ring netif_rx_ring;
>  	struct napi_struct napi;
>  	struct ovpn_socket *sock;
> +	/* state of the TCP reading. Needed to keep track of how much of a
> +	 * single packet has already been read from the stream and how much is
> +	 * missing
> +	 */
> +	struct {
> +		struct ptr_ring tx_ring;
> +		struct work_struct tx_work;
> +		struct work_struct rx_work;
> +
> +		u8 raw_len[sizeof(u16)];

Why not u16 or __be16 for this one?

> +		struct sk_buff *skb;
> +		u16 offset;
> +		u16 data_len;
> +		struct {
> +			void (*sk_data_ready)(struct sock *sk);
> +			void (*sk_write_space)(struct sock *sk);
> +			struct proto *prot;
> +		} sk_cb;
> +	} tcp;
>  	struct ovpn_crypto_state crypto;
>  	struct dst_cache dst_cache;
>  	struct ovpn_bind __rcu *bind;
> diff --git a/drivers/net/ovpn/skb.h b/drivers/net/ovpn/skb.h
> new file mode 100644
> index 000000000000..ba92811e12ff
> --- /dev/null
> +++ b/drivers/net/ovpn/skb.h
> @@ -0,0 +1,51 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*  OpenVPN data channel offload
> + *
> + *  Copyright (C) 2020-2024 OpenVPN, Inc.
> + *
> + *  Author:	Antonio Quartulli <antonio@openvpn.net>
> + *		James Yonan <james@openvpn.net>
> + */
> +
> +#ifndef _NET_OVPN_SKB_H_
> +#define _NET_OVPN_SKB_H_
> +
> +#include <linux/in.h>
> +#include <linux/in6.h>
> +#include <linux/ip.h>
> +#include <linux/skbuff.h>
> +#include <linux/socket.h>
> +#include <linux/types.h>
> +
> +#define OVPN_SKB_CB(skb) ((struct ovpn_skb_cb *)&((skb)->cb))
> +
> +struct ovpn_skb_cb {
> +	union {
> +		struct in_addr ipv4;
> +		struct in6_addr ipv6;
> +	} local;
> +	sa_family_t sa_fam;
> +};
> +
> +/* Return IP protocol version from skb header.
> + * Return 0 if protocol is not IPv4/IPv6 or cannot be read.
> + */
> +static inline __be16 ovpn_ip_check_protocol(struct sk_buff *skb)

A dupe of this function exists in drivers/net/ovpn/io.c. I guess you
can just introduce skb.h from the start (with only
ovpn_ip_check_protocol at first).

> +{
> +	__be16 proto = 0;
> +
> +	/* skb could be non-linear,
> +	 * make sure IP header is in non-fragmented part
> +	 */
> +	if (!pskb_network_may_pull(skb, sizeof(struct iphdr)))
> +		return 0;
> +
> +	if (ip_hdr(skb)->version == 4)
> +		proto = htons(ETH_P_IP);
> +	else if (ip_hdr(skb)->version == 6)
> +		proto = htons(ETH_P_IPV6);
> +
> +	return proto;
> +}
> +
> +#endif /* _NET_OVPN_SKB_H_ */
> diff --git a/drivers/net/ovpn/socket.c b/drivers/net/ovpn/socket.c
> index e099a61b03fa..004db5b13663 100644
> --- a/drivers/net/ovpn/socket.c
> +++ b/drivers/net/ovpn/socket.c
> @@ -16,6 +16,7 @@
>  #include "packet.h"
>  #include "peer.h"
>  #include "socket.h"
> +#include "tcp.h"
>  #include "udp.h"
>  
>  /* Finalize release of socket, called after RCU grace period */
> @@ -26,6 +27,8 @@ static void ovpn_socket_detach(struct socket *sock)
>  
>  	if (sock->sk->sk_protocol == IPPROTO_UDP)
>  		ovpn_udp_socket_detach(sock);
> +	else if (sock->sk->sk_protocol == IPPROTO_TCP)
> +		ovpn_tcp_socket_detach(sock);
>  
>  	sockfd_put(sock);
>  }
> @@ -69,6 +72,8 @@ static int ovpn_socket_attach(struct socket *sock, struct ovpn_peer *peer)
>  
>  	if (sock->sk->sk_protocol == IPPROTO_UDP)
>  		ret = ovpn_udp_socket_attach(sock, peer->ovpn);
> +	else if (sock->sk->sk_protocol == IPPROTO_TCP)
> +		ret = ovpn_tcp_socket_attach(sock, peer);
>  
>  	return ret;
>  }
> @@ -124,6 +129,21 @@ struct ovpn_socket *ovpn_socket_new(struct socket *sock, struct ovpn_peer *peer)
>  	ovpn_sock->sock = sock;

The line above this is:

    ovpn_sock->ovpn = peer->ovpn;

It's technically fine since you then overwrite this with peer in case
we're on TCP, but ovpn_sock->ovpn only exists on UDP since you moved
it into a union in this patch.

>  	kref_init(&ovpn_sock->refcount);
>  
> +	/* TCP sockets are per-peer, therefore they are linked to their unique
> +	 * peer
> +	 */
> +	if (sock->sk->sk_protocol == IPPROTO_TCP) {
> +		ovpn_sock->peer = peer;
> +		ret = ptr_ring_init(&ovpn_sock->recv_ring, OVPN_QUEUE_LEN,
> +				    GFP_KERNEL);
> +		if (ret < 0) {
> +			netdev_err(peer->ovpn->dev, "%s: cannot allocate TCP recv ring\n",
> +				   __func__);

Should you also call ovpn_socket_detach here? (as well when the
kzalloc for ovpn_sock fails a bit earlier)

> +			kfree(ovpn_sock);
> +			return ERR_PTR(ret);
> +		}
> +	}
> +
>  	rcu_assign_sk_user_data(sock->sk, ovpn_sock);
>  
>  	return ovpn_sock;
> diff --git a/drivers/net/ovpn/socket.h b/drivers/net/ovpn/socket.h
> index 0d23de5a9344..88c6271ba5c7 100644
> --- a/drivers/net/ovpn/socket.h
> +++ b/drivers/net/ovpn/socket.h
> @@ -21,12 +21,25 @@ struct ovpn_peer;
>  /**
>   * struct ovpn_socket - a kernel socket referenced in the ovpn code
>   * @ovpn: ovpn instance owning this socket (UDP only)
> + * @peer: unique peer transmitting over this socket (TCP only)
> + * @recv_ring: queue where non-data packets directed to userspace are stored
>   * @sock: the low level sock object
>   * @refcount: amount of contexts currently referencing this object
>   * @rcu: member used to schedule RCU destructor callback
>   */
>  struct ovpn_socket {
> -	struct ovpn_struct *ovpn;
> +	union {
> +		/* the VPN session object owning this socket (UDP only) */

nit: Probably not needed

> +		struct ovpn_struct *ovpn;
> +
> +		/* TCP only */
> +		struct {
> +			/** @peer: unique peer transmitting over this socket */

Is kdoc upset about peer but not recv_ring?

> +			struct ovpn_peer *peer;
> +			struct ptr_ring recv_ring;
> +		};
> +	};
> +
>  	struct socket *sock;
>  	struct kref refcount;
>  	struct rcu_head rcu;
> diff --git a/drivers/net/ovpn/tcp.c b/drivers/net/ovpn/tcp.c
> new file mode 100644
> index 000000000000..84ad7cd4fc4f
> --- /dev/null
> +++ b/drivers/net/ovpn/tcp.c
> @@ -0,0 +1,511 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*  OpenVPN data channel offload
> + *
> + *  Copyright (C) 2019-2024 OpenVPN, Inc.
> + *
> + *  Author:	Antonio Quartulli <antonio@openvpn.net>
> + */
> +
> +#include <linux/ptr_ring.h>
> +#include <linux/skbuff.h>
> +#include <net/tcp.h>
> +#include <net/route.h>
> +
> +#include "ovpnstruct.h"
> +#include "main.h"
> +#include "io.h"
> +#include "packet.h"
> +#include "peer.h"
> +#include "proto.h"
> +#include "skb.h"
> +#include "socket.h"
> +#include "tcp.h"
> +
> +static struct proto ovpn_tcp_prot;
> +
> +static int ovpn_tcp_read_sock(read_descriptor_t *desc, struct sk_buff *in_skb,
> +			      unsigned int in_offset, size_t in_len)
> +{
> +	struct sock *sk = desc->arg.data;
> +	struct ovpn_socket *sock;
> +	struct ovpn_skb_cb *cb;
> +	struct ovpn_peer *peer;
> +	size_t chunk, copied = 0;
> +	void *data;
> +	u16 len;
> +	int st;
> +
> +	rcu_read_lock();
> +	sock = rcu_dereference_sk_user_data(sk);
> +	rcu_read_unlock();

You can't just release rcu_read_lock and keep using sock (here and in
the rest of this file). Either you keep rcu_read_lock, or you can take
a reference on the ovpn_socket.


Anyway, this looks like you're reinventing strparser. Overall this is
very similar to net/xfrm/espintcp.c, but the receive side of espintcp
uses strp and is much shorter (recv_ring looks equivalent to
ike_queue, both sending a few messages to userspace -- look for
strp_init, espintcp_rcv, espintcp_parse in that file).

> +/* Set TCP encapsulation callbacks */
> +int ovpn_tcp_socket_attach(struct socket *sock, struct ovpn_peer *peer)
> +{
> +	void *old_data;
> +	int ret;
> +
> +	INIT_WORK(&peer->tcp.tx_work, ovpn_tcp_tx_work);
> +
> +	ret = ptr_ring_init(&peer->tcp.tx_ring, OVPN_QUEUE_LEN, GFP_KERNEL);
> +	if (ret < 0) {
> +		netdev_err(peer->ovpn->dev, "cannot allocate TCP TX ring\n");
> +		return ret;
> +	}
> +
> +	peer->tcp.skb = NULL;
> +	peer->tcp.offset = 0;
> +	peer->tcp.data_len = 0;
> +
> +	write_lock_bh(&sock->sk->sk_callback_lock);
> +
> +	/* make sure no pre-existing encapsulation handler exists */
> +	rcu_read_lock();
> +	old_data = rcu_dereference_sk_user_data(sock->sk);
> +	rcu_read_unlock();
> +	if (old_data) {
> +		netdev_err(peer->ovpn->dev,
> +			   "provided socket already taken by other user\n");
> +		ret = -EBUSY;
> +		goto err;

The UDP code differentiates "socket already owned by this interface"
from "already taken by other user". That doesn't apply to TCP?



> +int __init ovpn_tcp_init(void)
> +{
> +	/* We need to substitute the recvmsg and the sock_is_readable
> +	 * callbacks in the sk_prot member of the sock object for TCP
> +	 * sockets.
> +	 *
> +	 * However sock->sk_prot is a pointer to a static variable and
> +	 * therefore we can't directly modify it, otherwise every socket
> +	 * pointing to it will be affected.
> +	 *
> +	 * For this reason we create our own static copy and modify what
> +	 * we need. Then we make sk_prot point to this copy
> +	 * (in ovpn_tcp_socket_attach())
> +	 */
> +	ovpn_tcp_prot = tcp_prot;

Don't you need a separate variant for IPv6, like TLS does?

> +	ovpn_tcp_prot.recvmsg = ovpn_tcp_recvmsg;

You don't need to replace ->sendmsg as well? The userspace client is
not expected to send messages?

-- 
Sabrina


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 07/24] ovpn: introduce the ovpn_peer object
  2024-05-13 10:53     ` Antonio Quartulli
@ 2024-05-13 15:04       ` Simon Horman
  0 siblings, 0 replies; 111+ messages in thread
From: Simon Horman @ 2024-05-13 15:04 UTC (permalink / raw
  To: Antonio Quartulli; +Cc: netdev

On Mon, May 13, 2024 at 12:53:09PM +0200, Antonio Quartulli wrote:
> On 13/05/2024 12:09, Simon Horman wrote:
> > On Mon, May 06, 2024 at 03:16:20AM +0200, Antonio Quartulli wrote:
> > > An ovpn_peer object holds the whole status of a remote peer
> > > (regardless whether it is a server or a client).
> > > 
> > > This includes status for crypto, tx/rx buffers, napi, etc.
> > > 
> > > Only support for one peer is introduced (P2P mode).
> > > Multi peer support is introduced with a later patch.
> > > 
> > > Along with the ovpn_peer, also the ovpn_bind object is introcued
> > > as the two are strictly related.
> > > An ovpn_bind object wraps a sockaddr representing the local
> > > coordinates being used to talk to a specific peer.
> > > 
> > > Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
> > 
> > ...
> > 
> > > diff --git a/drivers/net/ovpn/ovpnstruct.h b/drivers/net/ovpn/ovpnstruct.h
> > > index ee05b8a2c61d..b79d4f0474b0 100644
> > > --- a/drivers/net/ovpn/ovpnstruct.h
> > > +++ b/drivers/net/ovpn/ovpnstruct.h
> > > @@ -17,12 +17,19 @@
> > >    * @dev: the actual netdev representing the tunnel
> > >    * @registered: whether dev is still registered with netdev or not
> > >    * @mode: device operation mode (i.e. p2p, mp, ..)
> > > + * @lock: protect this object
> > > + * @event_wq: used to schedule generic events that may sleep and that need to be
> > > + *            performed outside of softirq context
> > 
> > nit: events_wq
> 
> Thanks for the report. I fixed this locally already.
> 
> You don't know how long I had to stare at the kdoc warning and the code in
> order to realize that I missed a 's' :-S

It took me more than one reading too :)

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 13/24] ovpn: implement TCP transport
  2024-05-13 13:37   ` Antonio Quartulli
@ 2024-05-13 15:34     ` Jakub Kicinski
  0 siblings, 0 replies; 111+ messages in thread
From: Jakub Kicinski @ 2024-05-13 15:34 UTC (permalink / raw
  To: Antonio Quartulli
  Cc: Simon Horman, Sergey Ryazanov, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Esben Haabendal, netdev

On Mon, 13 May 2024 15:37:54 +0200 Antonio Quartulli wrote:
> >    * @netif_rx_ring: queue of packets to be sent to the netdevice via NAPI
> >    * @napi: NAPI object
> >    * @sock: the socket being used to talk to this peer
> > + * @tcp.tx_ring: queue for packets to be forwarded to userspace (TCP only)
> > + * @tcp.tx_work: work for processing outgoing socket data (TCP only)
> > + * @tcp.rx_work: wok for processing incoming socket data (TCP only)
> > + * @tcp.raw_len: next packet length as read from the stream (TCP only)  
> 
> can you please help me with the following warning from kerneldoc?
> As you can see below, raw_len is an array.
> 
> May that be the reason why the script isn't picking it up correctly?
> 
> drivers/net/ovpn/peer.h:101: warning: Function parameter or struct 
> member 'raw_len' not described in 'ovpn_peer'
> drivers/net/ovpn/peer.h:101: warning: Excess struct member 'tcp.raw_len' 
> description in 'ovpn_peer'
> 
> (line number may differ as I am in the middle of a rebase)

Hm, the script itself is a fairly simple file of perl regexps
You can try to tweak it and send a fix to the list.
I presume using sizeof() to declare an array is fairly uncommon.
Or forgo the sizeof() and use literal 2? :)

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 13/24] ovpn: implement TCP transport
  2024-05-13 14:50   ` Sabrina Dubroca
@ 2024-05-13 22:20     ` Antonio Quartulli
  2024-05-14  8:58       ` Sabrina Dubroca
  0 siblings, 1 reply; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-13 22:20 UTC (permalink / raw
  To: Sabrina Dubroca
  Cc: netdev, Jakub Kicinski, Sergey Ryazanov, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Esben Haabendal

On 13/05/2024 16:50, Sabrina Dubroca wrote:
> 2024-05-06, 03:16:26 +0200, Antonio Quartulli wrote:
>> @@ -307,6 +308,7 @@ static bool ovpn_encrypt_one(struct ovpn_peer *peer, struct sk_buff *skb)
>>   /* Process packets in TX queue in a transport-specific way.
>>    *
>>    * UDP transport - encrypt and send across the tunnel.
>> + * TCP transport - encrypt and put into TCP TX queue.
>>    */
>>   void ovpn_encrypt_work(struct work_struct *work)
>>   {
>> @@ -340,6 +342,9 @@ void ovpn_encrypt_work(struct work_struct *work)
>>   					ovpn_udp_send_skb(peer->ovpn, peer,
>>   							  curr);
>>   					break;
>> +				case IPPROTO_TCP:
>> +					ovpn_tcp_send_skb(peer, curr);
>> +					break;
>>   				default:
>>   					/* no transport configured yet */
>>   					consume_skb(skb);
>> diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c
>> index 9ae9844dd281..a04d6e55a473 100644
>> --- a/drivers/net/ovpn/main.c
>> +++ b/drivers/net/ovpn/main.c
>> @@ -23,6 +23,7 @@
>>   #include "io.h"
>>   #include "packet.h"
>>   #include "peer.h"
>> +#include "tcp.h"
>>   
>>   /* Driver info */
>>   #define DRV_DESCRIPTION	"OpenVPN data channel offload (ovpn)"
>> @@ -247,8 +248,14 @@ static struct pernet_operations ovpn_pernet_ops = {
>>   
>>   static int __init ovpn_init(void)
>>   {
>> -	int err = register_netdevice_notifier(&ovpn_netdev_notifier);
>> +	int err = ovpn_tcp_init();
>>   
>> +	if (err) {
> 
> ovpn_tcp_init cannot fail (and if it could, you'd need to clean up
> when register_netdevice_notifier fails). I'd make ovpn_tcp_init void
> and kill this check.

I like to have all init functions returning int by design, even though 
they may not fail.

But I can undersand this is not necessarily good practice (somebody will 
always ask "when does it fail?" and there will will be no answer, which 
is confusing)

> 
>> +		pr_err("ovpn: cannot initialize TCP component: %d\n", err);
>> +		return err;
>> +	}
>> +
>> +	err = register_netdevice_notifier(&ovpn_netdev_notifier);
>>   	if (err) {
>>   		pr_err("ovpn: can't register netdevice notifier: %d\n", err);
>>   		return err;
>> diff --git a/drivers/net/ovpn/peer.h b/drivers/net/ovpn/peer.h
>> index b5ff59a4b40f..ac4907705d98 100644
>> --- a/drivers/net/ovpn/peer.h
>> +++ b/drivers/net/ovpn/peer.h
>> @@ -33,6 +33,16 @@
>>    * @netif_rx_ring: queue of packets to be sent to the netdevice via NAPI
>>    * @napi: NAPI object
>>    * @sock: the socket being used to talk to this peer
>> + * @tcp.tx_ring: queue for packets to be forwarded to userspace (TCP only)
>> + * @tcp.tx_work: work for processing outgoing socket data (TCP only)
>> + * @tcp.rx_work: wok for processing incoming socket data (TCP only)
> 
> Never actually used.
> If you keep it: s/wok/work/

Indeed, I think this is another leftover.

> 
>> + * @tcp.raw_len: next packet length as read from the stream (TCP only)
>> + * @tcp.skb: next packet being filled with data from the stream (TCP only)
>> + * @tcp.offset: position of the next byte to write in the skb (TCP only)
>> + * @tcp.data_len: next packet length converted to host order (TCP only)
> 
> It would be nice to add information about whether they're used for TX or RX.

they are all about "from the stream" and "to the skb", meaning that we 
are doing RX.
Will make it more explicit.

> 
>> + * @tcp.sk_cb.sk_data_ready: pointer to original cb
>> + * @tcp.sk_cb.sk_write_space: pointer to original cb
>> + * @tcp.sk_cb.prot: pointer to original prot object
>>    * @crypto: the crypto configuration (ciphers, keys, etc..)
>>    * @dst_cache: cache for dst_entry used to send to peer
>>    * @bind: remote peer binding
>> @@ -59,6 +69,25 @@ struct ovpn_peer {
>>   	struct ptr_ring netif_rx_ring;
>>   	struct napi_struct napi;
>>   	struct ovpn_socket *sock;
>> +	/* state of the TCP reading. Needed to keep track of how much of a
>> +	 * single packet has already been read from the stream and how much is
>> +	 * missing
>> +	 */
>> +	struct {
>> +		struct ptr_ring tx_ring;
>> +		struct work_struct tx_work;
>> +		struct work_struct rx_work;
>> +
>> +		u8 raw_len[sizeof(u16)];
> 
> Why not u16 or __be16 for this one?

because in this array we are putting the bytes as we get them from the 
stream.
We may be at the point where one out of two bytes is available on the 
stream. For this reason I use an array to store this u16 byte by byte.

Once thw two bytes are ready, we convert the content in an actual int 
and store it in "data_len" (a few lines below).

> 
>> +		struct sk_buff *skb;
>> +		u16 offset;
>> +		u16 data_len;
>> +		struct {
>> +			void (*sk_data_ready)(struct sock *sk);
>> +			void (*sk_write_space)(struct sock *sk);
>> +			struct proto *prot;
>> +		} sk_cb;
>> +	} tcp;
>>   	struct ovpn_crypto_state crypto;
>>   	struct dst_cache dst_cache;
>>   	struct ovpn_bind __rcu *bind;
>> diff --git a/drivers/net/ovpn/skb.h b/drivers/net/ovpn/skb.h
>> new file mode 100644
>> index 000000000000..ba92811e12ff
>> --- /dev/null
>> +++ b/drivers/net/ovpn/skb.h
>> @@ -0,0 +1,51 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only */
>> +/*  OpenVPN data channel offload
>> + *
>> + *  Copyright (C) 2020-2024 OpenVPN, Inc.
>> + *
>> + *  Author:	Antonio Quartulli <antonio@openvpn.net>
>> + *		James Yonan <james@openvpn.net>
>> + */
>> +
>> +#ifndef _NET_OVPN_SKB_H_
>> +#define _NET_OVPN_SKB_H_
>> +
>> +#include <linux/in.h>
>> +#include <linux/in6.h>
>> +#include <linux/ip.h>
>> +#include <linux/skbuff.h>
>> +#include <linux/socket.h>
>> +#include <linux/types.h>
>> +
>> +#define OVPN_SKB_CB(skb) ((struct ovpn_skb_cb *)&((skb)->cb))
>> +
>> +struct ovpn_skb_cb {
>> +	union {
>> +		struct in_addr ipv4;
>> +		struct in6_addr ipv6;
>> +	} local;
>> +	sa_family_t sa_fam;
>> +};
>> +
>> +/* Return IP protocol version from skb header.
>> + * Return 0 if protocol is not IPv4/IPv6 or cannot be read.
>> + */
>> +static inline __be16 ovpn_ip_check_protocol(struct sk_buff *skb)
> 
> A dupe of this function exists in drivers/net/ovpn/io.c. I guess you
> can just introduce skb.h from the start (with only
> ovpn_ip_check_protocol at first).

thanks. I think that was the idea, but something went horribly wrong.

> 
>> +{
>> +	__be16 proto = 0;
>> +
>> +	/* skb could be non-linear,
>> +	 * make sure IP header is in non-fragmented part
>> +	 */
>> +	if (!pskb_network_may_pull(skb, sizeof(struct iphdr)))
>> +		return 0;
>> +
>> +	if (ip_hdr(skb)->version == 4)
>> +		proto = htons(ETH_P_IP);
>> +	else if (ip_hdr(skb)->version == 6)
>> +		proto = htons(ETH_P_IPV6);
>> +
>> +	return proto;
>> +}
>> +
>> +#endif /* _NET_OVPN_SKB_H_ */
>> diff --git a/drivers/net/ovpn/socket.c b/drivers/net/ovpn/socket.c
>> index e099a61b03fa..004db5b13663 100644
>> --- a/drivers/net/ovpn/socket.c
>> +++ b/drivers/net/ovpn/socket.c
>> @@ -16,6 +16,7 @@
>>   #include "packet.h"
>>   #include "peer.h"
>>   #include "socket.h"
>> +#include "tcp.h"
>>   #include "udp.h"
>>   
>>   /* Finalize release of socket, called after RCU grace period */
>> @@ -26,6 +27,8 @@ static void ovpn_socket_detach(struct socket *sock)
>>   
>>   	if (sock->sk->sk_protocol == IPPROTO_UDP)
>>   		ovpn_udp_socket_detach(sock);
>> +	else if (sock->sk->sk_protocol == IPPROTO_TCP)
>> +		ovpn_tcp_socket_detach(sock);
>>   
>>   	sockfd_put(sock);
>>   }
>> @@ -69,6 +72,8 @@ static int ovpn_socket_attach(struct socket *sock, struct ovpn_peer *peer)
>>   
>>   	if (sock->sk->sk_protocol == IPPROTO_UDP)
>>   		ret = ovpn_udp_socket_attach(sock, peer->ovpn);
>> +	else if (sock->sk->sk_protocol == IPPROTO_TCP)
>> +		ret = ovpn_tcp_socket_attach(sock, peer);
>>   
>>   	return ret;
>>   }
>> @@ -124,6 +129,21 @@ struct ovpn_socket *ovpn_socket_new(struct socket *sock, struct ovpn_peer *peer)
>>   	ovpn_sock->sock = sock;
> 
> The line above this is:
> 
>      ovpn_sock->ovpn = peer->ovpn;
> 
> It's technically fine since you then overwrite this with peer in case
> we're on TCP, but ovpn_sock->ovpn only exists on UDP since you moved
> it into a union in this patch.

Yeah, I did not want to make another branch, but having a UDP specific 
case will make code easier to read.

> 
>>   	kref_init(&ovpn_sock->refcount);
>>   
>> +	/* TCP sockets are per-peer, therefore they are linked to their unique
>> +	 * peer
>> +	 */
>> +	if (sock->sk->sk_protocol == IPPROTO_TCP) {
>> +		ovpn_sock->peer = peer;
>> +		ret = ptr_ring_init(&ovpn_sock->recv_ring, OVPN_QUEUE_LEN,
>> +				    GFP_KERNEL);
>> +		if (ret < 0) {
>> +			netdev_err(peer->ovpn->dev, "%s: cannot allocate TCP recv ring\n",
>> +				   __func__);
> 
> Should you also call ovpn_socket_detach here? (as well when the
> kzalloc for ovpn_sock fails a bit earlier)

mh, the attach is performed as first thing when we enter this function 
therefore you are right. we must undo the attach in case of failure.

> 
>> +			kfree(ovpn_sock);
>> +			return ERR_PTR(ret);
>> +		}
>> +	}
>> +
>>   	rcu_assign_sk_user_data(sock->sk, ovpn_sock);
>>   
>>   	return ovpn_sock;
>> diff --git a/drivers/net/ovpn/socket.h b/drivers/net/ovpn/socket.h
>> index 0d23de5a9344..88c6271ba5c7 100644
>> --- a/drivers/net/ovpn/socket.h
>> +++ b/drivers/net/ovpn/socket.h
>> @@ -21,12 +21,25 @@ struct ovpn_peer;
>>   /**
>>    * struct ovpn_socket - a kernel socket referenced in the ovpn code
>>    * @ovpn: ovpn instance owning this socket (UDP only)
>> + * @peer: unique peer transmitting over this socket (TCP only)
>> + * @recv_ring: queue where non-data packets directed to userspace are stored
>>    * @sock: the low level sock object
>>    * @refcount: amount of contexts currently referencing this object
>>    * @rcu: member used to schedule RCU destructor callback
>>    */
>>   struct ovpn_socket {
>> -	struct ovpn_struct *ovpn;
>> +	union {
>> +		/* the VPN session object owning this socket (UDP only) */
> 
> nit: Probably not needed
> 
>> +		struct ovpn_struct *ovpn;
>> +
>> +		/* TCP only */
>> +		struct {
>> +			/** @peer: unique peer transmitting over this socket */
> 
> Is kdoc upset about peer but not recv_ring?

leftovers from before having the kdoc. I am removing them.

> 
>> +			struct ovpn_peer *peer;
>> +			struct ptr_ring recv_ring;
>> +		};
>> +	};
>> +
>>   	struct socket *sock;
>>   	struct kref refcount;
>>   	struct rcu_head rcu;
>> diff --git a/drivers/net/ovpn/tcp.c b/drivers/net/ovpn/tcp.c
>> new file mode 100644
>> index 000000000000..84ad7cd4fc4f
>> --- /dev/null
>> +++ b/drivers/net/ovpn/tcp.c
>> @@ -0,0 +1,511 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*  OpenVPN data channel offload
>> + *
>> + *  Copyright (C) 2019-2024 OpenVPN, Inc.
>> + *
>> + *  Author:	Antonio Quartulli <antonio@openvpn.net>
>> + */
>> +
>> +#include <linux/ptr_ring.h>
>> +#include <linux/skbuff.h>
>> +#include <net/tcp.h>
>> +#include <net/route.h>
>> +
>> +#include "ovpnstruct.h"
>> +#include "main.h"
>> +#include "io.h"
>> +#include "packet.h"
>> +#include "peer.h"
>> +#include "proto.h"
>> +#include "skb.h"
>> +#include "socket.h"
>> +#include "tcp.h"
>> +
>> +static struct proto ovpn_tcp_prot;
>> +
>> +static int ovpn_tcp_read_sock(read_descriptor_t *desc, struct sk_buff *in_skb,
>> +			      unsigned int in_offset, size_t in_len)
>> +{
>> +	struct sock *sk = desc->arg.data;
>> +	struct ovpn_socket *sock;
>> +	struct ovpn_skb_cb *cb;
>> +	struct ovpn_peer *peer;
>> +	size_t chunk, copied = 0;
>> +	void *data;
>> +	u16 len;
>> +	int st;
>> +
>> +	rcu_read_lock();
>> +	sock = rcu_dereference_sk_user_data(sk);
>> +	rcu_read_unlock();
> 
> You can't just release rcu_read_lock and keep using sock (here and in
> the rest of this file). Either you keep rcu_read_lock, or you can take
> a reference on the ovpn_socket.

I was just staring at this today, after having worked on the 
rcu_read_lock/unlock for the peer get()s..

I thinkt the assumption was: if we are in this read_sock callback, it's 
impossible that the ovpn_socket was invalidated, because it gets 
invalidated upon detach, which also prevents any further calling of this 
callback. But this sounds racy, and I guess we should somewhat hold a 
reference..

> 
> 
> Anyway, this looks like you're reinventing strparser. Overall this is
> very similar to net/xfrm/espintcp.c, but the receive side of espintcp
> uses strp and is much shorter (recv_ring looks equivalent to
> ike_queue, both sending a few messages to userspace -- look for
> strp_init, espintcp_rcv, espintcp_parse in that file).

I think I did have a look at strparser once, but I wasn't sure to be 
grasping all details.

Will have another look and see what I can re-use.

> 
>> +/* Set TCP encapsulation callbacks */
>> +int ovpn_tcp_socket_attach(struct socket *sock, struct ovpn_peer *peer)
>> +{
>> +	void *old_data;
>> +	int ret;
>> +
>> +	INIT_WORK(&peer->tcp.tx_work, ovpn_tcp_tx_work);
>> +
>> +	ret = ptr_ring_init(&peer->tcp.tx_ring, OVPN_QUEUE_LEN, GFP_KERNEL);
>> +	if (ret < 0) {
>> +		netdev_err(peer->ovpn->dev, "cannot allocate TCP TX ring\n");
>> +		return ret;
>> +	}
>> +
>> +	peer->tcp.skb = NULL;
>> +	peer->tcp.offset = 0;
>> +	peer->tcp.data_len = 0;
>> +
>> +	write_lock_bh(&sock->sk->sk_callback_lock);
>> +
>> +	/* make sure no pre-existing encapsulation handler exists */
>> +	rcu_read_lock();
>> +	old_data = rcu_dereference_sk_user_data(sock->sk);
>> +	rcu_read_unlock();
>> +	if (old_data) {
>> +		netdev_err(peer->ovpn->dev,
>> +			   "provided socket already taken by other user\n");
>> +		ret = -EBUSY;
>> +		goto err;
> 
> The UDP code differentiates "socket already owned by this interface"
> from "already taken by other user". That doesn't apply to TCP?

This makes me wonder: how safe it is to interpret the user data as an 
object of type ovpn_socket?

When we find the user data already assigned, we don't know what was 
really stored in there, right?
Technically this socket could have gone through another module which 
assigned its own state.

Therefore I think that what UDP does [ dereferencing ((struct 
ovpn_socket *)user_data)->ovpn ] is probably not safe. Would you agree?

> 
> 
> 
>> +int __init ovpn_tcp_init(void)
>> +{
>> +	/* We need to substitute the recvmsg and the sock_is_readable
>> +	 * callbacks in the sk_prot member of the sock object for TCP
>> +	 * sockets.
>> +	 *
>> +	 * However sock->sk_prot is a pointer to a static variable and
>> +	 * therefore we can't directly modify it, otherwise every socket
>> +	 * pointing to it will be affected.
>> +	 *
>> +	 * For this reason we create our own static copy and modify what
>> +	 * we need. Then we make sk_prot point to this copy
>> +	 * (in ovpn_tcp_socket_attach())
>> +	 */
>> +	ovpn_tcp_prot = tcp_prot;
> 
> Don't you need a separate variant for IPv6, like TLS does?

Never did so far.

My wild wild wild guess: for the time this socket is owned by ovpn, we 
only use callbacks that are IPvX agnostic, hence v4 vs v6 doesn't make 
any difference.
When this socket is released, we reassigned the original prot.

> 
>> +	ovpn_tcp_prot.recvmsg = ovpn_tcp_recvmsg;
> 
> You don't need to replace ->sendmsg as well? The userspace client is
> not expected to send messages?

It is, but my assumption is that those packets will just go through the 
socket as usual. No need to be handled by ovpn (those packets are not 
encrypted/decrypted, like data traffic is).
And this is how it has worked so far.

Makes sense?

Thanks a lot!



-- 
Antonio Quartulli
OpenVPN Inc.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 13/24] ovpn: implement TCP transport
  2024-05-13 22:20     ` Antonio Quartulli
@ 2024-05-14  8:58       ` Sabrina Dubroca
  2024-05-14 22:11         ` Antonio Quartulli
  0 siblings, 1 reply; 111+ messages in thread
From: Sabrina Dubroca @ 2024-05-14  8:58 UTC (permalink / raw
  To: Antonio Quartulli
  Cc: netdev, Jakub Kicinski, Sergey Ryazanov, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Esben Haabendal

2024-05-14, 00:20:24 +0200, Antonio Quartulli wrote:
> On 13/05/2024 16:50, Sabrina Dubroca wrote:
> > 2024-05-06, 03:16:26 +0200, Antonio Quartulli wrote:
> > > diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c
> > > index 9ae9844dd281..a04d6e55a473 100644
> > > --- a/drivers/net/ovpn/main.c
> > > +++ b/drivers/net/ovpn/main.c
> > > @@ -23,6 +23,7 @@
> > >   #include "io.h"
> > >   #include "packet.h"
> > >   #include "peer.h"
> > > +#include "tcp.h"
> > >   /* Driver info */
> > >   #define DRV_DESCRIPTION	"OpenVPN data channel offload (ovpn)"
> > > @@ -247,8 +248,14 @@ static struct pernet_operations ovpn_pernet_ops = {
> > >   static int __init ovpn_init(void)
> > >   {
> > > -	int err = register_netdevice_notifier(&ovpn_netdev_notifier);
> > > +	int err = ovpn_tcp_init();
> > > +	if (err) {
> > 
> > ovpn_tcp_init cannot fail (and if it could, you'd need to clean up
> > when register_netdevice_notifier fails). I'd make ovpn_tcp_init void
> > and kill this check.
> 
> I like to have all init functions returning int by design, even though they
> may not fail.
> 
> But I can undersand this is not necessarily good practice (somebody will
> always ask "when does it fail?" and there will will be no answer, which is
> confusing)

Yes, pretty much.


> > > diff --git a/drivers/net/ovpn/peer.h b/drivers/net/ovpn/peer.h
> > > index b5ff59a4b40f..ac4907705d98 100644
> > > --- a/drivers/net/ovpn/peer.h
> > > +++ b/drivers/net/ovpn/peer.h
> > > + * @tcp.raw_len: next packet length as read from the stream (TCP only)
> > > + * @tcp.skb: next packet being filled with data from the stream (TCP only)
> > > + * @tcp.offset: position of the next byte to write in the skb (TCP only)
> > > + * @tcp.data_len: next packet length converted to host order (TCP only)
> > 
> > It would be nice to add information about whether they're used for TX or RX.
> 
> they are all about "from the stream" and "to the skb", meaning that we are
> doing RX.
> Will make it more explicit.

Maybe group them in a struct rx?

> > > + * @tcp.sk_cb.sk_data_ready: pointer to original cb
> > > + * @tcp.sk_cb.sk_write_space: pointer to original cb
> > > + * @tcp.sk_cb.prot: pointer to original prot object
> > >    * @crypto: the crypto configuration (ciphers, keys, etc..)
> > >    * @dst_cache: cache for dst_entry used to send to peer
> > >    * @bind: remote peer binding
> > > @@ -59,6 +69,25 @@ struct ovpn_peer {
> > >   	struct ptr_ring netif_rx_ring;
> > >   	struct napi_struct napi;
> > >   	struct ovpn_socket *sock;
> > > +	/* state of the TCP reading. Needed to keep track of how much of a
> > > +	 * single packet has already been read from the stream and how much is
> > > +	 * missing
> > > +	 */
> > > +	struct {
> > > +		struct ptr_ring tx_ring;
> > > +		struct work_struct tx_work;
> > > +		struct work_struct rx_work;
> > > +
> > > +		u8 raw_len[sizeof(u16)];
> > 
> > Why not u16 or __be16 for this one?
> 
> because in this array we are putting the bytes as we get them from the
> stream.
> We may be at the point where one out of two bytes is available on the
> stream. For this reason I use an array to store this u16 byte by byte.
> 
> Once thw two bytes are ready, we convert the content in an actual int and
> store it in "data_len" (a few lines below).

Ok, I see. Hopefully you can switch to strparser and make this one go
away.


> > > diff --git a/drivers/net/ovpn/socket.c b/drivers/net/ovpn/socket.c
> > > index e099a61b03fa..004db5b13663 100644
> > > --- a/drivers/net/ovpn/socket.c
> > > +++ b/drivers/net/ovpn/socket.c
> > > @@ -16,6 +16,7 @@
> > >   #include "packet.h"
> > >   #include "peer.h"
> > >   #include "socket.h"
> > > +#include "tcp.h"
> > >   #include "udp.h"
> > >   /* Finalize release of socket, called after RCU grace period */
> > > @@ -26,6 +27,8 @@ static void ovpn_socket_detach(struct socket *sock)
> > >   	if (sock->sk->sk_protocol == IPPROTO_UDP)
> > >   		ovpn_udp_socket_detach(sock);
> > > +	else if (sock->sk->sk_protocol == IPPROTO_TCP)
> > > +		ovpn_tcp_socket_detach(sock);
> > >   	sockfd_put(sock);
> > >   }
> > > @@ -69,6 +72,8 @@ static int ovpn_socket_attach(struct socket *sock, struct ovpn_peer *peer)
> > >   	if (sock->sk->sk_protocol == IPPROTO_UDP)
> > >   		ret = ovpn_udp_socket_attach(sock, peer->ovpn);
> > > +	else if (sock->sk->sk_protocol == IPPROTO_TCP)
> > > +		ret = ovpn_tcp_socket_attach(sock, peer);
> > >   	return ret;
> > >   }
> > > @@ -124,6 +129,21 @@ struct ovpn_socket *ovpn_socket_new(struct socket *sock, struct ovpn_peer *peer)
> > >   	ovpn_sock->sock = sock;
> > 
> > The line above this is:
> > 
> >      ovpn_sock->ovpn = peer->ovpn;
> > 
> > It's technically fine since you then overwrite this with peer in case
> > we're on TCP, but ovpn_sock->ovpn only exists on UDP since you moved
> > it into a union in this patch.
> 
> Yeah, I did not want to make another branch, but having a UDP specific case
> will make code easier to read.

Either that, or drop the union.


> > > diff --git a/drivers/net/ovpn/tcp.c b/drivers/net/ovpn/tcp.c
> > > new file mode 100644
> > > index 000000000000..84ad7cd4fc4f
> > > --- /dev/null
> > > +++ b/drivers/net/ovpn/tcp.c
> > > @@ -0,0 +1,511 @@
> > > +static int ovpn_tcp_read_sock(read_descriptor_t *desc, struct sk_buff *in_skb,
> > > +			      unsigned int in_offset, size_t in_len)
> > > +{
> > > +	struct sock *sk = desc->arg.data;
> > > +	struct ovpn_socket *sock;
> > > +	struct ovpn_skb_cb *cb;
> > > +	struct ovpn_peer *peer;
> > > +	size_t chunk, copied = 0;
> > > +	void *data;
> > > +	u16 len;
> > > +	int st;
> > > +
> > > +	rcu_read_lock();
> > > +	sock = rcu_dereference_sk_user_data(sk);
> > > +	rcu_read_unlock();
> > 
> > You can't just release rcu_read_lock and keep using sock (here and in
> > the rest of this file). Either you keep rcu_read_lock, or you can take
> > a reference on the ovpn_socket.
> 
> I was just staring at this today, after having worked on the
> rcu_read_lock/unlock for the peer get()s..
> 
> I thinkt the assumption was: if we are in this read_sock callback, it's
> impossible that the ovpn_socket was invalidated, because it gets invalidated
> upon detach, which also prevents any further calling of this callback. But
> this sounds racy, and I guess we should somewhat hold a reference..

ovpn_tcp_read_sock starts

detach
kfree_rcu(ovpn_socket)
...
ovpn_socket actually freed
...
ovpn_tcp_read_sock continues with freed ovpn_socket


I don't think anything in the current code prevents this.


> > > +/* Set TCP encapsulation callbacks */
> > > +int ovpn_tcp_socket_attach(struct socket *sock, struct ovpn_peer *peer)
> > > +{
> > > +	void *old_data;
> > > +	int ret;
> > > +
> > > +	INIT_WORK(&peer->tcp.tx_work, ovpn_tcp_tx_work);
> > > +
> > > +	ret = ptr_ring_init(&peer->tcp.tx_ring, OVPN_QUEUE_LEN, GFP_KERNEL);
> > > +	if (ret < 0) {
> > > +		netdev_err(peer->ovpn->dev, "cannot allocate TCP TX ring\n");
> > > +		return ret;
> > > +	}
> > > +
> > > +	peer->tcp.skb = NULL;
> > > +	peer->tcp.offset = 0;
> > > +	peer->tcp.data_len = 0;
> > > +
> > > +	write_lock_bh(&sock->sk->sk_callback_lock);
> > > +
> > > +	/* make sure no pre-existing encapsulation handler exists */
> > > +	rcu_read_lock();
> > > +	old_data = rcu_dereference_sk_user_data(sock->sk);
> > > +	rcu_read_unlock();
> > > +	if (old_data) {
> > > +		netdev_err(peer->ovpn->dev,
> > > +			   "provided socket already taken by other user\n");
> > > +		ret = -EBUSY;
> > > +		goto err;
> > 
> > The UDP code differentiates "socket already owned by this interface"
> > from "already taken by other user". That doesn't apply to TCP?
> 
> This makes me wonder: how safe it is to interpret the user data as an object
> of type ovpn_socket?
>
> When we find the user data already assigned, we don't know what was really
> stored in there, right?
> Technically this socket could have gone through another module which
> assigned its own state.
> 
> Therefore I think that what UDP does [ dereferencing ((struct ovpn_socket
> *)user_data)->ovpn ] is probably not safe. Would you agree?

Hmmm, yeah, I think you're right. If you checked encap_type ==
UDP_ENCAP_OVPNINUDP before (sk_prot for TCP), then you'd know it's
really your data. Basically call ovpn_from_udp_sock during attach if
you want to check something beyond EBUSY.

Once you're in your own callbacks, it should be safe. If some other
code sends packet with a non-ovpn socket to ovpn's ->encap_rcv,
something is really broken.

> > > +int __init ovpn_tcp_init(void)
> > > +{
> > > +	/* We need to substitute the recvmsg and the sock_is_readable
> > > +	 * callbacks in the sk_prot member of the sock object for TCP
> > > +	 * sockets.
> > > +	 *
> > > +	 * However sock->sk_prot is a pointer to a static variable and
> > > +	 * therefore we can't directly modify it, otherwise every socket
> > > +	 * pointing to it will be affected.
> > > +	 *
> > > +	 * For this reason we create our own static copy and modify what
> > > +	 * we need. Then we make sk_prot point to this copy
> > > +	 * (in ovpn_tcp_socket_attach())
> > > +	 */
> > > +	ovpn_tcp_prot = tcp_prot;
> > 
> > Don't you need a separate variant for IPv6, like TLS does?
> 
> Never did so far.
> 
> My wild wild wild guess: for the time this socket is owned by ovpn, we only
> use callbacks that are IPvX agnostic, hence v4 vs v6 doesn't make any
> difference.
> When this socket is released, we reassigned the original prot.

That seems a bit suspicious to me. For example, tcpv6_prot has a
different backlog_rcv. And you don't control if the socket is detached
before being closed, or which callbacks are needed. Your userspace
client doesn't use them, but someone else's might.

> > > +	ovpn_tcp_prot.recvmsg = ovpn_tcp_recvmsg;
> > 
> > You don't need to replace ->sendmsg as well? The userspace client is
> > not expected to send messages?
> 
> It is, but my assumption is that those packets will just go through the
> socket as usual. No need to be handled by ovpn (those packets are not
> encrypted/decrypted, like data traffic is).
> And this is how it has worked so far.
> 
> Makes sense?

Two things come to mind:

- userspace is expected to prefix the messages it inserts on the
  stream with the 2-byte length field? otherwise, the peer won't be
  able to parse them out of the stream

- I'm not convinced this would be safe wrt kernel writing partial
  messages. if ovpn_tcp_send_one doesn't send the full message, you
  could interleave two messages:

  +------+-------------------+------+--------+----------------+
  | len1 | (bytes from msg1) | len2 | (msg2) | (rest of msg1) |
  +------+-------------------+------+--------+----------------+

  and the RX side would parse that as:

  +------+-----------------------------------+------+---------
  | len1 | (bytes from msg1) | len2 | (msg2) | ???? | ...     
  +------+-------------------+---------------+------+---------

  and try to interpret some random bytes out of either msg1 or msg2 as
  a length prefix, resulting in a broken stream.


The stream format looks identical to ESP in TCP [1] (2B length prefix
followed by the actual message), so I think the espintcp code (both tx
and rx, except for actual protocol parsing) should look very
similar. The problems that need to be solved for both protocols are
pretty much the same.

[1] https://www.rfc-editor.org/rfc/rfc8229#section-3

-- 
Sabrina


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 13/24] ovpn: implement TCP transport
  2024-05-14  8:58       ` Sabrina Dubroca
@ 2024-05-14 22:11         ` Antonio Quartulli
  2024-05-15 10:19           ` Sabrina Dubroca
  0 siblings, 1 reply; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-14 22:11 UTC (permalink / raw
  To: Sabrina Dubroca
  Cc: netdev, Jakub Kicinski, Sergey Ryazanov, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Esben Haabendal

On 14/05/2024 10:58, Sabrina Dubroca wrote:
>>>> diff --git a/drivers/net/ovpn/peer.h b/drivers/net/ovpn/peer.h
>>>> index b5ff59a4b40f..ac4907705d98 100644
>>>> --- a/drivers/net/ovpn/peer.h
>>>> +++ b/drivers/net/ovpn/peer.h
>>>> + * @tcp.raw_len: next packet length as read from the stream (TCP only)
>>>> + * @tcp.skb: next packet being filled with data from the stream (TCP only)
>>>> + * @tcp.offset: position of the next byte to write in the skb (TCP only)
>>>> + * @tcp.data_len: next packet length converted to host order (TCP only)
>>>
>>> It would be nice to add information about whether they're used for TX or RX.
>>
>> they are all about "from the stream" and "to the skb", meaning that we are
>> doing RX.
>> Will make it more explicit.
> 
> Maybe group them in a struct rx?

yap, makes sense.

> 
>>>> + * @tcp.sk_cb.sk_data_ready: pointer to original cb
>>>> + * @tcp.sk_cb.sk_write_space: pointer to original cb
>>>> + * @tcp.sk_cb.prot: pointer to original prot object
>>>>     * @crypto: the crypto configuration (ciphers, keys, etc..)
>>>>     * @dst_cache: cache for dst_entry used to send to peer
>>>>     * @bind: remote peer binding
>>>> @@ -59,6 +69,25 @@ struct ovpn_peer {
>>>>    	struct ptr_ring netif_rx_ring;
>>>>    	struct napi_struct napi;
>>>>    	struct ovpn_socket *sock;
>>>> +	/* state of the TCP reading. Needed to keep track of how much of a
>>>> +	 * single packet has already been read from the stream and how much is
>>>> +	 * missing
>>>> +	 */
>>>> +	struct {
>>>> +		struct ptr_ring tx_ring;
>>>> +		struct work_struct tx_work;
>>>> +		struct work_struct rx_work;
>>>> +
>>>> +		u8 raw_len[sizeof(u16)];
>>>
>>> Why not u16 or __be16 for this one?
>>
>> because in this array we are putting the bytes as we get them from the
>> stream.
>> We may be at the point where one out of two bytes is available on the
>> stream. For this reason I use an array to store this u16 byte by byte.
>>
>> Once thw two bytes are ready, we convert the content in an actual int and
>> store it in "data_len" (a few lines below).
> 
> Ok, I see. Hopefully you can switch to strparser and make this one go
> away.
> 
> 
>>>> diff --git a/drivers/net/ovpn/socket.c b/drivers/net/ovpn/socket.c
>>>> index e099a61b03fa..004db5b13663 100644
>>>> --- a/drivers/net/ovpn/socket.c
>>>> +++ b/drivers/net/ovpn/socket.c
>>>> @@ -16,6 +16,7 @@
>>>>    #include "packet.h"
>>>>    #include "peer.h"
>>>>    #include "socket.h"
>>>> +#include "tcp.h"
>>>>    #include "udp.h"
>>>>    /* Finalize release of socket, called after RCU grace period */
>>>> @@ -26,6 +27,8 @@ static void ovpn_socket_detach(struct socket *sock)
>>>>    	if (sock->sk->sk_protocol == IPPROTO_UDP)
>>>>    		ovpn_udp_socket_detach(sock);
>>>> +	else if (sock->sk->sk_protocol == IPPROTO_TCP)
>>>> +		ovpn_tcp_socket_detach(sock);
>>>>    	sockfd_put(sock);
>>>>    }
>>>> @@ -69,6 +72,8 @@ static int ovpn_socket_attach(struct socket *sock, struct ovpn_peer *peer)
>>>>    	if (sock->sk->sk_protocol == IPPROTO_UDP)
>>>>    		ret = ovpn_udp_socket_attach(sock, peer->ovpn);
>>>> +	else if (sock->sk->sk_protocol == IPPROTO_TCP)
>>>> +		ret = ovpn_tcp_socket_attach(sock, peer);
>>>>    	return ret;
>>>>    }
>>>> @@ -124,6 +129,21 @@ struct ovpn_socket *ovpn_socket_new(struct socket *sock, struct ovpn_peer *peer)
>>>>    	ovpn_sock->sock = sock;
>>>
>>> The line above this is:
>>>
>>>       ovpn_sock->ovpn = peer->ovpn;
>>>
>>> It's technically fine since you then overwrite this with peer in case
>>> we're on TCP, but ovpn_sock->ovpn only exists on UDP since you moved
>>> it into a union in this patch.
>>
>> Yeah, I did not want to make another branch, but having a UDP specific case
>> will make code easier to read.
> 
> Either that, or drop the union.

ACK

> 
> 
>>>> diff --git a/drivers/net/ovpn/tcp.c b/drivers/net/ovpn/tcp.c
>>>> new file mode 100644
>>>> index 000000000000..84ad7cd4fc4f
>>>> --- /dev/null
>>>> +++ b/drivers/net/ovpn/tcp.c
>>>> @@ -0,0 +1,511 @@
>>>> +static int ovpn_tcp_read_sock(read_descriptor_t *desc, struct sk_buff *in_skb,
>>>> +			      unsigned int in_offset, size_t in_len)
>>>> +{
>>>> +	struct sock *sk = desc->arg.data;
>>>> +	struct ovpn_socket *sock;
>>>> +	struct ovpn_skb_cb *cb;
>>>> +	struct ovpn_peer *peer;
>>>> +	size_t chunk, copied = 0;
>>>> +	void *data;
>>>> +	u16 len;
>>>> +	int st;
>>>> +
>>>> +	rcu_read_lock();
>>>> +	sock = rcu_dereference_sk_user_data(sk);
>>>> +	rcu_read_unlock();
>>>
>>> You can't just release rcu_read_lock and keep using sock (here and in
>>> the rest of this file). Either you keep rcu_read_lock, or you can take
>>> a reference on the ovpn_socket.
>>
>> I was just staring at this today, after having worked on the
>> rcu_read_lock/unlock for the peer get()s..
>>
>> I thinkt the assumption was: if we are in this read_sock callback, it's
>> impossible that the ovpn_socket was invalidated, because it gets invalidated
>> upon detach, which also prevents any further calling of this callback. But
>> this sounds racy, and I guess we should somewhat hold a reference..
> 
> ovpn_tcp_read_sock starts
> 
> detach
> kfree_rcu(ovpn_socket)
> ...
> ovpn_socket actually freed
> ...
> ovpn_tcp_read_sock continues with freed ovpn_socket
> 
> 
> I don't think anything in the current code prevents this.

mh yeah, if something like this happens right after having started the 
read_sock we are doomed.
Will fix this.


> 
> 
>>>> +/* Set TCP encapsulation callbacks */
>>>> +int ovpn_tcp_socket_attach(struct socket *sock, struct ovpn_peer *peer)
>>>> +{
>>>> +	void *old_data;
>>>> +	int ret;
>>>> +
>>>> +	INIT_WORK(&peer->tcp.tx_work, ovpn_tcp_tx_work);
>>>> +
>>>> +	ret = ptr_ring_init(&peer->tcp.tx_ring, OVPN_QUEUE_LEN, GFP_KERNEL);
>>>> +	if (ret < 0) {
>>>> +		netdev_err(peer->ovpn->dev, "cannot allocate TCP TX ring\n");
>>>> +		return ret;
>>>> +	}
>>>> +
>>>> +	peer->tcp.skb = NULL;
>>>> +	peer->tcp.offset = 0;
>>>> +	peer->tcp.data_len = 0;
>>>> +
>>>> +	write_lock_bh(&sock->sk->sk_callback_lock);
>>>> +
>>>> +	/* make sure no pre-existing encapsulation handler exists */
>>>> +	rcu_read_lock();
>>>> +	old_data = rcu_dereference_sk_user_data(sock->sk);
>>>> +	rcu_read_unlock();
>>>> +	if (old_data) {
>>>> +		netdev_err(peer->ovpn->dev,
>>>> +			   "provided socket already taken by other user\n");
>>>> +		ret = -EBUSY;
>>>> +		goto err;
>>>
>>> The UDP code differentiates "socket already owned by this interface"
>>> from "already taken by other user". That doesn't apply to TCP?
>>
>> This makes me wonder: how safe it is to interpret the user data as an object
>> of type ovpn_socket?
>>
>> When we find the user data already assigned, we don't know what was really
>> stored in there, right?
>> Technically this socket could have gone through another module which
>> assigned its own state.
>>
>> Therefore I think that what UDP does [ dereferencing ((struct ovpn_socket
>> *)user_data)->ovpn ] is probably not safe. Would you agree?
> 
> Hmmm, yeah, I think you're right. If you checked encap_type ==
> UDP_ENCAP_OVPNINUDP before (sk_prot for TCP), then you'd know it's
> really your data. Basically call ovpn_from_udp_sock during attach if
> you want to check something beyond EBUSY.

right. Maybe we can leave with simply reporting EBUSY and be done with 
it, without adding extra checks and what not.

> 
> Once you're in your own callbacks, it should be safe. If some other
> code sends packet with a non-ovpn socket to ovpn's ->encap_rcv,
> something is really broken.

yup

> 
>>>> +int __init ovpn_tcp_init(void)
>>>> +{
>>>> +	/* We need to substitute the recvmsg and the sock_is_readable
>>>> +	 * callbacks in the sk_prot member of the sock object for TCP
>>>> +	 * sockets.
>>>> +	 *
>>>> +	 * However sock->sk_prot is a pointer to a static variable and
>>>> +	 * therefore we can't directly modify it, otherwise every socket
>>>> +	 * pointing to it will be affected.
>>>> +	 *
>>>> +	 * For this reason we create our own static copy and modify what
>>>> +	 * we need. Then we make sk_prot point to this copy
>>>> +	 * (in ovpn_tcp_socket_attach())
>>>> +	 */
>>>> +	ovpn_tcp_prot = tcp_prot;
>>>
>>> Don't you need a separate variant for IPv6, like TLS does?
>>
>> Never did so far.
>>
>> My wild wild wild guess: for the time this socket is owned by ovpn, we only
>> use callbacks that are IPvX agnostic, hence v4 vs v6 doesn't make any
>> difference.
>> When this socket is released, we reassigned the original prot.
> 
> That seems a bit suspicious to me. For example, tcpv6_prot has a
> different backlog_rcv. And you don't control if the socket is detached
> before being closed, or which callbacks are needed. Your userspace
> client doesn't use them, but someone else's might.
> 
>>>> +	ovpn_tcp_prot.recvmsg = ovpn_tcp_recvmsg;
>>>
>>> You don't need to replace ->sendmsg as well? The userspace client is
>>> not expected to send messages?
>>
>> It is, but my assumption is that those packets will just go through the
>> socket as usual. No need to be handled by ovpn (those packets are not
>> encrypted/decrypted, like data traffic is).
>> And this is how it has worked so far.
>>
>> Makes sense?
> 
> Two things come to mind:
> 
> - userspace is expected to prefix the messages it inserts on the
>    stream with the 2-byte length field? otherwise, the peer won't be
>    able to parse them out of the stream

correct. userspace sends those packets as if ovpn is not running, 
therefore this happens naturally.

> 
> - I'm not convinced this would be safe wrt kernel writing partial
>    messages. if ovpn_tcp_send_one doesn't send the full message, you
>    could interleave two messages:
> 
>    +------+-------------------+------+--------+----------------+
>    | len1 | (bytes from msg1) | len2 | (msg2) | (rest of msg1) |
>    +------+-------------------+------+--------+----------------+
> 
>    and the RX side would parse that as:
> 
>    +------+-----------------------------------+------+---------
>    | len1 | (bytes from msg1) | len2 | (msg2) | ???? | ...
>    +------+-------------------+---------------+------+---------
> 
>    and try to interpret some random bytes out of either msg1 or msg2 as
>    a length prefix, resulting in a broken stream.

hm you are correct. if multiple sendmsg can overlap, then we might be in 
troubles, but are we sure this can truly happen?

> 
> 
> The stream format looks identical to ESP in TCP [1] (2B length prefix
> followed by the actual message), so I think the espintcp code (both tx
> and rx, except for actual protocol parsing) should look very
> similar. The problems that need to be solved for both protocols are
> pretty much the same.

ok, will have a look. maybe this will simplify the code even more and we 
will get rid of some of the issues we were discussing above.

Thanks!

> 
> [1] https://www.rfc-editor.org/rfc/rfc8229#section-3
> 

-- 
Antonio Quartulli
OpenVPN Inc.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 13/24] ovpn: implement TCP transport
  2024-05-14 22:11         ` Antonio Quartulli
@ 2024-05-15 10:19           ` Sabrina Dubroca
  2024-05-15 12:54             ` Antonio Quartulli
  0 siblings, 1 reply; 111+ messages in thread
From: Sabrina Dubroca @ 2024-05-15 10:19 UTC (permalink / raw
  To: Antonio Quartulli
  Cc: netdev, Jakub Kicinski, Sergey Ryazanov, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Esben Haabendal

2024-05-15, 00:11:28 +0200, Antonio Quartulli wrote:
> On 14/05/2024 10:58, Sabrina Dubroca wrote:
> > > > The UDP code differentiates "socket already owned by this interface"
> > > > from "already taken by other user". That doesn't apply to TCP?
> > > 
> > > This makes me wonder: how safe it is to interpret the user data as an object
> > > of type ovpn_socket?
> > > 
> > > When we find the user data already assigned, we don't know what was really
> > > stored in there, right?
> > > Technically this socket could have gone through another module which
> > > assigned its own state.
> > > 
> > > Therefore I think that what UDP does [ dereferencing ((struct ovpn_socket
> > > *)user_data)->ovpn ] is probably not safe. Would you agree?
> > 
> > Hmmm, yeah, I think you're right. If you checked encap_type ==
> > UDP_ENCAP_OVPNINUDP before (sk_prot for TCP), then you'd know it's
> > really your data. Basically call ovpn_from_udp_sock during attach if
> > you want to check something beyond EBUSY.
> 
> right. Maybe we can leave with simply reporting EBUSY and be done with it,
> without adding extra checks and what not.

I don't know. What was the reason for the EALREADY handling in udp.c
and the corresponding refcount increase in ovpn_socket_new?


> > > > > +int __init ovpn_tcp_init(void)
> > > > > +{
> > > > > +	/* We need to substitute the recvmsg and the sock_is_readable
> > > > > +	 * callbacks in the sk_prot member of the sock object for TCP
> > > > > +	 * sockets.
> > > > > +	 *
> > > > > +	 * However sock->sk_prot is a pointer to a static variable and
> > > > > +	 * therefore we can't directly modify it, otherwise every socket
> > > > > +	 * pointing to it will be affected.
> > > > > +	 *
> > > > > +	 * For this reason we create our own static copy and modify what
> > > > > +	 * we need. Then we make sk_prot point to this copy
> > > > > +	 * (in ovpn_tcp_socket_attach())
> > > > > +	 */
> > > > > +	ovpn_tcp_prot = tcp_prot;
> > > > 
> > > > Don't you need a separate variant for IPv6, like TLS does?
> > > 
> > > Never did so far.
> > > 
> > > My wild wild wild guess: for the time this socket is owned by ovpn, we only
> > > use callbacks that are IPvX agnostic, hence v4 vs v6 doesn't make any
> > > difference.
> > > When this socket is released, we reassigned the original prot.
> > 
> > That seems a bit suspicious to me. For example, tcpv6_prot has a
> > different backlog_rcv. And you don't control if the socket is detached
> > before being closed, or which callbacks are needed. Your userspace
> > client doesn't use them, but someone else's might.
> > 
> > > > > +	ovpn_tcp_prot.recvmsg = ovpn_tcp_recvmsg;
> > > > 
> > > > You don't need to replace ->sendmsg as well? The userspace client is
> > > > not expected to send messages?
> > > 
> > > It is, but my assumption is that those packets will just go through the
> > > socket as usual. No need to be handled by ovpn (those packets are not
> > > encrypted/decrypted, like data traffic is).
> > > And this is how it has worked so far.
> > > 
> > > Makes sense?
> > 
> > Two things come to mind:
> > 
> > - userspace is expected to prefix the messages it inserts on the
> >    stream with the 2-byte length field? otherwise, the peer won't be
> >    able to parse them out of the stream
> 
> correct. userspace sends those packets as if ovpn is not running, therefore
> this happens naturally.

ok.


> > - I'm not convinced this would be safe wrt kernel writing partial
> >    messages. if ovpn_tcp_send_one doesn't send the full message, you
> >    could interleave two messages:
> > 
> >    +------+-------------------+------+--------+----------------+
> >    | len1 | (bytes from msg1) | len2 | (msg2) | (rest of msg1) |
> >    +------+-------------------+------+--------+----------------+
> > 
> >    and the RX side would parse that as:
> > 
> >    +------+-----------------------------------+------+---------
> >    | len1 | (bytes from msg1) | len2 | (msg2) | ???? | ...
> >    +------+-------------------+---------------+------+---------
> > 
> >    and try to interpret some random bytes out of either msg1 or msg2 as
> >    a length prefix, resulting in a broken stream.
> 
> hm you are correct. if multiple sendmsg can overlap, then we might be in
> troubles, but are we sure this can truly happen?

What would prevent this? The kernel_sendmsg call in ovpn_tcp_send_one
could send a partial message, and then what would stop userspace from
sending its own message during the cond_resched from ovpn_tcp_tx_work?

> > The stream format looks identical to ESP in TCP [1] (2B length prefix
> > followed by the actual message), so I think the espintcp code (both tx
> > and rx, except for actual protocol parsing) should look very
> > similar. The problems that need to be solved for both protocols are
> > pretty much the same.
> 
> ok, will have a look. maybe this will simplify the code even more and we
> will get rid of some of the issues we were discussing above.

I doubt dealing with possible interleaving will make the code simpler,
but I think it has to be done.

-- 
Sabrina


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 13/24] ovpn: implement TCP transport
  2024-05-15 10:19           ` Sabrina Dubroca
@ 2024-05-15 12:54             ` Antonio Quartulli
  2024-05-15 14:55               ` Sabrina Dubroca
  0 siblings, 1 reply; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-15 12:54 UTC (permalink / raw
  To: Sabrina Dubroca
  Cc: netdev, Jakub Kicinski, Sergey Ryazanov, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Esben Haabendal

On 15/05/2024 12:19, Sabrina Dubroca wrote:
> 2024-05-15, 00:11:28 +0200, Antonio Quartulli wrote:
>> On 14/05/2024 10:58, Sabrina Dubroca wrote:
>>>>> The UDP code differentiates "socket already owned by this interface"
>>>>> from "already taken by other user". That doesn't apply to TCP?
>>>>
>>>> This makes me wonder: how safe it is to interpret the user data as an object
>>>> of type ovpn_socket?
>>>>
>>>> When we find the user data already assigned, we don't know what was really
>>>> stored in there, right?
>>>> Technically this socket could have gone through another module which
>>>> assigned its own state.
>>>>
>>>> Therefore I think that what UDP does [ dereferencing ((struct ovpn_socket
>>>> *)user_data)->ovpn ] is probably not safe. Would you agree?
>>>
>>> Hmmm, yeah, I think you're right. If you checked encap_type ==
>>> UDP_ENCAP_OVPNINUDP before (sk_prot for TCP), then you'd know it's
>>> really your data. Basically call ovpn_from_udp_sock during attach if
>>> you want to check something beyond EBUSY.
>>
>> right. Maybe we can leave with simply reporting EBUSY and be done with it,
>> without adding extra checks and what not.
> 
> I don't know. What was the reason for the EALREADY handling in udp.c
> and the corresponding refcount increase in ovpn_socket_new?

it's just me that likes to be verbose when doing error reporting.
But eventually the exact error is ignored and we release the reference. 
 From netlink.c:

342                 peer->sock = ovpn_socket_new(sock, peer);
343                 if (IS_ERR(peer->sock)) {
344                         sockfd_put(sock);
345                         peer->sock = NULL;
346                         ret = -ENOTSOCK;

so no added value in distinguishing the two cases.

> 
> 
>>>>>> +int __init ovpn_tcp_init(void)
>>>>>> +{
>>>>>> +	/* We need to substitute the recvmsg and the sock_is_readable
>>>>>> +	 * callbacks in the sk_prot member of the sock object for TCP
>>>>>> +	 * sockets.
>>>>>> +	 *
>>>>>> +	 * However sock->sk_prot is a pointer to a static variable and
>>>>>> +	 * therefore we can't directly modify it, otherwise every socket
>>>>>> +	 * pointing to it will be affected.
>>>>>> +	 *
>>>>>> +	 * For this reason we create our own static copy and modify what
>>>>>> +	 * we need. Then we make sk_prot point to this copy
>>>>>> +	 * (in ovpn_tcp_socket_attach())
>>>>>> +	 */
>>>>>> +	ovpn_tcp_prot = tcp_prot;
>>>>>
>>>>> Don't you need a separate variant for IPv6, like TLS does?
>>>>
>>>> Never did so far.
>>>>
>>>> My wild wild wild guess: for the time this socket is owned by ovpn, we only
>>>> use callbacks that are IPvX agnostic, hence v4 vs v6 doesn't make any
>>>> difference.
>>>> When this socket is released, we reassigned the original prot.
>>>
>>> That seems a bit suspicious to me. For example, tcpv6_prot has a
>>> different backlog_rcv. And you don't control if the socket is detached
>>> before being closed, or which callbacks are needed. Your userspace
>>> client doesn't use them, but someone else's might.
>>>
>>>>>> +	ovpn_tcp_prot.recvmsg = ovpn_tcp_recvmsg;
>>>>>
>>>>> You don't need to replace ->sendmsg as well? The userspace client is
>>>>> not expected to send messages?
>>>>
>>>> It is, but my assumption is that those packets will just go through the
>>>> socket as usual. No need to be handled by ovpn (those packets are not
>>>> encrypted/decrypted, like data traffic is).
>>>> And this is how it has worked so far.
>>>>
>>>> Makes sense?
>>>
>>> Two things come to mind:
>>>
>>> - userspace is expected to prefix the messages it inserts on the
>>>     stream with the 2-byte length field? otherwise, the peer won't be
>>>     able to parse them out of the stream
>>
>> correct. userspace sends those packets as if ovpn is not running, therefore
>> this happens naturally.
> 
> ok.
> 
> 
>>> - I'm not convinced this would be safe wrt kernel writing partial
>>>     messages. if ovpn_tcp_send_one doesn't send the full message, you
>>>     could interleave two messages:
>>>
>>>     +------+-------------------+------+--------+----------------+
>>>     | len1 | (bytes from msg1) | len2 | (msg2) | (rest of msg1) |
>>>     +------+-------------------+------+--------+----------------+
>>>
>>>     and the RX side would parse that as:
>>>
>>>     +------+-----------------------------------+------+---------
>>>     | len1 | (bytes from msg1) | len2 | (msg2) | ???? | ...
>>>     +------+-------------------+---------------+------+---------
>>>
>>>     and try to interpret some random bytes out of either msg1 or msg2 as
>>>     a length prefix, resulting in a broken stream.
>>
>> hm you are correct. if multiple sendmsg can overlap, then we might be in
>> troubles, but are we sure this can truly happen?
> 
> What would prevent this? The kernel_sendmsg call in ovpn_tcp_send_one
> could send a partial message, and then what would stop userspace from
> sending its own message during the cond_resched from ovpn_tcp_tx_work?

I was under the impression that ovpn_tcp_send_one() would always send an 
entire packet, but this may not be the case. So you're definitely right.

We may end up having interleaving sendmsg from kernelspace and userspace.

> 
>>> The stream format looks identical to ESP in TCP [1] (2B length prefix
>>> followed by the actual message), so I think the espintcp code (both tx
>>> and rx, except for actual protocol parsing) should look very
>>> similar. The problems that need to be solved for both protocols are
>>> pretty much the same.
>>
>> ok, will have a look. maybe this will simplify the code even more and we
>> will get rid of some of the issues we were discussing above.
> 
> I doubt dealing with possible interleaving will make the code simpler,
> but I think it has to be done.

Yap.

Thanks a lot for pointing this out and for the pointers you gave me.

> 

-- 
Antonio Quartulli
OpenVPN Inc.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 13/24] ovpn: implement TCP transport
  2024-05-15 12:54             ` Antonio Quartulli
@ 2024-05-15 14:55               ` Sabrina Dubroca
  2024-05-15 19:44                 ` Antonio Quartulli
  0 siblings, 1 reply; 111+ messages in thread
From: Sabrina Dubroca @ 2024-05-15 14:55 UTC (permalink / raw
  To: Antonio Quartulli
  Cc: netdev, Jakub Kicinski, Sergey Ryazanov, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Esben Haabendal

2024-05-15, 14:54:49 +0200, Antonio Quartulli wrote:
> On 15/05/2024 12:19, Sabrina Dubroca wrote:
> > 2024-05-15, 00:11:28 +0200, Antonio Quartulli wrote:
> > > On 14/05/2024 10:58, Sabrina Dubroca wrote:
> > > > > > The UDP code differentiates "socket already owned by this interface"
> > > > > > from "already taken by other user". That doesn't apply to TCP?
> > > > > 
> > > > > This makes me wonder: how safe it is to interpret the user data as an object
> > > > > of type ovpn_socket?
> > > > > 
> > > > > When we find the user data already assigned, we don't know what was really
> > > > > stored in there, right?
> > > > > Technically this socket could have gone through another module which
> > > > > assigned its own state.
> > > > > 
> > > > > Therefore I think that what UDP does [ dereferencing ((struct ovpn_socket
> > > > > *)user_data)->ovpn ] is probably not safe. Would you agree?
> > > > 
> > > > Hmmm, yeah, I think you're right. If you checked encap_type ==
> > > > UDP_ENCAP_OVPNINUDP before (sk_prot for TCP), then you'd know it's
> > > > really your data. Basically call ovpn_from_udp_sock during attach if
> > > > you want to check something beyond EBUSY.
> > > 
> > > right. Maybe we can leave with simply reporting EBUSY and be done with it,
> > > without adding extra checks and what not.
> > 
> > I don't know. What was the reason for the EALREADY handling in udp.c
> > and the corresponding refcount increase in ovpn_socket_new?
> 
> it's just me that likes to be verbose when doing error reporting.

With the "already owned by this interface" message? Sure, I get that.

> But eventually the exact error is ignored and we release the reference. From
> netlink.c:
> 
> 342                 peer->sock = ovpn_socket_new(sock, peer);
> 343                 if (IS_ERR(peer->sock)) {
> 344                         sockfd_put(sock);
> 345                         peer->sock = NULL;
> 346                         ret = -ENOTSOCK;
> 
> so no added value in distinguishing the two cases.

But ovpn_socket_new currently turns EALREADY into a valid result, so
we won't go through the error hanadling here. That's the part I'm
unclear about.

-- 
Sabrina


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 13/24] ovpn: implement TCP transport
  2024-05-15 14:55               ` Sabrina Dubroca
@ 2024-05-15 19:44                 ` Antonio Quartulli
  2024-05-15 20:35                   ` Sabrina Dubroca
  0 siblings, 1 reply; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-15 19:44 UTC (permalink / raw
  To: Sabrina Dubroca
  Cc: netdev, Jakub Kicinski, Sergey Ryazanov, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Esben Haabendal

On 15/05/2024 16:55, Sabrina Dubroca wrote:
> 2024-05-15, 14:54:49 +0200, Antonio Quartulli wrote:
>> On 15/05/2024 12:19, Sabrina Dubroca wrote:
>>> 2024-05-15, 00:11:28 +0200, Antonio Quartulli wrote:
>>>> On 14/05/2024 10:58, Sabrina Dubroca wrote:
>>>>>>> The UDP code differentiates "socket already owned by this interface"
>>>>>>> from "already taken by other user". That doesn't apply to TCP?
>>>>>>
>>>>>> This makes me wonder: how safe it is to interpret the user data as an object
>>>>>> of type ovpn_socket?
>>>>>>
>>>>>> When we find the user data already assigned, we don't know what was really
>>>>>> stored in there, right?
>>>>>> Technically this socket could have gone through another module which
>>>>>> assigned its own state.
>>>>>>
>>>>>> Therefore I think that what UDP does [ dereferencing ((struct ovpn_socket
>>>>>> *)user_data)->ovpn ] is probably not safe. Would you agree?
>>>>>
>>>>> Hmmm, yeah, I think you're right. If you checked encap_type ==
>>>>> UDP_ENCAP_OVPNINUDP before (sk_prot for TCP), then you'd know it's
>>>>> really your data. Basically call ovpn_from_udp_sock during attach if
>>>>> you want to check something beyond EBUSY.
>>>>
>>>> right. Maybe we can leave with simply reporting EBUSY and be done with it,
>>>> without adding extra checks and what not.
>>>
>>> I don't know. What was the reason for the EALREADY handling in udp.c
>>> and the corresponding refcount increase in ovpn_socket_new?
>>
>> it's just me that likes to be verbose when doing error reporting.
> 
> With the "already owned by this interface" message? Sure, I get that.
> 
>> But eventually the exact error is ignored and we release the reference. From
>> netlink.c:
>>
>> 342                 peer->sock = ovpn_socket_new(sock, peer);
>> 343                 if (IS_ERR(peer->sock)) {
>> 344                         sockfd_put(sock);
>> 345                         peer->sock = NULL;
>> 346                         ret = -ENOTSOCK;
>>
>> so no added value in distinguishing the two cases.
> 
> But ovpn_socket_new currently turns EALREADY into a valid result, so
> we won't go through the error hanadling here. That's the part I'm
> unclear about.

you're right. I had forgotten a little but important detail.

With UDP OpenVPN creates one socket and uses it for all peers.
With TCP we forcefully need one socket per client.

Consequently, when a UDP socket is found to be used by our own instance, 
  we can happily increase the refcounter and use it as if it was free 
(we are just attaching it to yet another peer).

In TCP this is not possible, so the socket must be unused, otherwise we 
can't attach it.

I hope it makes sense.

> 

-- 
Antonio Quartulli
OpenVPN Inc.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 13/24] ovpn: implement TCP transport
  2024-05-15 19:44                 ` Antonio Quartulli
@ 2024-05-15 20:35                   ` Sabrina Dubroca
  2024-05-15 20:39                     ` Antonio Quartulli
  0 siblings, 1 reply; 111+ messages in thread
From: Sabrina Dubroca @ 2024-05-15 20:35 UTC (permalink / raw
  To: Antonio Quartulli
  Cc: netdev, Jakub Kicinski, Sergey Ryazanov, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Esben Haabendal

2024-05-15, 21:44:44 +0200, Antonio Quartulli wrote:
> On 15/05/2024 16:55, Sabrina Dubroca wrote:
> > 2024-05-15, 14:54:49 +0200, Antonio Quartulli wrote:
> > > On 15/05/2024 12:19, Sabrina Dubroca wrote:
> > > > 2024-05-15, 00:11:28 +0200, Antonio Quartulli wrote:
> > > > > On 14/05/2024 10:58, Sabrina Dubroca wrote:
> > > > > > > > The UDP code differentiates "socket already owned by this interface"
> > > > > > > > from "already taken by other user". That doesn't apply to TCP?
> > > > > > > 
> > > > > > > This makes me wonder: how safe it is to interpret the user data as an object
> > > > > > > of type ovpn_socket?
> > > > > > > 
> > > > > > > When we find the user data already assigned, we don't know what was really
> > > > > > > stored in there, right?
> > > > > > > Technically this socket could have gone through another module which
> > > > > > > assigned its own state.
> > > > > > > 
> > > > > > > Therefore I think that what UDP does [ dereferencing ((struct ovpn_socket
> > > > > > > *)user_data)->ovpn ] is probably not safe. Would you agree?
> > > > > > 
> > > > > > Hmmm, yeah, I think you're right. If you checked encap_type ==
> > > > > > UDP_ENCAP_OVPNINUDP before (sk_prot for TCP), then you'd know it's
> > > > > > really your data. Basically call ovpn_from_udp_sock during attach if
> > > > > > you want to check something beyond EBUSY.
> > > > > 
> > > > > right. Maybe we can leave with simply reporting EBUSY and be done with it,
> > > > > without adding extra checks and what not.
> > > > 
> > > > I don't know. What was the reason for the EALREADY handling in udp.c
> > > > and the corresponding refcount increase in ovpn_socket_new?
> > > 
> > > it's just me that likes to be verbose when doing error reporting.
> > 
> > With the "already owned by this interface" message? Sure, I get that.
> > 
> > > But eventually the exact error is ignored and we release the reference. From
> > > netlink.c:
> > > 
> > > 342                 peer->sock = ovpn_socket_new(sock, peer);
> > > 343                 if (IS_ERR(peer->sock)) {
> > > 344                         sockfd_put(sock);
> > > 345                         peer->sock = NULL;
> > > 346                         ret = -ENOTSOCK;
> > > 
> > > so no added value in distinguishing the two cases.
> > 
> > But ovpn_socket_new currently turns EALREADY into a valid result, so
> > we won't go through the error hanadling here. That's the part I'm
> > unclear about.
> 
> you're right. I had forgotten a little but important detail.
> 
> With UDP OpenVPN creates one socket and uses it for all peers.
> With TCP we forcefully need one socket per client.
> 
> Consequently, when a UDP socket is found to be used by our own instance,  we
> can happily increase the refcounter and use it as if it was free (we are
> just attaching it to yet another peer).
> 
> In TCP this is not possible, so the socket must be unused, otherwise we
> can't attach it.
> 
> I hope it makes sense.

Yes, thanks. This behavior should be documented (for example, by
putting exactly what you just wrote in a comment above
ovpn_socket_new).

So for TCP you just need the existing check and EBUSY return. For UDP,
you need the EALREADY check, but with an extra encap_type test before
looking at the contents of the sk_user_data.

-- 
Sabrina


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 13/24] ovpn: implement TCP transport
  2024-05-15 20:35                   ` Sabrina Dubroca
@ 2024-05-15 20:39                     ` Antonio Quartulli
  0 siblings, 0 replies; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-15 20:39 UTC (permalink / raw
  To: Sabrina Dubroca
  Cc: netdev, Jakub Kicinski, Sergey Ryazanov, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Esben Haabendal

On 15/05/2024 22:35, Sabrina Dubroca wrote:
> 2024-05-15, 21:44:44 +0200, Antonio Quartulli wrote:
>> On 15/05/2024 16:55, Sabrina Dubroca wrote:
>>> 2024-05-15, 14:54:49 +0200, Antonio Quartulli wrote:
>>>> On 15/05/2024 12:19, Sabrina Dubroca wrote:
>>>>> 2024-05-15, 00:11:28 +0200, Antonio Quartulli wrote:
>>>>>> On 14/05/2024 10:58, Sabrina Dubroca wrote:
>>>>>>>>> The UDP code differentiates "socket already owned by this interface"
>>>>>>>>> from "already taken by other user". That doesn't apply to TCP?
>>>>>>>>
>>>>>>>> This makes me wonder: how safe it is to interpret the user data as an object
>>>>>>>> of type ovpn_socket?
>>>>>>>>
>>>>>>>> When we find the user data already assigned, we don't know what was really
>>>>>>>> stored in there, right?
>>>>>>>> Technically this socket could have gone through another module which
>>>>>>>> assigned its own state.
>>>>>>>>
>>>>>>>> Therefore I think that what UDP does [ dereferencing ((struct ovpn_socket
>>>>>>>> *)user_data)->ovpn ] is probably not safe. Would you agree?
>>>>>>>
>>>>>>> Hmmm, yeah, I think you're right. If you checked encap_type ==
>>>>>>> UDP_ENCAP_OVPNINUDP before (sk_prot for TCP), then you'd know it's
>>>>>>> really your data. Basically call ovpn_from_udp_sock during attach if
>>>>>>> you want to check something beyond EBUSY.
>>>>>>
>>>>>> right. Maybe we can leave with simply reporting EBUSY and be done with it,
>>>>>> without adding extra checks and what not.
>>>>>
>>>>> I don't know. What was the reason for the EALREADY handling in udp.c
>>>>> and the corresponding refcount increase in ovpn_socket_new?
>>>>
>>>> it's just me that likes to be verbose when doing error reporting.
>>>
>>> With the "already owned by this interface" message? Sure, I get that.
>>>
>>>> But eventually the exact error is ignored and we release the reference. From
>>>> netlink.c:
>>>>
>>>> 342                 peer->sock = ovpn_socket_new(sock, peer);
>>>> 343                 if (IS_ERR(peer->sock)) {
>>>> 344                         sockfd_put(sock);
>>>> 345                         peer->sock = NULL;
>>>> 346                         ret = -ENOTSOCK;
>>>>
>>>> so no added value in distinguishing the two cases.
>>>
>>> But ovpn_socket_new currently turns EALREADY into a valid result, so
>>> we won't go through the error hanadling here. That's the part I'm
>>> unclear about.
>>
>> you're right. I had forgotten a little but important detail.
>>
>> With UDP OpenVPN creates one socket and uses it for all peers.
>> With TCP we forcefully need one socket per client.
>>
>> Consequently, when a UDP socket is found to be used by our own instance,  we
>> can happily increase the refcounter and use it as if it was free (we are
>> just attaching it to yet another peer).
>>
>> In TCP this is not possible, so the socket must be unused, otherwise we
>> can't attach it.
>>
>> I hope it makes sense.
> 
> Yes, thanks. This behavior should be documented (for example, by
> putting exactly what you just wrote in a comment above
> ovpn_socket_new).

absolutely, will do.

> 
> So for TCP you just need the existing check and EBUSY return. For UDP,
> you need the EALREADY check, but with an extra encap_type test before
> looking at the contents of the sk_user_data.

ACK

-- 
Antonio Quartulli
OpenVPN Inc.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 11/24] ovpn: implement packet processing
  2024-05-12  8:46   ` Sabrina Dubroca
  2024-05-13  7:14     ` Antonio Quartulli
@ 2024-05-22 14:08     ` Antonio Quartulli
  2024-05-22 14:28       ` Andrew Lunn
  1 sibling, 1 reply; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-22 14:08 UTC (permalink / raw
  To: Sabrina Dubroca, Andrew Lunn
  Cc: netdev, Jakub Kicinski, Sergey Ryazanov, Paolo Abeni,
	Eric Dumazet, Esben Haabendal

On 12/05/2024 10:46, Sabrina Dubroca wrote:
> 2024-05-06, 03:16:24 +0200, Antonio Quartulli wrote:
>> diff --git a/drivers/net/ovpn/bind.c b/drivers/net/ovpn/bind.c
>> index c1f842c06e32..7240d1036fb7 100644
>> --- a/drivers/net/ovpn/bind.c
>> +++ b/drivers/net/ovpn/bind.c
>> @@ -13,6 +13,7 @@
>>   #include "ovpnstruct.h"
>>   #include "io.h"
>>   #include "bind.h"
>> +#include "packet.h"
>>   #include "peer.h"
> 
> You have a few hunks like that in this patch, adding an include to a
> file that is otherwise not being modified. That's odd.

I just went through this and there is a reason for these extra includes.

Basically this patch is modifying peer.h so that it now requires 
packet.h as dependency.

To reduce the includes complexity I am adding as many includes as 
possible to .c files only, therefore the dependency needs to appear in 
every .c file including peer.h, rather than adding the include to peer.h 
itself.

This was my interpretation of Andrew Lunn's suggestion, but I may have 
got it too extreme.

Opinions?

Regards,


-- 
Antonio Quartulli
OpenVPN Inc.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 11/24] ovpn: implement packet processing
  2024-05-22 14:08     ` Antonio Quartulli
@ 2024-05-22 14:28       ` Andrew Lunn
  0 siblings, 0 replies; 111+ messages in thread
From: Andrew Lunn @ 2024-05-22 14:28 UTC (permalink / raw
  To: Antonio Quartulli
  Cc: Sabrina Dubroca, netdev, Jakub Kicinski, Sergey Ryazanov,
	Paolo Abeni, Eric Dumazet, Esben Haabendal

On Wed, May 22, 2024 at 04:08:44PM +0200, Antonio Quartulli wrote:
> On 12/05/2024 10:46, Sabrina Dubroca wrote:
> > 2024-05-06, 03:16:24 +0200, Antonio Quartulli wrote:
> > > diff --git a/drivers/net/ovpn/bind.c b/drivers/net/ovpn/bind.c
> > > index c1f842c06e32..7240d1036fb7 100644
> > > --- a/drivers/net/ovpn/bind.c
> > > +++ b/drivers/net/ovpn/bind.c
> > > @@ -13,6 +13,7 @@
> > >   #include "ovpnstruct.h"
> > >   #include "io.h"
> > >   #include "bind.h"
> > > +#include "packet.h"
> > >   #include "peer.h"
> > 
> > You have a few hunks like that in this patch, adding an include to a
> > file that is otherwise not being modified. That's odd.
> 
> I just went through this and there is a reason for these extra includes.
> 
> Basically this patch is modifying peer.h so that it now requires packet.h as
> dependency.
> 
> To reduce the includes complexity I am adding as many includes as possible
> to .c files only, therefore the dependency needs to appear in every .c file
> including peer.h, rather than adding the include to peer.h itself.
> 
> This was my interpretation of Andrew Lunn's suggestion, but I may have got
> it too extreme.

It becomes an issue when adding one include pulls in 10s to 100s of
other includes, 99% of which are not needed and just slows down the
compile. With our own local headers, this is probably not going to
happen.

Try using

make foo/bar/foobar.i

which will run cpp on foobar.c and produce foobar.i. You can then see
the effects of the additional include. If it is minimum, you don't
need to care much, and could always include it.

     Andrew


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 14/24] ovpn: implement multi-peer support
  2024-05-06  1:16 ` [PATCH net-next v3 14/24] ovpn: implement multi-peer support Antonio Quartulli
@ 2024-05-28 14:44   ` Sabrina Dubroca
  2024-05-28 19:41     ` Antonio Quartulli
  0 siblings, 1 reply; 111+ messages in thread
From: Sabrina Dubroca @ 2024-05-28 14:44 UTC (permalink / raw
  To: Antonio Quartulli
  Cc: netdev, Jakub Kicinski, Sergey Ryazanov, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Esben Haabendal

Hi Antonio, I took a little break but I'm looking at your patches
again now.

2024-05-06, 03:16:27 +0200, Antonio Quartulli wrote:
> diff --git a/drivers/net/ovpn/ovpnstruct.h b/drivers/net/ovpn/ovpnstruct.h
> index 7414c2459fb9..58166fdeac63 100644
> --- a/drivers/net/ovpn/ovpnstruct.h
> +++ b/drivers/net/ovpn/ovpnstruct.h
> @@ -31,6 +35,12 @@ struct ovpn_struct {
>  	spinlock_t lock; /* protect writing to the ovpn_struct object */
>  	struct workqueue_struct *crypto_wq;
>  	struct workqueue_struct *events_wq;
> +	struct {
> +		DECLARE_HASHTABLE(by_id, 12);
> +		DECLARE_HASHTABLE(by_transp_addr, 12);
> +		DECLARE_HASHTABLE(by_vpn_addr, 12);

Those are really big. I guess for large servers they make sense, but
you're making clients hold 98kB in memory that they're not going to use.

Maybe they could be dynamically sized, but I think struct peers should
be allocated on demand (only for mode == MP) if you want this size.

> +		spinlock_t lock; /* protects writes to peers tables */
> +	} peers;
>  	struct ovpn_peer __rcu *peer;
>  	struct list_head dev_list;
>  };
> diff --git a/drivers/net/ovpn/peer.c b/drivers/net/ovpn/peer.c
> index 99a2ae42a332..38a89595dade 100644
> --- a/drivers/net/ovpn/peer.c
> +++ b/drivers/net/ovpn/peer.c
> @@ -361,6 +362,91 @@ struct ovpn_peer *ovpn_peer_get_by_src(struct ovpn_struct *ovpn,
>  	return peer;
>  }
>  
> +/**
> + * ovpn_peer_add_mp - add per to related tables in a MP instance
                             ^
                             s/per/peer/

> + * @ovpn: the instance to add the peer to
> + * @peer: the peer to add
> + *
> + * Return: 0 on success or a negative error code otherwise
> + */
> +static int ovpn_peer_add_mp(struct ovpn_struct *ovpn, struct ovpn_peer *peer)
> +{
[...]
> +	index = ovpn_peer_index(ovpn->peers.by_id, &peer->id, sizeof(peer->id));
> +	hlist_add_head_rcu(&peer->hash_entry_id, &ovpn->peers.by_id[index]);
> +
> +	if (peer->vpn_addrs.ipv4.s_addr != htonl(INADDR_ANY)) {
> +		index = ovpn_peer_index(ovpn->peers.by_vpn_addr,
> +					&peer->vpn_addrs.ipv4,
> +					sizeof(peer->vpn_addrs.ipv4));
> +		hlist_add_head_rcu(&peer->hash_entry_addr4,
> +				   &ovpn->peers.by_vpn_addr[index]);
> +	}
> +
> +	hlist_del_init_rcu(&peer->hash_entry_addr6);

Why are hash_entry_transp_addr and hash_entry_addr6 getting a
hlist_del_init_rcu() call, but not hash_entry_id and hash_entry_addr4?

> +	if (memcmp(&peer->vpn_addrs.ipv6, &in6addr_any,
> +		   sizeof(peer->vpn_addrs.ipv6))) {

!ipv6_addr_any(&peer->vpn_addrs.ipv6)

> +		index = ovpn_peer_index(ovpn->peers.by_vpn_addr,
> +					&peer->vpn_addrs.ipv6,
> +					sizeof(peer->vpn_addrs.ipv6));
> +		hlist_add_head_rcu(&peer->hash_entry_addr6,
> +				   &ovpn->peers.by_vpn_addr[index]);
> +	}
> +

-- 
Sabrina


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 15/24] ovpn: implement peer lookup logic
  2024-05-06  1:16 ` [PATCH net-next v3 15/24] ovpn: implement peer lookup logic Antonio Quartulli
@ 2024-05-28 16:42   ` Sabrina Dubroca
  2024-05-28 20:09     ` Antonio Quartulli
  0 siblings, 1 reply; 111+ messages in thread
From: Sabrina Dubroca @ 2024-05-28 16:42 UTC (permalink / raw
  To: Antonio Quartulli
  Cc: netdev, Jakub Kicinski, Sergey Ryazanov, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Esben Haabendal

2024-05-06, 03:16:28 +0200, Antonio Quartulli wrote:
> +static struct in6_addr ovpn_nexthop_from_skb6(struct sk_buff *skb)
> +{
> +	struct rt6_info *rt = (struct rt6_info *)skb_rtable(skb);

skb_rt6_info?

> +
> +	if (!rt || !(rt->rt6i_flags & RTF_GATEWAY))
> +		return ipv6_hdr(skb)->daddr;
> +
> +	return rt->rt6i_gateway;
> +}
> +
> +/**
> + * ovpn_peer_get_by_vpn_addr4 - retrieve peer by its VPN IPv4 address
> + * @head: list head to search
> + * @addr: VPN IPv4 to use as search key
> + *
> + * Return: the peer if found or NULL otherwise

The doc for all those ovpn_peer_get_* functions could indicate that on
success, a reference on the peer is held.


[...]
> +static struct ovpn_peer *ovpn_peer_get_by_vpn_addr6(struct hlist_head *head,
> +						    struct in6_addr *addr)
> +{
> +	struct ovpn_peer *tmp, *peer = NULL;
> +	int i;
> +
> +	rcu_read_lock();
> +	hlist_for_each_entry_rcu(tmp, head, hash_entry_addr6) {
> +		for (i = 0; i < 4; i++) {
> +			if (addr->s6_addr32[i] !=
> +			    tmp->vpn_addrs.ipv6.s6_addr32[i])
> +				continue;
> +		}

ipv6_addr_equal

[...]
> +	default:
> +		return NULL;
> +	}
> +
> +	index = ovpn_peer_index(ovpn->peers.by_transp_addr, &ss, sa_len);
> +	head = &ovpn->peers.by_transp_addr[index];

Maybe worth adding a get_bucket helper (with a better name :)) instead
of ovpn_peer_index, since all uses of ovpn_peer_index are followed by
a "head = TBL[index]" (or direct use in some hlist iterator), but the
index itself is not used later on, only the bucket.

> +
> +	rcu_read_lock();
> +	hlist_for_each_entry_rcu(tmp, head, hash_entry_transp_addr) {
> +		found = ovpn_peer_transp_match(tmp, &ss);
> +		if (!found)

nit: call ovpn_peer_transp_match directly and drop the found variable

> +			continue;
> +
> +		if (!ovpn_peer_hold(tmp))
> +			continue;
> +
> +		peer = tmp;
> +		break;
> +	}
> +	rcu_read_unlock();
>  
>  	return peer;
>  }
> @@ -303,10 +427,28 @@ static struct ovpn_peer *ovpn_peer_get_by_id_p2p(struct ovpn_struct *ovpn,
>  
>  struct ovpn_peer *ovpn_peer_get_by_id(struct ovpn_struct *ovpn, u32 peer_id)
>  {
> -	struct ovpn_peer *peer = NULL;
> +	struct ovpn_peer *tmp, *peer = NULL;
> +	struct hlist_head *head;
> +	u32 index;
>  
>  	if (ovpn->mode == OVPN_MODE_P2P)
> -		peer = ovpn_peer_get_by_id_p2p(ovpn, peer_id);
> +		return ovpn_peer_get_by_id_p2p(ovpn, peer_id);
> +
> +	index = ovpn_peer_index(ovpn->peers.by_id, &peer_id, sizeof(peer_id));
> +	head = &ovpn->peers.by_id[index];
> +
> +	rcu_read_lock();
> +	hlist_for_each_entry_rcu(tmp, head, hash_entry_id) {
> +		if (tmp->id != peer_id)
> +			continue;
> +
> +		if (!ovpn_peer_hold(tmp))
> +			continue;

Can there ever be multiple peers with the same id? (ie, is it worth
continuing the loop if this fails? the same question probably applies
to ovpn_peer_get_by_transp_addr as well)


> +		peer = tmp;
> +		break;
> +	}
> +	rcu_read_unlock();
>  
>  	return peer;
>  }
> @@ -328,6 +470,11 @@ struct ovpn_peer *ovpn_peer_get_by_dst(struct ovpn_struct *ovpn,
>  				       struct sk_buff *skb)
>  {
>  	struct ovpn_peer *tmp, *peer = NULL;
> +	struct hlist_head *head;
> +	sa_family_t sa_fam;
> +	struct in6_addr addr6;
> +	__be32 addr4;
> +	u32 index;
>  
>  	/* in P2P mode, no matter the destination, packets are always sent to
>  	 * the single peer listening on the other side
> @@ -338,15 +485,123 @@ struct ovpn_peer *ovpn_peer_get_by_dst(struct ovpn_struct *ovpn,
>  		if (likely(tmp && ovpn_peer_hold(tmp)))
>  			peer = tmp;
>  		rcu_read_unlock();
> +		return peer;
> +	}
> +
> +	sa_fam = skb_protocol_to_family(skb);
> +
> +	switch (sa_fam) {
> +	case AF_INET:
> +		addr4 = ovpn_nexthop_from_skb4(skb);
> +		index = ovpn_peer_index(ovpn->peers.by_vpn_addr, &addr4,
> +					sizeof(addr4));
> +		head = &ovpn->peers.by_vpn_addr[index];
> +
> +		peer = ovpn_peer_get_by_vpn_addr4(head, &addr4);
> +		break;
> +	case AF_INET6:
> +		addr6 = ovpn_nexthop_from_skb6(skb);
> +		index = ovpn_peer_index(ovpn->peers.by_vpn_addr, &addr6,
> +					sizeof(addr6));
> +		head = &ovpn->peers.by_vpn_addr[index];
> +
> +		peer = ovpn_peer_get_by_vpn_addr6(head, &addr6);

The index -> head -> peer code is identical in get_by_dst and
get_by_src, it could be stuffed into ovpn_peer_get_by_vpn_addr{4,6}.

> +		break;
>  	}
>  
>  	return peer;
>  }


[snip the _rt4 variant, comments apply to both]
> +/**
> + * ovpn_nexthop_from_rt6 - look up the IPv6 nexthop for the given destination

I'm a bit confused by this talk about "destination" when those two
functions are then used with the source address from the packet, from
a function called "get_by_src".

> + * @ovpn: the private data representing the current VPN session
> + * @dst: the destination to be looked up
> + *
> + * Looks up in the IPv6 system routing table the IO of the nexthop to be used

"the IO"?

> + * to reach the destination passed as argument. IF no nexthop can be found, the
> + * destination itself is returned as it probably has to be used as nexthop.
> + *
> + * Return: the IP of the next hop if found or the dst itself otherwise

"the dst" tends to refer to a dst_entry, maybe "or @dst otherwise"?
(though I'm not sure that's valid kdoc)

(also for ovpn_nexthop_from_rt4)

> + */
> +static struct in6_addr ovpn_nexthop_from_rt6(struct ovpn_struct *ovpn,
> +					     struct in6_addr dst)
> +{
> +#if IS_ENABLED(CONFIG_IPV6)
> +	struct dst_entry *entry;
> +	struct rt6_info *rt;
> +	struct flowi6 fl = {
> +		.daddr = dst,
> +	};
> +
> +	entry = ipv6_stub->ipv6_dst_lookup_flow(dev_net(ovpn->dev), NULL, &fl,
> +						NULL);
> +	if (IS_ERR(entry)) {
> +		net_dbg_ratelimited("%s: no route to host %pI6c\n", __func__,
> +				    &dst);
> +		/* if we end up here this packet is probably going to be
> +		 * thrown away later
> +		 */
> +		return dst;
> +	}
> +
> +	rt = container_of(entry, struct rt6_info, dst);

dst_rt6_info(entry)

> +
> +	if (!(rt->rt6i_flags & RTF_GATEWAY))
> +		goto out;
> +
> +	dst = rt->rt6i_gateway;
> +out:
> +	dst_release((struct dst_entry *)rt);
> +#endif
> +	return dst;
> +}
> +
>  struct ovpn_peer *ovpn_peer_get_by_src(struct ovpn_struct *ovpn,
>  				       struct sk_buff *skb)
>  {
>  	struct ovpn_peer *tmp, *peer = NULL;
> +	struct hlist_head *head;
> +	sa_family_t sa_fam;
> +	struct in6_addr addr6;
> +	__be32 addr4;
> +	u32 index;
>  
>  	/* in P2P mode, no matter the destination, packets are always sent to
>  	 * the single peer listening on the other side
> @@ -357,6 +612,28 @@ struct ovpn_peer *ovpn_peer_get_by_src(struct ovpn_struct *ovpn,
>  		if (likely(tmp && ovpn_peer_hold(tmp)))
>  			peer = tmp;
>  		rcu_read_unlock();
> +		return peer;
> +	}
> +
> +	sa_fam = skb_protocol_to_family(skb);
> +
> +	switch (sa_fam) {

nit:
	switch (skb_protocol_to_family(skb))
seems a bit more readable to me (also in ovpn_peer_get_by_dst) - and
saves you from reverse xmas tree complaints (sa_fam should have been
after addr6)

> +	case AF_INET:
> +		addr4 = ovpn_nexthop_from_rt4(ovpn, ip_hdr(skb)->saddr);
> +		index = ovpn_peer_index(ovpn->peers.by_vpn_addr, &addr4,
> +					sizeof(addr4));
> +		head = &ovpn->peers.by_vpn_addr[index];
> +
> +		peer = ovpn_peer_get_by_vpn_addr4(head, &addr4);
> +		break;
> +	case AF_INET6:
> +		addr6 = ovpn_nexthop_from_rt6(ovpn, ipv6_hdr(skb)->saddr);
> +		index = ovpn_peer_index(ovpn->peers.by_vpn_addr, &addr6,
> +					sizeof(addr6));
> +		head = &ovpn->peers.by_vpn_addr[index];
> +
> +		peer = ovpn_peer_get_by_vpn_addr6(head, &addr6);
> +		break;
>  	}
>  
>  	return peer;
> -- 
> 2.43.2
> 
> 

-- 
Sabrina


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 14/24] ovpn: implement multi-peer support
  2024-05-28 14:44   ` Sabrina Dubroca
@ 2024-05-28 19:41     ` Antonio Quartulli
  2024-05-29 15:16       ` Sabrina Dubroca
  0 siblings, 1 reply; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-28 19:41 UTC (permalink / raw
  To: Sabrina Dubroca
  Cc: netdev, Jakub Kicinski, Sergey Ryazanov, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Esben Haabendal

On 28/05/2024 16:44, Sabrina Dubroca wrote:
> Hi Antonio, I took a little break but I'm looking at your patches
> again now.

Thanks Sabrina! Meanwhile I have been working on all your suggested changes.
Right now I am familiarizing with the strparser.

> 
> 2024-05-06, 03:16:27 +0200, Antonio Quartulli wrote:
>> diff --git a/drivers/net/ovpn/ovpnstruct.h b/drivers/net/ovpn/ovpnstruct.h
>> index 7414c2459fb9..58166fdeac63 100644
>> --- a/drivers/net/ovpn/ovpnstruct.h
>> +++ b/drivers/net/ovpn/ovpnstruct.h
>> @@ -31,6 +35,12 @@ struct ovpn_struct {
>>   	spinlock_t lock; /* protect writing to the ovpn_struct object */
>>   	struct workqueue_struct *crypto_wq;
>>   	struct workqueue_struct *events_wq;
>> +	struct {
>> +		DECLARE_HASHTABLE(by_id, 12);
>> +		DECLARE_HASHTABLE(by_transp_addr, 12);
>> +		DECLARE_HASHTABLE(by_vpn_addr, 12);
> 
> Those are really big. I guess for large servers they make sense, but
> you're making clients hold 98kB in memory that they're not going to use.

Right - for clients it doesn't make sense.

> 
> Maybe they could be dynamically sized, but I think struct peers should
> be allocated on demand (only for mode == MP) if you want this size.

Yeah, makes sense. I'll allocate it dynamically then.

> 
>> +		spinlock_t lock; /* protects writes to peers tables */
>> +	} peers;
>>   	struct ovpn_peer __rcu *peer;
>>   	struct list_head dev_list;
>>   };
>> diff --git a/drivers/net/ovpn/peer.c b/drivers/net/ovpn/peer.c
>> index 99a2ae42a332..38a89595dade 100644
>> --- a/drivers/net/ovpn/peer.c
>> +++ b/drivers/net/ovpn/peer.c
>> @@ -361,6 +362,91 @@ struct ovpn_peer *ovpn_peer_get_by_src(struct ovpn_struct *ovpn,
>>   	return peer;
>>   }
>>   
>> +/**
>> + * ovpn_peer_add_mp - add per to related tables in a MP instance
>                               ^
>                               s/per/peer/

ACK

> 
>> + * @ovpn: the instance to add the peer to
>> + * @peer: the peer to add
>> + *
>> + * Return: 0 on success or a negative error code otherwise
>> + */
>> +static int ovpn_peer_add_mp(struct ovpn_struct *ovpn, struct ovpn_peer *peer)
>> +{
> [...]
>> +	index = ovpn_peer_index(ovpn->peers.by_id, &peer->id, sizeof(peer->id));
>> +	hlist_add_head_rcu(&peer->hash_entry_id, &ovpn->peers.by_id[index]);
>> +
>> +	if (peer->vpn_addrs.ipv4.s_addr != htonl(INADDR_ANY)) {
>> +		index = ovpn_peer_index(ovpn->peers.by_vpn_addr,
>> +					&peer->vpn_addrs.ipv4,
>> +					sizeof(peer->vpn_addrs.ipv4));
>> +		hlist_add_head_rcu(&peer->hash_entry_addr4,
>> +				   &ovpn->peers.by_vpn_addr[index]);
>> +	}
>> +
>> +	hlist_del_init_rcu(&peer->hash_entry_addr6);
> 
> Why are hash_entry_transp_addr and hash_entry_addr6 getting a
> hlist_del_init_rcu() call, but not hash_entry_id and hash_entry_addr4?

I think not calling del_init_rcu on hash_entry_addr4 was a mistake.

Calling del_init_rcu on addr4, addr6 and transp_addr is needed to put 
them in a known state in case they are not hashed.

While hash_entry_id always goes through hlist_add_head_rcu, therefore 
del_init_rcu is useless (to my understanding).

> 
>> +	if (memcmp(&peer->vpn_addrs.ipv6, &in6addr_any,
>> +		   sizeof(peer->vpn_addrs.ipv6))) {
> 
> !ipv6_addr_any(&peer->vpn_addrs.ipv6)

ACK

> 
>> +		index = ovpn_peer_index(ovpn->peers.by_vpn_addr,
>> +					&peer->vpn_addrs.ipv6,
>> +					sizeof(peer->vpn_addrs.ipv6));
>> +		hlist_add_head_rcu(&peer->hash_entry_addr6,
>> +				   &ovpn->peers.by_vpn_addr[index]);
>> +	}
>> +
> 

-- 
Antonio Quartulli
OpenVPN Inc.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 15/24] ovpn: implement peer lookup logic
  2024-05-28 16:42   ` Sabrina Dubroca
@ 2024-05-28 20:09     ` Antonio Quartulli
  2024-05-29 16:42       ` Sabrina Dubroca
  0 siblings, 1 reply; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-28 20:09 UTC (permalink / raw
  To: Sabrina Dubroca
  Cc: netdev, Jakub Kicinski, Sergey Ryazanov, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Esben Haabendal

On 28/05/2024 18:42, Sabrina Dubroca wrote:
> 2024-05-06, 03:16:28 +0200, Antonio Quartulli wrote:
>> +static struct in6_addr ovpn_nexthop_from_skb6(struct sk_buff *skb)
>> +{
>> +	struct rt6_info *rt = (struct rt6_info *)skb_rtable(skb);
> 
> skb_rt6_info?

Yes! I have been looking for this guy all over the place in 
sk_buff.h....it was just in another header :) thanks!

> 
>> +
>> +	if (!rt || !(rt->rt6i_flags & RTF_GATEWAY))
>> +		return ipv6_hdr(skb)->daddr;
>> +
>> +	return rt->rt6i_gateway;
>> +}
>> +
>> +/**
>> + * ovpn_peer_get_by_vpn_addr4 - retrieve peer by its VPN IPv4 address
>> + * @head: list head to search
>> + * @addr: VPN IPv4 to use as search key
>> + *
>> + * Return: the peer if found or NULL otherwise
> 
> The doc for all those ovpn_peer_get_* functions could indicate that on
> success, a reference on the peer is held.

ACK

> 
> 
> [...]
>> +static struct ovpn_peer *ovpn_peer_get_by_vpn_addr6(struct hlist_head *head,
>> +						    struct in6_addr *addr)
>> +{
>> +	struct ovpn_peer *tmp, *peer = NULL;
>> +	int i;
>> +
>> +	rcu_read_lock();
>> +	hlist_for_each_entry_rcu(tmp, head, hash_entry_addr6) {
>> +		for (i = 0; i < 4; i++) {
>> +			if (addr->s6_addr32[i] !=
>> +			    tmp->vpn_addrs.ipv6.s6_addr32[i])
>> +				continue;
>> +		}
> 
> ipv6_addr_equal

Thanks

> 
> [...]
>> +	default:
>> +		return NULL;
>> +	}
>> +
>> +	index = ovpn_peer_index(ovpn->peers.by_transp_addr, &ss, sa_len);
>> +	head = &ovpn->peers.by_transp_addr[index];
> 
> Maybe worth adding a get_bucket helper (with a better name :)) instead
> of ovpn_peer_index, since all uses of ovpn_peer_index are followed by
> a "head = TBL[index]" (or direct use in some hlist iterator), but the
> index itself is not used later on, only the bucket.

yup, good idea

> 
>> +
>> +	rcu_read_lock();
>> +	hlist_for_each_entry_rcu(tmp, head, hash_entry_transp_addr) {
>> +		found = ovpn_peer_transp_match(tmp, &ss);
>> +		if (!found)
> 
> nit: call ovpn_peer_transp_match directly and drop the found variable

ACK.
I presume it's a leftover from the past, otherwise it wouldn't make much 
sense.

> 
>> +			continue;
>> +
>> +		if (!ovpn_peer_hold(tmp))
>> +			continue;
>> +
>> +		peer = tmp;
>> +		break;
>> +	}
>> +	rcu_read_unlock();
>>   
>>   	return peer;
>>   }
>> @@ -303,10 +427,28 @@ static struct ovpn_peer *ovpn_peer_get_by_id_p2p(struct ovpn_struct *ovpn,
>>   
>>   struct ovpn_peer *ovpn_peer_get_by_id(struct ovpn_struct *ovpn, u32 peer_id)
>>   {
>> -	struct ovpn_peer *peer = NULL;
>> +	struct ovpn_peer *tmp, *peer = NULL;
>> +	struct hlist_head *head;
>> +	u32 index;
>>   
>>   	if (ovpn->mode == OVPN_MODE_P2P)
>> -		peer = ovpn_peer_get_by_id_p2p(ovpn, peer_id);
>> +		return ovpn_peer_get_by_id_p2p(ovpn, peer_id);
>> +
>> +	index = ovpn_peer_index(ovpn->peers.by_id, &peer_id, sizeof(peer_id));
>> +	head = &ovpn->peers.by_id[index];
>> +
>> +	rcu_read_lock();
>> +	hlist_for_each_entry_rcu(tmp, head, hash_entry_id) {
>> +		if (tmp->id != peer_id)
>> +			continue;
>> +
>> +		if (!ovpn_peer_hold(tmp))
>> +			continue;
> 
> Can there ever be multiple peers with the same id? (ie, is it worth
> continuing the loop if this fails? the same question probably applies
> to ovpn_peer_get_by_transp_addr as well)

Well, not at the same time, but theoretically we could re-use the ID of 
a peer that is being released (i.e. still in the list but refcnt at 0) 
because it won't be returned by this lookup.

This said, I truly believe it's impossible for a peer to have refcnt 0 
and still being in the list:
Either
* delete on the peer was not yet called, thus peer is in the list and 
the last reference wasn't yet dropped
* delete on the peer was called, thus peer cannot be in the list anymore 
and refcnt may or may not be 0...


> 
> 
>> +		peer = tmp;
>> +		break;
>> +	}
>> +	rcu_read_unlock();
>>   
>>   	return peer;
>>   }
>> @@ -328,6 +470,11 @@ struct ovpn_peer *ovpn_peer_get_by_dst(struct ovpn_struct *ovpn,
>>   				       struct sk_buff *skb)
>>   {
>>   	struct ovpn_peer *tmp, *peer = NULL;
>> +	struct hlist_head *head;
>> +	sa_family_t sa_fam;
>> +	struct in6_addr addr6;
>> +	__be32 addr4;
>> +	u32 index;
>>   
>>   	/* in P2P mode, no matter the destination, packets are always sent to
>>   	 * the single peer listening on the other side
>> @@ -338,15 +485,123 @@ struct ovpn_peer *ovpn_peer_get_by_dst(struct ovpn_struct *ovpn,
>>   		if (likely(tmp && ovpn_peer_hold(tmp)))
>>   			peer = tmp;
>>   		rcu_read_unlock();
>> +		return peer;
>> +	}
>> +
>> +	sa_fam = skb_protocol_to_family(skb);
>> +
>> +	switch (sa_fam) {
>> +	case AF_INET:
>> +		addr4 = ovpn_nexthop_from_skb4(skb);
>> +		index = ovpn_peer_index(ovpn->peers.by_vpn_addr, &addr4,
>> +					sizeof(addr4));
>> +		head = &ovpn->peers.by_vpn_addr[index];
>> +
>> +		peer = ovpn_peer_get_by_vpn_addr4(head, &addr4);
>> +		break;
>> +	case AF_INET6:
>> +		addr6 = ovpn_nexthop_from_skb6(skb);
>> +		index = ovpn_peer_index(ovpn->peers.by_vpn_addr, &addr6,
>> +					sizeof(addr6));
>> +		head = &ovpn->peers.by_vpn_addr[index];
>> +
>> +		peer = ovpn_peer_get_by_vpn_addr6(head, &addr6);
> 
> The index -> head -> peer code is identical in get_by_dst and
> get_by_src, it could be stuffed into ovpn_peer_get_by_vpn_addr{4,6}.

hm yeah, you're right. I'll do it!

> 
>> +		break;
>>   	}
>>   
>>   	return peer;
>>   }
> 
> 
> [snip the _rt4 variant, comments apply to both]
>> +/**
>> + * ovpn_nexthop_from_rt6 - look up the IPv6 nexthop for the given destination
> 
> I'm a bit confused by this talk about "destination" when those two
> functions are then used with the source address from the packet, from
> a function called "get_by_src".

well, in my brain a next hop can exists only when I want to reach a 
certain destination. Therefore, at a low level, the terms nextop and 
destination always need to go hand in hand.

This said, when implementing RPF (Reverse Path Filtering) I need to 
imagine that I want to route to the source IP of the incoming packet. If 
the nexthop I looked up matches the peer the packet came from, then 
everything is fine.

makes sense?

[FTR I have already renamed/changed get_by_src into check_by_src, 
because I don't need to truly extract a peer and get a reference, but I 
only need to perform the aforementioned comparison.]

> 
>> + * @ovpn: the private data representing the current VPN session
>> + * @dst: the destination to be looked up
>> + *
>> + * Looks up in the IPv6 system routing table the IO of the nexthop to be used
> 
> "the IO"?

typ0: "the IP"

> 
>> + * to reach the destination passed as argument. IF no nexthop can be found, the
>> + * destination itself is returned as it probably has to be used as nexthop.
>> + *
>> + * Return: the IP of the next hop if found or the dst itself otherwise
> 
> "the dst" tends to refer to a dst_entry, maybe "or @dst otherwise"?

it refers to @dst (the function argument). That's basically the case 
where the destination is "onlink" and thus it is the nexthop (basically 
the destination is the connected peer).

> (though I'm not sure that's valid kdoc)
> 
> (also for ovpn_nexthop_from_rt4)
> 
>> + */
>> +static struct in6_addr ovpn_nexthop_from_rt6(struct ovpn_struct *ovpn,
>> +					     struct in6_addr dst)
>> +{
>> +#if IS_ENABLED(CONFIG_IPV6)
>> +	struct dst_entry *entry;
>> +	struct rt6_info *rt;
>> +	struct flowi6 fl = {
>> +		.daddr = dst,
>> +	};
>> +
>> +	entry = ipv6_stub->ipv6_dst_lookup_flow(dev_net(ovpn->dev), NULL, &fl,
>> +						NULL);
>> +	if (IS_ERR(entry)) {
>> +		net_dbg_ratelimited("%s: no route to host %pI6c\n", __func__,
>> +				    &dst);
>> +		/* if we end up here this packet is probably going to be
>> +		 * thrown away later
>> +		 */
>> +		return dst;
>> +	}
>> +
>> +	rt = container_of(entry, struct rt6_info, dst);
> 
> dst_rt6_info(entry)

Oh, I see this just came to life in 6.10-rc1. Thanks!

> 
>> +
>> +	if (!(rt->rt6i_flags & RTF_GATEWAY))
>> +		goto out;
>> +
>> +	dst = rt->rt6i_gateway;
>> +out:
>> +	dst_release((struct dst_entry *)rt);
>> +#endif
>> +	return dst;
>> +}
>> +
>>   struct ovpn_peer *ovpn_peer_get_by_src(struct ovpn_struct *ovpn,
>>   				       struct sk_buff *skb)
>>   {
>>   	struct ovpn_peer *tmp, *peer = NULL;
>> +	struct hlist_head *head;
>> +	sa_family_t sa_fam;
>> +	struct in6_addr addr6;
>> +	__be32 addr4;
>> +	u32 index;
>>   
>>   	/* in P2P mode, no matter the destination, packets are always sent to
>>   	 * the single peer listening on the other side
>> @@ -357,6 +612,28 @@ struct ovpn_peer *ovpn_peer_get_by_src(struct ovpn_struct *ovpn,
>>   		if (likely(tmp && ovpn_peer_hold(tmp)))
>>   			peer = tmp;
>>   		rcu_read_unlock();
>> +		return peer;
>> +	}
>> +
>> +	sa_fam = skb_protocol_to_family(skb);
>> +
>> +	switch (sa_fam) {
> 
> nit:
> 	switch (skb_protocol_to_family(skb))
> seems a bit more readable to me (also in ovpn_peer_get_by_dst) - and
> saves you from reverse xmas tree complaints (sa_fam should have been
> after addr6)

ACK, thanks!

> 
>> +	case AF_INET:
>> +		addr4 = ovpn_nexthop_from_rt4(ovpn, ip_hdr(skb)->saddr);
>> +		index = ovpn_peer_index(ovpn->peers.by_vpn_addr, &addr4,
>> +					sizeof(addr4));
>> +		head = &ovpn->peers.by_vpn_addr[index];
>> +
>> +		peer = ovpn_peer_get_by_vpn_addr4(head, &addr4);
>> +		break;
>> +	case AF_INET6:
>> +		addr6 = ovpn_nexthop_from_rt6(ovpn, ipv6_hdr(skb)->saddr);
>> +		index = ovpn_peer_index(ovpn->peers.by_vpn_addr, &addr6,
>> +					sizeof(addr6));
>> +		head = &ovpn->peers.by_vpn_addr[index];
>> +
>> +		peer = ovpn_peer_get_by_vpn_addr6(head, &addr6);
>> +		break;
>>   	}
>>   
>>   	return peer;
>> -- 
>> 2.43.2
>>
>>
> 

-- 
Antonio Quartulli
OpenVPN Inc.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 14/24] ovpn: implement multi-peer support
  2024-05-28 19:41     ` Antonio Quartulli
@ 2024-05-29 15:16       ` Sabrina Dubroca
  2024-05-29 20:15         ` Antonio Quartulli
  0 siblings, 1 reply; 111+ messages in thread
From: Sabrina Dubroca @ 2024-05-29 15:16 UTC (permalink / raw
  To: Antonio Quartulli
  Cc: netdev, Jakub Kicinski, Sergey Ryazanov, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Esben Haabendal

2024-05-28, 21:41:15 +0200, Antonio Quartulli wrote:
> On 28/05/2024 16:44, Sabrina Dubroca wrote:
> > Hi Antonio, I took a little break but I'm looking at your patches
> > again now.
> 
> Thanks Sabrina! Meanwhile I have been working on all your suggested changes.
> Right now I am familiarizing with the strparser.

Cool :)

> > 2024-05-06, 03:16:27 +0200, Antonio Quartulli wrote:
> > > +	index = ovpn_peer_index(ovpn->peers.by_id, &peer->id, sizeof(peer->id));
> > > +	hlist_add_head_rcu(&peer->hash_entry_id, &ovpn->peers.by_id[index]);
> > > +
> > > +	if (peer->vpn_addrs.ipv4.s_addr != htonl(INADDR_ANY)) {
> > > +		index = ovpn_peer_index(ovpn->peers.by_vpn_addr,
> > > +					&peer->vpn_addrs.ipv4,
> > > +					sizeof(peer->vpn_addrs.ipv4));
> > > +		hlist_add_head_rcu(&peer->hash_entry_addr4,
> > > +				   &ovpn->peers.by_vpn_addr[index]);
> > > +	}
> > > +
> > > +	hlist_del_init_rcu(&peer->hash_entry_addr6);
> > 
> > Why are hash_entry_transp_addr and hash_entry_addr6 getting a
> > hlist_del_init_rcu() call, but not hash_entry_id and hash_entry_addr4?
> 
> I think not calling del_init_rcu on hash_entry_addr4 was a mistake.
> 
> Calling del_init_rcu on addr4, addr6 and transp_addr is needed to put them
> in a known state in case they are not hashed.

hlist_del_init_rcu does nothing if node is not already on a list.

> While hash_entry_id always goes through hlist_add_head_rcu, therefore
> del_init_rcu is useless (to my understanding).

I'm probably missing something about how this all fits together. In
patch 19, I see ovpn_nl_set_peer_doit can re-add a peer that is
already added (but I'm not sure why, since you don't allow changing
the addresses, so it won't actually be re-hashed).

I don't think doing a 2nd add of the same element to peers.by_id (or
any of the other hashtables) is correct, so I'd say you need
hlist_del_init_rcu for all of them.

-- 
Sabrina


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 15/24] ovpn: implement peer lookup logic
  2024-05-28 20:09     ` Antonio Quartulli
@ 2024-05-29 16:42       ` Sabrina Dubroca
  2024-05-29 20:19         ` Antonio Quartulli
  0 siblings, 1 reply; 111+ messages in thread
From: Sabrina Dubroca @ 2024-05-29 16:42 UTC (permalink / raw
  To: Antonio Quartulli
  Cc: netdev, Jakub Kicinski, Sergey Ryazanov, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Esben Haabendal

2024-05-28, 22:09:37 +0200, Antonio Quartulli wrote:
> On 28/05/2024 18:42, Sabrina Dubroca wrote:
> > 2024-05-06, 03:16:28 +0200, Antonio Quartulli wrote:
> > > @@ -303,10 +427,28 @@ static struct ovpn_peer *ovpn_peer_get_by_id_p2p(struct ovpn_struct *ovpn,
> > >   struct ovpn_peer *ovpn_peer_get_by_id(struct ovpn_struct *ovpn, u32 peer_id)
> > >   {
> > > -	struct ovpn_peer *peer = NULL;
> > > +	struct ovpn_peer *tmp, *peer = NULL;
> > > +	struct hlist_head *head;
> > > +	u32 index;
> > >   	if (ovpn->mode == OVPN_MODE_P2P)
> > > -		peer = ovpn_peer_get_by_id_p2p(ovpn, peer_id);
> > > +		return ovpn_peer_get_by_id_p2p(ovpn, peer_id);
> > > +
> > > +	index = ovpn_peer_index(ovpn->peers.by_id, &peer_id, sizeof(peer_id));
> > > +	head = &ovpn->peers.by_id[index];
> > > +
> > > +	rcu_read_lock();
> > > +	hlist_for_each_entry_rcu(tmp, head, hash_entry_id) {
> > > +		if (tmp->id != peer_id)
> > > +			continue;
> > > +
> > > +		if (!ovpn_peer_hold(tmp))
> > > +			continue;
> > 
> > Can there ever be multiple peers with the same id? (ie, is it worth
> > continuing the loop if this fails? the same question probably applies
> > to ovpn_peer_get_by_transp_addr as well)
> 
> Well, not at the same time, but theoretically we could re-use the ID of a
> peer that is being released (i.e. still in the list but refcnt at 0) because
> it won't be returned by this lookup.
> 
> This said, I truly believe it's impossible for a peer to have refcnt 0 and
> still being in the list:
> Either
> * delete on the peer was not yet called, thus peer is in the list and the
> last reference wasn't yet dropped
> * delete on the peer was called, thus peer cannot be in the list anymore and
> refcnt may or may not be 0...

Ok, thanks. Let's just keep this code.


> > > +/**
> > > + * ovpn_nexthop_from_rt6 - look up the IPv6 nexthop for the given destination
> > 
> > I'm a bit confused by this talk about "destination" when those two
> > functions are then used with the source address from the packet, from
> > a function called "get_by_src".
> 
> well, in my brain a next hop can exists only when I want to reach a certain
> destination. Therefore, at a low level, the terms nextop and destination
> always need to go hand in hand.
> 
> This said, when implementing RPF (Reverse Path Filtering) I need to imagine
> that I want to route to the source IP of the incoming packet. If the nexthop
> I looked up matches the peer the packet came from, then everything is fine.
> 
> makes sense?

Yeah, that's fair.

> 
> [FTR I have already renamed/changed get_by_src into check_by_src, because I
> don't need to truly extract a peer and get a reference, but I only need to
> perform the aforementioned comparison.]

Ok.

> > > + * @ovpn: the private data representing the current VPN session
> > > + * @dst: the destination to be looked up
> > > + *
> > > + * Looks up in the IPv6 system routing table the IO of the nexthop to be used
> > 
> > "the IO"?
> 
> typ0: "the IP"
> 
> > 
> > > + * to reach the destination passed as argument. IF no nexthop can be found, the
> > > + * destination itself is returned as it probably has to be used as nexthop.
> > > + *
> > > + * Return: the IP of the next hop if found or the dst itself otherwise
> > 
> > "the dst" tends to refer to a dst_entry, maybe "or @dst otherwise"?
> 
> it refers to @dst (the function argument). That's basically the case where
> the destination is "onlink" and thus it is the nexthop (basically the
> destination is the connected peer).

I understand that, it's just the wording "the dst" that I'm
nitpicking. s/dst/addr/ would help easily-confused people like me (for
both "the dst" and my confusion with source vs destination in
caller/callee), but I can live with this.

-- 
Sabrina


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 14/24] ovpn: implement multi-peer support
  2024-05-29 15:16       ` Sabrina Dubroca
@ 2024-05-29 20:15         ` Antonio Quartulli
  2024-05-29 20:45           ` Sabrina Dubroca
  0 siblings, 1 reply; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-29 20:15 UTC (permalink / raw
  To: Sabrina Dubroca
  Cc: netdev, Jakub Kicinski, Sergey Ryazanov, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Esben Haabendal

On 29/05/2024 17:16, Sabrina Dubroca wrote:
> 2024-05-28, 21:41:15 +0200, Antonio Quartulli wrote:
>> On 28/05/2024 16:44, Sabrina Dubroca wrote:
>>> Hi Antonio, I took a little break but I'm looking at your patches
>>> again now.
>>
>> Thanks Sabrina! Meanwhile I have been working on all your suggested changes.
>> Right now I am familiarizing with the strparser.
> 
> Cool :)
> 
>>> 2024-05-06, 03:16:27 +0200, Antonio Quartulli wrote:
>>>> +	index = ovpn_peer_index(ovpn->peers.by_id, &peer->id, sizeof(peer->id));
>>>> +	hlist_add_head_rcu(&peer->hash_entry_id, &ovpn->peers.by_id[index]);
>>>> +
>>>> +	if (peer->vpn_addrs.ipv4.s_addr != htonl(INADDR_ANY)) {
>>>> +		index = ovpn_peer_index(ovpn->peers.by_vpn_addr,
>>>> +					&peer->vpn_addrs.ipv4,
>>>> +					sizeof(peer->vpn_addrs.ipv4));
>>>> +		hlist_add_head_rcu(&peer->hash_entry_addr4,
>>>> +				   &ovpn->peers.by_vpn_addr[index]);
>>>> +	}
>>>> +
>>>> +	hlist_del_init_rcu(&peer->hash_entry_addr6);
>>>
>>> Why are hash_entry_transp_addr and hash_entry_addr6 getting a
>>> hlist_del_init_rcu() call, but not hash_entry_id and hash_entry_addr4?
>>
>> I think not calling del_init_rcu on hash_entry_addr4 was a mistake.
>>
>> Calling del_init_rcu on addr4, addr6 and transp_addr is needed to put them
>> in a known state in case they are not hashed.
> 
> hlist_del_init_rcu does nothing if node is not already on a list.

Mh you're right. I must have got confused for some reason.
Those del_init_rcu can go then.

> 
>> While hash_entry_id always goes through hlist_add_head_rcu, therefore
>> del_init_rcu is useless (to my understanding).
> 
> I'm probably missing something about how this all fits together. In
> patch 19, I see ovpn_nl_set_peer_doit can re-add a peer that is
> already added (but I'm not sure why, since you don't allow changing
> the addresses, so it won't actually be re-hashed).

Actually it's not a "re-add", but the intent is to "update" a peer that 
already exists. However, some fields are forbidden from being updated, 
like the address.

[NOTE: I found some issue with the "peer update" logic in 
ovpn_nl_set_peer_doit and it's being changed a bit]

> 
> I don't think doing a 2nd add of the same element to peers.by_id (or
> any of the other hashtables) is correct, so I'd say you need
> hlist_del_init_rcu for all of them.

This is exactly the bug I mentioned above: we should not go through the 
add again. Ideally we should just update the fields and be done with it, 
without re-hashing the object.

I hope it makes sense.

Cheers,

> 

-- 
Antonio Quartulli
OpenVPN Inc.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 15/24] ovpn: implement peer lookup logic
  2024-05-29 16:42       ` Sabrina Dubroca
@ 2024-05-29 20:19         ` Antonio Quartulli
  0 siblings, 0 replies; 111+ messages in thread
From: Antonio Quartulli @ 2024-05-29 20:19 UTC (permalink / raw
  To: Sabrina Dubroca
  Cc: netdev, Jakub Kicinski, Sergey Ryazanov, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Esben Haabendal

On 29/05/2024 18:42, Sabrina Dubroca wrote:
> 2024-05-28, 22:09:37 +0200, Antonio Quartulli wrote:
>> On 28/05/2024 18:42, Sabrina Dubroca wrote:
>>> 2024-05-06, 03:16:28 +0200, Antonio Quartulli wrote:
>>>> @@ -303,10 +427,28 @@ static struct ovpn_peer *ovpn_peer_get_by_id_p2p(struct ovpn_struct *ovpn,
>>>>    struct ovpn_peer *ovpn_peer_get_by_id(struct ovpn_struct *ovpn, u32 peer_id)
>>>>    {
>>>> -	struct ovpn_peer *peer = NULL;
>>>> +	struct ovpn_peer *tmp, *peer = NULL;
>>>> +	struct hlist_head *head;
>>>> +	u32 index;
>>>>    	if (ovpn->mode == OVPN_MODE_P2P)
>>>> -		peer = ovpn_peer_get_by_id_p2p(ovpn, peer_id);
>>>> +		return ovpn_peer_get_by_id_p2p(ovpn, peer_id);
>>>> +
>>>> +	index = ovpn_peer_index(ovpn->peers.by_id, &peer_id, sizeof(peer_id));
>>>> +	head = &ovpn->peers.by_id[index];
>>>> +
>>>> +	rcu_read_lock();
>>>> +	hlist_for_each_entry_rcu(tmp, head, hash_entry_id) {
>>>> +		if (tmp->id != peer_id)
>>>> +			continue;
>>>> +
>>>> +		if (!ovpn_peer_hold(tmp))
>>>> +			continue;
>>>
>>> Can there ever be multiple peers with the same id? (ie, is it worth
>>> continuing the loop if this fails? the same question probably applies
>>> to ovpn_peer_get_by_transp_addr as well)
>>
>> Well, not at the same time, but theoretically we could re-use the ID of a
>> peer that is being released (i.e. still in the list but refcnt at 0) because
>> it won't be returned by this lookup.
>>
>> This said, I truly believe it's impossible for a peer to have refcnt 0 and
>> still being in the list:
>> Either
>> * delete on the peer was not yet called, thus peer is in the list and the
>> last reference wasn't yet dropped
>> * delete on the peer was called, thus peer cannot be in the list anymore and
>> refcnt may or may not be 0...
> 
> Ok, thanks. Let's just keep this code.

ok

> 
> 
>>>> +/**
>>>> + * ovpn_nexthop_from_rt6 - look up the IPv6 nexthop for the given destination
>>>
>>> I'm a bit confused by this talk about "destination" when those two
>>> functions are then used with the source address from the packet, from
>>> a function called "get_by_src".
>>
>> well, in my brain a next hop can exists only when I want to reach a certain
>> destination. Therefore, at a low level, the terms nextop and destination
>> always need to go hand in hand.
>>
>> This said, when implementing RPF (Reverse Path Filtering) I need to imagine
>> that I want to route to the source IP of the incoming packet. If the nexthop
>> I looked up matches the peer the packet came from, then everything is fine.
>>
>> makes sense?
> 
> Yeah, that's fair.
> 
>>
>> [FTR I have already renamed/changed get_by_src into check_by_src, because I
>> don't need to truly extract a peer and get a reference, but I only need to
>> perform the aforementioned comparison.]
> 
> Ok.
> 
>>>> + * @ovpn: the private data representing the current VPN session
>>>> + * @dst: the destination to be looked up
>>>> + *
>>>> + * Looks up in the IPv6 system routing table the IO of the nexthop to be used
>>>
>>> "the IO"?
>>
>> typ0: "the IP"
>>
>>>
>>>> + * to reach the destination passed as argument. IF no nexthop can be found, the
>>>> + * destination itself is returned as it probably has to be used as nexthop.
>>>> + *
>>>> + * Return: the IP of the next hop if found or the dst itself otherwise
>>>
>>> "the dst" tends to refer to a dst_entry, maybe "or @dst otherwise"?
>>
>> it refers to @dst (the function argument). That's basically the case where
>> the destination is "onlink" and thus it is the nexthop (basically the
>> destination is the connected peer).
> 
> I understand that, it's just the wording "the dst" that I'm
> nitpicking. s/dst/addr/ would help easily-confused people like me (for
> both "the dst" and my confusion with source vs destination in
> caller/callee), but I can live with this.

Oh ok, now I understand your concern.
I will reword this part a bit and add a comment in the caller to clarify 
why we invoke nexthop_from_rt4/6 passing the source address as param.

Cheers,

> 

-- 
Antonio Quartulli
OpenVPN Inc.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH net-next v3 14/24] ovpn: implement multi-peer support
  2024-05-29 20:15         ` Antonio Quartulli
@ 2024-05-29 20:45           ` Sabrina Dubroca
  0 siblings, 0 replies; 111+ messages in thread
From: Sabrina Dubroca @ 2024-05-29 20:45 UTC (permalink / raw
  To: Antonio Quartulli
  Cc: netdev, Jakub Kicinski, Sergey Ryazanov, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Esben Haabendal

2024-05-29, 22:15:27 +0200, Antonio Quartulli wrote:
> On 29/05/2024 17:16, Sabrina Dubroca wrote:
> > 2024-05-28, 21:41:15 +0200, Antonio Quartulli wrote:
> > > On 28/05/2024 16:44, Sabrina Dubroca wrote:
> > > > Hi Antonio, I took a little break but I'm looking at your patches
> > > > again now.
> > > 
> > > Thanks Sabrina! Meanwhile I have been working on all your suggested changes.
> > > Right now I am familiarizing with the strparser.
> > 
> > Cool :)
> > 
> > > > 2024-05-06, 03:16:27 +0200, Antonio Quartulli wrote:
> > > > > +	index = ovpn_peer_index(ovpn->peers.by_id, &peer->id, sizeof(peer->id));
> > > > > +	hlist_add_head_rcu(&peer->hash_entry_id, &ovpn->peers.by_id[index]);
> > > > > +
> > > > > +	if (peer->vpn_addrs.ipv4.s_addr != htonl(INADDR_ANY)) {
> > > > > +		index = ovpn_peer_index(ovpn->peers.by_vpn_addr,
> > > > > +					&peer->vpn_addrs.ipv4,
> > > > > +					sizeof(peer->vpn_addrs.ipv4));
> > > > > +		hlist_add_head_rcu(&peer->hash_entry_addr4,
> > > > > +				   &ovpn->peers.by_vpn_addr[index]);
> > > > > +	}
> > > > > +
> > > > > +	hlist_del_init_rcu(&peer->hash_entry_addr6);
> > > > 
> > > > Why are hash_entry_transp_addr and hash_entry_addr6 getting a
> > > > hlist_del_init_rcu() call, but not hash_entry_id and hash_entry_addr4?
> > > 
> > > I think not calling del_init_rcu on hash_entry_addr4 was a mistake.
> > > 
> > > Calling del_init_rcu on addr4, addr6 and transp_addr is needed to put them
> > > in a known state in case they are not hashed.
> > 
> > hlist_del_init_rcu does nothing if node is not already on a list.
> 
> Mh you're right. I must have got confused for some reason.
> Those del_init_rcu can go then.
> 
> > 
> > > While hash_entry_id always goes through hlist_add_head_rcu, therefore
> > > del_init_rcu is useless (to my understanding).
> > 
> > I'm probably missing something about how this all fits together. In
> > patch 19, I see ovpn_nl_set_peer_doit can re-add a peer that is
> > already added (but I'm not sure why, since you don't allow changing
> > the addresses, so it won't actually be re-hashed).
> 
> Actually it's not a "re-add", but the intent is to "update" a peer that
> already exists. However, some fields are forbidden from being updated, like
> the address.
> 
> [NOTE: I found some issue with the "peer update" logic in
> ovpn_nl_set_peer_doit and it's being changed a bit]
> 
> > 
> > I don't think doing a 2nd add of the same element to peers.by_id (or
> > any of the other hashtables) is correct, so I'd say you need
> > hlist_del_init_rcu for all of them.
> 
> This is exactly the bug I mentioned above: we should not go through the add
> again. Ideally we should just update the fields and be done with it, without
> re-hashing the object.

Ok, if you only call ovpn_peer_add for new peers, this looks fine and
the hlist_del_init_rcu can all be removed as you said.

Thanks.

-- 
Sabrina


^ permalink raw reply	[flat|nested] 111+ messages in thread

end of thread, other threads:[~2024-05-29 20:46 UTC | newest]

Thread overview: 111+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-06  1:16 [PATCH net-next v3 00/24] Introducing OpenVPN Data Channel Offload Antonio Quartulli
2024-05-06  1:16 ` [PATCH net-next v3 01/24] netlink: add NLA_POLICY_MAX_LEN macro Antonio Quartulli
2024-05-06  1:16 ` [PATCH net-next v3 02/24] net: introduce OpenVPN Data Channel Offload (ovpn) Antonio Quartulli
2024-05-06  1:16 ` [PATCH net-next v3 03/24] ovpn: add basic netlink support Antonio Quartulli
2024-05-08  0:10   ` Jakub Kicinski
2024-05-08  7:42     ` Antonio Quartulli
2024-05-08 14:42   ` Sabrina Dubroca
2024-05-08 14:51     ` Antonio Quartulli
2024-05-06  1:16 ` [PATCH net-next v3 04/24] ovpn: add basic interface creation/destruction/management routines Antonio Quartulli
2024-05-08  0:18   ` Jakub Kicinski
2024-05-08  7:53     ` Antonio Quartulli
2024-05-08 14:52   ` Sabrina Dubroca
2024-05-09  1:06     ` Jakub Kicinski
2024-05-09  8:25     ` Antonio Quartulli
2024-05-09 10:09       ` Sabrina Dubroca
2024-05-09 10:35         ` Antonio Quartulli
2024-05-09 12:16           ` Sabrina Dubroca
2024-05-09 13:25             ` Antonio Quartulli
2024-05-09 13:52               ` Sabrina Dubroca
2024-05-06  1:16 ` [PATCH net-next v3 05/24] ovpn: implement interface creation/destruction via netlink Antonio Quartulli
2024-05-08  0:21   ` Jakub Kicinski
2024-05-08  9:49     ` Antonio Quartulli
2024-05-09  1:09       ` Jakub Kicinski
2024-05-09  8:30         ` Antonio Quartulli
2024-05-06  1:16 ` [PATCH net-next v3 06/24] ovpn: keep carrier always on Antonio Quartulli
2024-05-06  1:16 ` [PATCH net-next v3 07/24] ovpn: introduce the ovpn_peer object Antonio Quartulli
2024-05-08 16:06   ` Sabrina Dubroca
2024-05-08 20:31     ` Antonio Quartulli
2024-05-09 13:04       ` Sabrina Dubroca
2024-05-09 13:24         ` Andrew Lunn
2024-05-10 18:57           ` Antonio Quartulli
2024-05-11  0:28             ` Jakub Kicinski
2024-05-09 13:44         ` Antonio Quartulli
2024-05-09 13:55           ` Andrew Lunn
2024-05-09 14:17           ` Sabrina Dubroca
2024-05-09 14:36             ` Antonio Quartulli
2024-05-09 14:53               ` Antonio Quartulli
2024-05-10 10:30                 ` Sabrina Dubroca
2024-05-10 12:34                   ` Antonio Quartulli
2024-05-10 14:11                     ` Sabrina Dubroca
2024-05-13 10:09   ` Simon Horman
2024-05-13 10:53     ` Antonio Quartulli
2024-05-13 15:04       ` Simon Horman
2024-05-06  1:16 ` [PATCH net-next v3 08/24] ovpn: introduce the ovpn_socket object Antonio Quartulli
2024-05-08 17:10   ` Sabrina Dubroca
2024-05-08 20:38     ` Antonio Quartulli
2024-05-09 13:32       ` Sabrina Dubroca
2024-05-09 13:46         ` Antonio Quartulli
2024-05-06  1:16 ` [PATCH net-next v3 09/24] ovpn: implement basic TX path (UDP) Antonio Quartulli
2024-05-10 13:01   ` Sabrina Dubroca
2024-05-10 13:39     ` Antonio Quartulli
2024-05-12 21:35   ` Sabrina Dubroca
2024-05-13  7:37     ` Antonio Quartulli
2024-05-13  9:36       ` Sabrina Dubroca
2024-05-13  9:47         ` Antonio Quartulli
2024-05-06  1:16 ` [PATCH net-next v3 10/24] ovpn: implement basic RX " Antonio Quartulli
2024-05-10 13:45   ` Sabrina Dubroca
2024-05-10 14:41     ` Antonio Quartulli
2024-05-06  1:16 ` [PATCH net-next v3 11/24] ovpn: implement packet processing Antonio Quartulli
2024-05-12  8:46   ` Sabrina Dubroca
2024-05-13  7:14     ` Antonio Quartulli
2024-05-13  9:24       ` Sabrina Dubroca
2024-05-13  9:31         ` Antonio Quartulli
2024-05-22 14:08     ` Antonio Quartulli
2024-05-22 14:28       ` Andrew Lunn
2024-05-06  1:16 ` [PATCH net-next v3 12/24] ovpn: store tunnel and transport statistics Antonio Quartulli
2024-05-12  8:47   ` Sabrina Dubroca
2024-05-13  7:25     ` Antonio Quartulli
2024-05-13  9:19       ` Sabrina Dubroca
2024-05-13  9:33         ` Antonio Quartulli
2024-05-06  1:16 ` [PATCH net-next v3 13/24] ovpn: implement TCP transport Antonio Quartulli
2024-05-13 13:37   ` Antonio Quartulli
2024-05-13 15:34     ` Jakub Kicinski
2024-05-13 14:50   ` Sabrina Dubroca
2024-05-13 22:20     ` Antonio Quartulli
2024-05-14  8:58       ` Sabrina Dubroca
2024-05-14 22:11         ` Antonio Quartulli
2024-05-15 10:19           ` Sabrina Dubroca
2024-05-15 12:54             ` Antonio Quartulli
2024-05-15 14:55               ` Sabrina Dubroca
2024-05-15 19:44                 ` Antonio Quartulli
2024-05-15 20:35                   ` Sabrina Dubroca
2024-05-15 20:39                     ` Antonio Quartulli
2024-05-06  1:16 ` [PATCH net-next v3 14/24] ovpn: implement multi-peer support Antonio Quartulli
2024-05-28 14:44   ` Sabrina Dubroca
2024-05-28 19:41     ` Antonio Quartulli
2024-05-29 15:16       ` Sabrina Dubroca
2024-05-29 20:15         ` Antonio Quartulli
2024-05-29 20:45           ` Sabrina Dubroca
2024-05-06  1:16 ` [PATCH net-next v3 15/24] ovpn: implement peer lookup logic Antonio Quartulli
2024-05-28 16:42   ` Sabrina Dubroca
2024-05-28 20:09     ` Antonio Quartulli
2024-05-29 16:42       ` Sabrina Dubroca
2024-05-29 20:19         ` Antonio Quartulli
2024-05-06  1:16 ` [PATCH net-next v3 16/24] ovpn: implement keepalive mechanism Antonio Quartulli
2024-05-06  1:16 ` [PATCH net-next v3 17/24] ovpn: add support for updating local UDP endpoint Antonio Quartulli
2024-05-06  1:16 ` [PATCH net-next v3 18/24] ovpn: add support for peer floating Antonio Quartulli
2024-05-06  1:16 ` [PATCH net-next v3 19/24] ovpn: implement peer add/dump/delete via netlink Antonio Quartulli
2024-05-06  1:16 ` [PATCH net-next v3 20/24] ovpn: implement key add/del/swap " Antonio Quartulli
2024-05-06  1:16 ` [PATCH net-next v3 21/24] ovpn: kill key and notify userspace in case of IV exhaustion Antonio Quartulli
2024-05-06  1:16 ` [PATCH net-next v3 22/24] ovpn: notify userspace when a peer is deleted Antonio Quartulli
2024-05-06  1:16 ` [PATCH net-next v3 23/24] ovpn: add basic ethtool support Antonio Quartulli
2024-05-06  1:16 ` [PATCH net-next v3 24/24] testing/selftest: add test tool and scripts for ovpn module Antonio Quartulli
2024-05-07 23:55   ` Jakub Kicinski
2024-05-08  9:51     ` Antonio Quartulli
2024-05-09  0:50       ` Jakub Kicinski
2024-05-09  8:40         ` Antonio Quartulli
2024-05-07 23:48 ` [PATCH net-next v3 00/24] Introducing OpenVPN Data Channel Offload Jakub Kicinski
2024-05-08  9:56   ` Antonio Quartulli
2024-05-09  0:53     ` Jakub Kicinski
2024-05-09  8:41       ` Antonio Quartulli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).