Netdev Archive mirror
 help / color / mirror / Atom feed
* [RFC net-next 00/15] add basic PSP encryption for TCP connections
@ 2024-05-10  3:04 Jakub Kicinski
  2024-05-10  3:04 ` [RFC net-next 01/15] psp: add documentation Jakub Kicinski
                   ` (15 more replies)
  0 siblings, 16 replies; 44+ messages in thread
From: Jakub Kicinski @ 2024-05-10  3:04 UTC (permalink / raw
  To: netdev
  Cc: pabeni, willemdebruijn.kernel, borisp, gal, cratiu, rrameshbabu,
	steffen.klassert, tariqt, Jakub Kicinski

Hi!

Add support for PSP encryption of TCP connections.

PSP is a protocol out of Google:
https://github.com/google/psp/blob/main/doc/PSP_Arch_Spec.pdf
which shares some similarities with IPsec. I added some more info
in the first patch so I'll keep it short here.

The protocol can work in multiple modes including tunneling.
But I'm mostly interested in using it as TLS replacement because
of its superior offload characteristics. So this patch does three
things:

 - it adds "core" PSP code
   PSP is offload-centric, and requires some additional care and
   feeding, so first chunk of the code exposes device info.
   This part can be reused by PSP implementations in xfrm, tunneling etc.

 - TCP integration TLS style
   Reuse some of the existing concepts from TLS offload, such as
   attaching crypto state to a socket, marking skbs as "decrypted",
   egress validation. PSP does not prescribe key exchange protocols.
   To use PSP as a more efficient TLS offload we intend to perform
   a TLS handshake ("inline" in the same TCP connection) and negotiate
   switching to PSP based on capabilities of both endpoints.
   This is also why I'm not including a software implementation.
   Nobody would use it in production, software TLS is faster,
   it has larger crypto records.

 - mlx5 implementation
   That's mostly other people's work, not 100% sure those folks
   consider it ready hence the RFC in the title. But it works :)

Not posted, queued a branch [1] are follow up pieces:
 - standard stats
 - netdevsim implementation and tests

[1] https://github.com/kuba-moo/linux/tree/psp

Jakub Kicinski (8):
  psp: add documentation
  psp: base PSP device support
  net: modify core data structures for PSP datapath support
  tcp: add datapath logic for PSP with inline key exchange
  psp: add op for rotation of secret state
  net: psp: add socket security association code
  net: psp: update the TCP MSS to reflect PSP packet overhead
  psp: track generations of secret state

Raed Salem (7):
  net/mlx5e: Support PSP offload functionality
  net/mlx5e: Implement PSP operations .assoc_add and .assoc_del
  net/mlx5e: Implement PSP Tx data path
  net/mlx5e: Add PSP steering in local NIC RX
  net/mlx5e: Configure PSP Rx flow steering rules
  net/mlx5e: Add Rx data path offload
  net/mlx5e: Implement PSP key_rotate operation

 Documentation/netlink/specs/psp.yaml          | 186 +++++
 Documentation/networking/index.rst            |   1 +
 Documentation/networking/psp.rst              | 138 ++++
 .../net/ethernet/mellanox/mlx5/core/Kconfig   |  11 +
 .../net/ethernet/mellanox/mlx5/core/Makefile  |   5 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |   7 +-
 .../net/ethernet/mellanox/mlx5/core/en/fs.h   |   2 +-
 .../ethernet/mellanox/mlx5/core/en/params.c   |   4 +-
 .../mellanox/mlx5/core/en_accel/en_accel.h    |  50 +-
 .../mellanox/mlx5/core/en_accel/ipsec_rxtx.h  |   2 +-
 .../mellanox/mlx5/core/en_accel/nisp.c        | 209 +++++
 .../mellanox/mlx5/core/en_accel/nisp.h        |  55 ++
 .../mellanox/mlx5/core/en_accel/nisp_fs.c     | 737 ++++++++++++++++++
 .../mellanox/mlx5/core/en_accel/nisp_fs.h     |  30 +
 .../mlx5/core/en_accel/nisp_offload.c         |  52 ++
 .../mellanox/mlx5/core/en_accel/nisp_rxtx.c   | 304 ++++++++
 .../mellanox/mlx5/core/en_accel/nisp_rxtx.h   | 124 +++
 .../net/ethernet/mellanox/mlx5/core/en_main.c |   9 +
 .../net/ethernet/mellanox/mlx5/core/en_rx.c   |  10 +
 .../net/ethernet/mellanox/mlx5/core/en_tx.c   |  10 +-
 drivers/net/ethernet/mellanox/mlx5/core/fw.c  |   6 +
 .../ethernet/mellanox/mlx5/core/lib/crypto.h  |   1 +
 .../mellanox/mlx5/core/lib/psp_defs.h         |  28 +
 .../net/ethernet/mellanox/mlx5/core/main.c    |   5 +
 .../net/ethernet/mellanox/mlx5/core/nisp.c    |  24 +
 .../net/ethernet/mellanox/mlx5/core/nisp.h    |  15 +
 include/linux/mlx5/device.h                   |   4 +
 include/linux/mlx5/driver.h                   |   2 +
 include/linux/mlx5/mlx5_ifc.h                 |  98 ++-
 include/linux/netdevice.h                     |   4 +
 include/linux/skbuff.h                        |   3 +
 include/linux/tcp.h                           |   3 +
 include/net/dropreason-core.h                 |   6 +
 include/net/psp.h                             |  12 +
 include/net/psp/functions.h                   | 150 ++++
 include/net/psp/types.h                       | 182 +++++
 include/net/sock.h                            |   4 +
 include/uapi/linux/psp.h                      |  66 ++
 net/Kconfig                                   |   1 +
 net/Makefile                                  |   1 +
 net/core/gro.c                                |   2 +
 net/core/skbuff.c                             |   4 +
 net/core/sock.c                               |   2 +
 net/ipv4/inet_connection_sock.c               |   2 +
 net/ipv4/tcp.c                                |   2 +
 net/ipv4/tcp_ipv4.c                           |  13 +-
 net/ipv4/tcp_minisocks.c                      |  21 +-
 net/ipv4/tcp_output.c                         |  16 +-
 net/ipv6/ipv6_sockglue.c                      |   6 +-
 net/ipv6/tcp_ipv6.c                           |  22 +-
 net/mptcp/protocol.c                          |   2 +
 net/psp/Kconfig                               |  15 +
 net/psp/Makefile                              |   5 +
 net/psp/psp-nl-gen.c                          | 119 +++
 net/psp/psp-nl-gen.h                          |  39 +
 net/psp/psp.h                                 |  54 ++
 net/psp/psp_main.c                            | 144 ++++
 net/psp/psp_nl.c                              | 517 ++++++++++++
 net/psp/psp_sock.c                            | 276 +++++++
 tools/net/ynl/Makefile.deps                   |   1 +
 60 files changed, 3791 insertions(+), 32 deletions(-)
 create mode 100644 Documentation/netlink/specs/psp.yaml
 create mode 100644 Documentation/networking/psp.rst
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp.h
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp_fs.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp_fs.h
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp_offload.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp_rxtx.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp_rxtx.h
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/psp_defs.h
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/nisp.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/nisp.h
 create mode 100644 include/net/psp.h
 create mode 100644 include/net/psp/functions.h
 create mode 100644 include/net/psp/types.h
 create mode 100644 include/uapi/linux/psp.h
 create mode 100644 net/psp/Kconfig
 create mode 100644 net/psp/Makefile
 create mode 100644 net/psp/psp-nl-gen.c
 create mode 100644 net/psp/psp-nl-gen.h
 create mode 100644 net/psp/psp.h
 create mode 100644 net/psp/psp_main.c
 create mode 100644 net/psp/psp_nl.c
 create mode 100644 net/psp/psp_sock.c

-- 
2.45.0


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [RFC net-next 01/15] psp: add documentation
  2024-05-10  3:04 [RFC net-next 00/15] add basic PSP encryption for TCP connections Jakub Kicinski
@ 2024-05-10  3:04 ` Jakub Kicinski
  2024-05-10 22:19   ` Saeed Mahameed
  2024-05-13  1:24   ` Willem de Bruijn
  2024-05-10  3:04 ` [RFC net-next 02/15] psp: base PSP device support Jakub Kicinski
                   ` (14 subsequent siblings)
  15 siblings, 2 replies; 44+ messages in thread
From: Jakub Kicinski @ 2024-05-10  3:04 UTC (permalink / raw
  To: netdev
  Cc: pabeni, willemdebruijn.kernel, borisp, gal, cratiu, rrameshbabu,
	steffen.klassert, tariqt, Jakub Kicinski

Add documentation of things which belong in the docs rather
than commit messages.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 Documentation/networking/index.rst |   1 +
 Documentation/networking/psp.rst   | 138 +++++++++++++++++++++++++++++
 2 files changed, 139 insertions(+)
 create mode 100644 Documentation/networking/psp.rst

diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst
index 7664c0bfe461..0376029ecbdf 100644
--- a/Documentation/networking/index.rst
+++ b/Documentation/networking/index.rst
@@ -94,6 +94,7 @@ Refer to :ref:`netdev-FAQ` for a guide on netdev development process specifics.
    ppp_generic
    proc_net_tcp
    pse-pd/index
+   psp
    radiotap-headers
    rds
    regulatory
diff --git a/Documentation/networking/psp.rst b/Documentation/networking/psp.rst
new file mode 100644
index 000000000000..a39b464813ab
--- /dev/null
+++ b/Documentation/networking/psp.rst
@@ -0,0 +1,138 @@
+.. SPDX-License-Identifier: GPL-2.0-only
+
+=====================
+PSP Security Protocol
+=====================
+
+Protocol
+========
+
+PSP Security Protocol (PSP) was defined at Google and published in:
+
+https://raw.githubusercontent.com/google/psp/main/doc/PSP_Arch_Spec.pdf
+
+This section briefly covers protocol aspects crucial for understanding
+the kernel API. Refer to the protocol specification for further details.
+
+Note that the kernel implementation and documentation uses the term
+"secret state" in place of "master key", it is both less confusing
+to an average developer and is less likely to run afoul any naming
+guidelines.
+
+Derived Rx keys
+---------------
+
+PSP borrows some terms and mechanisms from IPsec. PSP was designed
+with HW offloads in mind. The key feature of PSP is that Rx keys for every
+connection do not have to be stored by the receiver but can be derived
+from secret state and information present in packet headers.
+This makes it possible to implement receivers which require a constant
+amount of memory regardless of the number of connections (``O(1)`` scaling).
+
+Tx keys have to be stored like with any other protocol, but Tx is much
+less latency sensitive than Rx, and delays in fetching keys from slow
+memory is less likely to cause packet drops.
+
+Key rotation
+------------
+
+The secret state known only to the receiver is fundamental to the design.
+Per specification this state cannot be directly accessible (it must be
+impossible to read it out of the hardware of the receiver NIC).
+Moreover, it has to be "rotated" periodically (usually daily). Rotation
+means that new secret state gets generated (by a random number generator
+of the device), and used for all new connections. To avoid disrupting
+old connections the old secret state remains in the NIC. A phase bit
+carried in the packet headers indicates which generation of secret state
+the packet has been encrypted with.
+
+User facing API
+===============
+
+PSP is designed primarily for hardware offloads. There is currently
+no software fallback for systems which do not have PSP capable NICs.
+There is also no standard (or otherwise defined) way of establishing
+a PSP-secured connection or exchanging the symmetric keys.
+
+The expectation is that higher layer protocols will take care of
+protocol and key negotiation. For example one may use TLS key exchange,
+announce the PSP capability, and switch to PSP if both endpoints
+are PSP-capable.
+
+All configuration of PSP is performed via the PSP netlink family.
+
+Device discovery
+----------------
+
+The PSP netlink family defines operations to retrieve information
+about the PSP devices available on the system, configure them and
+access PSP related statistics.
+
+Securing a connection
+---------------------
+
+PSP encryption is currently only supported for TCP connections.
+Rx and Tx keys are allocated separately. First the ``rx-assoc``
+Netlink command needs to be issued, specifying a target TCP socket.
+Kernel will allocate a new PSP Rx key from the NIC and associate it
+with given socket. At this stage socket will accept both PSP-secured
+and plain text TCP packets.
+
+Tx keys are installed using the ``tx-assoc`` Netlink command.
+Once the Tx keys are installed all data read from the socket will
+be PSP-secured. In other words act of installing Tx keys has the secondary
+effect on the Rx direction, requring all received packets to be encrypted.
+Since packet reception is asynchronous, to make it possible for the
+application to trust that any data read from the socket after the ``tx-assoc``
+call returns success has been encrypted, the kernel will scan the receive
+queue of the socket at ``tx-assoc`` time. If any enqueued packet was received
+in clear text the Tx association will fail, and application should retry
+installing the Tx key after draining the socket (this should not be necessary
+if both endpoints are well behaved).
+
+Rotation notifications
+----------------------
+
+The rotations of secret state happen asynchornously and are usually
+performed by management daemons, not under application control.
+The PSP netlink family will generate a notification whenever keys
+are rotated. The applications are expected to re-establish connections
+before keys are rotated again.
+
+Kernel implementation
+=====================
+
+Driver notes
+------------
+
+Drivers are expected to start with no PSP enabled (``psp-versions-ena``
+in ``dev-get`` set to ``0``) whenever possible. The user space should
+not depend on this behavior, as future extension may necessitate creation
+of devices with PSP already enabled, nonetheless drivers should not enable
+PSP by default. Enabling PSP should be the responsibility of the system
+component which also takes care of key rotation.
+
+Note that ``psp-versions-ena`` is expected to be used only for enabling
+receive processing. The device is not expected to reject transmit requests
+after ``psp-versions-ena`` has been disabled. User may also disable
+``psp-versions-ena`` while there are active associations, which will
+break all PSP Rx processing.
+
+Drivers are expected to ensure that secret state is usable upon init
+(working keys can be allocated), and that no duplicate keys may be generated
+(reuse of SPI without key rotation). Drivers may achieve this by rotating
+keys twice before registering the PSP device.
+
+Drivers must use ``psp_skb_get_assoc_rcu()`` to check if PSP Tx offload
+was requested for given skb. On Rx drivers should allocate and populate
+the ``SKB_EXT_PSP`` skb extension, and set the skb->decrypted bit to 1.
+
+Kernel implementation notes
+---------------------------
+
+PSP implementation follows the TLS offload more closely than the IPsec
+offload, with per-socket state, and the use of skb->decrypted to prevent
+clear text leaks.
+
+PSP device is separate from netdev, to make it possible to "delegate"
+PSP offload capabilities to software devices (e.g. ``veth``).
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC net-next 02/15] psp: base PSP device support
  2024-05-10  3:04 [RFC net-next 00/15] add basic PSP encryption for TCP connections Jakub Kicinski
  2024-05-10  3:04 ` [RFC net-next 01/15] psp: add documentation Jakub Kicinski
@ 2024-05-10  3:04 ` Jakub Kicinski
  2024-05-10  3:04 ` [RFC net-next 03/15] net: modify core data structures for PSP datapath support Jakub Kicinski
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 44+ messages in thread
From: Jakub Kicinski @ 2024-05-10  3:04 UTC (permalink / raw
  To: netdev
  Cc: pabeni, willemdebruijn.kernel, borisp, gal, cratiu, rrameshbabu,
	steffen.klassert, tariqt, Jakub Kicinski

Add a netlink family for PSP and allow drivers to register support.

The "PSP device" is its own object. This allows us to perform more
flexible reference counting / lifetime control than if PSP information
was part of net_device. In the future we should also be able
to "delegate" PSP access to software devices, such as *vlan, veth
or netkit more easily.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 Documentation/netlink/specs/psp.yaml |  94 +++++++++++
 include/linux/netdevice.h            |   4 +
 include/net/psp.h                    |  12 ++
 include/net/psp/functions.h          |  14 ++
 include/net/psp/types.h              | 102 ++++++++++++
 include/uapi/linux/psp.h             |  42 +++++
 net/Kconfig                          |   1 +
 net/Makefile                         |   1 +
 net/psp/Kconfig                      |  13 ++
 net/psp/Makefile                     |   5 +
 net/psp/psp-nl-gen.c                 |  65 ++++++++
 net/psp/psp-nl-gen.h                 |  30 ++++
 net/psp/psp.h                        |  31 ++++
 net/psp/psp_main.c                   | 130 ++++++++++++++++
 net/psp/psp_nl.c                     | 223 +++++++++++++++++++++++++++
 tools/net/ynl/Makefile.deps          |   1 +
 16 files changed, 768 insertions(+)
 create mode 100644 Documentation/netlink/specs/psp.yaml
 create mode 100644 include/net/psp.h
 create mode 100644 include/net/psp/functions.h
 create mode 100644 include/net/psp/types.h
 create mode 100644 include/uapi/linux/psp.h
 create mode 100644 net/psp/Kconfig
 create mode 100644 net/psp/Makefile
 create mode 100644 net/psp/psp-nl-gen.c
 create mode 100644 net/psp/psp-nl-gen.h
 create mode 100644 net/psp/psp.h
 create mode 100644 net/psp/psp_main.c
 create mode 100644 net/psp/psp_nl.c

diff --git a/Documentation/netlink/specs/psp.yaml b/Documentation/netlink/specs/psp.yaml
new file mode 100644
index 000000000000..dbb5ef148045
--- /dev/null
+++ b/Documentation/netlink/specs/psp.yaml
@@ -0,0 +1,94 @@
+# SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)
+
+name: psp
+
+doc:
+  PSP Security Protocol Generic Netlink family.
+
+definitions:
+  -
+    type: enum
+    name: version
+    entries: [ hdr0-aes-gcm-128, hdr0-aes-gcm-256,
+               hdr0-aes-gmac-128, hdr0-aes-gmac-256, ]
+
+attribute-sets:
+  -
+    name: dev
+    attributes:
+      -
+        name: id
+        doc: PSP device ID.
+        type: u32
+        checks:
+          min: 1
+      -
+        name: ifindex
+        doc: ifindex of the main netdevice linked to the PSP device.
+        type: u32
+      -
+        name: psp-versions-cap
+        doc: Bitmask of PSP versions supported by the device.
+        type: u32
+        enum: version
+        enum-as-flags: true
+      -
+        name: psp-versions-ena
+        doc: Bitmask of currently enabled (accepted on Rx) PSP versions.
+        type: u32
+        enum: version
+        enum-as-flags: true
+
+operations:
+  list:
+    -
+      name: dev-get
+      doc: Get / dump information about PSP capable devices on the system.
+      attribute-set: dev
+      do:
+        request:
+          attributes:
+            - id
+        reply: &dev-all
+          attributes:
+            - id
+            - ifindex
+            - psp-versions-cap
+            - psp-versions-ena
+        pre: psp-device-get-locked
+        post: psp-device-unlock
+      dump:
+        reply: *dev-all
+    -
+      name: dev-add-ntf
+      doc: Notification about device appearing.
+      notify: dev-get
+      mcgrp: mgmt
+    -
+      name: dev-del-ntf
+      doc: Notification about device disappearing.
+      notify: dev-get
+      mcgrp: mgmt
+    -
+      name: dev-set
+      doc: Set the configuration of a PSP device.
+      attribute-set: dev
+      do:
+        request:
+          attributes:
+            - id
+            - psp-versions-ena
+        reply:
+          attributes: []
+        pre: psp-device-get-locked
+        post: psp-device-unlock
+    -
+      name: dev-change-ntf
+      doc: Notification about device configuration being changed.
+      notify: dev-get
+      mcgrp: mgmt
+
+mcast-groups:
+  list:
+    -
+      name: mgmt
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index cf261fb89d73..7327ed157bc2 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1874,6 +1874,7 @@ enum netdev_reg_state {
  *			 device struct
  *	@mpls_ptr:	mpls_dev struct pointer
  *	@mctp_ptr:	MCTP specific data
+ *	@psp_dev:	PSP crypto device registered for this netdev
  *
  *	@dev_addr:	Hw address (before bcast,
  *			because most packets are unicast)
@@ -2251,6 +2252,9 @@ struct net_device {
 #if IS_ENABLED(CONFIG_MCTP)
 	struct mctp_dev __rcu	*mctp_ptr;
 #endif
+#if IS_ENABLED(CONFIG_INET_PSP)
+	struct psp_dev __rcu	*psp_dev;
+#endif
 
 /*
  * Cache lines mostly used on receive path (including eth_type_trans())
diff --git a/include/net/psp.h b/include/net/psp.h
new file mode 100644
index 000000000000..33bb4d1dc46e
--- /dev/null
+++ b/include/net/psp.h
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef __NET_PSP_ALL_H
+#define __NET_PSP_ALL_H
+
+#include <uapi/linux/psp.h>
+#include <net/psp/functions.h>
+#include <net/psp/types.h>
+
+/* Do not add any code here. Put it in the sub-headers instead. */
+
+#endif /* __NET_PSP_ALL_H */
diff --git a/include/net/psp/functions.h b/include/net/psp/functions.h
new file mode 100644
index 000000000000..074f9df9afc3
--- /dev/null
+++ b/include/net/psp/functions.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef __NET_PSP_HELPERS_H
+#define __NET_PSP_HELPERS_H
+
+#include <net/psp/types.h>
+
+/* Driver-facing API */
+struct psp_dev *
+psp_dev_create(struct net_device *netdev, struct psp_dev_ops *psd_ops,
+	       struct psp_dev_caps *psd_caps, void *priv_ptr);
+void psp_dev_unregister(struct psp_dev *psd);
+
+#endif /* __NET_PSP_HELPERS_H */
diff --git a/include/net/psp/types.h b/include/net/psp/types.h
new file mode 100644
index 000000000000..dbc5423a53df
--- /dev/null
+++ b/include/net/psp/types.h
@@ -0,0 +1,102 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef __NET_PSP_H
+#define __NET_PSP_H
+
+#include <linux/mutex.h>
+#include <linux/refcount.h>
+
+struct netlink_ext_ack;
+
+#define PSP_DEFAULT_UDP_PORT	1000
+
+struct psphdr {
+	u8	nexthdr;
+	u8	hdrlen;
+	u8	crypt_offset;
+	u8	verfl;
+	__be32	spi;
+	__be64	iv;
+	__be64	vc[]; /* optional */
+};
+
+#define PSP_SPI_KEY_ID		GENMASK(30, 0)
+#define PSP_SPI_KEY_PHASE	BIT(31)
+
+#define PSPHDR_CRYPT_OFFSET	GENMASK(5, 0)
+
+#define PSPHDR_VERFL_SAMPLE	BIT(7)
+#define PSPHDR_VERFL_DROP	BIT(6)
+#define PSPHDR_VERFL_VERSION	GENMASK(5, 2)
+#define PSPHDR_VERFL_VIRT	BIT(1)
+#define PSPHDR_VERFL_ONE	BIT(0)
+
+#define PSP_HDRLEN_NOOPT	((sizeof(struct psphdr) - 8) / 8)
+
+/**
+ * struct psp_dev_config - PSP device configuration
+ * @versions: PSP versions enabled on the device
+ */
+struct psp_dev_config {
+	u32 versions;
+};
+
+/**
+ * struct psp_dev - PSP device struct
+ * @main_netdev: original netdevice of this PSP device
+ * @ops:	driver callbacks
+ * @caps:	device capabilities
+ * @drv_priv:	driver priv pointer
+ * @lock:	instance lock, protects all fields
+ * @refcnt:	reference count for the instance
+ * @id:		instance id
+ * @config:	current device configuration
+ *
+ * @rcu:	RCU head for freeing the structure
+ */
+struct psp_dev {
+	struct net_device *main_netdev;
+
+	struct psp_dev_ops *ops;
+	struct psp_dev_caps *caps;
+	void *drv_priv;
+
+	struct mutex lock;
+	refcount_t refcnt;
+
+	u32 id;
+
+	struct psp_dev_config config;
+
+	struct rcu_head rcu;
+};
+
+/**
+ * struct psp_dev_caps - PSP device capabilities
+ */
+struct psp_dev_caps {
+	/**
+	 * @versions: mask of supported PSP versions
+	 * Set this field to 0 to indicate PSP is not supported at all.
+	 */
+	u32 versions;
+};
+
+#define PSP_V0_KEY	16
+#define PSP_V1_KEY	32
+#define PSP_MAX_KEY	32
+
+/**
+ * struct psp_dev_ops - netdev driver facing PSP callbacks
+ */
+struct psp_dev_ops {
+	/**
+	 * @set_config: set configuration of a PSP device
+	 * Driver can inspect @psd->config for the previous configuration.
+	 * Core will update @psd->config with @config on success.
+	 */
+	int (*set_config)(struct psp_dev *psd, struct psp_dev_config *conf,
+			  struct netlink_ext_ack *extack);
+};
+
+#endif /* __NET_PSP_H */
diff --git a/include/uapi/linux/psp.h b/include/uapi/linux/psp.h
new file mode 100644
index 000000000000..4a404f085190
--- /dev/null
+++ b/include/uapi/linux/psp.h
@@ -0,0 +1,42 @@
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) */
+/* Do not edit directly, auto-generated from: */
+/*	Documentation/netlink/specs/psp.yaml */
+/* YNL-GEN uapi header */
+
+#ifndef _UAPI_LINUX_PSP_H
+#define _UAPI_LINUX_PSP_H
+
+#define PSP_FAMILY_NAME		"psp"
+#define PSP_FAMILY_VERSION	1
+
+enum psp_version {
+	PSP_VERSION_HDR0_AES_GCM_128,
+	PSP_VERSION_HDR0_AES_GCM_256,
+	PSP_VERSION_HDR0_AES_GMAC_128,
+	PSP_VERSION_HDR0_AES_GMAC_256,
+};
+
+enum {
+	PSP_A_DEV_ID = 1,
+	PSP_A_DEV_IFINDEX,
+	PSP_A_DEV_PSP_VERSIONS_CAP,
+	PSP_A_DEV_PSP_VERSIONS_ENA,
+
+	__PSP_A_DEV_MAX,
+	PSP_A_DEV_MAX = (__PSP_A_DEV_MAX - 1)
+};
+
+enum {
+	PSP_CMD_DEV_GET = 1,
+	PSP_CMD_DEV_ADD_NTF,
+	PSP_CMD_DEV_DEL_NTF,
+	PSP_CMD_DEV_SET,
+	PSP_CMD_DEV_CHANGE_NTF,
+
+	__PSP_CMD_MAX,
+	PSP_CMD_MAX = (__PSP_CMD_MAX - 1)
+};
+
+#define PSP_MCGRP_MGMT	"mgmt"
+
+#endif /* _UAPI_LINUX_PSP_H */
diff --git a/net/Kconfig b/net/Kconfig
index f0a8692496ff..3079ed6711c0 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -69,6 +69,7 @@ config SKB_EXTENSIONS
 menu "Networking options"
 
 source "net/packet/Kconfig"
+source "net/psp/Kconfig"
 source "net/unix/Kconfig"
 source "net/tls/Kconfig"
 source "net/xfrm/Kconfig"
diff --git a/net/Makefile b/net/Makefile
index 65bb8c72a35e..c47b1ae4540d 100644
--- a/net/Makefile
+++ b/net/Makefile
@@ -18,6 +18,7 @@ obj-$(CONFIG_INET)		+= ipv4/
 obj-$(CONFIG_TLS)		+= tls/
 obj-$(CONFIG_XFRM)		+= xfrm/
 obj-$(CONFIG_UNIX)		+= unix/
+obj-$(CONFIG_INET_PSP)		+= psp/
 obj-y				+= ipv6/
 obj-$(CONFIG_PACKET)		+= packet/
 obj-$(CONFIG_NET_KEY)		+= key/
diff --git a/net/psp/Kconfig b/net/psp/Kconfig
new file mode 100644
index 000000000000..55f9dd87446b
--- /dev/null
+++ b/net/psp/Kconfig
@@ -0,0 +1,13 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# PSP configuration
+#
+config INET_PSP
+	bool "PSP Security Protocol support"
+	depends on INET
+	help
+	Enable kernel support for the PSP protocol.
+	For more information see:
+	  https://raw.githubusercontent.com/google/psp/main/doc/PSP_Arch_Spec.pdf
+
+	If unsure, say N.
diff --git a/net/psp/Makefile b/net/psp/Makefile
new file mode 100644
index 000000000000..41b51d06e560
--- /dev/null
+++ b/net/psp/Makefile
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+obj-$(CONFIG_INET_PSP) += psp.o
+
+psp-y := psp_main.o psp_nl.o psp-nl-gen.o
diff --git a/net/psp/psp-nl-gen.c b/net/psp/psp-nl-gen.c
new file mode 100644
index 000000000000..859712e7c2c1
--- /dev/null
+++ b/net/psp/psp-nl-gen.c
@@ -0,0 +1,65 @@
+// SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)
+/* Do not edit directly, auto-generated from: */
+/*	Documentation/netlink/specs/psp.yaml */
+/* YNL-GEN kernel source */
+
+#include <net/netlink.h>
+#include <net/genetlink.h>
+
+#include "psp-nl-gen.h"
+
+#include <uapi/linux/psp.h>
+
+/* PSP_CMD_DEV_GET - do */
+static const struct nla_policy psp_dev_get_nl_policy[PSP_A_DEV_ID + 1] = {
+	[PSP_A_DEV_ID] = NLA_POLICY_MIN(NLA_U32, 1),
+};
+
+/* PSP_CMD_DEV_SET - do */
+static const struct nla_policy psp_dev_set_nl_policy[PSP_A_DEV_PSP_VERSIONS_ENA + 1] = {
+	[PSP_A_DEV_ID] = NLA_POLICY_MIN(NLA_U32, 1),
+	[PSP_A_DEV_PSP_VERSIONS_ENA] = NLA_POLICY_MASK(NLA_U32, 0xf),
+};
+
+/* Ops table for psp */
+static const struct genl_split_ops psp_nl_ops[] = {
+	{
+		.cmd		= PSP_CMD_DEV_GET,
+		.pre_doit	= psp_device_get_locked,
+		.doit		= psp_nl_dev_get_doit,
+		.post_doit	= psp_device_unlock,
+		.policy		= psp_dev_get_nl_policy,
+		.maxattr	= PSP_A_DEV_ID,
+		.flags		= GENL_CMD_CAP_DO,
+	},
+	{
+		.cmd	= PSP_CMD_DEV_GET,
+		.dumpit	= psp_nl_dev_get_dumpit,
+		.flags	= GENL_CMD_CAP_DUMP,
+	},
+	{
+		.cmd		= PSP_CMD_DEV_SET,
+		.pre_doit	= psp_device_get_locked,
+		.doit		= psp_nl_dev_set_doit,
+		.post_doit	= psp_device_unlock,
+		.policy		= psp_dev_set_nl_policy,
+		.maxattr	= PSP_A_DEV_PSP_VERSIONS_ENA,
+		.flags		= GENL_CMD_CAP_DO,
+	},
+};
+
+static const struct genl_multicast_group psp_nl_mcgrps[] = {
+	[PSP_NLGRP_MGMT] = { "mgmt", },
+};
+
+struct genl_family psp_nl_family __ro_after_init = {
+	.name		= PSP_FAMILY_NAME,
+	.version	= PSP_FAMILY_VERSION,
+	.netnsok	= true,
+	.parallel_ops	= true,
+	.module		= THIS_MODULE,
+	.split_ops	= psp_nl_ops,
+	.n_split_ops	= ARRAY_SIZE(psp_nl_ops),
+	.mcgrps		= psp_nl_mcgrps,
+	.n_mcgrps	= ARRAY_SIZE(psp_nl_mcgrps),
+};
diff --git a/net/psp/psp-nl-gen.h b/net/psp/psp-nl-gen.h
new file mode 100644
index 000000000000..a099686cab5d
--- /dev/null
+++ b/net/psp/psp-nl-gen.h
@@ -0,0 +1,30 @@
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) */
+/* Do not edit directly, auto-generated from: */
+/*	Documentation/netlink/specs/psp.yaml */
+/* YNL-GEN kernel header */
+
+#ifndef _LINUX_PSP_GEN_H
+#define _LINUX_PSP_GEN_H
+
+#include <net/netlink.h>
+#include <net/genetlink.h>
+
+#include <uapi/linux/psp.h>
+
+int psp_device_get_locked(const struct genl_split_ops *ops,
+			  struct sk_buff *skb, struct genl_info *info);
+void
+psp_device_unlock(const struct genl_split_ops *ops, struct sk_buff *skb,
+		  struct genl_info *info);
+
+int psp_nl_dev_get_doit(struct sk_buff *skb, struct genl_info *info);
+int psp_nl_dev_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb);
+int psp_nl_dev_set_doit(struct sk_buff *skb, struct genl_info *info);
+
+enum {
+	PSP_NLGRP_MGMT,
+};
+
+extern struct genl_family psp_nl_family;
+
+#endif /* _LINUX_PSP_GEN_H */
diff --git a/net/psp/psp.h b/net/psp/psp.h
new file mode 100644
index 000000000000..94d0cc31a61f
--- /dev/null
+++ b/net/psp/psp.h
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef __PSP_PSP_H
+#define __PSP_PSP_H
+
+#include <linux/list.h>
+#include <linux/mutex.h>
+#include <net/netns/generic.h>
+#include <net/psp.h>
+#include <net/sock.h>
+
+extern struct xarray psp_devs;
+extern struct mutex psp_devs_lock;
+
+void psp_dev_destroy(struct psp_dev *psd);
+int psp_dev_check_access(struct psp_dev *psd, struct net *net);
+
+void psp_nl_notify_dev(struct psp_dev *psd, u32 cmd);
+
+static inline void psp_dev_get(struct psp_dev *psd)
+{
+	refcount_inc(&psd->refcnt);
+}
+
+static inline void psp_dev_put(struct psp_dev *psd)
+{
+	if (refcount_dec_and_test(&psd->refcnt))
+		psp_dev_destroy(psd);
+}
+
+#endif /* __PSP_PSP_H */
diff --git a/net/psp/psp_main.c b/net/psp/psp_main.c
new file mode 100644
index 000000000000..cf463f757892
--- /dev/null
+++ b/net/psp/psp_main.c
@@ -0,0 +1,130 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include <linux/list.h>
+#include <linux/netdevice.h>
+#include <linux/xarray.h>
+#include <net/net_namespace.h>
+#include <net/psp.h>
+
+#include "psp.h"
+#include "psp-nl-gen.h"
+
+DEFINE_XARRAY_ALLOC1(psp_devs);
+struct mutex psp_devs_lock;
+
+/**
+ * DOC: PSP locking
+ *
+ * psp_devs_lock protects the psp_devs xarray.
+ * Ordering is take the psp_devs_lock and then the instance lock.
+ * Each instance is protected by RCU, and has a refcount.
+ * When driver unregisters the instance gets flushed, but struct sticks around.
+ */
+
+/**
+ * psp_dev_check_access() - check if user in a given net ns can access PSP dev
+ * @psd:	PSP device structure user is trying to access
+ * @net:	net namespace user is in
+ *
+ * Return: 0 if PSP device should be visible in @net, errno otherwise.
+ */
+int psp_dev_check_access(struct psp_dev *psd, struct net *net)
+{
+	if (dev_net(psd->main_netdev) == net)
+		return 0;
+	return -ENOENT;
+}
+
+/**
+ * psp_dev_create() - create and register PSP device
+ * @netdev:	main netdevice
+ * @psd_ops:	driver callbacks
+ * @psd_caps:	device capabilities
+ * @priv_ptr:	back-pointer to driver private data
+ *
+ * Return: pointer to allocated PSP device, or ERR_PTR.
+ */
+struct psp_dev *
+psp_dev_create(struct net_device *netdev,
+	       struct psp_dev_ops *psd_ops, struct psp_dev_caps *psd_caps,
+	       void *priv_ptr)
+{
+	struct psp_dev *psd;
+	static u32 last_id;
+	int err;
+
+	if (WARN_ON(!psd_caps->versions ||
+		    !psd_ops->set_config))
+		return ERR_PTR(-EINVAL);
+
+	psd = kzalloc(sizeof(*psd), GFP_KERNEL);
+	if (!psd)
+		return ERR_PTR(-ENOMEM);
+
+	psd->main_netdev = netdev;
+	psd->ops = psd_ops;
+	psd->caps = psd_caps;
+	psd->drv_priv = priv_ptr;
+
+	mutex_init(&psd->lock);
+	refcount_set(&psd->refcnt, 1);
+
+	mutex_lock(&psp_devs_lock);
+	err = xa_alloc_cyclic(&psp_devs, &psd->id, psd, xa_limit_31b,
+			      &last_id, GFP_KERNEL);
+	if (err) {
+		mutex_unlock(&psp_devs_lock);
+		kfree(psd);
+		return ERR_PTR(err);
+	}
+	mutex_lock(&psd->lock);
+	mutex_unlock(&psp_devs_lock);
+
+	psp_nl_notify_dev(psd, PSP_CMD_DEV_ADD_NTF);
+
+	rcu_assign_pointer(netdev->psp_dev, psd);
+
+	mutex_unlock(&psd->lock);
+
+	return psd;
+}
+EXPORT_SYMBOL(psp_dev_create);
+
+void psp_dev_destroy(struct psp_dev *psd)
+{
+	mutex_destroy(&psd->lock);
+	kfree_rcu(psd, rcu);
+}
+
+/**
+ * psp_dev_unregister() - unregister PSP device
+ * @psd:	PSP device structure
+ */
+void psp_dev_unregister(struct psp_dev *psd)
+{
+	mutex_lock(&psp_devs_lock);
+	mutex_lock(&psd->lock);
+
+	psp_nl_notify_dev(psd, PSP_CMD_DEV_DEL_NTF);
+	xa_erase(&psp_devs, psd->id);
+	mutex_unlock(&psp_devs_lock);
+
+	rcu_assign_pointer(psd->main_netdev->psp_dev, NULL);
+
+	psd->ops = NULL;
+	psd->drv_priv = NULL;
+
+	mutex_unlock(&psd->lock);
+
+	psp_dev_put(psd);
+}
+EXPORT_SYMBOL(psp_dev_unregister);
+
+static int __init psp_init(void)
+{
+	mutex_init(&psp_devs_lock);
+
+	return genl_register_family(&psp_nl_family);
+}
+
+subsys_initcall(psp_init);
diff --git a/net/psp/psp_nl.c b/net/psp/psp_nl.c
new file mode 100644
index 000000000000..fda5ce800f82
--- /dev/null
+++ b/net/psp/psp_nl.c
@@ -0,0 +1,223 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include <linux/skbuff.h>
+#include <linux/xarray.h>
+#include <net/genetlink.h>
+#include <net/psp.h>
+#include <net/sock.h>
+
+#include "psp-nl-gen.h"
+#include "psp.h"
+
+/* Netlink helpers */
+
+static struct sk_buff *psp_nl_reply_new(struct genl_info *info)
+{
+	struct sk_buff *rsp;
+	void *hdr;
+
+	rsp = genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!rsp)
+		return NULL;
+
+	hdr = genlmsg_iput(rsp, info);
+	if (!hdr) {
+		nlmsg_free(rsp);
+		return NULL;
+	}
+
+	return rsp;
+}
+
+static int psp_nl_reply_send(struct sk_buff *rsp, struct genl_info *info)
+{
+	/* Note that this *only* works with a single message per skb! */
+	nlmsg_end(rsp, (struct nlmsghdr *)rsp->data);
+
+	return genlmsg_reply(rsp, info);
+}
+
+/* Device stuff */
+
+static struct psp_dev *
+psp_device_get_and_lock(struct net *net, struct nlattr *dev_id)
+{
+	struct psp_dev *psd;
+	int err;
+
+	mutex_lock(&psp_devs_lock);
+	psd = xa_load(&psp_devs, nla_get_u32(dev_id));
+	if (!psd) {
+		mutex_unlock(&psp_devs_lock);
+		return ERR_PTR(-ENODEV);
+	}
+
+	mutex_lock(&psd->lock);
+	mutex_unlock(&psp_devs_lock);
+
+	err = psp_dev_check_access(psd, net);
+	if (err) {
+		mutex_unlock(&psd->lock);
+		return ERR_PTR(err);
+	}
+
+	return psd;
+}
+
+int psp_device_get_locked(const struct genl_split_ops *ops,
+			  struct sk_buff *skb, struct genl_info *info)
+{
+	if (GENL_REQ_ATTR_CHECK(info, PSP_A_DEV_ID))
+		return -EINVAL;
+
+	info->user_ptr[0] = psp_device_get_and_lock(genl_info_net(info),
+						    info->attrs[PSP_A_DEV_ID]);
+	return PTR_ERR_OR_ZERO(info->user_ptr[0]);
+}
+
+void
+psp_device_unlock(const struct genl_split_ops *ops, struct sk_buff *skb,
+		  struct genl_info *info)
+{
+	struct psp_dev *psd = info->user_ptr[0];
+
+	mutex_unlock(&psd->lock);
+}
+
+static int
+psp_nl_dev_fill(struct psp_dev *psd, struct sk_buff *rsp,
+		const struct genl_info *info)
+{
+	void *hdr;
+
+	hdr = genlmsg_iput(rsp, info);
+	if (!hdr)
+		return -EMSGSIZE;
+
+	if (nla_put_u32(rsp, PSP_A_DEV_ID, psd->id) ||
+	    nla_put_u32(rsp, PSP_A_DEV_IFINDEX, psd->main_netdev->ifindex) ||
+	    nla_put_u32(rsp, PSP_A_DEV_PSP_VERSIONS_CAP, psd->caps->versions) ||
+	    nla_put_u32(rsp, PSP_A_DEV_PSP_VERSIONS_ENA, psd->config.versions))
+		goto err_cancel_msg;
+
+	genlmsg_end(rsp, hdr);
+	return 0;
+
+err_cancel_msg:
+	genlmsg_cancel(rsp, hdr);
+	return -EMSGSIZE;
+}
+
+void psp_nl_notify_dev(struct psp_dev *psd, u32 cmd)
+{
+	struct genl_info info;
+	struct sk_buff *ntf;
+
+	if (!genl_has_listeners(&psp_nl_family, dev_net(psd->main_netdev),
+				PSP_NLGRP_MGMT))
+		return;
+
+	ntf = genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!ntf)
+		return;
+
+	genl_info_init_ntf(&info, &psp_nl_family, cmd);
+	if (psp_nl_dev_fill(psd, ntf, &info)) {
+		nlmsg_free(ntf);
+		return;
+	}
+
+	genlmsg_multicast_netns(&psp_nl_family, dev_net(psd->main_netdev), ntf,
+				0, PSP_NLGRP_MGMT, GFP_KERNEL);
+}
+
+int psp_nl_dev_get_doit(struct sk_buff *req, struct genl_info *info)
+{
+	struct psp_dev *psd = info->user_ptr[0];
+	struct sk_buff *rsp;
+	int err;
+
+	rsp = genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!rsp)
+		return -ENOMEM;
+
+	err = psp_nl_dev_fill(psd, rsp, info);
+	if (err)
+		goto err_free_msg;
+
+	return genlmsg_reply(rsp, info);
+
+err_free_msg:
+	nlmsg_free(rsp);
+	return err;
+}
+
+static int
+psp_nl_dev_get_dumpit_one(struct sk_buff *rsp, struct netlink_callback *cb,
+			  struct psp_dev *psd)
+{
+	if (psp_dev_check_access(psd, sock_net(rsp->sk)))
+		return 0;
+
+	return psp_nl_dev_fill(psd, rsp, genl_info_dump(cb));
+}
+
+int psp_nl_dev_get_dumpit(struct sk_buff *rsp, struct netlink_callback *cb)
+{
+	struct psp_dev *psd;
+	int err = 0;
+
+	mutex_lock(&psp_devs_lock);
+	xa_for_each_start(&psp_devs, cb->args[0], psd, cb->args[0]) {
+		mutex_lock(&psd->lock);
+		err = psp_nl_dev_get_dumpit_one(rsp, cb, psd);
+		mutex_unlock(&psd->lock);
+		if (err)
+			break;
+	}
+	mutex_unlock(&psp_devs_lock);
+
+	return err;
+}
+
+int psp_nl_dev_set_doit(struct sk_buff *skb, struct genl_info *info)
+{
+	struct psp_dev *psd = info->user_ptr[0];
+	struct psp_dev_config new_config;
+	struct sk_buff *rsp;
+	int err;
+
+	memcpy(&new_config, &psd->config, sizeof(new_config));
+
+	if (info->attrs[PSP_A_DEV_PSP_VERSIONS_ENA]) {
+		new_config.versions =
+			nla_get_u32(info->attrs[PSP_A_DEV_PSP_VERSIONS_ENA]);
+		if (new_config.versions & ~psd->caps->versions) {
+			NL_SET_ERR_MSG(info->extack, "Requested PSP versions not supported by the device");
+			return -EINVAL;
+		}
+	} else {
+		NL_SET_ERR_MSG(info->extack, "No settings present");
+		return -EINVAL;
+	}
+
+	rsp = psp_nl_reply_new(info);
+	if (!rsp)
+		return -ENOMEM;
+
+	if (memcmp(&new_config, &psd->config, sizeof(new_config))) {
+		err = psd->ops->set_config(psd, &new_config, info->extack);
+		if (err)
+			goto err_free_rsp;
+
+		memcpy(&psd->config, &new_config, sizeof(new_config));
+	}
+
+	psp_nl_notify_dev(psd, PSP_CMD_DEV_CHANGE_NTF);
+
+	return psp_nl_reply_send(rsp, info);
+
+err_free_rsp:
+	nlmsg_free(rsp);
+	return err;
+}
diff --git a/tools/net/ynl/Makefile.deps b/tools/net/ynl/Makefile.deps
index f4e8eb79c1b8..e191ea3cefc0 100644
--- a/tools/net/ynl/Makefile.deps
+++ b/tools/net/ynl/Makefile.deps
@@ -25,3 +25,4 @@ CFLAGS_nfsd:=$(call get_hdr_inc,_LINUX_NFSD_NETLINK_H,nfsd_netlink.h)
 CFLAGS_ovs_datapath:=$(call get_hdr_inc,__LINUX_OPENVSWITCH_H,openvswitch.h)
 CFLAGS_ovs_flow:=$(call get_hdr_inc,__LINUX_OPENVSWITCH_H,openvswitch.h)
 CFLAGS_ovs_vport:=$(call get_hdr_inc,__LINUX_OPENVSWITCH_H,openvswitch.h)
+CFLAGS_psp:=$(call get_hdr_inc,_LINUX_PSP_H,psp.h)
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC net-next 03/15] net: modify core data structures for PSP datapath support
  2024-05-10  3:04 [RFC net-next 00/15] add basic PSP encryption for TCP connections Jakub Kicinski
  2024-05-10  3:04 ` [RFC net-next 01/15] psp: add documentation Jakub Kicinski
  2024-05-10  3:04 ` [RFC net-next 02/15] psp: base PSP device support Jakub Kicinski
@ 2024-05-10  3:04 ` Jakub Kicinski
  2024-05-10  3:04 ` [RFC net-next 04/15] tcp: add datapath logic for PSP with inline key exchange Jakub Kicinski
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 44+ messages in thread
From: Jakub Kicinski @ 2024-05-10  3:04 UTC (permalink / raw
  To: netdev
  Cc: pabeni, willemdebruijn.kernel, borisp, gal, cratiu, rrameshbabu,
	steffen.klassert, tariqt, Jakub Kicinski

Add pointers to psp data structures to core networking structs,
and an SKB extension to carry the PSP information from the drivers
to the socket layer.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
Split out to a separate patch for ease of review,
I will squash if that's not helpful.
---
 include/linux/skbuff.h          | 3 +++
 include/linux/tcp.h             | 3 +++
 include/net/psp/functions.h     | 5 +++++
 include/net/psp/types.h         | 7 +++++++
 include/net/sock.h              | 4 ++++
 net/core/skbuff.c               | 4 ++++
 net/core/sock.c                 | 2 ++
 net/ipv4/inet_connection_sock.c | 2 ++
 net/ipv4/tcp_minisocks.c        | 6 ++++--
 net/mptcp/protocol.c            | 2 ++
 10 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index c0b97c93a6de..4689255c66d2 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -4659,6 +4659,9 @@ enum skb_ext_id {
 #endif
 #if IS_ENABLED(CONFIG_MCTP_FLOWS)
 	SKB_EXT_MCTP,
+#endif
+#if IS_ENABLED(CONFIG_INET_PSP)
+	SKB_EXT_PSP,
 #endif
 	SKB_EXT_NUM, /* must be last */
 };
diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 6a5e08b937b3..368ea3a2b338 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -551,6 +551,9 @@ struct tcp_timewait_sock {
 #ifdef CONFIG_TCP_AO
 	struct tcp_ao_info	__rcu *ao_info;
 #endif
+#if IS_ENABLED(CONFIG_INET_PSP)
+	struct psp_assoc __rcu	  *psp_assoc;
+#endif
 };
 
 static inline struct tcp_timewait_sock *tcp_twsk(const struct sock *sk)
diff --git a/include/net/psp/functions.h b/include/net/psp/functions.h
index 074f9df9afc3..9ff0f2b5744f 100644
--- a/include/net/psp/functions.h
+++ b/include/net/psp/functions.h
@@ -5,10 +5,15 @@
 
 #include <net/psp/types.h>
 
+struct tcp_timewait_sock;
+
 /* Driver-facing API */
 struct psp_dev *
 psp_dev_create(struct net_device *netdev, struct psp_dev_ops *psd_ops,
 	       struct psp_dev_caps *psd_caps, void *priv_ptr);
 void psp_dev_unregister(struct psp_dev *psd);
 
+static inline void psp_sk_assoc_free(struct sock *sk) { }
+static inline void psp_twsk_assoc_free(struct tcp_timewait_sock *tw) { }
+
 #endif /* __NET_PSP_HELPERS_H */
diff --git a/include/net/psp/types.h b/include/net/psp/types.h
index dbc5423a53df..a23d9bd9ce96 100644
--- a/include/net/psp/types.h
+++ b/include/net/psp/types.h
@@ -86,6 +86,13 @@ struct psp_dev_caps {
 #define PSP_V1_KEY	32
 #define PSP_MAX_KEY	32
 
+struct psp_skb_ext {
+	__be32 spi;
+	/* generation and version are 8b but we don't want holes */
+	u16 generation;
+	u16 version;
+};
+
 /**
  * struct psp_dev_ops - netdev driver facing PSP callbacks
  */
diff --git a/include/net/sock.h b/include/net/sock.h
index 0450494a1766..dc4c46ac0984 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -249,6 +249,7 @@ struct sk_filter;
   *	@sk_dst_cache: destination cache
   *	@sk_dst_pending_confirm: need to confirm neighbour
   *	@sk_policy: flow policy
+  *	@psp_assoc: PSP association, if socket is PSP-secured
   *	@sk_receive_queue: incoming packets
   *	@sk_wmem_alloc: transmit queue bytes committed
   *	@sk_tsq_flags: TCP Small Queues flags
@@ -436,6 +437,9 @@ struct sock {
 	struct mem_cgroup	*sk_memcg;
 #ifdef CONFIG_XFRM
 	struct xfrm_policy __rcu *sk_policy[2];
+#endif
+#if IS_ENABLED(CONFIG_INET_PSP)
+	struct psp_assoc __rcu	*psp_assoc;
 #endif
 	__cacheline_group_end(sock_read_rxtx);
 
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 466999a7515e..1b6821d8dede 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -77,6 +77,7 @@
 #include <net/mptcp.h>
 #include <net/mctp.h>
 #include <net/page_pool/helpers.h>
+#include <net/psp/types.h>
 #include <net/dropreason.h>
 
 #include <linux/uaccess.h>
@@ -4957,6 +4958,9 @@ static const u8 skb_ext_type_len[] = {
 #if IS_ENABLED(CONFIG_MCTP_FLOWS)
 	[SKB_EXT_MCTP] = SKB_EXT_CHUNKSIZEOF(struct mctp_flow),
 #endif
+#if IS_ENABLED(CONFIG_INET_PSP)
+	[SKB_EXT_PSP] = SKB_EXT_CHUNKSIZEOF(struct psp_skb_ext),
+#endif
 };
 
 static __always_inline unsigned int skb_ext_total_length(void)
diff --git a/net/core/sock.c b/net/core/sock.c
index 8d6e638b5426..24e9113e0417 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -142,6 +142,7 @@
 #include <trace/events/sock.h>
 
 #include <net/tcp.h>
+#include <net/psp.h>
 #include <net/busy_poll.h>
 #include <net/phonet/phonet.h>
 
@@ -3757,6 +3758,7 @@ void sk_common_release(struct sock *sk)
 	sock_orphan(sk);
 
 	xfrm_sk_free_policy(sk);
+	psp_sk_assoc_free(sk);
 
 	sock_put(sk);
 }
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 3b38610958ee..10d4be66046a 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -21,6 +21,7 @@
 #include <net/xfrm.h>
 #include <net/tcp.h>
 #include <net/sock_reuseport.h>
+#include <net/psp.h>
 #include <net/addrconf.h>
 
 #if IS_ENABLED(CONFIG_IPV6)
@@ -1226,6 +1227,7 @@ void inet_csk_destroy_sock(struct sock *sk)
 	sk_stream_kill_queues(sk);
 
 	xfrm_sk_free_policy(sk);
+	psp_sk_assoc_free(sk);
 
 	this_cpu_dec(*sk->sk_prot->orphan_count);
 
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index 7d543569a180..660e890f3c74 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -23,6 +23,7 @@
 #include <net/xfrm.h>
 #include <net/busy_poll.h>
 #include <net/rstreason.h>
+#include <net/psp.h>
 
 static bool tcp_in_window(u32 seq, u32 end_seq, u32 s_win, u32 e_win)
 {
@@ -377,15 +378,16 @@ static void tcp_md5_twsk_free_rcu(struct rcu_head *head)
 
 void tcp_twsk_destructor(struct sock *sk)
 {
+	struct tcp_timewait_sock *twsk = tcp_twsk(sk);
+
 #ifdef CONFIG_TCP_MD5SIG
 	if (static_branch_unlikely(&tcp_md5_needed.key)) {
-		struct tcp_timewait_sock *twsk = tcp_twsk(sk);
-
 		if (twsk->tw_md5_key)
 			call_rcu(&twsk->tw_md5_key->rcu, tcp_md5_twsk_free_rcu);
 	}
 #endif
 	tcp_ao_destroy_sock(sk, true);
+	psp_twsk_assoc_free(twsk);
 }
 EXPORT_SYMBOL_GPL(tcp_twsk_destructor);
 
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index bb8f96f2b86f..cd79bcecebc2 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -23,6 +23,7 @@
 #include <net/hotdata.h>
 #include <net/xfrm.h>
 #include <asm/ioctls.h>
+#include <net/psp.h>
 #include "protocol.h"
 #include "mib.h"
 
@@ -3010,6 +3011,7 @@ static void __mptcp_destroy_sock(struct sock *sk)
 	WARN_ON_ONCE(msk->rmem_released);
 	sk_stream_kill_queues(sk);
 	xfrm_sk_free_policy(sk);
+	psp_sk_assoc_free(sk);
 
 	sock_put(sk);
 }
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC net-next 04/15] tcp: add datapath logic for PSP with inline key exchange
  2024-05-10  3:04 [RFC net-next 00/15] add basic PSP encryption for TCP connections Jakub Kicinski
                   ` (2 preceding siblings ...)
  2024-05-10  3:04 ` [RFC net-next 03/15] net: modify core data structures for PSP datapath support Jakub Kicinski
@ 2024-05-10  3:04 ` Jakub Kicinski
  2024-05-10  3:04 ` [RFC net-next 05/15] psp: add op for rotation of secret state Jakub Kicinski
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 44+ messages in thread
From: Jakub Kicinski @ 2024-05-10  3:04 UTC (permalink / raw
  To: netdev
  Cc: pabeni, willemdebruijn.kernel, borisp, gal, cratiu, rrameshbabu,
	steffen.klassert, tariqt, Jakub Kicinski

Add validation points and state propagation to support PSP key
exchange inline, on TCP connections. The expectation is that
application will use some well established mechanism like TLS
handshake to establish a secure channel over the connection and
if both endpoints are PSP-capable - exchange and install PSP keys.
Because the connection can existing in PSP-unsecured and PSP-secured
state we need to make sure that there are no race conditions or
retransmission leaks.

On Tx - mark packets with the skb->decrypted bit when PSP key
is at the enqueue time. Drivers should only encrypt packets with
this bit set. This prevents retransmissions getting encrypted when
original transmission was not. Similarly to TLS, we'll use
sk->sk_validate_xmit_skb to make sure PSP skbs can't "escape"
via a PSP-unaware device without being encrypted.

On Rx - validation is done under socket lock. This moves the validation
point later than xfrm, for example. Please see the documentation patch
for more details on the flow of securing a connection, but for
the purpose of this patch what's important is that we want to
enforce the invariant that once connection is secured any skb
in the receive queue has been encrypted with PSP.

Add trivialities like GRO and coalescing checks.

This change only adds the validation points, for ease of review.
Subsequent change will add the ability to install keys, and flesh
the enforcement logic out

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 include/net/dropreason-core.h |  6 +++
 include/net/psp/functions.h   | 74 +++++++++++++++++++++++++++++++++++
 net/core/gro.c                |  2 +
 net/ipv4/tcp.c                |  2 +
 net/ipv4/tcp_ipv4.c           |  9 ++++-
 net/ipv4/tcp_minisocks.c      | 15 +++++++
 net/ipv4/tcp_output.c         | 16 +++++---
 net/ipv6/tcp_ipv6.c           | 10 +++++
 net/psp/Kconfig               |  1 +
 9 files changed, 128 insertions(+), 7 deletions(-)

diff --git a/include/net/dropreason-core.h b/include/net/dropreason-core.h
index 9707ab54fdd5..29f1de9f5148 100644
--- a/include/net/dropreason-core.h
+++ b/include/net/dropreason-core.h
@@ -92,6 +92,8 @@
 	FN(PACKET_SOCK_ERROR)		\
 	FN(TC_CHAIN_NOTFOUND)		\
 	FN(TC_RECLASSIFY_LOOP)		\
+	FN(PSP_INPUT)			\
+	FN(PSP_OUTPUT)			\
 	FNe(MAX)
 
 /**
@@ -418,6 +420,10 @@ enum skb_drop_reason {
 	 * iterations.
 	 */
 	SKB_DROP_REASON_TC_RECLASSIFY_LOOP,
+	/** @SKB_DROP_REASON_PSP_INPUT: PSP input checks failed */
+	SKB_DROP_REASON_PSP_INPUT,
+	/** @SKB_DROP_REASON_PSP_OUTPUT: PSP output checks failed */
+	SKB_DROP_REASON_PSP_OUTPUT,
 	/**
 	 * @SKB_DROP_REASON_MAX: the maximum of core drop reasons, which
 	 * shouldn't be used as a real 'reason' - only for tracing code gen
diff --git a/include/net/psp/functions.h b/include/net/psp/functions.h
index 9ff0f2b5744f..7a3b897ecc69 100644
--- a/include/net/psp/functions.h
+++ b/include/net/psp/functions.h
@@ -3,6 +3,8 @@
 #ifndef __NET_PSP_HELPERS_H
 #define __NET_PSP_HELPERS_H
 
+#include <linux/skbuff.h>
+#include <net/sock.h>
 #include <net/psp/types.h>
 
 struct tcp_timewait_sock;
@@ -13,7 +15,79 @@ psp_dev_create(struct net_device *netdev, struct psp_dev_ops *psd_ops,
 	       struct psp_dev_caps *psd_caps, void *priv_ptr);
 void psp_dev_unregister(struct psp_dev *psd);
 
+/* Kernel-facing API */
+#if IS_ENABLED(CONFIG_INET_PSP)
 static inline void psp_sk_assoc_free(struct sock *sk) { }
+static inline void
+psp_twsk_init(struct tcp_timewait_sock *tw, struct sock *sk) { }
 static inline void psp_twsk_assoc_free(struct tcp_timewait_sock *tw) { }
 
+static inline void
+psp_enqueue_set_decrypted(struct sock *sk, struct sk_buff *skb)
+{
+}
+
+static inline unsigned long
+__psp_skb_coalesce_diff(const struct sk_buff *one, const struct sk_buff *two,
+			unsigned long diffs)
+{
+	return diffs;
+}
+
+static inline enum skb_drop_reason
+psp_sk_rx_policy_check(struct sock *sk, struct sk_buff *skb)
+{
+	return 0;
+}
+
+static inline enum skb_drop_reason
+psp_twsk_rx_policy_check(struct tcp_timewait_sock *tw, struct sk_buff *skb)
+{
+	return 0;
+}
+
+static inline struct psp_assoc *psp_skb_get_assoc_rcu(struct sk_buff *skb)
+{
+	return NULL;
+}
+#else
+static inline void psp_sk_assoc_free(struct sock *sk) { }
+static inline void
+psp_twsk_init(struct tcp_timewait_sock *tw, struct sock *sk) { }
+static inline void psp_twsk_assoc_free(struct tcp_timewait_sock *tw) { }
+
+static inline void
+psp_enqueue_set_decrypted(struct sock *sk, struct sk_buff *skb) { }
+
+static inline unsigned long
+__psp_skb_coalesce_diff(const struct sk_buff *one, const struct sk_buff *two,
+			unsigned long diffs)
+{
+	return diffs;
+}
+
+static inline enum skb_drop_reason
+psp_sk_rx_policy_check(struct sock *sk, struct sk_buff *skb)
+{
+	return 0;
+}
+
+static inline enum skb_drop_reason
+psp_twsk_rx_policy_check(struct tcp_timewait_sock *tw, struct sk_buff *skb)
+{
+	return 0;
+}
+
+static inline struct psp_assoc *psp_skb_get_assoc_rcu(struct sk_buff *skb)
+{
+	return NULL;
+}
+#endif
+
+static inline unsigned long
+psp_skb_coalesce_diff(const struct sk_buff *one, const struct sk_buff *two)
+{
+	return __psp_skb_coalesce_diff(one, two, 0);
+}
+
 #endif /* __NET_PSP_HELPERS_H */
diff --git a/net/core/gro.c b/net/core/gro.c
index e2f84947fb74..56c6c0e43a11 100644
--- a/net/core/gro.c
+++ b/net/core/gro.c
@@ -1,4 +1,5 @@
 // SPDX-License-Identifier: GPL-2.0-or-later
+#include <net/psp.h>
 #include <net/gro.h>
 #include <net/dst_metadata.h>
 #include <net/busy_poll.h>
@@ -387,6 +388,7 @@ static void gro_list_prepare(const struct list_head *head,
 			diffs |= skb_get_nfct(p) ^ skb_get_nfct(skb);
 
 			diffs |= gro_list_prepare_tc_ext(skb, p, diffs);
+			diffs |= __psp_skb_coalesce_diff(skb, p, diffs);
 		}
 
 		NAPI_GRO_CB(p)->same_flow = !diffs;
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 06aab937d60a..b81447bcb884 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -275,6 +275,7 @@
 #include <net/proto_memory.h>
 #include <net/xfrm.h>
 #include <net/ip.h>
+#include <net/psp.h>
 #include <net/sock.h>
 #include <net/rstreason.h>
 
@@ -674,6 +675,7 @@ void tcp_skb_entail(struct sock *sk, struct sk_buff *skb)
 	tcb->seq     = tcb->end_seq = tp->write_seq;
 	tcb->tcp_flags = TCPHDR_ACK;
 	__skb_header_release(skb);
+	psp_enqueue_set_decrypted(sk, skb);
 	tcp_add_write_queue_tail(sk, skb);
 	sk_wmem_queued_add(sk, skb->truesize);
 	sk_mem_charge(sk, skb->truesize);
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 95e3d28b83b8..9539c4a7b55d 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -71,6 +71,7 @@
 #include <net/secure_seq.h>
 #include <net/busy_poll.h>
 #include <net/rstreason.h>
+#include <net/psp.h>
 
 #include <linux/inet.h>
 #include <linux/ipv6.h>
@@ -1895,6 +1896,10 @@ int tcp_v4_do_rcv(struct sock *sk, struct sk_buff *skb)
 	enum skb_drop_reason reason;
 	struct sock *rsk;
 
+	reason = psp_sk_rx_policy_check(sk, skb);
+	if (reason)
+		goto err_discard;
+
 	if (sk->sk_state == TCP_ESTABLISHED) { /* Fast path */
 		struct dst_entry *dst;
 
@@ -1956,6 +1961,7 @@ int tcp_v4_do_rcv(struct sock *sk, struct sk_buff *skb)
 	reason = SKB_DROP_REASON_TCP_CSUM;
 	trace_tcp_bad_csum(skb);
 	TCP_INC_STATS(sock_net(sk), TCP_MIB_CSUMERRORS);
+err_discard:
 	TCP_INC_STATS(sock_net(sk), TCP_MIB_INERRS);
 	goto discard;
 }
@@ -2057,7 +2063,8 @@ bool tcp_add_backlog(struct sock *sk, struct sk_buff *skb,
 	    !mptcp_skb_can_collapse(tail, skb) ||
 	    skb_cmp_decrypted(tail, skb) ||
 	    thtail->doff != th->doff ||
-	    memcmp(thtail + 1, th + 1, hdrlen - sizeof(*th)))
+	    memcmp(thtail + 1, th + 1, hdrlen - sizeof(*th)) ||
+	    psp_skb_coalesce_diff(tail, skb))
 		goto no_coalesce;
 
 	__skb_pull(skb, hdrlen);
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index 660e890f3c74..501b8bc8573e 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -101,8 +101,11 @@ tcp_timewait_state_process(struct inet_timewait_sock *tw, struct sk_buff *skb,
 {
 	struct tcp_options_received tmp_opt;
 	struct tcp_timewait_sock *tcptw = tcp_twsk((struct sock *)tw);
+	enum skb_drop_reason psp_drop;
 	bool paws_reject = false;
 
+	psp_drop = psp_twsk_rx_policy_check(tcptw, skb);
+
 	tmp_opt.saw_tstamp = 0;
 	if (th->doff > (sizeof(*th) >> 2) && tcptw->tw_ts_recent_stamp) {
 		tcp_parse_options(twsk_net(tw), skb, &tmp_opt, 0, NULL);
@@ -119,6 +122,9 @@ tcp_timewait_state_process(struct inet_timewait_sock *tw, struct sk_buff *skb,
 	if (tw->tw_substate == TCP_FIN_WAIT2) {
 		/* Just repeat all the checks of tcp_rcv_state_process() */
 
+		if (psp_drop)
+			goto out_put;
+
 		/* Out of window, send ACK */
 		if (paws_reject ||
 		    !tcp_in_window(TCP_SKB_CB(skb)->seq, TCP_SKB_CB(skb)->end_seq,
@@ -183,6 +189,9 @@ tcp_timewait_state_process(struct inet_timewait_sock *tw, struct sk_buff *skb,
 	     (TCP_SKB_CB(skb)->seq == TCP_SKB_CB(skb)->end_seq || th->rst))) {
 		/* In window segment, it may be only reset or bare ack. */
 
+		if (psp_drop)
+			goto out_put;
+
 		if (th->rst) {
 			/* This is TIME_WAIT assassination, in two flavors.
 			 * Oh well... nobody has a sufficient solution to this
@@ -234,6 +243,9 @@ tcp_timewait_state_process(struct inet_timewait_sock *tw, struct sk_buff *skb,
 		return TCP_TW_SYN;
 	}
 
+	if (psp_drop)
+		goto out_put;
+
 	if (paws_reject)
 		__NET_INC_STATS(twsk_net(tw), LINUX_MIB_PAWSESTABREJECTED);
 
@@ -250,6 +262,8 @@ tcp_timewait_state_process(struct inet_timewait_sock *tw, struct sk_buff *skb,
 		return tcp_timewait_check_oow_rate_limit(
 			tw, skb, LINUX_MIB_TCPACKSKIPPEDTIMEWAIT);
 	}
+
+out_put:
 	inet_twsk_put(tw);
 	return TCP_TW_SUCCESS;
 }
@@ -332,6 +346,7 @@ void tcp_time_wait(struct sock *sk, int state, int timeo)
 
 		tcp_time_wait_init(sk, tcptw);
 		tcp_ao_time_wait(tcptw, tp);
+		psp_twsk_init(tcptw, sk);
 
 		/* Get the TIME_WAIT timeout firing. */
 		if (timeo < rto)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 95caf8aaa8be..90d0a1d759ae 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -40,6 +40,7 @@
 #include <net/tcp.h>
 #include <net/mptcp.h>
 #include <net/proto_memory.h>
+#include <net/psp.h>
 
 #include <linux/compiler.h>
 #include <linux/gfp.h>
@@ -400,13 +401,15 @@ static void tcp_ecn_send(struct sock *sk, struct sk_buff *skb,
 /* Constructs common control bits of non-data skb. If SYN/FIN is present,
  * auto increment end seqno.
  */
-static void tcp_init_nondata_skb(struct sk_buff *skb, u32 seq, u8 flags)
+static void tcp_init_nondata_skb(struct sk_buff *skb, struct sock *sk,
+				 u32 seq, u8 flags)
 {
 	skb->ip_summed = CHECKSUM_PARTIAL;
 
 	TCP_SKB_CB(skb)->tcp_flags = flags;
 
 	tcp_skb_pcount_set(skb, 1);
+	psp_enqueue_set_decrypted(sk, skb);
 
 	TCP_SKB_CB(skb)->seq = seq;
 	if (flags & (TCPHDR_SYN | TCPHDR_FIN))
@@ -1497,6 +1500,7 @@ static void tcp_queue_skb(struct sock *sk, struct sk_buff *skb)
 	/* Advance write_seq and place onto the write_queue. */
 	WRITE_ONCE(tp->write_seq, TCP_SKB_CB(skb)->end_seq);
 	__skb_header_release(skb);
+	psp_enqueue_set_decrypted(sk, skb);
 	tcp_add_write_queue_tail(sk, skb);
 	sk_wmem_queued_add(sk, skb->truesize);
 	sk_mem_charge(sk, skb->truesize);
@@ -3611,7 +3615,7 @@ void tcp_send_fin(struct sock *sk)
 		skb_reserve(skb, MAX_TCP_HEADER);
 		sk_forced_mem_schedule(sk, skb->truesize);
 		/* FIN eats a sequence byte, write_seq advanced by tcp_queue_skb(). */
-		tcp_init_nondata_skb(skb, tp->write_seq,
+		tcp_init_nondata_skb(skb, sk, tp->write_seq,
 				     TCPHDR_ACK | TCPHDR_FIN);
 		tcp_queue_skb(sk, skb);
 	}
@@ -3639,7 +3643,7 @@ void tcp_send_active_reset(struct sock *sk, gfp_t priority,
 
 	/* Reserve space for headers and prepare control bits. */
 	skb_reserve(skb, MAX_TCP_HEADER);
-	tcp_init_nondata_skb(skb, tcp_acceptable_seq(sk),
+	tcp_init_nondata_skb(skb, sk, tcp_acceptable_seq(sk),
 			     TCPHDR_ACK | TCPHDR_RST);
 	tcp_mstamp_refresh(tcp_sk(sk));
 	/* Send it off. */
@@ -4129,7 +4133,7 @@ int tcp_connect(struct sock *sk)
 	if (unlikely(!buff))
 		return -ENOBUFS;
 
-	tcp_init_nondata_skb(buff, tp->write_seq++, TCPHDR_SYN);
+	tcp_init_nondata_skb(buff, sk, tp->write_seq++, TCPHDR_SYN);
 	tcp_mstamp_refresh(tp);
 	tp->retrans_stamp = tcp_time_stamp_ts(tp);
 	tcp_connect_queue_skb(sk, buff);
@@ -4261,7 +4265,7 @@ void __tcp_send_ack(struct sock *sk, u32 rcv_nxt)
 
 	/* Reserve space for headers and prepare control bits. */
 	skb_reserve(buff, MAX_TCP_HEADER);
-	tcp_init_nondata_skb(buff, tcp_acceptable_seq(sk), TCPHDR_ACK);
+	tcp_init_nondata_skb(buff, sk, tcp_acceptable_seq(sk), TCPHDR_ACK);
 
 	/* We do not want pure acks influencing TCP Small Queues or fq/pacing
 	 * too much.
@@ -4307,7 +4311,7 @@ static int tcp_xmit_probe_skb(struct sock *sk, int urgent, int mib)
 	 * end to send an ack.  Don't queue or clone SKB, just
 	 * send it.
 	 */
-	tcp_init_nondata_skb(skb, tp->snd_una - !urgent, TCPHDR_ACK);
+	tcp_init_nondata_skb(skb, sk, tp->snd_una - !urgent, TCPHDR_ACK);
 	NET_INC_STATS(sock_net(sk), mib);
 	return tcp_transmit_skb(sk, skb, 0, (__force gfp_t)0);
 }
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 37201c4fb393..6991464511c3 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -61,6 +61,7 @@
 #include <net/hotdata.h>
 #include <net/busy_poll.h>
 #include <net/rstreason.h>
+#include <net/psp.h>
 
 #include <linux/proc_fs.h>
 #include <linux/seq_file.h>
@@ -1607,6 +1608,10 @@ int tcp_v6_do_rcv(struct sock *sk, struct sk_buff *skb)
 	if (skb->protocol == htons(ETH_P_IP))
 		return tcp_v4_do_rcv(sk, skb);
 
+	reason = psp_sk_rx_policy_check(sk, skb);
+	if (reason)
+		goto err_discard;
+
 	/*
 	 *	socket locking is here for SMP purposes as backlog rcv
 	 *	is currently called with bh processing disabled.
@@ -1688,6 +1693,7 @@ int tcp_v6_do_rcv(struct sock *sk, struct sk_buff *skb)
 	reason = SKB_DROP_REASON_TCP_CSUM;
 	trace_tcp_bad_csum(skb);
 	TCP_INC_STATS(sock_net(sk), TCP_MIB_CSUMERRORS);
+err_discard:
 	TCP_INC_STATS(sock_net(sk), TCP_MIB_INERRS);
 	goto discard;
 
@@ -1992,6 +1998,10 @@ INDIRECT_CALLABLE_SCOPE int tcp_v6_rcv(struct sk_buff *skb)
 			__this_cpu_write(tcp_tw_isn, isn);
 			goto process;
 		}
+
+		drop_reason = psp_twsk_rx_policy_check(tcp_twsk(sk), skb);
+		if (drop_reason)
+			break;
 	}
 		/* to ACK */
 		fallthrough;
diff --git a/net/psp/Kconfig b/net/psp/Kconfig
index 55f9dd87446b..5e3908a40945 100644
--- a/net/psp/Kconfig
+++ b/net/psp/Kconfig
@@ -5,6 +5,7 @@
 config INET_PSP
 	bool "PSP Security Protocol support"
 	depends on INET
+	select SKB_DECRYPTED
 	help
 	Enable kernel support for the PSP protocol.
 	For more information see:
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC net-next 05/15] psp: add op for rotation of secret state
  2024-05-10  3:04 [RFC net-next 00/15] add basic PSP encryption for TCP connections Jakub Kicinski
                   ` (3 preceding siblings ...)
  2024-05-10  3:04 ` [RFC net-next 04/15] tcp: add datapath logic for PSP with inline key exchange Jakub Kicinski
@ 2024-05-10  3:04 ` Jakub Kicinski
  2024-05-16 19:59   ` Lance Richardson
  2024-05-10  3:04 ` [RFC net-next 06/15] net: psp: add socket security association code Jakub Kicinski
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 44+ messages in thread
From: Jakub Kicinski @ 2024-05-10  3:04 UTC (permalink / raw
  To: netdev
  Cc: pabeni, willemdebruijn.kernel, borisp, gal, cratiu, rrameshbabu,
	steffen.klassert, tariqt, Jakub Kicinski

Rotating the secret state is a key part of the PSP protocol design.
Some external daemon needs to do it once a day, or so.
Add a netlink op to perform this operation.
Add a notification group for informing users that key has been
rotated and they should rekey (next rotation will cut them off).

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 Documentation/netlink/specs/psp.yaml | 21 +++++++++++++++
 include/net/psp/types.h              |  5 ++++
 include/uapi/linux/psp.h             |  3 +++
 net/psp/psp-nl-gen.c                 | 15 +++++++++++
 net/psp/psp-nl-gen.h                 |  2 ++
 net/psp/psp_main.c                   |  3 ++-
 net/psp/psp_nl.c                     | 40 ++++++++++++++++++++++++++++
 7 files changed, 88 insertions(+), 1 deletion(-)

diff --git a/Documentation/netlink/specs/psp.yaml b/Documentation/netlink/specs/psp.yaml
index dbb5ef148045..6ae598a60b3f 100644
--- a/Documentation/netlink/specs/psp.yaml
+++ b/Documentation/netlink/specs/psp.yaml
@@ -88,7 +88,28 @@ name: psp
       notify: dev-get
       mcgrp: mgmt
 
+    -
+      name: key-rotate
+      doc: Rotate the secret state.
+      attribute-set: dev
+      do:
+        request:
+          attributes:
+            - id
+        reply:
+          attributes:
+            - id
+        pre: psp-device-get-locked
+        post: psp-device-unlock
+    -
+      name: key-rotate-ntf
+      doc: Notification about secret state getting rotated.
+      notify: key-rotate
+      mcgrp: use
+
 mcast-groups:
   list:
     -
       name: mgmt
+    -
+      name: use
diff --git a/include/net/psp/types.h b/include/net/psp/types.h
index a23d9bd9ce96..43950caf70f1 100644
--- a/include/net/psp/types.h
+++ b/include/net/psp/types.h
@@ -104,6 +104,11 @@ struct psp_dev_ops {
 	 */
 	int (*set_config)(struct psp_dev *psd, struct psp_dev_config *conf,
 			  struct netlink_ext_ack *extack);
+
+	/**
+	 * @key_rotate: rotate the secret state
+	 */
+	int (*key_rotate)(struct psp_dev *psd, struct netlink_ext_ack *extack);
 };
 
 #endif /* __NET_PSP_H */
diff --git a/include/uapi/linux/psp.h b/include/uapi/linux/psp.h
index 4a404f085190..cbfbf3f0f364 100644
--- a/include/uapi/linux/psp.h
+++ b/include/uapi/linux/psp.h
@@ -32,11 +32,14 @@ enum {
 	PSP_CMD_DEV_DEL_NTF,
 	PSP_CMD_DEV_SET,
 	PSP_CMD_DEV_CHANGE_NTF,
+	PSP_CMD_KEY_ROTATE,
+	PSP_CMD_KEY_ROTATE_NTF,
 
 	__PSP_CMD_MAX,
 	PSP_CMD_MAX = (__PSP_CMD_MAX - 1)
 };
 
 #define PSP_MCGRP_MGMT	"mgmt"
+#define PSP_MCGRP_USE	"use"
 
 #endif /* _UAPI_LINUX_PSP_H */
diff --git a/net/psp/psp-nl-gen.c b/net/psp/psp-nl-gen.c
index 859712e7c2c1..7f49577ac72f 100644
--- a/net/psp/psp-nl-gen.c
+++ b/net/psp/psp-nl-gen.c
@@ -21,6 +21,11 @@ static const struct nla_policy psp_dev_set_nl_policy[PSP_A_DEV_PSP_VERSIONS_ENA
 	[PSP_A_DEV_PSP_VERSIONS_ENA] = NLA_POLICY_MASK(NLA_U32, 0xf),
 };
 
+/* PSP_CMD_KEY_ROTATE - do */
+static const struct nla_policy psp_key_rotate_nl_policy[PSP_A_DEV_ID + 1] = {
+	[PSP_A_DEV_ID] = NLA_POLICY_MIN(NLA_U32, 1),
+};
+
 /* Ops table for psp */
 static const struct genl_split_ops psp_nl_ops[] = {
 	{
@@ -46,10 +51,20 @@ static const struct genl_split_ops psp_nl_ops[] = {
 		.maxattr	= PSP_A_DEV_PSP_VERSIONS_ENA,
 		.flags		= GENL_CMD_CAP_DO,
 	},
+	{
+		.cmd		= PSP_CMD_KEY_ROTATE,
+		.pre_doit	= psp_device_get_locked,
+		.doit		= psp_nl_key_rotate_doit,
+		.post_doit	= psp_device_unlock,
+		.policy		= psp_key_rotate_nl_policy,
+		.maxattr	= PSP_A_DEV_ID,
+		.flags		= GENL_CMD_CAP_DO,
+	},
 };
 
 static const struct genl_multicast_group psp_nl_mcgrps[] = {
 	[PSP_NLGRP_MGMT] = { "mgmt", },
+	[PSP_NLGRP_USE] = { "use", },
 };
 
 struct genl_family psp_nl_family __ro_after_init = {
diff --git a/net/psp/psp-nl-gen.h b/net/psp/psp-nl-gen.h
index a099686cab5d..00a2d4ec59e4 100644
--- a/net/psp/psp-nl-gen.h
+++ b/net/psp/psp-nl-gen.h
@@ -20,9 +20,11 @@ psp_device_unlock(const struct genl_split_ops *ops, struct sk_buff *skb,
 int psp_nl_dev_get_doit(struct sk_buff *skb, struct genl_info *info);
 int psp_nl_dev_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb);
 int psp_nl_dev_set_doit(struct sk_buff *skb, struct genl_info *info);
+int psp_nl_key_rotate_doit(struct sk_buff *skb, struct genl_info *info);
 
 enum {
 	PSP_NLGRP_MGMT,
+	PSP_NLGRP_USE,
 };
 
 extern struct genl_family psp_nl_family;
diff --git a/net/psp/psp_main.c b/net/psp/psp_main.c
index cf463f757892..536f50d82fb2 100644
--- a/net/psp/psp_main.c
+++ b/net/psp/psp_main.c
@@ -54,7 +54,8 @@ psp_dev_create(struct net_device *netdev,
 	int err;
 
 	if (WARN_ON(!psd_caps->versions ||
-		    !psd_ops->set_config))
+		    !psd_ops->set_config ||
+		    !psd_ops->key_rotate))
 		return ERR_PTR(-EINVAL);
 
 	psd = kzalloc(sizeof(*psd), GFP_KERNEL);
diff --git a/net/psp/psp_nl.c b/net/psp/psp_nl.c
index fda5ce800f82..b7006e50dc87 100644
--- a/net/psp/psp_nl.c
+++ b/net/psp/psp_nl.c
@@ -221,3 +221,43 @@ int psp_nl_dev_set_doit(struct sk_buff *skb, struct genl_info *info)
 	nlmsg_free(rsp);
 	return err;
 }
+
+int psp_nl_key_rotate_doit(struct sk_buff *skb, struct genl_info *info)
+{
+	struct psp_dev *psd = info->user_ptr[0];
+	struct genl_info ntf_info;
+	struct sk_buff *ntf, *rsp;
+	int err;
+
+	rsp = psp_nl_reply_new(info);
+	if (!rsp)
+		return -ENOMEM;
+
+	genl_info_init_ntf(&ntf_info, &psp_nl_family, PSP_CMD_KEY_ROTATE);
+	ntf = psp_nl_reply_new(&ntf_info);
+	if (!ntf) {
+		err = -ENOMEM;
+		goto err_free_rsp;
+	}
+
+	if (nla_put_u32(rsp, PSP_A_DEV_ID, psd->id) ||
+	    nla_put_u32(ntf, PSP_A_DEV_ID, psd->id)) {
+		err = -EMSGSIZE;
+		goto err_free_ntf;
+	}
+
+	err = psd->ops->key_rotate(psd, info->extack);
+	if (err)
+		goto err_free_ntf;
+
+	nlmsg_end(ntf, (struct nlmsghdr *)ntf->data);
+	genlmsg_multicast_netns(&psp_nl_family, dev_net(psd->main_netdev), ntf,
+				0, PSP_NLGRP_USE, GFP_KERNEL);
+	return psp_nl_reply_send(rsp, info);
+
+err_free_ntf:
+	nlmsg_free(ntf);
+err_free_rsp:
+	nlmsg_free(rsp);
+	return err;
+}
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC net-next 06/15] net: psp: add socket security association code
  2024-05-10  3:04 [RFC net-next 00/15] add basic PSP encryption for TCP connections Jakub Kicinski
                   ` (4 preceding siblings ...)
  2024-05-10  3:04 ` [RFC net-next 05/15] psp: add op for rotation of secret state Jakub Kicinski
@ 2024-05-10  3:04 ` Jakub Kicinski
  2024-05-10  3:04 ` [RFC net-next 07/15] net: psp: update the TCP MSS to reflect PSP packet overhead Jakub Kicinski
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 44+ messages in thread
From: Jakub Kicinski @ 2024-05-10  3:04 UTC (permalink / raw
  To: netdev
  Cc: pabeni, willemdebruijn.kernel, borisp, gal, cratiu, rrameshbabu,
	steffen.klassert, tariqt, Jakub Kicinski

Add the ability to install PSP Rx and Tx crypto keys on TCP
connections. Netlink ops are provided for both operations.
Rx side combines allocating a new Rx key and installing it
on the socket. Theoretically these are separate actions,
but in practice they will always be used one after the
other. We can add distinct "alloc" and "install" ops later.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 Documentation/netlink/specs/psp.yaml |  71 ++++++++
 include/net/psp/functions.h          |  69 ++++++--
 include/net/psp/types.h              |  55 ++++++
 include/uapi/linux/psp.h             |  21 +++
 net/psp/Kconfig                      |   1 +
 net/psp/Makefile                     |   2 +-
 net/psp/psp-nl-gen.c                 |  39 ++++
 net/psp/psp-nl-gen.h                 |   7 +
 net/psp/psp.h                        |  22 +++
 net/psp/psp_main.c                   |  11 +-
 net/psp/psp_nl.c                     | 244 +++++++++++++++++++++++++
 net/psp/psp_sock.c                   | 255 +++++++++++++++++++++++++++
 12 files changed, 783 insertions(+), 14 deletions(-)
 create mode 100644 net/psp/psp_sock.c

diff --git a/Documentation/netlink/specs/psp.yaml b/Documentation/netlink/specs/psp.yaml
index 6ae598a60b3f..a81f19e7b20d 100644
--- a/Documentation/netlink/specs/psp.yaml
+++ b/Documentation/netlink/specs/psp.yaml
@@ -38,6 +38,44 @@ name: psp
         type: u32
         enum: version
         enum-as-flags: true
+  -
+    name: assoc
+    attributes:
+      -
+        name: dev-id
+        doc: PSP device ID.
+        type: u32
+        checks:
+          min: 1
+      -
+        name: version
+        doc: |
+          PSP versions (AEAD and protocol version) used by this association,
+          dictates the size of the key.
+        type: u32
+        enum: version
+      -
+        name: rx-key
+        type: nest
+        nested-attributes: keys
+      -
+        name: tx-key
+        type: nest
+        nested-attributes: keys
+      -
+        name: sock-fd
+        doc: Sockets which should be bound to the association immediately.
+        type: u32
+  -
+    name: keys
+    attributes:
+      -
+        name: key
+        type: binary
+      -
+        name: spi
+        doc: Security Parameters Index (SPI) of the association.
+        type: u32
 
 operations:
   list:
@@ -107,6 +145,39 @@ name: psp
       notify: key-rotate
       mcgrp: use
 
+    -
+      name: rx-assoc
+      doc: Allocate a new Rx key + SPI pair, associate it with a socket.
+      attribute-set: assoc
+      do:
+        request:
+          attributes:
+            - dev-id
+            - version
+            - sock-fd
+        reply:
+          attributes:
+            - dev-id
+            - version
+            - rx-key
+        pre: psp-assoc-device-get-locked
+        post: psp-device-unlock
+    -
+      name: tx-assoc
+      doc: Add a PSP Tx association.
+      attribute-set: assoc
+      do:
+        request:
+          attributes:
+            - dev-id
+            - version
+            - tx-key
+            - sock-fd
+        reply:
+          attributes: []
+        pre: psp-assoc-device-get-locked
+        post: psp-device-unlock
+
 mcast-groups:
   list:
     -
diff --git a/include/net/psp/functions.h b/include/net/psp/functions.h
index 7a3b897ecc69..ad81322dd4ed 100644
--- a/include/net/psp/functions.h
+++ b/include/net/psp/functions.h
@@ -4,6 +4,7 @@
 #define __NET_PSP_HELPERS_H
 
 #include <linux/skbuff.h>
+#include <linux/rcupdate.h>
 #include <net/sock.h>
 #include <net/psp/types.h>
 
@@ -16,39 +17,78 @@ psp_dev_create(struct net_device *netdev, struct psp_dev_ops *psd_ops,
 void psp_dev_unregister(struct psp_dev *psd);
 
 /* Kernel-facing API */
+void psp_assoc_put(struct psp_assoc *pas);
+
+static inline void *psp_assoc_drv_data(struct psp_assoc *pas)
+{
+	return pas->drv_data;
+}
+
 #if IS_ENABLED(CONFIG_INET_PSP)
-static inline void psp_sk_assoc_free(struct sock *sk) { }
-static inline void
-psp_twsk_init(struct tcp_timewait_sock *tw, struct sock *sk) { }
-static inline void psp_twsk_assoc_free(struct tcp_timewait_sock *tw) { }
+void psp_sk_assoc_free(struct sock *sk);
+void psp_twsk_init(struct tcp_timewait_sock *tw, struct sock *sk);
+void psp_twsk_assoc_free(struct tcp_timewait_sock *tw);
+enum skb_drop_reason
+psp_twsk_rx_policy_check(struct tcp_timewait_sock *tw, struct sk_buff *skb);
+
+static inline struct psp_assoc *psp_sk_assoc(struct sock *sk)
+{
+	return rcu_dereference_check(sk->psp_assoc, lockdep_sock_is_held(sk));
+}
 
 static inline void
 psp_enqueue_set_decrypted(struct sock *sk, struct sk_buff *skb)
 {
+	struct psp_assoc *pas;
+
+	pas = psp_sk_assoc(sk);
+	if (pas && pas->tx.spi)
+		skb->decrypted = 1;
 }
 
 static inline unsigned long
 __psp_skb_coalesce_diff(const struct sk_buff *one, const struct sk_buff *two,
 			unsigned long diffs)
 {
+	struct psp_skb_ext *a, *b;
+
+	a = skb_ext_find(one, SKB_EXT_PSP);
+	b = skb_ext_find(two, SKB_EXT_PSP);
+
+	diffs |= (!!a) ^ (!!b);
+	if (!diffs && unlikely(a))
+		diffs |= memcmp(a, b, sizeof(*a));
 	return diffs;
 }
 
+static inline enum skb_drop_reason
+__psp_sk_rx_policy_check(struct psp_skb_ext *pse, struct psp_assoc *pas)
+{
+	if (!pse) {
+		if (pas && READ_ONCE(pas->rx_required))
+			return SKB_DROP_REASON_PSP_INPUT;
+		return 0;
+	}
+
+	if (pas && pas->rx.spi == pse->spi &&
+	    pas->generation == pse->generation &&
+	    pas->version == pse->version)
+		return 0;
+	return SKB_DROP_REASON_PSP_INPUT;
+}
+
 static inline enum skb_drop_reason
 psp_sk_rx_policy_check(struct sock *sk, struct sk_buff *skb)
 {
-	return 0;
-}
-
-static inline enum skb_drop_reason
-psp_twsk_rx_policy_check(struct tcp_timewait_sock *tw, struct sk_buff *skb)
-{
-	return 0;
+	return __psp_sk_rx_policy_check(skb_ext_find(skb, SKB_EXT_PSP),
+					psp_sk_assoc(sk));
 }
 
 static inline struct psp_assoc *psp_skb_get_assoc_rcu(struct sk_buff *skb)
 {
-	return NULL;
+	if (!skb->decrypted || !skb->sk || !sk_fullsock(skb->sk))
+		return NULL;
+	return rcu_dereference(skb->sk->psp_assoc);
 }
 #else
 static inline void psp_sk_assoc_free(struct sock *sk) { }
@@ -56,6 +96,11 @@ static inline void
 psp_twsk_init(struct tcp_timewait_sock *tw, struct sock *sk) { }
 static inline void psp_twsk_assoc_free(struct tcp_timewait_sock *tw) { }
 
+static inline struct psp_assoc *psp_sk_assoc(struct sock *sk)
+{
+	return NULL;
+}
+
 static inline void
 psp_enqueue_set_decrypted(struct sock *sk, struct sk_buff *skb) { }
 
diff --git a/include/net/psp/types.h b/include/net/psp/types.h
index 43950caf70f1..e39abf10c03c 100644
--- a/include/net/psp/types.h
+++ b/include/net/psp/types.h
@@ -51,6 +51,7 @@ struct psp_dev_config {
  * @refcnt:	reference count for the instance
  * @id:		instance id
  * @config:	current device configuration
+ * @active_assocs:	list of registered associations
  *
  * @rcu:	RCU head for freeing the structure
  */
@@ -68,6 +69,8 @@ struct psp_dev {
 
 	struct psp_dev_config config;
 
+	struct list_head active_assocs;
+
 	struct rcu_head rcu;
 };
 
@@ -80,6 +83,12 @@ struct psp_dev_caps {
 	 * Set this field to 0 to indicate PSP is not supported at all.
 	 */
 	u32 versions;
+
+	/**
+	 * @assoc_drv_spc: size of driver-specific state in Tx assoc
+	 * Determines the size of struct psp_assoc::drv_spc
+	 */
+	u32 assoc_drv_spc;
 };
 
 #define PSP_V0_KEY	16
@@ -93,6 +102,30 @@ struct psp_skb_ext {
 	u16 version;
 };
 
+struct psp_key_parsed {
+	__be32 spi;
+	u8 key[PSP_MAX_KEY];
+};
+
+struct psp_assoc {
+	struct psp_dev *psd;
+
+	u8 generation;
+	u8 version;
+	u8 key_sz;
+	u8 rx_required;
+
+	struct psp_key_parsed tx;
+	struct psp_key_parsed rx;
+
+	refcount_t refcnt;
+	struct rcu_head rcu;
+	struct work_struct work;
+	struct list_head assocs_list;
+
+	u8 drv_data[] __aligned(8);
+};
+
 /**
  * struct psp_dev_ops - netdev driver facing PSP callbacks
  */
@@ -109,6 +142,28 @@ struct psp_dev_ops {
 	 * @key_rotate: rotate the secret state
 	 */
 	int (*key_rotate)(struct psp_dev *psd, struct netlink_ext_ack *extack);
+
+	/**
+	 * @rx_spi_alloc: allocate an Rx SPI+key pair
+	 * Allocate an Rx SPI and resulting derived key.
+	 * This key should remain valid until key rotation.
+	 */
+	int (*rx_spi_alloc)(struct psp_dev *psd, u32 version,
+			    struct psp_key_parsed *assoc,
+			    struct netlink_ext_ack *extack);
+
+	/**
+	 * @tx_key_add: add a Tx key to the device
+	 * Install an association in the device. Core will allocate space
+	 * for the driver to use at drv_data.
+	 */
+	int (*tx_key_add)(struct psp_dev *psd, struct psp_assoc *pas,
+			  struct netlink_ext_ack *extack);
+	/**
+	 * @tx_key_del: remove a Tx key from the device
+	 * Remove an association from the device.
+	 */
+	void (*tx_key_del)(struct psp_dev *psd, struct psp_assoc *pas);
 };
 
 #endif /* __NET_PSP_H */
diff --git a/include/uapi/linux/psp.h b/include/uapi/linux/psp.h
index cbfbf3f0f364..607c42c39ba5 100644
--- a/include/uapi/linux/psp.h
+++ b/include/uapi/linux/psp.h
@@ -26,6 +26,25 @@ enum {
 	PSP_A_DEV_MAX = (__PSP_A_DEV_MAX - 1)
 };
 
+enum {
+	PSP_A_ASSOC_DEV_ID = 1,
+	PSP_A_ASSOC_VERSION,
+	PSP_A_ASSOC_RX_KEY,
+	PSP_A_ASSOC_TX_KEY,
+	PSP_A_ASSOC_SOCK_FD,
+
+	__PSP_A_ASSOC_MAX,
+	PSP_A_ASSOC_MAX = (__PSP_A_ASSOC_MAX - 1)
+};
+
+enum {
+	PSP_A_KEYS_KEY = 1,
+	PSP_A_KEYS_SPI,
+
+	__PSP_A_KEYS_MAX,
+	PSP_A_KEYS_MAX = (__PSP_A_KEYS_MAX - 1)
+};
+
 enum {
 	PSP_CMD_DEV_GET = 1,
 	PSP_CMD_DEV_ADD_NTF,
@@ -34,6 +53,8 @@ enum {
 	PSP_CMD_DEV_CHANGE_NTF,
 	PSP_CMD_KEY_ROTATE,
 	PSP_CMD_KEY_ROTATE_NTF,
+	PSP_CMD_RX_ASSOC,
+	PSP_CMD_TX_ASSOC,
 
 	__PSP_CMD_MAX,
 	PSP_CMD_MAX = (__PSP_CMD_MAX - 1)
diff --git a/net/psp/Kconfig b/net/psp/Kconfig
index 5e3908a40945..a7d24691a7e1 100644
--- a/net/psp/Kconfig
+++ b/net/psp/Kconfig
@@ -6,6 +6,7 @@ config INET_PSP
 	bool "PSP Security Protocol support"
 	depends on INET
 	select SKB_DECRYPTED
+	select SOCK_VALIDATE_XMIT
 	help
 	Enable kernel support for the PSP protocol.
 	For more information see:
diff --git a/net/psp/Makefile b/net/psp/Makefile
index 41b51d06e560..eb5ff3c5bfb2 100644
--- a/net/psp/Makefile
+++ b/net/psp/Makefile
@@ -2,4 +2,4 @@
 
 obj-$(CONFIG_INET_PSP) += psp.o
 
-psp-y := psp_main.o psp_nl.o psp-nl-gen.o
+psp-y := psp_main.o psp_nl.o psp_sock.o psp-nl-gen.o
diff --git a/net/psp/psp-nl-gen.c b/net/psp/psp-nl-gen.c
index 7f49577ac72f..9fdd6f831803 100644
--- a/net/psp/psp-nl-gen.c
+++ b/net/psp/psp-nl-gen.c
@@ -10,6 +10,12 @@
 
 #include <uapi/linux/psp.h>
 
+/* Common nested types */
+const struct nla_policy psp_keys_nl_policy[PSP_A_KEYS_SPI + 1] = {
+	[PSP_A_KEYS_KEY] = { .type = NLA_BINARY, },
+	[PSP_A_KEYS_SPI] = { .type = NLA_U32, },
+};
+
 /* PSP_CMD_DEV_GET - do */
 static const struct nla_policy psp_dev_get_nl_policy[PSP_A_DEV_ID + 1] = {
 	[PSP_A_DEV_ID] = NLA_POLICY_MIN(NLA_U32, 1),
@@ -26,6 +32,21 @@ static const struct nla_policy psp_key_rotate_nl_policy[PSP_A_DEV_ID + 1] = {
 	[PSP_A_DEV_ID] = NLA_POLICY_MIN(NLA_U32, 1),
 };
 
+/* PSP_CMD_RX_ASSOC - do */
+static const struct nla_policy psp_rx_assoc_nl_policy[PSP_A_ASSOC_SOCK_FD + 1] = {
+	[PSP_A_ASSOC_DEV_ID] = NLA_POLICY_MIN(NLA_U32, 1),
+	[PSP_A_ASSOC_VERSION] = NLA_POLICY_MAX(NLA_U32, 3),
+	[PSP_A_ASSOC_SOCK_FD] = { .type = NLA_U32, },
+};
+
+/* PSP_CMD_TX_ASSOC - do */
+static const struct nla_policy psp_tx_assoc_nl_policy[PSP_A_ASSOC_SOCK_FD + 1] = {
+	[PSP_A_ASSOC_DEV_ID] = NLA_POLICY_MIN(NLA_U32, 1),
+	[PSP_A_ASSOC_VERSION] = NLA_POLICY_MAX(NLA_U32, 3),
+	[PSP_A_ASSOC_TX_KEY] = NLA_POLICY_NESTED(psp_keys_nl_policy),
+	[PSP_A_ASSOC_SOCK_FD] = { .type = NLA_U32, },
+};
+
 /* Ops table for psp */
 static const struct genl_split_ops psp_nl_ops[] = {
 	{
@@ -60,6 +81,24 @@ static const struct genl_split_ops psp_nl_ops[] = {
 		.maxattr	= PSP_A_DEV_ID,
 		.flags		= GENL_CMD_CAP_DO,
 	},
+	{
+		.cmd		= PSP_CMD_RX_ASSOC,
+		.pre_doit	= psp_assoc_device_get_locked,
+		.doit		= psp_nl_rx_assoc_doit,
+		.post_doit	= psp_device_unlock,
+		.policy		= psp_rx_assoc_nl_policy,
+		.maxattr	= PSP_A_ASSOC_SOCK_FD,
+		.flags		= GENL_CMD_CAP_DO,
+	},
+	{
+		.cmd		= PSP_CMD_TX_ASSOC,
+		.pre_doit	= psp_assoc_device_get_locked,
+		.doit		= psp_nl_tx_assoc_doit,
+		.post_doit	= psp_device_unlock,
+		.policy		= psp_tx_assoc_nl_policy,
+		.maxattr	= PSP_A_ASSOC_SOCK_FD,
+		.flags		= GENL_CMD_CAP_DO,
+	},
 };
 
 static const struct genl_multicast_group psp_nl_mcgrps[] = {
diff --git a/net/psp/psp-nl-gen.h b/net/psp/psp-nl-gen.h
index 00a2d4ec59e4..25268ed11fb5 100644
--- a/net/psp/psp-nl-gen.h
+++ b/net/psp/psp-nl-gen.h
@@ -11,8 +11,13 @@
 
 #include <uapi/linux/psp.h>
 
+/* Common nested types */
+extern const struct nla_policy psp_keys_nl_policy[PSP_A_KEYS_SPI + 1];
+
 int psp_device_get_locked(const struct genl_split_ops *ops,
 			  struct sk_buff *skb, struct genl_info *info);
+int psp_assoc_device_get_locked(const struct genl_split_ops *ops,
+				struct sk_buff *skb, struct genl_info *info);
 void
 psp_device_unlock(const struct genl_split_ops *ops, struct sk_buff *skb,
 		  struct genl_info *info);
@@ -21,6 +26,8 @@ int psp_nl_dev_get_doit(struct sk_buff *skb, struct genl_info *info);
 int psp_nl_dev_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb);
 int psp_nl_dev_set_doit(struct sk_buff *skb, struct genl_info *info);
 int psp_nl_key_rotate_doit(struct sk_buff *skb, struct genl_info *info);
+int psp_nl_rx_assoc_doit(struct sk_buff *skb, struct genl_info *info);
+int psp_nl_tx_assoc_doit(struct sk_buff *skb, struct genl_info *info);
 
 enum {
 	PSP_NLGRP_MGMT,
diff --git a/net/psp/psp.h b/net/psp/psp.h
index 94d0cc31a61f..b4092936bc64 100644
--- a/net/psp/psp.h
+++ b/net/psp/psp.h
@@ -4,6 +4,7 @@
 #define __PSP_PSP_H
 
 #include <linux/list.h>
+#include <linux/lockdep.h>
 #include <linux/mutex.h>
 #include <net/netns/generic.h>
 #include <net/psp.h>
@@ -17,15 +18,36 @@ int psp_dev_check_access(struct psp_dev *psd, struct net *net);
 
 void psp_nl_notify_dev(struct psp_dev *psd, u32 cmd);
 
+struct psp_assoc *psp_assoc_create(struct psp_dev *psd);
+struct psp_dev *psd_get_for_sock(struct sock *sk);
+void psp_dev_tx_key_del(struct psp_dev *psd, struct psp_assoc *pas);
+int psp_sock_assoc_set_rx(struct sock *sk, struct psp_assoc *pas,
+			  struct psp_key_parsed *key,
+			  struct netlink_ext_ack *extack);
+int psp_sock_assoc_set_tx(struct sock *sk, struct psp_dev *psd,
+			  u32 version, struct psp_key_parsed *key,
+			  struct netlink_ext_ack *extack);
+
 static inline void psp_dev_get(struct psp_dev *psd)
 {
 	refcount_inc(&psd->refcnt);
 }
 
+static inline bool psp_dev_tryget(struct psp_dev *psd)
+{
+	return refcount_inc_not_zero(&psd->refcnt);
+}
+
 static inline void psp_dev_put(struct psp_dev *psd)
 {
 	if (refcount_dec_and_test(&psd->refcnt))
 		psp_dev_destroy(psd);
 }
 
+static inline bool psp_dev_is_registered(struct psp_dev *psd)
+{
+	lockdep_assert_held(&psd->lock);
+	return !!psd->ops;
+}
+
 #endif /* __PSP_PSP_H */
diff --git a/net/psp/psp_main.c b/net/psp/psp_main.c
index 536f50d82fb2..59066c4db048 100644
--- a/net/psp/psp_main.c
+++ b/net/psp/psp_main.c
@@ -55,7 +55,10 @@ psp_dev_create(struct net_device *netdev,
 
 	if (WARN_ON(!psd_caps->versions ||
 		    !psd_ops->set_config ||
-		    !psd_ops->key_rotate))
+		    !psd_ops->key_rotate ||
+		    !psd_ops->rx_spi_alloc ||
+		    !psd_ops->tx_key_add ||
+		    !psd_ops->tx_key_del))
 		return ERR_PTR(-EINVAL);
 
 	psd = kzalloc(sizeof(*psd), GFP_KERNEL);
@@ -68,6 +71,7 @@ psp_dev_create(struct net_device *netdev,
 	psd->drv_priv = priv_ptr;
 
 	mutex_init(&psd->lock);
+	INIT_LIST_HEAD(&psd->active_assocs);
 	refcount_set(&psd->refcnt, 1);
 
 	mutex_lock(&psp_devs_lock);
@@ -103,6 +107,8 @@ void psp_dev_destroy(struct psp_dev *psd)
  */
 void psp_dev_unregister(struct psp_dev *psd)
 {
+	struct psp_assoc *pas, *next;
+
 	mutex_lock(&psp_devs_lock);
 	mutex_lock(&psd->lock);
 
@@ -110,6 +116,9 @@ void psp_dev_unregister(struct psp_dev *psd)
 	xa_erase(&psp_devs, psd->id);
 	mutex_unlock(&psp_devs_lock);
 
+	list_for_each_entry_safe(pas, next, &psd->active_assocs, assocs_list)
+		psp_dev_tx_key_del(psd, pas);
+
 	rcu_assign_pointer(psd->main_netdev->psp_dev, NULL);
 
 	psd->ops = NULL;
diff --git a/net/psp/psp_nl.c b/net/psp/psp_nl.c
index b7006e50dc87..58508e642185 100644
--- a/net/psp/psp_nl.c
+++ b/net/psp/psp_nl.c
@@ -79,9 +79,12 @@ void
 psp_device_unlock(const struct genl_split_ops *ops, struct sk_buff *skb,
 		  struct genl_info *info)
 {
+	struct socket *socket = info->user_ptr[1];
 	struct psp_dev *psd = info->user_ptr[0];
 
 	mutex_unlock(&psd->lock);
+	if (socket)
+		sockfd_put(socket);
 }
 
 static int
@@ -261,3 +264,244 @@ int psp_nl_key_rotate_doit(struct sk_buff *skb, struct genl_info *info)
 	nlmsg_free(rsp);
 	return err;
 }
+
+/* Key etc. */
+
+int psp_assoc_device_get_locked(const struct genl_split_ops *ops,
+				struct sk_buff *skb, struct genl_info *info)
+{
+	struct socket *socket;
+	struct psp_dev *psd;
+	struct nlattr *id;
+	struct sock *sk;
+	int fd, err;
+
+	if (GENL_REQ_ATTR_CHECK(info, PSP_A_ASSOC_SOCK_FD))
+		return -EINVAL;
+
+	fd = nla_get_u32(info->attrs[PSP_A_ASSOC_SOCK_FD]);
+	socket = sockfd_lookup(fd, &err);
+	if (!socket)
+		return err;
+
+	sk = socket->sk;
+	if (sk->sk_family != AF_INET && sk->sk_family != AF_INET6) {
+		NL_SET_ERR_MSG_ATTR(info->extack,
+				    info->attrs[PSP_A_ASSOC_SOCK_FD],
+				    "Unsupported socket family");
+		err = -EOPNOTSUPP;
+		goto err_sock_put;
+	}
+
+	psd = psd_get_for_sock(socket->sk);
+	if (psd) {
+		err = psp_dev_check_access(psd, genl_info_net(info));
+		if (err) {
+			psp_dev_put(psd);
+			psd = NULL;
+		}
+	}
+
+	if (!psd && GENL_REQ_ATTR_CHECK(info, PSP_A_ASSOC_DEV_ID)) {
+		err = -EINVAL;
+		goto err_sock_put;
+	}
+
+	id = info->attrs[PSP_A_ASSOC_DEV_ID];
+	if (psd) {
+		mutex_lock(&psd->lock);
+		if (id && psd->id != nla_get_u32(id)) {
+			mutex_unlock(&psd->lock);
+			NL_SET_ERR_MSG_ATTR(info->extack, id,
+					    "Device id vs socket mismatch");
+			err = -EINVAL;
+			goto err_psd_put;
+		}
+
+		psp_dev_put(psd);
+	} else {
+		psd = psp_device_get_and_lock(genl_info_net(info), id);
+		if (IS_ERR(psd)) {
+			err = PTR_ERR(psd);
+			goto err_sock_put;
+		}
+	}
+
+	info->user_ptr[0] = psd;
+	info->user_ptr[1] = socket;
+
+	return 0;
+
+err_psd_put:
+	psp_dev_put(psd);
+err_sock_put:
+	sockfd_put(socket);
+	return err;
+}
+
+static unsigned int psp_nl_assoc_key_size(u32 version)
+{
+	switch (version) {
+	case PSP_VERSION_HDR0_AES_GCM_128:
+	case PSP_VERSION_HDR0_AES_GMAC_128:
+		return 16;
+	case PSP_VERSION_HDR0_AES_GCM_256:
+	case PSP_VERSION_HDR0_AES_GMAC_256:
+		return 32;
+	default:
+		/* Netlink policies should prevent us from getting here */
+		WARN_ON_ONCE(1);
+		return 0;
+	}
+}
+
+static int
+psp_nl_parse_key(struct genl_info *info, u32 attr, struct psp_key_parsed *key,
+		 unsigned int key_sz)
+{
+	struct nlattr *nest = info->attrs[attr];
+	struct nlattr *tb[PSP_A_KEYS_SPI + 1];
+	int err;
+
+	err = nla_parse_nested(tb, ARRAY_SIZE(tb) - 1, nest,
+			       psp_keys_nl_policy, info->extack);
+	if (err)
+		return err;
+
+	if (NL_REQ_ATTR_CHECK(info->extack, nest, tb, PSP_A_KEYS_KEY) ||
+	    NL_REQ_ATTR_CHECK(info->extack, nest, tb, PSP_A_KEYS_SPI))
+		return -EINVAL;
+
+	if (nla_len(tb[PSP_A_KEYS_KEY]) != key_sz) {
+		NL_SET_ERR_MSG_ATTR(info->extack, tb[PSP_A_KEYS_KEY],
+				    "incorrect key length");
+		return -EINVAL;
+	}
+
+	key->spi = cpu_to_be32(nla_get_u32(tb[PSP_A_KEYS_SPI]));
+	memcpy(key->key, nla_data(tb[PSP_A_KEYS_KEY]), key_sz);
+
+	return 0;
+}
+
+static int
+psp_nl_put_key(struct sk_buff *skb, u32 attr, u32 version,
+	       struct psp_key_parsed *key)
+{
+	int key_sz = psp_nl_assoc_key_size(version);
+	void *nest;
+
+	nest = nla_nest_start(skb, attr);
+
+	if (nla_put_u32(skb, PSP_A_KEYS_SPI, be32_to_cpu(key->spi)) ||
+	    nla_put(skb, PSP_A_KEYS_KEY, key_sz, key->key)) {
+		nla_nest_cancel(skb, nest);
+		return -EMSGSIZE;
+	}
+
+	nla_nest_end(skb, nest);
+
+	return 0;
+}
+
+int psp_nl_rx_assoc_doit(struct sk_buff *skb, struct genl_info *info)
+{
+	struct socket *socket = info->user_ptr[1];
+	struct psp_dev *psd = info->user_ptr[0];
+	struct psp_key_parsed key;
+	struct psp_assoc *pas;
+	struct sk_buff *rsp;
+	u32 version;
+	int err;
+
+	if (GENL_REQ_ATTR_CHECK(info, PSP_A_ASSOC_VERSION))
+		return -EINVAL;
+
+	version = nla_get_u32(info->attrs[PSP_A_ASSOC_VERSION]);
+	if (!(psd->caps->versions & (1 << version))) {
+		NL_SET_BAD_ATTR(info->extack, info->attrs[PSP_A_ASSOC_VERSION]);
+		return -EOPNOTSUPP;
+	}
+
+	rsp = psp_nl_reply_new(info);
+	if (!rsp)
+		return -ENOMEM;
+
+	pas = psp_assoc_create(psd);
+	if (!pas) {
+		err = -ENOMEM;
+		goto err_free_rsp;
+	}
+	pas->version = version;
+	pas->key_sz = psp_nl_assoc_key_size(version);
+
+	err = psd->ops->rx_spi_alloc(psd, version, &key, info->extack);
+	if (err)
+		goto err_free_pas;
+
+	if (nla_put_u32(rsp, PSP_A_ASSOC_DEV_ID, psd->id) ||
+	    nla_put_u32(rsp, PSP_A_ASSOC_VERSION, version) ||
+	    psp_nl_put_key(rsp, PSP_A_ASSOC_RX_KEY, version, &key)) {
+		err = -EMSGSIZE;
+		goto err_free_pas;
+	}
+
+	err = psp_sock_assoc_set_rx(socket->sk, pas, &key, info->extack);
+	if (err) {
+		NL_SET_BAD_ATTR(info->extack, info->attrs[PSP_A_ASSOC_SOCK_FD]);
+		goto err_free_pas;
+	}
+	psp_assoc_put(pas);
+
+	return psp_nl_reply_send(rsp, info);
+
+err_free_pas:
+	psp_assoc_put(pas);
+err_free_rsp:
+	nlmsg_free(rsp);
+	return err;
+}
+
+int psp_nl_tx_assoc_doit(struct sk_buff *skb, struct genl_info *info)
+{
+	struct socket *socket = info->user_ptr[1];
+	struct psp_dev *psd = info->user_ptr[0];
+	struct psp_key_parsed key;
+	struct sk_buff *rsp;
+	unsigned int key_sz;
+	u32 version;
+	int err;
+
+	if (GENL_REQ_ATTR_CHECK(info, PSP_A_ASSOC_VERSION) ||
+	    GENL_REQ_ATTR_CHECK(info, PSP_A_ASSOC_TX_KEY))
+		return -EINVAL;
+
+	version = nla_get_u32(info->attrs[PSP_A_ASSOC_VERSION]);
+	if (!(psd->caps->versions & (1 << version))) {
+		NL_SET_BAD_ATTR(info->extack, info->attrs[PSP_A_ASSOC_VERSION]);
+		return -EOPNOTSUPP;
+	}
+
+	key_sz = psp_nl_assoc_key_size(version);
+	if (!key_sz)
+		return -EINVAL;
+
+	err = psp_nl_parse_key(info, PSP_A_ASSOC_TX_KEY, &key, key_sz);
+	if (err < 0)
+		return err;
+
+	rsp = psp_nl_reply_new(info);
+	if (!rsp)
+		return -ENOMEM;
+
+	err = psp_sock_assoc_set_tx(socket->sk, psd, version, &key,
+				    info->extack);
+	if (err)
+		goto err_free_msg;
+
+	return psp_nl_reply_send(rsp, info);
+
+err_free_msg:
+	nlmsg_free(rsp);
+	return err;
+}
diff --git a/net/psp/psp_sock.c b/net/psp/psp_sock.c
new file mode 100644
index 000000000000..42b881e681b9
--- /dev/null
+++ b/net/psp/psp_sock.c
@@ -0,0 +1,255 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include <linux/file.h>
+#include <linux/net.h>
+#include <linux/rcupdate.h>
+#include <linux/tcp.h>
+
+#include <net/psp.h>
+#include "psp.h"
+
+struct psp_dev *psd_get_for_sock(struct sock *sk)
+{
+	struct dst_entry *dst;
+	struct psp_dev *psd;
+
+	dst = sk_dst_get(sk);
+	if (!dst)
+		return NULL;
+
+	rcu_read_lock();
+	psd = rcu_dereference(dst->dev->psp_dev);
+	if (psd && !psp_dev_tryget(psd))
+		psd = NULL;
+	rcu_read_unlock();
+
+	dst_release(dst);
+
+	return psd;
+}
+
+static struct sk_buff *
+psp_validate_xmit(struct sock *sk, struct net_device *dev, struct sk_buff *skb)
+{
+	struct psp_assoc *pas;
+	bool good;
+
+	rcu_read_lock();
+	pas = psp_skb_get_assoc_rcu(skb);
+	good = !pas || rcu_access_pointer(dev->psp_dev) == pas->psd;
+	rcu_read_unlock();
+	if (!good) {
+		kfree_skb_reason(skb, SKB_DROP_REASON_PSP_OUTPUT);
+		return NULL;
+	}
+
+	return skb;
+}
+
+struct psp_assoc *psp_assoc_create(struct psp_dev *psd)
+{
+	struct psp_assoc *pas;
+
+	lockdep_assert_held(&psd->lock);
+
+	pas = kzalloc(struct_size(pas, drv_data, psd->caps->assoc_drv_spc),
+		      GFP_KERNEL_ACCOUNT);
+	if (!pas)
+		return NULL;
+
+	pas->psd = psd;
+	psp_dev_get(psd);
+	refcount_set(&pas->refcnt, 1);
+
+	list_add_tail(&pas->assocs_list, &psd->active_assocs);
+
+	return pas;
+}
+
+static struct psp_assoc *psp_assoc_dummy(struct psp_assoc *pas)
+{
+	struct psp_dev *psd = pas->psd;
+	size_t sz;
+
+	lockdep_assert_held(&psd->lock);
+
+	sz = struct_size(pas, drv_data, psd->caps->assoc_drv_spc);
+	return kmemdup(pas, sz, GFP_KERNEL);
+}
+
+static int psp_dev_tx_key_add(struct psp_dev *psd, struct psp_assoc *pas,
+			      struct netlink_ext_ack *extack)
+{
+	return psd->ops->tx_key_add(psd, pas, extack);
+}
+
+void psp_dev_tx_key_del(struct psp_dev *psd, struct psp_assoc *pas)
+{
+	if (pas->tx.spi)
+		psd->ops->tx_key_del(psd, pas);
+	list_del(&pas->assocs_list);
+}
+
+static void psp_assoc_free(struct work_struct *work)
+{
+	struct psp_assoc *pas = container_of(work, struct psp_assoc, work);
+	struct psp_dev *psd = pas->psd;
+
+	mutex_lock(&psd->lock);
+	if (psd->ops)
+		psp_dev_tx_key_del(psd, pas);
+	mutex_unlock(&psd->lock);
+	psp_dev_put(psd);
+	kfree(pas);
+}
+
+static void psp_assoc_free_queue(struct rcu_head *head)
+{
+	struct psp_assoc *pas = container_of(head, struct psp_assoc, rcu);
+
+	INIT_WORK(&pas->work, psp_assoc_free);
+	schedule_work(&pas->work);
+}
+
+/**
+ * psp_assoc_put() - release a reference on a PSP association
+ * @pas: association to release
+ */
+void psp_assoc_put(struct psp_assoc *pas)
+{
+	if (pas && refcount_dec_and_test(&pas->refcnt))
+		call_rcu(&pas->rcu, psp_assoc_free_queue);
+}
+
+void psp_sk_assoc_free(struct sock *sk)
+{
+	rcu_read_lock();
+	psp_assoc_put(rcu_dereference(sk->psp_assoc));
+	rcu_assign_pointer(sk->psp_assoc, NULL);
+	rcu_read_unlock();
+}
+
+int psp_sock_assoc_set_rx(struct sock *sk, struct psp_assoc *pas,
+			  struct psp_key_parsed *key,
+			  struct netlink_ext_ack *extack)
+{
+	int err;
+
+	memcpy(&pas->rx, key, sizeof(*key));
+
+	lock_sock(sk);
+
+	if (psp_sk_assoc(sk)) {
+		NL_SET_ERR_MSG(extack, "Socket already has PSP state");
+		err = -EBUSY;
+		goto exit_unlock;
+	}
+
+	refcount_inc(&pas->refcnt);
+	rcu_assign_pointer(sk->psp_assoc, pas);
+	err = 0;
+
+exit_unlock:
+	release_sock(sk);
+
+	return err;
+}
+
+static int psp_sock_recv_queue_check(struct sock *sk)
+{
+	struct sk_buff *skb;
+
+	skb_queue_walk(&sk->sk_receive_queue, skb) {
+		if (psp_sk_rx_policy_check(sk, skb))
+			return -EBUSY;
+	}
+	return 0;
+}
+
+int psp_sock_assoc_set_tx(struct sock *sk, struct psp_dev *psd,
+			  u32 version, struct psp_key_parsed *key,
+			  struct netlink_ext_ack *extack)
+{
+	struct psp_assoc *pas, *dummy;
+	int err;
+
+	lock_sock(sk);
+
+	pas = psp_sk_assoc(sk);
+	if (!pas) {
+		NL_SET_ERR_MSG(extack, "Socket has no Rx key");
+		err = -EINVAL;
+		goto exit_unlock;
+	}
+	if (pas->psd != psd) {
+		NL_SET_ERR_MSG(extack, "Rx key from different device");
+		err = -EINVAL;
+		goto exit_unlock;
+	}
+	if (pas->version != version) {
+		NL_SET_ERR_MSG(extack,
+			       "PSP version mismatch with existing state");
+		err = -EINVAL;
+		goto exit_unlock;
+	}
+	if (pas->tx.spi) {
+		NL_SET_ERR_MSG(extack, "Tx key already set");
+		err = -EBUSY;
+		goto exit_unlock;
+	}
+
+	WRITE_ONCE(pas->rx_required, 1);
+	err = psp_sock_recv_queue_check(sk);
+	if (err) {
+		NL_SET_ERR_MSG(extack, "Socket has incompatible segments already in the recv queue");
+		goto exit_clear_rx;
+	}
+
+	/* Pass a fake association to drivers to make sure they don't
+	 * try to store pointers to it. For re-keying we'll need to
+	 * re-allocate the assoc structures.
+	 */
+	dummy = psp_assoc_dummy(pas);
+	memcpy(&dummy->tx, key, sizeof(*key));
+	err = psp_dev_tx_key_add(psd, dummy, extack);
+	if (err)
+		goto exit_free_dummy;
+
+	memcpy(pas->drv_data, dummy->drv_data, psd->caps->assoc_drv_spc);
+	memcpy(&pas->tx, key, sizeof(*key));
+
+	WRITE_ONCE(sk->sk_validate_xmit_skb, psp_validate_xmit);
+
+exit_free_dummy:
+	kfree(dummy);
+exit_clear_rx:
+	if (err)
+		WRITE_ONCE(pas->rx_required, 0);
+exit_unlock:
+	release_sock(sk);
+	return err;
+}
+
+void psp_twsk_init(struct tcp_timewait_sock *tw, struct sock *sk)
+{
+	struct psp_assoc *pas = psp_sk_assoc(sk);
+
+	if (pas)
+		refcount_inc(&pas->refcnt);
+	rcu_assign_pointer(tw->psp_assoc, pas);
+}
+
+void psp_twsk_assoc_free(struct tcp_timewait_sock *tw)
+{
+	rcu_read_lock();
+	psp_assoc_put(rcu_dereference(tw->psp_assoc));
+	rcu_assign_pointer(tw->psp_assoc, NULL);
+	rcu_read_unlock();
+}
+
+enum skb_drop_reason
+psp_twsk_rx_policy_check(struct tcp_timewait_sock *tw, struct sk_buff *skb)
+{
+	return __psp_sk_rx_policy_check(skb_ext_find(skb, SKB_EXT_PSP),
+					rcu_dereference(tw->psp_assoc));
+}
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC net-next 07/15] net: psp: update the TCP MSS to reflect PSP packet overhead
  2024-05-10  3:04 [RFC net-next 00/15] add basic PSP encryption for TCP connections Jakub Kicinski
                   ` (5 preceding siblings ...)
  2024-05-10  3:04 ` [RFC net-next 06/15] net: psp: add socket security association code Jakub Kicinski
@ 2024-05-10  3:04 ` Jakub Kicinski
  2024-05-13  1:47   ` Willem de Bruijn
  2024-05-10  3:04 ` [RFC net-next 08/15] psp: track generations of secret state Jakub Kicinski
                   ` (8 subsequent siblings)
  15 siblings, 1 reply; 44+ messages in thread
From: Jakub Kicinski @ 2024-05-10  3:04 UTC (permalink / raw
  To: netdev
  Cc: pabeni, willemdebruijn.kernel, borisp, gal, cratiu, rrameshbabu,
	steffen.klassert, tariqt, Jakub Kicinski

PSP eats 32B of header space. Adjust MSS appropriately.

We can either modify tcp_mtu_to_mss() / tcp_mss_to_mtu()
or reuse icsk_ext_hdr_len. The former option is more TCP
specific and has runtime overhead. The latter is a bit
of a hack as PSP is not an ext_hdr. If one squints hard
enough, UDP encap is just a more practical version of
IPv6 exthdr, so go with the latter. Happy to change.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 include/net/psp/functions.h | 12 ++++++++++++
 include/net/psp/types.h     |  3 +++
 net/ipv4/tcp_ipv4.c         |  4 ++--
 net/ipv6/ipv6_sockglue.c    |  6 +++++-
 net/ipv6/tcp_ipv6.c         | 12 ++++++------
 net/psp/psp_sock.c          |  5 +++++
 6 files changed, 33 insertions(+), 9 deletions(-)

diff --git a/include/net/psp/functions.h b/include/net/psp/functions.h
index ad81322dd4ed..be4f78dec425 100644
--- a/include/net/psp/functions.h
+++ b/include/net/psp/functions.h
@@ -90,6 +90,13 @@ static inline struct psp_assoc *psp_skb_get_assoc_rcu(struct sk_buff *skb)
 		return NULL;
 	return rcu_dereference(skb->sk->psp_assoc);
 }
+
+static inline unsigned int psp_sk_overhead(const struct sock *sk)
+{
+	bool has_psp = rcu_access_pointer(sk->psp_assoc);
+
+	return has_psp ? PSP_HDR_SIZE + PSP_TRL_SIZE : 0;
+}
 #else
 static inline void psp_sk_assoc_free(struct sock *sk) { }
 static inline void
@@ -127,6 +134,11 @@ static inline struct psp_assoc *psp_skb_get_assoc_rcu(struct sk_buff *skb)
 {
 	return NULL;
 }
+
+static inline unsigned int psp_sk_overhead(const struct sock *sk)
+{
+	return 0;
+}
 #endif
 
 static inline unsigned long
diff --git a/include/net/psp/types.h b/include/net/psp/types.h
index e39abf10c03c..aad836c1c2ca 100644
--- a/include/net/psp/types.h
+++ b/include/net/psp/types.h
@@ -95,6 +95,9 @@ struct psp_dev_caps {
 #define PSP_V1_KEY	32
 #define PSP_MAX_KEY	32
 
+#define PSP_HDR_SIZE	16	/* We don't support optional fields, yet */
+#define PSP_TRL_SIZE	16	/* AES-GCM/GMAC trailer size */
+
 struct psp_skb_ext {
 	__be32 spi;
 	/* generation and version are 8b but we don't want holes */
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 9539c4a7b55d..2a602cf51009 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -279,9 +279,9 @@ int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 	inet->inet_dport = usin->sin_port;
 	sk_daddr_set(sk, daddr);
 
-	inet_csk(sk)->icsk_ext_hdr_len = 0;
+	inet_csk(sk)->icsk_ext_hdr_len = psp_sk_overhead(sk);
 	if (inet_opt)
-		inet_csk(sk)->icsk_ext_hdr_len = inet_opt->opt.optlen;
+		inet_csk(sk)->icsk_ext_hdr_len += inet_opt->opt.optlen;
 
 	tp->rx_opt.mss_clamp = TCP_MSS_DEFAULT;
 
diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index d4c28ec1bc51..b4505bbb9e2c 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -49,6 +49,7 @@
 #include <net/xfrm.h>
 #include <net/compat.h>
 #include <net/seg6.h>
+#include <net/psp.h>
 
 #include <linux/uaccess.h>
 
@@ -107,7 +108,10 @@ struct ipv6_txoptions *ipv6_update_options(struct sock *sk,
 		    !((1 << sk->sk_state) & (TCPF_LISTEN | TCPF_CLOSE)) &&
 		    inet_sk(sk)->inet_daddr != LOOPBACK4_IPV6) {
 			struct inet_connection_sock *icsk = inet_csk(sk);
-			icsk->icsk_ext_hdr_len = opt->opt_flen + opt->opt_nflen;
+
+			icsk->icsk_ext_hdr_len =
+				psp_sk_overhead(sk) +
+				opt->opt_flen + opt->opt_nflen;
 			icsk->icsk_sync_mss(sk, icsk->icsk_pmtu_cookie);
 		}
 	}
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 6991464511c3..c67700fc49a1 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -299,10 +299,10 @@ static int tcp_v6_connect(struct sock *sk, struct sockaddr *uaddr,
 	sk->sk_gso_type = SKB_GSO_TCPV6;
 	ip6_dst_store(sk, dst, NULL, NULL);
 
-	icsk->icsk_ext_hdr_len = 0;
+	icsk->icsk_ext_hdr_len = psp_sk_overhead(sk);
 	if (opt)
-		icsk->icsk_ext_hdr_len = opt->opt_flen +
-					 opt->opt_nflen;
+		icsk->icsk_ext_hdr_len += opt->opt_flen +
+					  opt->opt_nflen;
 
 	tp->rx_opt.mss_clamp = IPV6_MIN_MTU - sizeof(struct tcphdr) - sizeof(struct ipv6hdr);
 
@@ -1500,10 +1500,10 @@ static struct sock *tcp_v6_syn_recv_sock(const struct sock *sk, struct sk_buff *
 		opt = ipv6_dup_options(newsk, opt);
 		RCU_INIT_POINTER(newnp->opt, opt);
 	}
-	inet_csk(newsk)->icsk_ext_hdr_len = 0;
+	inet_csk(newsk)->icsk_ext_hdr_len = psp_sk_overhead(sk);
 	if (opt)
-		inet_csk(newsk)->icsk_ext_hdr_len = opt->opt_nflen +
-						    opt->opt_flen;
+		inet_csk(newsk)->icsk_ext_hdr_len += opt->opt_nflen +
+						     opt->opt_flen;
 
 	tcp_ca_openreq_child(newsk, dst);
 
diff --git a/net/psp/psp_sock.c b/net/psp/psp_sock.c
index 42b881e681b9..bcef042cb8a5 100644
--- a/net/psp/psp_sock.c
+++ b/net/psp/psp_sock.c
@@ -170,6 +170,7 @@ int psp_sock_assoc_set_tx(struct sock *sk, struct psp_dev *psd,
 			  u32 version, struct psp_key_parsed *key,
 			  struct netlink_ext_ack *extack)
 {
+	struct inet_connection_sock *icsk;
 	struct psp_assoc *pas, *dummy;
 	int err;
 
@@ -220,6 +221,10 @@ int psp_sock_assoc_set_tx(struct sock *sk, struct psp_dev *psd,
 
 	WRITE_ONCE(sk->sk_validate_xmit_skb, psp_validate_xmit);
 
+	icsk = inet_csk(sk);
+	icsk->icsk_ext_hdr_len += psp_sk_overhead(sk);
+	icsk->icsk_sync_mss(sk, icsk->icsk_pmtu_cookie);
+
 exit_free_dummy:
 	kfree(dummy);
 exit_clear_rx:
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC net-next 08/15] psp: track generations of secret state
  2024-05-10  3:04 [RFC net-next 00/15] add basic PSP encryption for TCP connections Jakub Kicinski
                   ` (6 preceding siblings ...)
  2024-05-10  3:04 ` [RFC net-next 07/15] net: psp: update the TCP MSS to reflect PSP packet overhead Jakub Kicinski
@ 2024-05-10  3:04 ` Jakub Kicinski
  2024-05-10  3:04 ` [RFC net-next 09/15] net/mlx5e: Support PSP offload functionality Jakub Kicinski
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 44+ messages in thread
From: Jakub Kicinski @ 2024-05-10  3:04 UTC (permalink / raw
  To: netdev
  Cc: pabeni, willemdebruijn.kernel, borisp, gal, cratiu, rrameshbabu,
	steffen.klassert, tariqt, Jakub Kicinski

There is a (somewhat theoretical in absence of multi-host support)
possibility that another entity will rotate the key and we won't
know. This may lead to accepting packets with matching SPI but
which used different crypto keys than we expected. Maintain and
compare "key generation" per PSP spec.

Since we're tracking "key generations" more explicitly now,
maintain different lists for associations from different generations.
This way we can catch stale associations (the user space should
listen to rotation notifications and change the keys).

Drivers can "opt out" of generation tracking by setting
the generation value to 0.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 include/net/psp/types.h | 10 ++++++++++
 net/psp/psp.h           |  1 +
 net/psp/psp_main.c      |  6 +++++-
 net/psp/psp_nl.c        | 10 ++++++++++
 net/psp/psp_sock.c      | 16 ++++++++++++++++
 5 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/include/net/psp/types.h b/include/net/psp/types.h
index aad836c1c2ca..a9e406c979a8 100644
--- a/include/net/psp/types.h
+++ b/include/net/psp/types.h
@@ -50,8 +50,12 @@ struct psp_dev_config {
  * @lock:	instance lock, protects all fields
  * @refcnt:	reference count for the instance
  * @id:		instance id
+ * @generation:	current generation of the secret state
  * @config:	current device configuration
  * @active_assocs:	list of registered associations
+ * @prev_assocs:	associations which use old (but still usable)
+ *			secret state
+ * @stale_assocs:	associations which use a rotated out key
  *
  * @rcu:	RCU head for freeing the structure
  */
@@ -67,13 +71,19 @@ struct psp_dev {
 
 	u32 id;
 
+	u8 generation;
+
 	struct psp_dev_config config;
 
 	struct list_head active_assocs;
+	struct list_head prev_assocs;
+	struct list_head stale_assocs;
 
 	struct rcu_head rcu;
 };
 
+#define PSP_GEN_VALID_MASK	0x7f
+
 /**
  * struct psp_dev_caps - PSP device capabilities
  */
diff --git a/net/psp/psp.h b/net/psp/psp.h
index b4092936bc64..a511ec85e1c7 100644
--- a/net/psp/psp.h
+++ b/net/psp/psp.h
@@ -27,6 +27,7 @@ int psp_sock_assoc_set_rx(struct sock *sk, struct psp_assoc *pas,
 int psp_sock_assoc_set_tx(struct sock *sk, struct psp_dev *psd,
 			  u32 version, struct psp_key_parsed *key,
 			  struct netlink_ext_ack *extack);
+void psp_assocs_key_rotated(struct psp_dev *psd);
 
 static inline void psp_dev_get(struct psp_dev *psd)
 {
diff --git a/net/psp/psp_main.c b/net/psp/psp_main.c
index 59066c4db048..f9c5ee57df78 100644
--- a/net/psp/psp_main.c
+++ b/net/psp/psp_main.c
@@ -72,6 +72,8 @@ psp_dev_create(struct net_device *netdev,
 
 	mutex_init(&psd->lock);
 	INIT_LIST_HEAD(&psd->active_assocs);
+	INIT_LIST_HEAD(&psd->prev_assocs);
+	INIT_LIST_HEAD(&psd->stale_assocs);
 	refcount_set(&psd->refcnt, 1);
 
 	mutex_lock(&psp_devs_lock);
@@ -116,7 +118,9 @@ void psp_dev_unregister(struct psp_dev *psd)
 	xa_erase(&psp_devs, psd->id);
 	mutex_unlock(&psp_devs_lock);
 
-	list_for_each_entry_safe(pas, next, &psd->active_assocs, assocs_list)
+	list_splice_init(&psd->active_assocs, &psd->prev_assocs);
+	list_splice_init(&psd->prev_assocs, &psd->stale_assocs);
+	list_for_each_entry_safe(pas, next, &psd->stale_assocs, assocs_list)
 		psp_dev_tx_key_del(psd, pas);
 
 	rcu_assign_pointer(psd->main_netdev->psp_dev, NULL);
diff --git a/net/psp/psp_nl.c b/net/psp/psp_nl.c
index 58508e642185..7b8a1d390cde 100644
--- a/net/psp/psp_nl.c
+++ b/net/psp/psp_nl.c
@@ -230,6 +230,7 @@ int psp_nl_key_rotate_doit(struct sk_buff *skb, struct genl_info *info)
 	struct psp_dev *psd = info->user_ptr[0];
 	struct genl_info ntf_info;
 	struct sk_buff *ntf, *rsp;
+	u8 prev_gen;
 	int err;
 
 	rsp = psp_nl_reply_new(info);
@@ -249,10 +250,19 @@ int psp_nl_key_rotate_doit(struct sk_buff *skb, struct genl_info *info)
 		goto err_free_ntf;
 	}
 
+	/* suggest the next gen number, driver can override */
+	prev_gen = psd->generation;
+	psd->generation = (prev_gen + 1) & PSP_GEN_VALID_MASK;
+
 	err = psd->ops->key_rotate(psd, info->extack);
 	if (err)
 		goto err_free_ntf;
 
+	WARN_ON_ONCE((psd->generation && psd->generation == prev_gen) ||
+		     psd->generation & ~PSP_GEN_VALID_MASK);
+
+	psp_assocs_key_rotated(psd);
+
 	nlmsg_end(ntf, (struct nlmsghdr *)ntf->data);
 	genlmsg_multicast_netns(&psp_nl_family, dev_net(psd->main_netdev), ntf,
 				0, PSP_NLGRP_USE, GFP_KERNEL);
diff --git a/net/psp/psp_sock.c b/net/psp/psp_sock.c
index bcef042cb8a5..7a791703850c 100644
--- a/net/psp/psp_sock.c
+++ b/net/psp/psp_sock.c
@@ -58,6 +58,7 @@ struct psp_assoc *psp_assoc_create(struct psp_dev *psd)
 		return NULL;
 
 	pas->psd = psd;
+	pas->generation = psd->generation;
 	psp_dev_get(psd);
 	refcount_set(&pas->refcnt, 1);
 
@@ -235,6 +236,21 @@ int psp_sock_assoc_set_tx(struct sock *sk, struct psp_dev *psd,
 	return err;
 }
 
+void psp_assocs_key_rotated(struct psp_dev *psd)
+{
+	struct psp_assoc *pas, *next;
+
+	/* Mark the stale associations as invalid, they will no longer
+	 * be able to Rx any traffic.
+	 */
+	list_for_each_entry_safe(pas, next, &psd->prev_assocs, assocs_list)
+		pas->generation |= ~PSP_GEN_VALID_MASK;
+	list_splice_init(&psd->prev_assocs, &psd->stale_assocs);
+	list_splice_init(&psd->active_assocs, &psd->prev_assocs);
+
+	/* TODO: we should inform the sockets that got shut down */
+}
+
 void psp_twsk_init(struct tcp_timewait_sock *tw, struct sock *sk)
 {
 	struct psp_assoc *pas = psp_sk_assoc(sk);
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC net-next 09/15] net/mlx5e: Support PSP offload functionality
  2024-05-10  3:04 [RFC net-next 00/15] add basic PSP encryption for TCP connections Jakub Kicinski
                   ` (7 preceding siblings ...)
  2024-05-10  3:04 ` [RFC net-next 08/15] psp: track generations of secret state Jakub Kicinski
@ 2024-05-10  3:04 ` Jakub Kicinski
  2024-05-10  3:04 ` [RFC net-next 10/15] net/mlx5e: Implement PSP operations .assoc_add and .assoc_del Jakub Kicinski
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 44+ messages in thread
From: Jakub Kicinski @ 2024-05-10  3:04 UTC (permalink / raw
  To: netdev
  Cc: pabeni, willemdebruijn.kernel, borisp, gal, cratiu, rrameshbabu,
	steffen.klassert, tariqt, Raed Salem, Jakub Kicinski

From: Raed Salem <raeds@nvidia.com>

Add PSP offload related IFC structs, layouts, and enumerations. Implement
.set_config and .rx_spi_alloc PSP device operations. Driver does not need
to make use of the .set_config operation. Stub .assoc_add and .assoc_del
PSP operations.

Introduce the MLX5_EN_PSP configuration option for enabling PSP offload
support on mlx5 devices.

Signed-off-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 .../net/ethernet/mellanox/mlx5/core/Kconfig   |  11 ++
 .../net/ethernet/mellanox/mlx5/core/Makefile  |   4 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |   3 +
 .../ethernet/mellanox/mlx5/core/en/params.c   |   4 +-
 .../mellanox/mlx5/core/en_accel/nisp.c        | 149 ++++++++++++++++++
 .../mellanox/mlx5/core/en_accel/nisp.h        |  53 +++++++
 .../mlx5/core/en_accel/nisp_offload.c         |  52 ++++++
 .../net/ethernet/mellanox/mlx5/core/en_main.c |   9 ++
 drivers/net/ethernet/mellanox/mlx5/core/fw.c  |   6 +
 .../net/ethernet/mellanox/mlx5/core/main.c    |   5 +
 .../net/ethernet/mellanox/mlx5/core/nisp.c    |  24 +++
 .../net/ethernet/mellanox/mlx5/core/nisp.h    |  15 ++
 include/linux/mlx5/device.h                   |   4 +
 include/linux/mlx5/driver.h                   |   2 +
 include/linux/mlx5/mlx5_ifc.h                 |  98 +++++++++++-
 15 files changed, 431 insertions(+), 8 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp.h
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp_offload.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/nisp.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/nisp.h

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
index 685335832a93..0dc9665a9557 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
@@ -197,3 +197,14 @@ config MLX5_DPLL
 	help
 	  DPLL support in Mellanox Technologies ConnectX NICs.
 
+config MLX5_EN_PSP
+	bool "Mellanox Technologies support for PSP cryptography-offload acceleration"
+	depends on INET_PSP
+	depends on MLX5_CORE_EN
+	default y
+	help
+	  mlx5 device offload support for Google PSP Security Protocol offload.
+	  Adds support for PSP encryption offload and for SPI and key generation
+	  interfaces to PSP Stack which supports PSP crypto offload.
+
+	  If unsure, say Y.
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 76dc5a9b9648..c17a5e343603 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -17,7 +17,7 @@ mlx5_core-y :=	main.o cmd.o debugfs.o fw.o eq.o uar.o pagealloc.o \
 		fs_counters.o fs_ft_pool.o rl.o lag/debugfs.o lag/lag.o dev.o events.o wq.o lib/gid.o \
 		lib/devcom.o lib/pci_vsc.o lib/dm.o lib/fs_ttc.o diag/fs_tracepoint.o \
 		diag/fw_tracer.o diag/crdump.o devlink.o diag/rsc_dump.o diag/reporter_vnic.o \
-		fw_reset.o qos.o lib/tout.o lib/aso.o
+		fw_reset.o qos.o lib/tout.o lib/aso.o nisp.o
 
 #
 # Netdev basic
@@ -109,6 +109,8 @@ mlx5_core-$(CONFIG_MLX5_EN_TLS) += en_accel/ktls_stats.o \
 				   en_accel/fs_tcp.o en_accel/ktls.o en_accel/ktls_txrx.o \
 				   en_accel/ktls_tx.o en_accel/ktls_rx.o
 
+mlx5_core-$(CONFIG_MLX5_EN_PSP) += en_accel/nisp.o en_accel/nisp_offload.o
+
 mlx5_core-$(CONFIG_MLX5_SW_STEERING) += steering/dr_domain.o steering/dr_table.o \
 					steering/dr_matcher.o steering/dr_rule.o \
 					steering/dr_icm_pool.o steering/dr_buddy.o \
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index f8bd9dbf59cd..b7ceb8011a92 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -941,6 +941,9 @@ struct mlx5e_priv {
 #ifdef CONFIG_MLX5_EN_IPSEC
 	struct mlx5e_ipsec        *ipsec;
 #endif
+#ifdef CONFIG_MLX5_EN_PSP
+	struct mlx5e_nisp        *nisp;
+#endif
 #ifdef CONFIG_MLX5_EN_TLS
 	struct mlx5e_tls          *tls;
 #endif
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
index ec819dfc98be..3141abb33ff7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
@@ -6,6 +6,7 @@
 #include "en/port.h"
 #include "en_accel/en_accel.h"
 #include "en_accel/ipsec.h"
+#include "en_accel/nisp.h"
 #include <linux/dim.h>
 #include <net/page_pool/types.h>
 #include <net/xdp_sock_drv.h>
@@ -1005,7 +1006,8 @@ void mlx5e_build_sq_param(struct mlx5_core_dev *mdev,
 	bool allow_swp;
 
 	allow_swp = mlx5_geneve_tx_allowed(mdev) ||
-		    (mlx5_ipsec_device_caps(mdev) & MLX5_IPSEC_CAP_CRYPTO);
+		    (mlx5_ipsec_device_caps(mdev) & MLX5_IPSEC_CAP_CRYPTO) ||
+		    mlx5_is_nisp_device(mdev);
 	mlx5e_build_sq_param_common(mdev, param);
 	MLX5_SET(wq, wq, log_wq_sz, params->log_sq_size);
 	MLX5_SET(sqc, sqc, allow_swp, allow_swp);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp.c
new file mode 100644
index 000000000000..eff7906b3764
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp.c
@@ -0,0 +1,149 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/* Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. */
+#include <linux/mlx5/device.h>
+#include <net/psp.h>
+#include <linux/psp.h>
+#include "mlx5_core.h"
+#include "../nisp.h"
+#include "lib/crypto.h"
+#include "en_accel/nisp.h"
+
+static int
+mlx5e_psp_set_config(struct psp_dev *psd, struct psp_dev_config *conf,
+		     struct netlink_ext_ack *extack)
+{
+	return 0; /* TODO: this should actually do things to the device */
+}
+
+static int
+mlx5e_psp_rx_spi_alloc(struct psp_dev *psd, u32 version,
+		       struct psp_key_parsed *assoc,
+		       struct netlink_ext_ack *extack)
+{
+	struct mlx5e_priv *priv = netdev_priv(psd->main_netdev);
+	enum mlx5_nisp_gen_spi_in_key_size keysz;
+	struct nisp_key_spi key_spi = {};
+	u8 keysz_bytes;
+	int err;
+
+	switch (version) {
+	case PSP_VERSION_HDR0_AES_GCM_128:
+		keysz = MLX5_NISP_GEN_SPI_IN_KEY_SIZE_128;
+		keysz_bytes = 16;
+		break;
+	case PSP_VERSION_HDR0_AES_GCM_256:
+		keysz = MLX5_NISP_GEN_SPI_IN_KEY_SIZE_256;
+		keysz_bytes = 32;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	err = mlx5e_nisp_generate_key_spi(priv->mdev, keysz, keysz_bytes,
+					  &key_spi);
+	if (err)
+		return err;
+
+	assoc->spi = cpu_to_be32(key_spi.spi);
+	memcpy(assoc->key, key_spi.key, keysz_bytes);
+	return 0;
+}
+
+static int mlx5e_psp_assoc_add(struct psp_dev *psd, struct psp_assoc *pas,
+			       struct netlink_ext_ack *extack)
+{
+	struct mlx5e_priv *priv = netdev_priv(psd->main_netdev);
+
+	mlx5_core_dbg(priv->mdev, "PSP assoc add: rx: %u, tx: %u\n",
+		      be32_to_cpu(pas->rx.spi), be32_to_cpu(pas->tx.spi));
+
+	return -EINVAL;
+}
+
+static void mlx5e_psp_assoc_del(struct psp_dev *psd, struct psp_assoc *tas)
+{
+}
+
+static struct psp_dev_ops mlx5_psp_ops = {
+	.set_config   = mlx5e_psp_set_config,
+	.rx_spi_alloc = mlx5e_psp_rx_spi_alloc,
+	.tx_key_add   = mlx5e_psp_assoc_add,
+	.tx_key_del   = mlx5e_psp_assoc_del,
+};
+
+void mlx5e_nisp_unregister(struct mlx5e_priv *priv)
+{
+	if (!priv->nisp || !priv->nisp->psp)
+		return;
+
+	psp_dev_unregister(priv->nisp->psp);
+}
+
+void mlx5e_nisp_register(struct mlx5e_priv *priv)
+{
+	/* FW Caps missing */
+	if (!priv->nisp)
+		return;
+
+	priv->nisp->caps.assoc_drv_spc = sizeof(u32);
+	priv->nisp->caps.versions = 1 << PSP_VERSION_HDR0_AES_GCM_128;
+	if (MLX5_CAP_NISP(priv->mdev, nisp_crypto_esp_aes_gcm_256_encrypt) &&
+	    MLX5_CAP_NISP(priv->mdev, nisp_crypto_esp_aes_gcm_256_decrypt))
+		priv->nisp->caps.versions |= 1 << PSP_VERSION_HDR0_AES_GCM_256;
+
+	priv->nisp->psp = psp_dev_create(priv->netdev, &mlx5_psp_ops,
+					 &priv->nisp->caps, NULL);
+	if (IS_ERR(priv->nisp->psp))
+		mlx5_core_err(priv->mdev, "PSP failed to register due to %pe\n",
+			      priv->nisp->psp);
+}
+
+int mlx5e_nisp_init(struct mlx5e_priv *priv)
+{
+	struct mlx5_core_dev *mdev = priv->mdev;
+	struct mlx5e_nisp *nisp;
+
+	if (!mlx5_is_nisp_device(mdev)) {
+		mlx5_core_dbg(mdev, "NISP offload not supported\n");
+		return -EOPNOTSUPP;
+	}
+
+	if (!MLX5_CAP_ETH(mdev, swp)) {
+		mlx5_core_dbg(mdev, "SWP not supported\n");
+		return -EOPNOTSUPP;
+	}
+
+	if (!MLX5_CAP_ETH(mdev, swp_csum)) {
+		mlx5_core_dbg(mdev, "SWP checksum not supported\n");
+		return -EOPNOTSUPP;
+	}
+
+	if (!MLX5_CAP_ETH(mdev, swp_csum_l4_partial)) {
+		mlx5_core_dbg(mdev, "SWP L4 partial checksum not supported\n");
+		return -EOPNOTSUPP;
+	}
+
+	if (!MLX5_CAP_ETH(mdev, swp_lso)) {
+		mlx5_core_dbg(mdev, "NISP LSO not supported\n");
+		return -EOPNOTSUPP;
+	}
+
+	nisp = kzalloc(sizeof(*nisp), GFP_KERNEL);
+	if (!nisp)
+		return -ENOMEM;
+
+	priv->nisp = nisp;
+	mlx5_core_dbg(priv->mdev, "NISP attached to netdevice\n");
+	return 0;
+}
+
+void mlx5e_nisp_cleanup(struct mlx5e_priv *priv)
+{
+	struct mlx5e_nisp *nisp = priv->nisp;
+
+	if (!nisp)
+		return;
+
+	priv->nisp = NULL;
+	kfree(nisp);
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp.h b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp.h
new file mode 100644
index 000000000000..93eaea8b6f77
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp.h
@@ -0,0 +1,53 @@
+/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
+/* Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. */
+
+#ifndef __MLX5E_ACCEL_NISP_H__
+#define __MLX5E_ACCEL_NISP_H__
+#if IS_ENABLED(CONFIG_MLX5_EN_PSP)
+#include <net/psp/types.h>
+#include "en.h"
+
+struct mlx5e_nisp {
+	struct psp_dev *psp;
+	struct psp_dev_caps caps;
+};
+
+struct nisp_key_spi {
+	u32 spi;
+	__be32 key[PSP_MAX_KEY / sizeof(u32)];
+	u16 keysz;
+};
+
+static inline bool mlx5_is_nisp_device(struct mlx5_core_dev *mdev)
+{
+	if (!MLX5_CAP_GEN(mdev, psp))
+		return false;
+
+	if (!MLX5_CAP_NISP(mdev, nisp_crypto_esp_aes_gcm_128_encrypt) ||
+	    !MLX5_CAP_NISP(mdev, nisp_crypto_esp_aes_gcm_128_decrypt))
+		return false;
+
+	return true;
+}
+
+void mlx5e_nisp_register(struct mlx5e_priv *priv);
+void mlx5e_nisp_unregister(struct mlx5e_priv *priv);
+int mlx5e_nisp_init(struct mlx5e_priv *priv);
+void mlx5e_nisp_cleanup(struct mlx5e_priv *priv);
+int mlx5e_nisp_rotate_key(struct mlx5_core_dev *mdev);
+int mlx5e_nisp_generate_key_spi(struct mlx5_core_dev *mdev,
+				enum mlx5_nisp_gen_spi_in_key_size keysz,
+				unsigned int keysz_bytes,
+				struct nisp_key_spi *keys);
+#else
+static inline bool mlx5_is_nisp_device(struct mlx5_core_dev *mdev)
+{
+	return false;
+}
+
+static inline void mlx5e_nisp_register(struct mlx5e_priv *priv) { }
+static inline void mlx5e_nisp_unregister(struct mlx5e_priv *priv) { }
+static inline int mlx5e_nisp_init(struct mlx5e_priv *priv) { return 0; }
+static inline void mlx5e_nisp_cleanup(struct mlx5e_priv *priv) { }
+#endif /* CONFIG_MLX5_EN_PSP */
+#endif /* __MLX5E_ACCEL_NISP_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp_offload.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp_offload.c
new file mode 100644
index 000000000000..fc3268884dc9
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp_offload.c
@@ -0,0 +1,52 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/* Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. */
+#include <linux/workqueue.h>
+#include <net/psp/types.h>
+#include "mlx5_core.h"
+#include "en_accel/nisp.h"
+
+int mlx5e_nisp_rotate_key(struct mlx5_core_dev *mdev)
+{
+	u32 in[MLX5_ST_SZ_DW(nisp_rotate_key_in)] = {};
+	u32 out[MLX5_ST_SZ_DW(nisp_rotate_key_out)];
+
+	MLX5_SET(nisp_rotate_key_in, in, opcode,
+		 MLX5_CMD_OP_NISP_ROTATE_KEY);
+
+	return mlx5_cmd_exec(mdev, in, sizeof(in), out, sizeof(out));
+}
+
+int mlx5e_nisp_generate_key_spi(struct mlx5_core_dev *mdev,
+				enum mlx5_nisp_gen_spi_in_key_size keysz,
+				unsigned int keysz_bytes,
+				struct nisp_key_spi *keys)
+{
+	u32 in[MLX5_ST_SZ_DW(nisp_gen_spi_in)] = {};
+	int err, outlen, i;
+	void *out, *outkey;
+
+	WARN_ON_ONCE(keysz_bytes > PSP_MAX_KEY);
+
+	outlen = MLX5_ST_SZ_BYTES(nisp_gen_spi_out) + MLX5_ST_SZ_BYTES(key_spi);
+	out = kzalloc(outlen, GFP_KERNEL);
+	if (!out)
+		return -ENOMEM;
+
+	MLX5_SET(nisp_gen_spi_in, in, opcode, MLX5_CMD_OP_NISP_GEN_SPI);
+	MLX5_SET(nisp_gen_spi_in, in, key_size, keysz);
+	MLX5_SET(nisp_gen_spi_in, in, num_of_spi, 1);
+	err = mlx5_cmd_exec(mdev, in, sizeof(in), out, outlen);
+	if (err)
+		goto out;
+
+	outkey = MLX5_ADDR_OF(nisp_gen_spi_out, out, key_spi);
+	keys->keysz = keysz_bytes * BITS_PER_BYTE;
+	keys->spi = MLX5_GET(key_spi, outkey, spi);
+	for (i = 0; i < keysz_bytes / sizeof(*keys->key); ++i)
+		keys->key[i] = cpu_to_be32(MLX5_GET(key_spi,
+						    outkey + (32 - keysz_bytes), key[i]));
+
+out:
+	kfree(out);
+	return err;
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index ffe8919494d5..4948e19c3f3f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -49,6 +49,7 @@
 #include "en_tc.h"
 #include "en_rep.h"
 #include "en_accel/ipsec.h"
+#include "en_accel/nisp.h"
 #include "en_accel/macsec.h"
 #include "en_accel/en_accel.h"
 #include "en_accel/ktls.h"
@@ -5506,6 +5507,7 @@ static int mlx5e_nic_init(struct mlx5_core_dev *mdev,
 	if (take_rtnl)
 		rtnl_lock();
 
+	mlx5e_nisp_register(priv);
 	/* update XDP supported features */
 	mlx5e_set_xdp_feature(netdev);
 
@@ -5518,6 +5520,7 @@ static int mlx5e_nic_init(struct mlx5_core_dev *mdev,
 static void mlx5e_nic_cleanup(struct mlx5e_priv *priv)
 {
 	mlx5e_health_destroy_reporters(priv);
+	mlx5e_nisp_unregister(priv);
 	mlx5e_ktls_cleanup(priv);
 	mlx5e_fs_cleanup(priv->fs);
 	debugfs_remove_recursive(priv->dfs_root);
@@ -5645,6 +5648,10 @@ static void mlx5e_nic_enable(struct mlx5e_priv *priv)
 	if (err)
 		mlx5_core_err(mdev, "MACsec initialization failed, %d\n", err);
 
+	err = mlx5e_nisp_init(priv);
+	if (err)
+		mlx5_core_err(mdev, "PSP initialization failed, %d\n", err);
+
 	/* Marking the link as currently not needed by the Driver */
 	if (!netif_running(netdev))
 		mlx5e_modify_admin_state(mdev, MLX5_PORT_DOWN);
@@ -5702,6 +5709,7 @@ static void mlx5e_nic_disable(struct mlx5e_priv *priv)
 	mlx5e_disable_async_events(priv);
 	mlx5_lag_remove_netdev(mdev, priv->netdev);
 	mlx5_vxlan_reset_to_default(mdev->vxlan);
+	mlx5e_nisp_cleanup(priv);
 	mlx5e_macsec_cleanup(priv);
 	mlx5e_ipsec_cleanup(priv);
 }
@@ -6337,6 +6345,7 @@ static void _mlx5e_remove(struct auxiliary_device *adev)
 
 	mlx5_core_uplink_netdev_set(mdev, NULL);
 	mlx5e_dcbnl_delete_app(priv);
+	mlx5e_nisp_unregister(priv);
 	unregister_netdev(priv->netdev);
 	_mlx5e_suspend(adev);
 	priv->profile->cleanup(priv);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw.c b/drivers/net/ethernet/mellanox/mlx5/core/fw.c
index 2d95a9b7b44e..eee318d64d98 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fw.c
@@ -280,6 +280,12 @@ int mlx5_query_hca_caps(struct mlx5_core_dev *dev)
 			return err;
 	}
 
+	if (MLX5_CAP_GEN(dev, psp)) {
+		err = mlx5_core_get_caps(dev, MLX5_CAP_NISP);
+		if (err)
+			return err;
+	}
+
 	return 0;
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 331ce47f51a1..8625d593ed89 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -74,6 +74,7 @@
 #include "mlx5_irq.h"
 #include "hwmon.h"
 #include "lag/lag.h"
+#include "nisp.h"
 
 MODULE_AUTHOR("Eli Cohen <eli@mellanox.com>");
 MODULE_DESCRIPTION("Mellanox 5th generation network adapters (ConnectX series) core driver");
@@ -1013,6 +1014,7 @@ static int mlx5_init_once(struct mlx5_core_dev *dev)
 
 	dev->vxlan = mlx5_vxlan_create(dev);
 	dev->geneve = mlx5_geneve_create(dev);
+	dev->nisp = mlx5_nisp_create(dev);
 
 	err = mlx5_init_rl_table(dev);
 	if (err) {
@@ -1095,6 +1097,7 @@ static int mlx5_init_once(struct mlx5_core_dev *dev)
 err_rl_cleanup:
 	mlx5_cleanup_rl_table(dev);
 err_tables_cleanup:
+	mlx5_nisp_destroy(dev->nisp);
 	mlx5_geneve_destroy(dev->geneve);
 	mlx5_vxlan_destroy(dev->vxlan);
 	mlx5_cleanup_clock(dev);
@@ -1129,6 +1132,7 @@ static void mlx5_cleanup_once(struct mlx5_core_dev *dev)
 	mlx5_sriov_cleanup(dev);
 	mlx5_mpfs_cleanup(dev);
 	mlx5_cleanup_rl_table(dev);
+	mlx5_nisp_destroy(dev->nisp);
 	mlx5_geneve_destroy(dev->geneve);
 	mlx5_vxlan_destroy(dev->vxlan);
 	mlx5_cleanup_clock(dev);
@@ -1763,6 +1767,7 @@ static const int types[] = {
 	MLX5_CAP_VDPA_EMULATION,
 	MLX5_CAP_IPSEC,
 	MLX5_CAP_PORT_SELECTION,
+	MLX5_CAP_NISP,
 	MLX5_CAP_MACSEC,
 	MLX5_CAP_ADV_VIRTUALIZATION,
 	MLX5_CAP_CRYPTO,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/nisp.c b/drivers/net/ethernet/mellanox/mlx5/core/nisp.c
new file mode 100644
index 000000000000..f82734df8bf0
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/nisp.c
@@ -0,0 +1,24 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/* Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. */
+
+#include "nisp.h"
+
+struct mlx5_nisp *mlx5_nisp_create(struct mlx5_core_dev *mdev)
+{
+	struct mlx5_nisp *nisp = kzalloc(sizeof(*nisp), GFP_KERNEL);
+
+	if (!nisp)
+		return ERR_PTR(-ENOMEM);
+
+	nisp->mdev = mdev;
+
+	return nisp;
+}
+
+void mlx5_nisp_destroy(struct mlx5_nisp *nisp)
+{
+	if (IS_ERR_OR_NULL(nisp))
+		return;
+
+	kfree(nisp);
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/nisp.h b/drivers/net/ethernet/mellanox/mlx5/core/nisp.h
new file mode 100644
index 000000000000..c15e50e7ada7
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/nisp.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
+/* Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. */
+
+#ifndef __MLX5_NISP_H__
+#define __MLX5_NISP_H__
+#include <linux/mlx5/driver.h>
+
+struct mlx5_nisp {
+	struct mlx5_core_dev *mdev;
+};
+
+struct mlx5_nisp *mlx5_nisp_create(struct mlx5_core_dev *mdev);
+void mlx5_nisp_destroy(struct mlx5_nisp *nisp);
+
+#endif /* __MLX5_NISP_H__ */
diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h
index d7bb31d9a446..6c00c78bad53 100644
--- a/include/linux/mlx5/device.h
+++ b/include/linux/mlx5/device.h
@@ -1223,6 +1223,7 @@ enum mlx5_cap_type {
 	MLX5_CAP_DEV_EVENT = 0x14,
 	MLX5_CAP_IPSEC,
 	MLX5_CAP_CRYPTO = 0x1a,
+	MLX5_CAP_NISP = 0x1e,
 	MLX5_CAP_MACSEC = 0x1f,
 	MLX5_CAP_GENERAL_2 = 0x20,
 	MLX5_CAP_PORT_SELECTION = 0x25,
@@ -1435,6 +1436,9 @@ enum mlx5_qcam_feature_groups {
 #define MLX5_CAP_MACSEC(mdev, cap)\
 	MLX5_GET(macsec_cap, (mdev)->caps.hca[MLX5_CAP_MACSEC]->cur, cap)
 
+#define MLX5_CAP_NISP(mdev, cap)\
+	MLX5_GET(nisp_cap, (mdev)->caps.hca[MLX5_CAP_NISP]->cur, cap)
+
 enum {
 	MLX5_CMD_STAT_OK			= 0x0,
 	MLX5_CMD_STAT_INT_ERR			= 0x1,
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index bf9324a31ae9..8d6060866163 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -515,6 +515,7 @@ struct mlx5_sf_dev_table;
 struct mlx5_sf_hw_table;
 struct mlx5_sf_table;
 struct mlx5_crypto_dek_priv;
+struct mlx5_nisp;
 
 struct mlx5_rate_limit {
 	u32			rate;
@@ -824,6 +825,7 @@ struct mlx5_core_dev {
 #endif
 	u64 num_ipsec_offloads;
 	struct mlx5_sd          *sd;
+	struct mlx5_nisp        *nisp;
 };
 
 struct mlx5_db {
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index f468763478ae..a48e6a293602 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -309,6 +309,8 @@ enum {
 	MLX5_CMD_OP_CREATE_UMEM                   = 0xa08,
 	MLX5_CMD_OP_DESTROY_UMEM                  = 0xa0a,
 	MLX5_CMD_OP_SYNC_STEERING                 = 0xb00,
+	MLX5_CMD_OP_NISP_GEN_SPI                  = 0xb10,
+	MLX5_CMD_OP_NISP_ROTATE_KEY               = 0xb11,
 	MLX5_CMD_OP_QUERY_VHCA_STATE              = 0xb0d,
 	MLX5_CMD_OP_MODIFY_VHCA_STATE             = 0xb0e,
 	MLX5_CMD_OP_SYNC_CRYPTO                   = 0xb12,
@@ -477,12 +479,14 @@ struct mlx5_ifc_flow_table_prop_layout_bits {
 	u8         execute_aso[0x1];
 	u8         reserved_at_47[0x19];
 
-	u8         reserved_at_60[0x2];
+	u8         reformat_l2_to_l3_nisp_tunnel[0x1];
+	u8         reformat_l3_nisp_tunnel_to_l2[0x1];
 	u8         reformat_insert[0x1];
 	u8         reformat_remove[0x1];
 	u8         macsec_encrypt[0x1];
 	u8         macsec_decrypt[0x1];
-	u8         reserved_at_66[0x2];
+	u8         nisp_encrypt[0x1];
+	u8         nisp_decrypt[0x1];
 	u8         reformat_add_macsec[0x1];
 	u8         reformat_remove_macsec[0x1];
 	u8         reserved_at_6a[0xe];
@@ -670,7 +674,7 @@ struct mlx5_ifc_fte_match_set_misc2_bits {
 
 	u8         metadata_reg_a[0x20];
 
-	u8         reserved_at_1a0[0x8];
+	u8         nisp_syndrome[0x8];
 
 	u8         macsec_syndrome[0x8];
 	u8         ipsec_syndrome[0x8];
@@ -1093,7 +1097,8 @@ struct mlx5_ifc_per_protocol_networking_offload_caps_bits {
 	u8         tunnel_stateless_ip_over_ip_tx[0x1];
 	u8         reserved_at_2e[0x2];
 	u8         max_vxlan_udp_ports[0x8];
-	u8         reserved_at_38[0x6];
+	u8         swp_csum_l4_partial[0x1];
+	u8         reserved_at_39[0x5];
 	u8         max_geneve_opt_len[0x1];
 	u8         tunnel_stateless_geneve_rx[0x1];
 
@@ -1390,6 +1395,19 @@ struct mlx5_ifc_macsec_cap_bits {
 	u8    reserved_at_40[0x7c0];
 };
 
+struct mlx5_ifc_nisp_cap_bits {
+	u8         reserved_at_0[0x1];
+	u8         nisp_crypto_offload[0x1]; /* Set by the driver */
+	u8         reserved_at_2[0x1];
+	u8         nisp_crypto_esp_aes_gcm_256_encrypt[0x1];
+	u8         nisp_crypto_esp_aes_gcm_128_encrypt[0x1];
+	u8         nisp_crypto_esp_aes_gcm_256_decrypt[0x1];
+	u8         nisp_crypto_esp_aes_gcm_128_decrypt[0x1];
+	u8         reserved_at_7[0x4];
+	u8         log_max_num_of_nisp_spi[0x5];
+	u8         reserved_at_10[0x7f0];
+};
+
 enum {
 	MLX5_WQ_TYPE_LINKED_LIST  = 0x0,
 	MLX5_WQ_TYPE_CYCLIC       = 0x1,
@@ -1521,7 +1539,7 @@ struct mlx5_ifc_cmd_hca_cap_bits {
 	u8         reg_c_preserve[0x1];
 	u8         reserved_at_aa[0x1];
 	u8         log_max_srq[0x5];
-	u8         reserved_at_b0[0x1];
+	u8	   reserved_at_b0[0x1];
 	u8         uplink_follow[0x1];
 	u8         ts_cqe_to_dest_cqn[0x1];
 	u8         reserved_at_b3[0x6];
@@ -1744,7 +1762,9 @@ struct mlx5_ifc_cmd_hca_cap_bits {
 	u8         reserved_at_280[0x10];
 	u8         max_wqe_sz_sq[0x10];
 
-	u8         reserved_at_2a0[0x10];
+	u8         reserved_at_2a0[0xa];
+	u8         psp[0x1];
+	u8         reserved_at_2b1[0x5];
 	u8         max_wqe_sz_rq[0x10];
 
 	u8         max_flow_counter_31_16[0x10];
@@ -3519,6 +3539,7 @@ union mlx5_ifc_hca_cap_union_bits {
 	struct mlx5_ifc_macsec_cap_bits macsec_cap;
 	struct mlx5_ifc_crypto_cap_bits crypto_cap;
 	struct mlx5_ifc_ipsec_cap_bits ipsec_cap;
+	struct mlx5_ifc_nisp_cap_bits nisp_cap;
 	u8         reserved_at_0[0x8000];
 };
 
@@ -3548,6 +3569,7 @@ enum {
 enum {
 	MLX5_FLOW_CONTEXT_ENCRYPT_DECRYPT_TYPE_IPSEC   = 0x0,
 	MLX5_FLOW_CONTEXT_ENCRYPT_DECRYPT_TYPE_MACSEC  = 0x1,
+	MLX5_FLOW_CONTEXT_ENCRYPT_DECRYPT_TYPE_NISP    = 0x2,
 };
 
 struct mlx5_ifc_vlan_bits {
@@ -6747,6 +6769,8 @@ enum mlx5_reformat_ctx_type {
 	MLX5_REFORMAT_TYPE_DEL_ESP_TRANSPORT_OVER_UDP = 0xa,
 	MLX5_REFORMAT_TYPE_ADD_ESP_TRANSPORT_OVER_IPV6 = 0xb,
 	MLX5_REFORMAT_TYPE_ADD_ESP_TRANSPORT_OVER_UDPV6 = 0xc,
+	MLX5_REFORMAT_TYPE_ADD_NISP_TUNNEL = 0xd,
+	MLX5_REFORMAT_TYPE_DEL_NISP_TUNNEL = 0xe,
 	MLX5_REFORMAT_TYPE_INSERT_HDR = 0xf,
 	MLX5_REFORMAT_TYPE_REMOVE_HDR = 0x10,
 	MLX5_REFORMAT_TYPE_ADD_MACSEC = 0x11,
@@ -6873,6 +6897,7 @@ enum {
 	MLX5_ACTION_IN_FIELD_IPSEC_SYNDROME    = 0x5D,
 	MLX5_ACTION_IN_FIELD_OUT_EMD_47_32     = 0x6F,
 	MLX5_ACTION_IN_FIELD_OUT_EMD_31_0      = 0x70,
+	MLX5_ACTION_IN_FIELD_NISP_SYNDROME     = 0x71,
 };
 
 struct mlx5_ifc_alloc_modify_header_context_out_bits {
@@ -12452,6 +12477,7 @@ enum {
 	MLX5_GENERAL_OBJECT_TYPE_ENCRYPTION_KEY_PURPOSE_TLS = 0x1,
 	MLX5_GENERAL_OBJECT_TYPE_ENCRYPTION_KEY_PURPOSE_IPSEC = 0x2,
 	MLX5_GENERAL_OBJECT_TYPE_ENCRYPTION_KEY_PURPOSE_MACSEC = 0x4,
+	MLX5_GENERAL_OBJECT_TYPE_ENCRYPTION_KEY_PURPOSE_NISP = 0x6,
 };
 
 struct mlx5_ifc_tls_static_params_bits {
@@ -12786,4 +12812,64 @@ struct mlx5_ifc_msees_reg_bits {
 	u8         reserved_at_80[0x180];
 };
 
+struct mlx5_ifc_nisp_rotate_key_in_bits {
+	u8         opcode[0x10];
+	u8         uid[0x10];
+
+	u8         reserved_at_20[0x10];
+	u8         op_mod[0x10];
+
+	u8         reserved_at_40[0x40];
+};
+
+struct mlx5_ifc_nisp_rotate_key_out_bits {
+	u8         status[0x8];
+	u8         reserved_at_8[0x18];
+
+	u8         syndrome[0x20];
+
+	u8         reserved_at_40[0x40];
+};
+
+enum mlx5_nisp_gen_spi_in_key_size {
+	MLX5_NISP_GEN_SPI_IN_KEY_SIZE_128 = 0x0,
+	MLX5_NISP_GEN_SPI_IN_KEY_SIZE_256 = 0x1,
+};
+
+struct mlx5_ifc_key_spi_bits {
+	u8         spi[0x20];
+
+	u8         reserved_at_20[0x60];
+
+	u8         key[8][0x20];
+};
+
+struct mlx5_ifc_nisp_gen_spi_in_bits {
+	u8         opcode[0x10];
+	u8         uid[0x10];
+
+	u8         reserved_at_20[0x10];
+	u8         op_mod[0x10];
+
+	u8         reserved_at_40[0x20];
+
+	u8         key_size[0x2];
+	u8         reserved_at_62[0xe];
+	u8         num_of_spi[0x10];
+};
+
+struct mlx5_ifc_nisp_gen_spi_out_bits {
+	u8         status[0x8];
+	u8         reserved_at_8[0x18];
+
+	u8         syndrome[0x20];
+
+	u8         reserved_at_40[0x10];
+	u8         num_of_spi[0x10];
+
+	u8         reserved_at_60[0x20];
+
+	struct mlx5_ifc_key_spi_bits key_spi[0];
+};
+
 #endif /* MLX5_IFC_H */
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC net-next 10/15] net/mlx5e: Implement PSP operations .assoc_add and .assoc_del
  2024-05-10  3:04 [RFC net-next 00/15] add basic PSP encryption for TCP connections Jakub Kicinski
                   ` (8 preceding siblings ...)
  2024-05-10  3:04 ` [RFC net-next 09/15] net/mlx5e: Support PSP offload functionality Jakub Kicinski
@ 2024-05-10  3:04 ` Jakub Kicinski
  2024-05-10  3:04 ` [RFC net-next 11/15] net/mlx5e: Implement PSP Tx data path Jakub Kicinski
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 44+ messages in thread
From: Jakub Kicinski @ 2024-05-10  3:04 UTC (permalink / raw
  To: netdev
  Cc: pabeni, willemdebruijn.kernel, borisp, gal, cratiu, rrameshbabu,
	steffen.klassert, tariqt, Raed Salem, Jakub Kicinski

From: Raed Salem <raeds@nvidia.com>

Implement .assoc_add and .assoc_del PSP operations used in the tx control
path. Allocate the relevant hardware resources when a new key is registered
using .assoc_add. Destroy the key when .assoc_del is called. Use a atomic
counter to keep track of the current number of keys being used by the
device.

Signed-off-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 .../net/ethernet/mellanox/mlx5/core/Makefile  |   2 +-
 .../mellanox/mlx5/core/en_accel/en_accel.h    |   8 +
 .../mellanox/mlx5/core/en_accel/nisp.c        |  57 ++++-
 .../mellanox/mlx5/core/en_accel/nisp.h        |   2 +
 .../mellanox/mlx5/core/en_accel/nisp_fs.c     | 234 ++++++++++++++++++
 .../mellanox/mlx5/core/en_accel/nisp_fs.h     |  23 ++
 .../net/ethernet/mellanox/mlx5/core/en_main.c |  10 +-
 .../ethernet/mellanox/mlx5/core/lib/crypto.h  |   1 +
 8 files changed, 327 insertions(+), 10 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp_fs.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp_fs.h

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index c17a5e343603..5ce78b84c763 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -109,7 +109,7 @@ mlx5_core-$(CONFIG_MLX5_EN_TLS) += en_accel/ktls_stats.o \
 				   en_accel/fs_tcp.o en_accel/ktls.o en_accel/ktls_txrx.o \
 				   en_accel/ktls_tx.o en_accel/ktls_rx.o
 
-mlx5_core-$(CONFIG_MLX5_EN_PSP) += en_accel/nisp.o en_accel/nisp_offload.o
+mlx5_core-$(CONFIG_MLX5_EN_PSP) += en_accel/nisp.o en_accel/nisp_offload.o en_accel/nisp_fs.o
 
 mlx5_core-$(CONFIG_MLX5_SW_STEERING) += steering/dr_domain.o steering/dr_table.o \
 					steering/dr_matcher.o steering/dr_rule.o \
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/en_accel.h b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/en_accel.h
index caa34b9c161e..c15e48b0724c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/en_accel.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/en_accel.h
@@ -42,6 +42,7 @@
 #include <en_accel/macsec.h>
 #include "en.h"
 #include "en/txrx.h"
+#include "en_accel/nisp_fs.h"
 
 #if IS_ENABLED(CONFIG_GENEVE)
 #include <net/geneve.h>
@@ -212,11 +213,18 @@ static inline void mlx5e_accel_cleanup_rx(struct mlx5e_priv *priv)
 
 static inline int mlx5e_accel_init_tx(struct mlx5e_priv *priv)
 {
+	int err;
+
+	err = mlx5_accel_nisp_fs_init_tx_tables(priv);
+	if (err)
+		return err;
+
 	return mlx5e_ktls_init_tx(priv);
 }
 
 static inline void mlx5e_accel_cleanup_tx(struct mlx5e_priv *priv)
 {
 	mlx5e_ktls_cleanup_tx(priv);
+	mlx5_accel_nisp_fs_cleanup_tx_tables(priv);
 }
 #endif /* __MLX5E_EN_ACCEL_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp.c
index eff7906b3764..1131aa6e9b3d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp.c
@@ -7,6 +7,12 @@
 #include "../nisp.h"
 #include "lib/crypto.h"
 #include "en_accel/nisp.h"
+#include "en_accel/nisp_fs.h"
+
+struct mlx5e_nisp_sa_entry {
+	struct mlx5e_accel_nisp_rule *nisp_rule;
+	u32 enc_key_id;
+};
 
 static int
 mlx5e_psp_set_config(struct psp_dev *psd, struct psp_dev_config *conf,
@@ -49,19 +55,45 @@ mlx5e_psp_rx_spi_alloc(struct psp_dev *psd, u32 version,
 	return 0;
 }
 
+struct nisp_key {
+	u32 id;
+};
+
 static int mlx5e_psp_assoc_add(struct psp_dev *psd, struct psp_assoc *pas,
 			       struct netlink_ext_ack *extack)
 {
 	struct mlx5e_priv *priv = netdev_priv(psd->main_netdev);
+	struct mlx5_core_dev *mdev = priv->mdev;
+	struct mlx5e_nisp *nisp = priv->nisp;
+	struct psp_key_parsed *tx = &pas->tx;
+	struct nisp_key *nkey;
+	int err;
 
-	mlx5_core_dbg(priv->mdev, "PSP assoc add: rx: %u, tx: %u\n",
-		      be32_to_cpu(pas->rx.spi), be32_to_cpu(pas->tx.spi));
+	mdev = priv->mdev;
+	nkey = (struct nisp_key *)pas->drv_data;
 
-	return -EINVAL;
+	err = mlx5_create_encryption_key(mdev, tx->key,
+					 pas->key_sz,
+					 MLX5_ACCEL_OBJ_NISP_KEY,
+					 &nkey->id);
+	if (err) {
+		mlx5_core_err(mdev, "Failed to create encryption key (err = %d)\n", err);
+		return err;
+	}
+
+	atomic_inc(&nisp->tx_key_cnt);
+	return 0;
 }
 
-static void mlx5e_psp_assoc_del(struct psp_dev *psd, struct psp_assoc *tas)
+static void mlx5e_psp_assoc_del(struct psp_dev *psd, struct psp_assoc *pas)
 {
+	struct mlx5e_priv *priv = netdev_priv(psd->main_netdev);
+	struct mlx5e_nisp *nisp = priv->nisp;
+	struct nisp_key *nkey;
+
+	nkey = (struct nisp_key *)pas->drv_data;
+	mlx5_destroy_encryption_key(priv->mdev, nkey->id);
+	atomic_dec(&nisp->tx_key_cnt);
 }
 
 static struct psp_dev_ops mlx5_psp_ops = {
@@ -101,7 +133,9 @@ void mlx5e_nisp_register(struct mlx5e_priv *priv)
 int mlx5e_nisp_init(struct mlx5e_priv *priv)
 {
 	struct mlx5_core_dev *mdev = priv->mdev;
+	struct mlx5e_nisp_fs *fs;
 	struct mlx5e_nisp *nisp;
+	int err;
 
 	if (!mlx5_is_nisp_device(mdev)) {
 		mlx5_core_dbg(mdev, "NISP offload not supported\n");
@@ -133,8 +167,21 @@ int mlx5e_nisp_init(struct mlx5e_priv *priv)
 		return -ENOMEM;
 
 	priv->nisp = nisp;
+	fs = mlx5e_accel_nisp_fs_init(priv);
+	if (IS_ERR(fs)) {
+		err = PTR_ERR(fs);
+		goto out_err;
+	}
+
+	nisp->fs = fs;
+
 	mlx5_core_dbg(priv->mdev, "NISP attached to netdevice\n");
 	return 0;
+
+out_err:
+	priv->nisp = NULL;
+	kfree(nisp);
+	return err;
 }
 
 void mlx5e_nisp_cleanup(struct mlx5e_priv *priv)
@@ -144,6 +191,8 @@ void mlx5e_nisp_cleanup(struct mlx5e_priv *priv)
 	if (!nisp)
 		return;
 
+	WARN_ON(atomic_read(&nisp->tx_key_cnt));
+	mlx5e_accel_nisp_fs_cleanup(nisp->fs);
 	priv->nisp = NULL;
 	kfree(nisp);
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp.h b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp.h
index 93eaea8b6f77..14e5813367a7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp.h
@@ -10,6 +10,8 @@
 struct mlx5e_nisp {
 	struct psp_dev *psp;
 	struct psp_dev_caps caps;
+	struct mlx5e_nisp_fs *fs;
+	atomic_t tx_key_cnt;
 };
 
 struct nisp_key_spi {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp_fs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp_fs.c
new file mode 100644
index 000000000000..5d2ce83db7cc
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp_fs.c
@@ -0,0 +1,234 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/* Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. */
+
+#include <linux/netdevice.h>
+#include <linux/mlx5/fs.h>
+#include "en.h"
+#include "fs_core.h"
+#include "en_accel/nisp_fs.h"
+#include "en_accel/nisp.h"
+
+struct mlx5e_nisp_tx {
+	struct mlx5_flow_namespace *ns;
+	struct mlx5_flow_table *ft;
+	struct mlx5_flow_group *fg;
+	struct mlx5_flow_handle *rule;
+	struct mutex mutex; /* Protect NISP TX steering */
+	u32 refcnt;
+};
+
+struct mlx5e_nisp_fs {
+	struct mlx5_core_dev *mdev;
+	struct mlx5e_nisp_tx *tx_fs;
+	struct mlx5e_flow_steering *fs;
+};
+
+enum accel_nisp_rule_action {
+	ACCEL_NISP_RULE_ACTION_ENCRYPT,
+};
+
+struct mlx5e_accel_nisp_rule {
+	struct mlx5_flow_handle *rule;
+	u8 action;
+};
+
+static void setup_fte_udp_psp(struct mlx5_flow_spec *spec, u16 udp_port)
+{
+	spec->match_criteria_enable |= MLX5_MATCH_OUTER_HEADERS;
+	MLX5_SET(fte_match_set_lyr_2_4, spec->match_criteria, udp_dport, 0xffff);
+	MLX5_SET(fte_match_set_lyr_2_4, spec->match_value, udp_dport, udp_port);
+	MLX5_SET_TO_ONES(fte_match_set_lyr_2_4, spec->match_criteria, ip_protocol);
+	MLX5_SET(fte_match_set_lyr_2_4, spec->match_value, ip_protocol, IPPROTO_UDP);
+}
+
+static int accel_nisp_fs_tx_create_ft_table(struct mlx5e_nisp_fs *fs)
+{
+	int inlen = MLX5_ST_SZ_BYTES(create_flow_group_in);
+	struct mlx5_flow_table_attr ft_attr = {};
+	struct mlx5_core_dev *mdev = fs->mdev;
+	struct mlx5_flow_act flow_act = {};
+	u32 *in, *mc, *outer_headers_c;
+	struct mlx5_flow_handle *rule;
+	struct mlx5_flow_spec *spec;
+	struct mlx5e_nisp_tx *tx_fs;
+	struct mlx5_flow_table *ft;
+	struct mlx5_flow_group *fg;
+	int err = 0;
+
+	spec = kvzalloc(sizeof(*spec), GFP_KERNEL);
+	in = kvzalloc(inlen, GFP_KERNEL);
+	if (!spec || !in) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	ft_attr.max_fte = 1;
+#define MLX5E_NISP_PRIO 0
+	ft_attr.prio = MLX5E_NISP_PRIO;
+#define MLX5E_NISP_LEVEL 0
+	ft_attr.level = MLX5E_NISP_LEVEL;
+	ft_attr.autogroup.max_num_groups = 1;
+
+	tx_fs = fs->tx_fs;
+	ft = mlx5_create_flow_table(tx_fs->ns, &ft_attr);
+	if (IS_ERR(ft)) {
+		err = PTR_ERR(ft);
+		mlx5_core_err(mdev, "PSP: fail to add psp tx flow table, err = %d\n", err);
+		goto out;
+	}
+
+	mc = MLX5_ADDR_OF(create_flow_group_in, in, match_criteria);
+	outer_headers_c = MLX5_ADDR_OF(fte_match_param, mc, outer_headers);
+	MLX5_SET_TO_ONES(fte_match_set_lyr_2_4, outer_headers_c, ip_protocol);
+	MLX5_SET_TO_ONES(fte_match_set_lyr_2_4, outer_headers_c, udp_dport);
+	MLX5_SET_CFG(in, match_criteria_enable, MLX5_MATCH_OUTER_HEADERS);
+	fg = mlx5_create_flow_group(ft, in);
+	if (IS_ERR(fg)) {
+		err = PTR_ERR(fg);
+		mlx5_core_err(mdev, "PSP: fail to add psp tx flow group, err = %d\n", err);
+		goto err_create_fg;
+	}
+
+	setup_fte_udp_psp(spec, PSP_DEFAULT_UDP_PORT);
+	flow_act.crypto.type = MLX5_FLOW_CONTEXT_ENCRYPT_DECRYPT_TYPE_NISP;
+	flow_act.flags |= FLOW_ACT_NO_APPEND;
+	flow_act.action = MLX5_FLOW_CONTEXT_ACTION_ALLOW |
+			  MLX5_FLOW_CONTEXT_ACTION_CRYPTO_ENCRYPT;
+	rule = mlx5_add_flow_rules(ft, spec, &flow_act, NULL, 0);
+	if (IS_ERR(rule)) {
+		err = PTR_ERR(rule);
+		mlx5_core_err(mdev, "PSP: fail to add psp tx flow rule, err = %d\n", err);
+		goto err_add_flow_rule;
+	}
+
+	tx_fs->ft = ft;
+	tx_fs->fg = fg;
+	tx_fs->rule = rule;
+	goto out;
+
+err_add_flow_rule:
+	mlx5_destroy_flow_group(fg);
+err_create_fg:
+	mlx5_destroy_flow_table(ft);
+out:
+	kvfree(in);
+	kvfree(spec);
+	return err;
+}
+
+static void accel_nisp_fs_tx_destroy(struct mlx5e_nisp_tx *tx_fs)
+{
+	if (!tx_fs->ft)
+		return;
+
+	mlx5_del_flow_rules(tx_fs->rule);
+	mlx5_destroy_flow_group(tx_fs->fg);
+	mlx5_destroy_flow_table(tx_fs->ft);
+}
+
+static int accel_nisp_fs_tx_ft_get(struct mlx5e_nisp_fs *fs)
+{
+	struct mlx5e_nisp_tx *tx_fs = fs->tx_fs;
+	int err = 0;
+
+	mutex_lock(&tx_fs->mutex);
+	if (tx_fs->refcnt++)
+		goto out;
+
+	err = accel_nisp_fs_tx_create_ft_table(fs);
+	if (err)
+		tx_fs->refcnt--;
+out:
+	mutex_unlock(&tx_fs->mutex);
+	return err;
+}
+
+static void accel_nisp_fs_tx_ft_put(struct mlx5e_nisp_fs *fs)
+{
+	struct mlx5e_nisp_tx *tx_fs = fs->tx_fs;
+
+	mutex_lock(&tx_fs->mutex);
+	if (--tx_fs->refcnt)
+		goto out;
+
+	accel_nisp_fs_tx_destroy(tx_fs);
+out:
+	mutex_unlock(&tx_fs->mutex);
+}
+
+static void accel_nisp_fs_cleanup_tx(struct mlx5e_nisp_fs *fs)
+{
+	struct mlx5e_nisp_tx *tx_fs = fs->tx_fs;
+
+	if (!tx_fs)
+		return;
+
+	mutex_destroy(&tx_fs->mutex);
+	WARN_ON(tx_fs->refcnt);
+	kfree(tx_fs);
+	fs->tx_fs = NULL;
+}
+
+static int accel_nisp_fs_init_tx(struct mlx5e_nisp_fs *fs)
+{
+	struct mlx5_flow_namespace *ns;
+	struct mlx5e_nisp_tx *tx_fs;
+
+	ns = mlx5_get_flow_namespace(fs->mdev,
+				     MLX5_FLOW_NAMESPACE_EGRESS_IPSEC);
+	if (!ns)
+		return -EOPNOTSUPP;
+
+	tx_fs = kzalloc(sizeof(*tx_fs), GFP_KERNEL);
+	if (!tx_fs)
+		return -ENOMEM;
+
+	mutex_init(&tx_fs->mutex);
+	tx_fs->ns = ns;
+	fs->tx_fs = tx_fs;
+	return 0;
+}
+
+void mlx5_accel_nisp_fs_cleanup_tx_tables(struct mlx5e_priv *priv)
+{
+	if (!priv->nisp)
+		return;
+
+	accel_nisp_fs_tx_ft_put(priv->nisp->fs);
+}
+
+int mlx5_accel_nisp_fs_init_tx_tables(struct mlx5e_priv *priv)
+{
+	if (!priv->nisp)
+		return 0;
+
+	return accel_nisp_fs_tx_ft_get(priv->nisp->fs);
+}
+
+void mlx5e_accel_nisp_fs_cleanup(struct mlx5e_nisp_fs *fs)
+{
+	accel_nisp_fs_cleanup_tx(fs);
+	kfree(fs);
+}
+
+struct mlx5e_nisp_fs *mlx5e_accel_nisp_fs_init(struct mlx5e_priv *priv)
+{
+	struct mlx5e_nisp_fs *fs;
+	int err = 0;
+
+	fs = kzalloc(sizeof(*fs), GFP_KERNEL);
+	if (!fs)
+		return ERR_PTR(-ENOMEM);
+
+	fs->mdev = priv->mdev;
+	err = accel_nisp_fs_init_tx(fs);
+	if (err)
+		goto err_tx;
+
+	fs->fs = priv->fs;
+
+	return fs;
+err_tx:
+	kfree(fs);
+	return ERR_PTR(err);
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp_fs.h b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp_fs.h
new file mode 100644
index 000000000000..11cdc447a401
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp_fs.h
@@ -0,0 +1,23 @@
+/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
+/* Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. */
+
+#ifndef __MLX5_NISP_FS_H__
+#define __MLX5_NISP_FS_H__
+
+#ifdef CONFIG_MLX5_EN_PSP
+
+struct mlx5e_nisp_fs;
+
+struct mlx5e_nisp_fs *mlx5e_accel_nisp_fs_init(struct mlx5e_priv *priv);
+void mlx5e_accel_nisp_fs_cleanup(struct mlx5e_nisp_fs *fs);
+int mlx5_accel_nisp_fs_init_tx_tables(struct mlx5e_priv *priv);
+void mlx5_accel_nisp_fs_cleanup_tx_tables(struct mlx5e_priv *priv);
+#else
+static inline int mlx5_accel_nisp_fs_init_tx_tables(struct mlx5e_priv *priv)
+{
+	return 0;
+}
+
+static inline void mlx5_accel_nisp_fs_cleanup_tx_tables(struct mlx5e_priv *priv) { }
+#endif /* CONFIG_MLX5_EN_PSP */
+#endif /* __MLX5_NISP_FS_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 4948e19c3f3f..38e0c4786b1c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -5495,6 +5495,10 @@ static int mlx5e_nic_init(struct mlx5_core_dev *mdev,
 	}
 	priv->fs = fs;
 
+	err = mlx5e_nisp_init(priv);
+	if (err)
+		mlx5_core_err(mdev, "PSP initialization failed, %d\n", err);
+
 	err = mlx5e_ktls_init(priv);
 	if (err)
 		mlx5_core_err(mdev, "TLS initialization failed, %d\n", err);
@@ -5522,6 +5526,7 @@ static void mlx5e_nic_cleanup(struct mlx5e_priv *priv)
 	mlx5e_health_destroy_reporters(priv);
 	mlx5e_nisp_unregister(priv);
 	mlx5e_ktls_cleanup(priv);
+	mlx5e_nisp_cleanup(priv);
 	mlx5e_fs_cleanup(priv->fs);
 	debugfs_remove_recursive(priv->dfs_root);
 	priv->fs = NULL;
@@ -5648,10 +5653,6 @@ static void mlx5e_nic_enable(struct mlx5e_priv *priv)
 	if (err)
 		mlx5_core_err(mdev, "MACsec initialization failed, %d\n", err);
 
-	err = mlx5e_nisp_init(priv);
-	if (err)
-		mlx5_core_err(mdev, "PSP initialization failed, %d\n", err);
-
 	/* Marking the link as currently not needed by the Driver */
 	if (!netif_running(netdev))
 		mlx5e_modify_admin_state(mdev, MLX5_PORT_DOWN);
@@ -5709,7 +5710,6 @@ static void mlx5e_nic_disable(struct mlx5e_priv *priv)
 	mlx5e_disable_async_events(priv);
 	mlx5_lag_remove_netdev(mdev, priv->netdev);
 	mlx5_vxlan_reset_to_default(mdev->vxlan);
-	mlx5e_nisp_cleanup(priv);
 	mlx5e_macsec_cleanup(priv);
 	mlx5e_ipsec_cleanup(priv);
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/crypto.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/crypto.h
index c819c047bb9c..f257dfcf45d6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/crypto.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/crypto.h
@@ -8,6 +8,7 @@ enum {
 	MLX5_ACCEL_OBJ_TLS_KEY = MLX5_GENERAL_OBJECT_TYPE_ENCRYPTION_KEY_PURPOSE_TLS,
 	MLX5_ACCEL_OBJ_IPSEC_KEY = MLX5_GENERAL_OBJECT_TYPE_ENCRYPTION_KEY_PURPOSE_IPSEC,
 	MLX5_ACCEL_OBJ_MACSEC_KEY = MLX5_GENERAL_OBJECT_TYPE_ENCRYPTION_KEY_PURPOSE_MACSEC,
+	MLX5_ACCEL_OBJ_NISP_KEY = MLX5_GENERAL_OBJECT_TYPE_ENCRYPTION_KEY_PURPOSE_NISP,
 	MLX5_ACCEL_OBJ_TYPE_KEY_NUM,
 };
 
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC net-next 11/15] net/mlx5e: Implement PSP Tx data path
  2024-05-10  3:04 [RFC net-next 00/15] add basic PSP encryption for TCP connections Jakub Kicinski
                   ` (9 preceding siblings ...)
  2024-05-10  3:04 ` [RFC net-next 10/15] net/mlx5e: Implement PSP operations .assoc_add and .assoc_del Jakub Kicinski
@ 2024-05-10  3:04 ` Jakub Kicinski
  2024-05-10  3:04 ` [RFC net-next 12/15] net/mlx5e: Add PSP steering in local NIC RX Jakub Kicinski
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 44+ messages in thread
From: Jakub Kicinski @ 2024-05-10  3:04 UTC (permalink / raw
  To: netdev
  Cc: pabeni, willemdebruijn.kernel, borisp, gal, cratiu, rrameshbabu,
	steffen.klassert, tariqt, Raed Salem, Jakub Kicinski

From: Raed Salem <raeds@nvidia.com>

Setup PSP offload on Tx data path based on whether skb indicates that it is
intended for PSP or not. Support driver side encapsulation of the UDP
headers, PSP headers, and PSP trailer for the PSP traffic that will be
encrypted by the NIC.

Signed-off-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 .../net/ethernet/mellanox/mlx5/core/Makefile  |   3 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |   4 +-
 .../mellanox/mlx5/core/en_accel/en_accel.h    |  28 +++
 .../mellanox/mlx5/core/en_accel/nisp_rxtx.c   | 225 ++++++++++++++++++
 .../mellanox/mlx5/core/en_accel/nisp_rxtx.h   |  96 ++++++++
 .../net/ethernet/mellanox/mlx5/core/en_tx.c   |  10 +-
 .../mellanox/mlx5/core/lib/psp_defs.h         |  28 +++
 7 files changed, 390 insertions(+), 4 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp_rxtx.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp_rxtx.h
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/psp_defs.h

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 5ce78b84c763..858ab2e7cb1f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -109,7 +109,8 @@ mlx5_core-$(CONFIG_MLX5_EN_TLS) += en_accel/ktls_stats.o \
 				   en_accel/fs_tcp.o en_accel/ktls.o en_accel/ktls_txrx.o \
 				   en_accel/ktls_tx.o en_accel/ktls_rx.o
 
-mlx5_core-$(CONFIG_MLX5_EN_PSP) += en_accel/nisp.o en_accel/nisp_offload.o en_accel/nisp_fs.o
+mlx5_core-$(CONFIG_MLX5_EN_PSP) += en_accel/nisp.o en_accel/nisp_offload.o en_accel/nisp_fs.o \
+				   en_accel/nisp_rxtx.o
 
 mlx5_core-$(CONFIG_MLX5_SW_STEERING) += steering/dr_domain.o steering/dr_table.o \
 					steering/dr_matcher.o steering/dr_rule.o \
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index b7ceb8011a92..92e2554d6271 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -47,6 +47,7 @@
 #include <linux/rhashtable.h>
 #include <net/udp_tunnel.h>
 #include <net/switchdev.h>
+#include <net/psp/types.h>
 #include <net/xdp.h>
 #include <linux/dim.h>
 #include <linux/bits.h>
@@ -61,6 +62,7 @@
 #include "en/rx_res.h"
 #include "en/selq.h"
 #include "lib/sd.h"
+#include "lib/psp_defs.h"
 
 extern const struct net_device_ops mlx5e_netdev_ops;
 struct page_pool;
@@ -68,7 +70,7 @@ struct page_pool;
 #define MLX5E_METADATA_ETHER_TYPE (0x8CE4)
 #define MLX5E_METADATA_ETHER_LEN 8
 
-#define MLX5E_ETH_HARD_MTU (ETH_HLEN + VLAN_HLEN + ETH_FCS_LEN)
+#define MLX5E_ETH_HARD_MTU (ETH_HLEN + PSP_ENCAP_HLEN + PSP_TRL_SIZE + VLAN_HLEN + ETH_FCS_LEN)
 
 #define MLX5E_HW2SW_MTU(params, hwmtu) ((hwmtu) - ((params)->hard_mtu))
 #define MLX5E_SW2HW_MTU(params, swmtu) ((swmtu) + ((params)->hard_mtu))
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/en_accel.h b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/en_accel.h
index c15e48b0724c..cea997847fa4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/en_accel.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/en_accel.h
@@ -43,6 +43,7 @@
 #include "en.h"
 #include "en/txrx.h"
 #include "en_accel/nisp_fs.h"
+#include "en_accel/nisp_rxtx.h"
 
 #if IS_ENABLED(CONFIG_GENEVE)
 #include <net/geneve.h>
@@ -114,6 +115,9 @@ struct mlx5e_accel_tx_state {
 #ifdef CONFIG_MLX5_EN_IPSEC
 	struct mlx5e_accel_tx_ipsec_state ipsec;
 #endif
+#ifdef CONFIG_MLX5_EN_PSP
+	struct mlx5e_accel_tx_nisp_state nisp_st;
+#endif
 };
 
 static inline bool mlx5e_accel_tx_begin(struct net_device *dev,
@@ -132,6 +136,13 @@ static inline bool mlx5e_accel_tx_begin(struct net_device *dev,
 			return false;
 #endif
 
+#ifdef CONFIG_MLX5_EN_PSP
+	if (mlx5e_psp_is_offload(skb, dev)) {
+		if (unlikely(!mlx5e_nisp_handle_tx_skb(dev, skb, &state->nisp_st)))
+			return false;
+	}
+#endif
+
 #ifdef CONFIG_MLX5_EN_IPSEC
 	if (test_bit(MLX5E_SQ_STATE_IPSEC, &sq->state) && xfrm_offload(skb)) {
 		if (unlikely(!mlx5e_ipsec_handle_tx_skb(dev, skb, &state->ipsec)))
@@ -152,8 +163,14 @@ static inline bool mlx5e_accel_tx_begin(struct net_device *dev,
 }
 
 static inline unsigned int mlx5e_accel_tx_ids_len(struct mlx5e_txqsq *sq,
+						  struct sk_buff *skb,
 						  struct mlx5e_accel_tx_state *state)
 {
+#ifdef CONFIG_MLX5_EN_PSP
+	if (mlx5e_psp_is_offload_state(&state->nisp_st))
+		return mlx5e_nisp_tx_ids_len(&state->nisp_st);
+#endif
+
 #ifdef CONFIG_MLX5_EN_IPSEC
 	if (test_bit(MLX5E_SQ_STATE_IPSEC, &sq->state))
 		return mlx5e_ipsec_tx_ids_len(&state->ipsec);
@@ -167,8 +184,14 @@ static inline unsigned int mlx5e_accel_tx_ids_len(struct mlx5e_txqsq *sq,
 
 static inline void mlx5e_accel_tx_eseg(struct mlx5e_priv *priv,
 				       struct sk_buff *skb,
+				       struct mlx5e_accel_tx_state *accel,
 				       struct mlx5_wqe_eth_seg *eseg, u16 ihs)
 {
+#ifdef CONFIG_MLX5_EN_PSP
+	if (mlx5e_psp_is_offload_state(&accel->nisp_st))
+		mlx5e_nisp_tx_build_eseg(priv, skb, &accel->nisp_st, eseg);
+#endif
+
 #ifdef CONFIG_MLX5_EN_IPSEC
 	if (xfrm_offload(skb))
 		mlx5e_ipsec_tx_build_eseg(priv, skb, eseg);
@@ -194,6 +217,11 @@ static inline void mlx5e_accel_tx_finish(struct mlx5e_txqsq *sq,
 	mlx5e_ktls_handle_tx_wqe(&wqe->ctrl, &state->tls);
 #endif
 
+#ifdef CONFIG_MLX5_EN_PSP
+	if (mlx5e_psp_is_offload_state(&state->nisp_st))
+		mlx5e_nisp_handle_tx_wqe(wqe, &state->nisp_st, inlseg);
+#endif
+
 #ifdef CONFIG_MLX5_EN_IPSEC
 	if (test_bit(MLX5E_SQ_STATE_IPSEC, &sq->state) &&
 	    state->ipsec.xo && state->ipsec.tailen)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp_rxtx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp_rxtx.c
new file mode 100644
index 000000000000..c719b2916677
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp_rxtx.c
@@ -0,0 +1,225 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/* Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. */
+
+#include <linux/skbuff.h>
+#include <linux/ip.h>
+#include <linux/udp.h>
+#include <net/protocol.h>
+#include <net/udp.h>
+#include <net/ip6_checksum.h>
+#include <net/psp/types.h>
+
+#include "en.h"
+#include "../nisp.h"
+#include "en_accel/nisp_rxtx.h"
+#include "en_accel/nisp.h"
+#include "lib/psp_defs.h"
+
+static void mlx5e_nisp_set_swp(struct sk_buff *skb,
+			       struct mlx5e_accel_tx_nisp_state *nisp_st,
+			       struct mlx5_wqe_eth_seg *eseg)
+{
+	/* Tunnel Mode:
+	 * SWP:      OutL3       InL3  InL4
+	 * Pkt: MAC  IP     ESP  IP    L4
+	 *
+	 * Transport Mode:
+	 * SWP:      OutL3       OutL4
+	 * Pkt: MAC  IP     ESP  L4
+	 *
+	 * Tunnel(VXLAN TCP/UDP) over Transport Mode
+	 * SWP:      OutL3                   InL3  InL4
+	 * Pkt: MAC  IP     ESP  UDP  VXLAN  IP    L4
+	 */
+	u8 inner_ipproto = 0;
+	struct ethhdr *eth;
+
+	/* Shared settings */
+	eseg->swp_outer_l3_offset = skb_network_offset(skb) / 2;
+	if (skb->protocol == htons(ETH_P_IPV6))
+		eseg->swp_flags |= MLX5_ETH_WQE_SWP_OUTER_L3_IPV6;
+
+	if (skb->inner_protocol_type == ENCAP_TYPE_IPPROTO) {
+		inner_ipproto = skb->inner_ipproto;
+		/* Set SWP additional flags for packet of type IP|UDP|PSP|[ TCP | UDP ] */
+		switch (inner_ipproto) {
+		case IPPROTO_UDP:
+			eseg->swp_flags |= MLX5_ETH_WQE_SWP_INNER_L4_UDP;
+			fallthrough;
+		case IPPROTO_TCP:
+			eseg->swp_inner_l4_offset = skb_inner_transport_offset(skb) / 2;
+			break;
+		default:
+			break;
+		}
+	} else {
+		/* IP in IP tunneling like vxlan*/
+		if (skb->inner_protocol_type != ENCAP_TYPE_ETHER)
+			return;
+
+		eth = (struct ethhdr *)skb_inner_mac_header(skb);
+		switch (ntohs(eth->h_proto)) {
+		case ETH_P_IP:
+			inner_ipproto = ((struct iphdr *)((char *)skb->data +
+					 skb_inner_network_offset(skb)))->protocol;
+			break;
+		case ETH_P_IPV6:
+			inner_ipproto = ((struct ipv6hdr *)((char *)skb->data +
+					 skb_inner_network_offset(skb)))->nexthdr;
+			break;
+		default:
+			break;
+		}
+
+		/* Tunnel(VXLAN TCP/UDP) over Transport Mode PSP i.e. PSP payload is vxlan tunnel */
+		switch (inner_ipproto) {
+		case IPPROTO_UDP:
+			eseg->swp_flags |= MLX5_ETH_WQE_SWP_INNER_L4_UDP;
+			fallthrough;
+		case IPPROTO_TCP:
+			eseg->swp_inner_l3_offset = skb_inner_network_offset(skb) / 2;
+			eseg->swp_inner_l4_offset =
+				(skb->csum_start + skb->head - skb->data) / 2;
+			if (skb->protocol == htons(ETH_P_IPV6))
+				eseg->swp_flags |= MLX5_ETH_WQE_SWP_INNER_L3_IPV6;
+			break;
+		default:
+			break;
+		}
+
+		nisp_st->inner_ipproto = inner_ipproto;
+	}
+}
+
+static bool mlx5e_nisp_set_state(struct mlx5e_priv *priv,
+				 struct sk_buff *skb,
+				 struct mlx5e_accel_tx_nisp_state *nisp_st)
+{
+	struct psp_assoc *pas;
+	bool ret = false;
+
+	rcu_read_lock();
+	pas = psp_skb_get_assoc_rcu(skb);
+	if (!pas)
+		goto out;
+
+	ret = true;
+	nisp_st->tailen = PSP_TRL_SIZE;
+	nisp_st->spi = pas->tx.spi;
+	nisp_st->ver = pas->version;
+	memcpy(&nisp_st->keyid, pas->drv_data, sizeof(nisp_st->keyid));
+
+out:
+	rcu_read_unlock();
+	return ret;
+}
+
+void mlx5e_nisp_tx_build_eseg(struct mlx5e_priv *priv, struct sk_buff *skb,
+			      struct mlx5e_accel_tx_nisp_state *nisp_st,
+			      struct mlx5_wqe_eth_seg *eseg)
+{
+	if (!mlx5_is_nisp_device(priv->mdev))
+		return;
+
+	if (unlikely(skb->protocol != htons(ETH_P_IP) &&
+		     skb->protocol != htons(ETH_P_IPV6)))
+		return;
+
+	mlx5e_nisp_set_swp(skb, nisp_st, eseg);
+	/* Special WA for PSP LSO in ConnectX7 */
+	eseg->swp_outer_l3_offset = 0;
+	eseg->swp_inner_l3_offset = 0;
+
+	eseg->flow_table_metadata |= cpu_to_be32(nisp_st->keyid);
+	eseg->trailer |= cpu_to_be32(MLX5_ETH_WQE_INSERT_TRAILER) |
+			 cpu_to_be32(MLX5_ETH_WQE_TRAILER_HDR_OUTER_L4_ASSOC);
+}
+
+void mlx5e_nisp_handle_tx_wqe(struct mlx5e_tx_wqe *wqe,
+			      struct mlx5e_accel_tx_nisp_state *nisp_st,
+			      struct mlx5_wqe_inline_seg *inlseg)
+{
+	inlseg->byte_count = cpu_to_be32(nisp_st->tailen | MLX5_INLINE_SEG);
+}
+
+static void psp_write_headers(struct net *net, struct sk_buff *skb,
+			      __be32 spi, u8 ver, unsigned int udp_len,
+			      __be16 sport)
+{
+	struct udphdr *uh = udp_hdr(skb);
+	struct psphdr *psph = (struct psphdr *)(uh + 1);
+
+	uh->dest = htons(PSP_DEFAULT_UDP_PORT);
+	uh->source = udp_flow_src_port(net, skb, 0, 0, false);
+	uh->check = 0;
+	uh->len = htons(udp_len);
+
+	psph->nexthdr = IPPROTO_TCP;
+	psph->hdrlen = PSP_HDRLEN_NOOPT;
+	psph->crypt_offset = 0;
+	psph->verfl = FIELD_PREP(PSPHDR_VERFL_VERSION, ver) |
+		      FIELD_PREP(PSPHDR_VERFL_ONE, 1);
+	psph->spi = spi;
+	memset(&psph->iv, 0, sizeof(psph->iv));
+}
+
+/* Encapsulate a TCP packet with PSP by adding the UDP+PSP headers and filling
+ * them in.
+ */
+static bool psp_encapsulate(struct net *net, struct sk_buff *skb,
+			    __be32 spi, u8 ver, __be16 sport)
+{
+	u32 network_len = skb_network_header_len(skb);
+	u32 ethr_len = skb_mac_header_len(skb);
+	u32 bufflen = ethr_len + network_len;
+	struct ipv6hdr *ip6;
+
+	if (skb_cow_head(skb, PSP_ENCAP_HLEN))
+		return false;
+
+	skb_push(skb, PSP_ENCAP_HLEN);
+	skb->mac_header		-= PSP_ENCAP_HLEN;
+	skb->network_header	-= PSP_ENCAP_HLEN;
+	skb->transport_header	-= PSP_ENCAP_HLEN;
+	memmove(skb->data, skb->data + PSP_ENCAP_HLEN, bufflen);
+
+	ip6 = ipv6_hdr(skb);
+	skb_set_inner_ipproto(skb, IPPROTO_TCP);
+	ip6->nexthdr = IPPROTO_UDP;
+	be16_add_cpu(&ip6->payload_len, PSP_ENCAP_HLEN);
+
+	skb_set_inner_transport_header(skb, skb_transport_offset(skb) + PSP_ENCAP_HLEN);
+	skb->encapsulation = 1;
+	psp_write_headers(net, skb, spi, ver,
+			  skb->len - skb_transport_offset(skb), sport);
+
+	return true;
+}
+
+bool mlx5e_nisp_handle_tx_skb(struct net_device *netdev,
+			      struct sk_buff *skb,
+			      struct mlx5e_accel_tx_nisp_state *nisp_st)
+{
+	struct mlx5e_priv *priv = netdev_priv(netdev);
+	struct net *net = sock_net(skb->sk);
+	const struct ipv6hdr *ip6;
+	struct tcphdr *th;
+
+	if (!mlx5e_nisp_set_state(priv, skb, nisp_st))
+		return true;
+
+	/* psp_encap of the packet */
+	if (!psp_encapsulate(net, skb, nisp_st->spi, nisp_st->ver, 0)) {
+		kfree_skb_reason(skb, SKB_DROP_REASON_PSP_OUTPUT);
+		return false;
+	}
+	if (skb_is_gso(skb)) {
+		ip6 = ipv6_hdr(skb);
+		th = inner_tcp_hdr(skb);
+
+		th->check = ~tcp_v6_check(skb_shinfo(skb)->gso_size + inner_tcp_hdrlen(skb), &ip6->saddr,
+				&ip6->daddr, 0);
+	}
+
+	return true;
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp_rxtx.h b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp_rxtx.h
new file mode 100644
index 000000000000..1350a73c2019
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp_rxtx.h
@@ -0,0 +1,96 @@
+/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
+/* Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. */
+
+#ifndef __MLX5E_NISP_RXTX_H__
+#define __MLX5E_NISP_RXTX_H__
+
+#include <linux/skbuff.h>
+#include <net/xfrm.h>
+#include <net/psp.h>
+#include "en.h"
+#include "en/txrx.h"
+
+struct mlx5e_accel_tx_nisp_state {
+	u32 tailen;
+	u32 keyid;
+	__be32 spi;
+	u8 inner_ipproto;
+	u8 ver;
+};
+
+#ifdef CONFIG_MLX5_EN_PSP
+static inline bool mlx5e_psp_is_offload_state(struct mlx5e_accel_tx_nisp_state *nisp_state)
+{
+	return (nisp_state->tailen != 0);
+}
+
+static inline bool mlx5e_psp_is_offload(struct sk_buff *skb, struct net_device *netdev)
+{
+	bool ret;
+
+	rcu_read_lock();
+	ret = !!psp_skb_get_assoc_rcu(skb);
+	rcu_read_unlock();
+	return ret;
+}
+
+bool mlx5e_nisp_handle_tx_skb(struct net_device *netdev,
+			      struct sk_buff *skb,
+			      struct mlx5e_accel_tx_nisp_state *nisp_st);
+
+void mlx5e_nisp_tx_build_eseg(struct mlx5e_priv *priv, struct sk_buff *skb,
+			      struct mlx5e_accel_tx_nisp_state *nisp_st,
+			      struct mlx5_wqe_eth_seg *eseg);
+
+void mlx5e_nisp_handle_tx_wqe(struct mlx5e_tx_wqe *wqe,
+			      struct mlx5e_accel_tx_nisp_state *nisp_st,
+			      struct mlx5_wqe_inline_seg *inlseg);
+
+static inline bool mlx5e_nisp_txwqe_build_eseg_csum(struct mlx5e_txqsq *sq, struct sk_buff *skb,
+						    struct mlx5e_accel_tx_nisp_state *nisp_st,
+						    struct mlx5_wqe_eth_seg *eseg)
+{
+	u8 inner_ipproto;
+
+	if (!mlx5e_psp_is_offload_state(nisp_st))
+		return false;
+
+	inner_ipproto = nisp_st->inner_ipproto;
+	eseg->cs_flags = MLX5_ETH_WQE_L3_CSUM;
+	if (inner_ipproto) {
+		eseg->cs_flags |= MLX5_ETH_WQE_L3_INNER_CSUM;
+		if (inner_ipproto == IPPROTO_TCP || inner_ipproto == IPPROTO_UDP)
+			eseg->cs_flags |= MLX5_ETH_WQE_L4_INNER_CSUM;
+		if (likely(skb->ip_summed == CHECKSUM_PARTIAL))
+			sq->stats->csum_partial_inner++;
+	} else if (likely(skb->ip_summed == CHECKSUM_PARTIAL)) {
+		eseg->cs_flags |= MLX5_ETH_WQE_L4_INNER_CSUM;
+		sq->stats->csum_partial_inner++;
+	}
+
+	return true;
+}
+
+static inline unsigned int mlx5e_nisp_tx_ids_len(struct mlx5e_accel_tx_nisp_state *nisp_st)
+{
+	return nisp_st->tailen;
+}
+#else
+static inline bool mlx5e_psp_is_offload_state(struct mlx5e_accel_tx_nisp_state *nisp_state)
+{
+	return false;
+}
+
+static inline bool mlx5e_psp_is_offload(struct sk_buff *skb, struct net_device *netdev)
+{
+	return false;
+}
+
+static inline bool mlx5e_nisp_txwqe_build_eseg_csum(struct mlx5e_txqsq *sq, struct sk_buff *skb,
+						    struct mlx5e_accel_tx_nisp_state *nisp_st,
+						    struct mlx5_wqe_eth_seg *eseg)
+{
+	return false;
+}
+#endif
+#endif
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
index 099bf1078889..cc4d236a976f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
@@ -39,6 +39,7 @@
 #include "ipoib/ipoib.h"
 #include "en_accel/en_accel.h"
 #include "en_accel/ipsec_rxtx.h"
+#include "en_accel/nisp_rxtx.h"
 #include "en_accel/macsec.h"
 #include "en/ptp.h"
 #include <net/ipv6.h>
@@ -120,6 +121,11 @@ mlx5e_txwqe_build_eseg_csum(struct mlx5e_txqsq *sq, struct sk_buff *skb,
 			    struct mlx5e_accel_tx_state *accel,
 			    struct mlx5_wqe_eth_seg *eseg)
 {
+#ifdef CONFIG_MLX5_EN_PSP
+	if (unlikely(mlx5e_nisp_txwqe_build_eseg_csum(sq, skb, &accel->nisp_st, eseg)))
+		return;
+#endif
+
 	if (unlikely(mlx5e_ipsec_txwqe_build_eseg_csum(sq, skb, eseg)))
 		return;
 
@@ -294,7 +300,7 @@ static void mlx5e_sq_xmit_prepare(struct mlx5e_txqsq *sq, struct sk_buff *skb,
 		stats->packets++;
 	}
 
-	attr->insz = mlx5e_accel_tx_ids_len(sq, accel);
+	attr->insz = mlx5e_accel_tx_ids_len(sq, skb, accel);
 	stats->bytes += attr->num_bytes;
 }
 
@@ -663,7 +669,7 @@ static void mlx5e_txwqe_build_eseg(struct mlx5e_priv *priv, struct mlx5e_txqsq *
 				   struct sk_buff *skb, struct mlx5e_accel_tx_state *accel,
 				   struct mlx5_wqe_eth_seg *eseg, u16 ihs)
 {
-	mlx5e_accel_tx_eseg(priv, skb, eseg, ihs);
+	mlx5e_accel_tx_eseg(priv, skb, accel, eseg, ihs);
 	mlx5e_txwqe_build_eseg_csum(sq, skb, accel, eseg);
 	if (unlikely(sq->ptpsq))
 		mlx5e_cqe_ts_id_eseg(sq->ptpsq, skb, eseg);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/psp_defs.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/psp_defs.h
new file mode 100644
index 000000000000..7dd2aa90ed62
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/psp_defs.h
@@ -0,0 +1,28 @@
+/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
+/* Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. */
+
+#ifndef _LIB_PSP_DEFS_H
+#define _LIB_PSP_DEFS_H
+
+/*  PSP Security Payload (PSP) Header
+ *
+ *  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |  Next Header  |  Hdr Ext Len  |  Crypt Offset | R |Version|V|1|
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |                Security Parameters Index (SPI)                |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |                                                               |
+ * +                  Initialization Vector (IV)                   +
+ * |                                                               |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |                Virtualization Key (VK) [Optional]             |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |                  Pad to 8*N bytes [if needed]                 |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ */
+
+/* total length of headers for PSP encapsulation (UDP + PSP) */
+#define PSP_ENCAP_HLEN (sizeof(struct udphdr) + sizeof(struct psphdr))
+
+#endif  /* _LIB_PSP_DEFS_H */
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC net-next 12/15] net/mlx5e: Add PSP steering in local NIC RX
  2024-05-10  3:04 [RFC net-next 00/15] add basic PSP encryption for TCP connections Jakub Kicinski
                   ` (10 preceding siblings ...)
  2024-05-10  3:04 ` [RFC net-next 11/15] net/mlx5e: Implement PSP Tx data path Jakub Kicinski
@ 2024-05-10  3:04 ` Jakub Kicinski
  2024-05-13  1:52   ` Willem de Bruijn
  2024-05-10  3:04 ` [RFC net-next 13/15] net/mlx5e: Configure PSP Rx flow steering rules Jakub Kicinski
                   ` (3 subsequent siblings)
  15 siblings, 1 reply; 44+ messages in thread
From: Jakub Kicinski @ 2024-05-10  3:04 UTC (permalink / raw
  To: netdev
  Cc: pabeni, willemdebruijn.kernel, borisp, gal, cratiu, rrameshbabu,
	steffen.klassert, tariqt, Raed Salem, Jakub Kicinski

From: Raed Salem <raeds@nvidia.com>

Introduce decrypt FT, the RX error FT, and the default rules.

The PSP (PSP) RX decrypt flow table is pointed by the TTC
(Traffic Type Classifier) UDP steering rules.
The decrypt flow table has two flow groups. The first flow group
keeps the decrypt steering rule programmed always when PSP packet is
recognized using the dedicated udp destenation port number 1000, if
packet is decrypted then a PSP marker is set in metadata_regB[30].
The second flow group has a default rule to forward all non-offloaded
PSP packet to the TTC UDP default RSS TIR.

The RX error flow table is the destination of the decrypt steering rules in
the PSP RX decrypt flow table. It has two fixed rule one with single copy
action that copies nisp_syndrome to metadata_regB[23:29]. The PSP marker
and syndrome is used to filter out non-nisp packet and to return the PSP
crypto offload status in Rx flow. The marker is used to identify such
packet in driver so the driver could set SKB PSP metadata. The destination
of RX error flow table is the TTC UDP default RSS TIR. The second rule will
drop packets that failed to be decrypted (like in case illegal SPI or
expired SPI is used).

Signed-off-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 .../net/ethernet/mellanox/mlx5/core/en/fs.h   |   2 +-
 .../mellanox/mlx5/core/en_accel/nisp_fs.c     | 481 +++++++++++++++++-
 2 files changed, 476 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/fs.h b/drivers/net/ethernet/mellanox/mlx5/core/en/fs.h
index 4d6225e0eec7..23af74e4f8c5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/fs.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/fs.h
@@ -83,7 +83,7 @@ enum {
 #ifdef CONFIG_MLX5_EN_ARFS
 	MLX5E_ARFS_FT_LEVEL = MLX5E_INNER_TTC_FT_LEVEL + 1,
 #endif
-#ifdef CONFIG_MLX5_EN_IPSEC
+#if defined(CONFIG_MLX5_EN_IPSEC) || defined(CONFIG_MLX5_EN_PSP)
 	MLX5E_ACCEL_FS_POL_FT_LEVEL = MLX5E_INNER_TTC_FT_LEVEL + 1,
 	MLX5E_ACCEL_FS_ESP_FT_LEVEL,
 	MLX5E_ACCEL_FS_ESP_FT_ERR_LEVEL,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp_fs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp_fs.c
index 5d2ce83db7cc..11f583d13bdd 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp_fs.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp_fs.c
@@ -8,6 +8,12 @@
 #include "en_accel/nisp_fs.h"
 #include "en_accel/nisp.h"
 
+enum accel_fs_nisp_type {
+       ACCEL_FS_NISP4,
+       ACCEL_FS_NISP6,
+       ACCEL_FS_NISP_NUM_TYPES,
+};
+
 struct mlx5e_nisp_tx {
 	struct mlx5_flow_namespace *ns;
 	struct mlx5_flow_table *ft;
@@ -17,14 +23,15 @@ struct mlx5e_nisp_tx {
 	u32 refcnt;
 };
 
-struct mlx5e_nisp_fs {
-	struct mlx5_core_dev *mdev;
-	struct mlx5e_nisp_tx *tx_fs;
-	struct mlx5e_flow_steering *fs;
-};
-
 enum accel_nisp_rule_action {
 	ACCEL_NISP_RULE_ACTION_ENCRYPT,
+	ACCEL_NISP_RULE_ACTION_DECRYPT,
+};
+
+enum accel_nisp_syndrome {
+	NISP_OK = 0,
+	NISP_ICV_FAIL,
+	NISP_BAD_TRAILER,
 };
 
 struct mlx5e_accel_nisp_rule {
@@ -32,6 +39,216 @@ struct mlx5e_accel_nisp_rule {
 	u8 action;
 };
 
+struct mlx5e_nisp_rx_err {
+	struct mlx5_flow_table *ft;
+	struct mlx5_flow_handle *rule;
+	struct mlx5_flow_handle *drop_rule;
+	struct mlx5_modify_hdr *copy_modify_hdr;
+};
+
+struct mlx5e_accel_fs_nisp_prot {
+	struct mlx5_flow_table *ft;
+	struct mlx5_flow_group *miss_group;
+	struct mlx5_flow_handle *miss_rule;
+	struct mlx5_flow_destination default_dest;
+	struct mlx5e_nisp_rx_err rx_err;
+	u32 refcnt;
+	struct mutex prot_mutex; /* protect ESP4/ESP6 protocol */
+	struct mlx5_flow_handle *def_rule;
+};
+
+struct mlx5e_accel_fs_nisp {
+	struct mlx5e_accel_fs_nisp_prot fs_prot[ACCEL_FS_NISP_NUM_TYPES];
+};
+
+struct mlx5e_nisp_fs {
+	struct mlx5_core_dev *mdev;
+	struct mlx5e_nisp_tx *tx_fs;
+	/* Rx manage */
+	struct mlx5e_flow_steering *fs;
+	struct mlx5e_accel_fs_nisp *rx_fs;
+};
+
+/* NISP RX flow steering */
+static enum mlx5_traffic_types fs_nisp2tt(enum accel_fs_nisp_type i)
+{
+	if (i == ACCEL_FS_NISP4)
+		return MLX5_TT_IPV4_UDP;
+
+	return MLX5_TT_IPV6_UDP;
+}
+
+static void accel_nisp_fs_rx_err_del_rules(struct mlx5e_nisp_fs *fs,
+					  struct mlx5e_nisp_rx_err *rx_err)
+{
+	if (rx_err->drop_rule) {
+		mlx5_del_flow_rules(rx_err->drop_rule);
+		rx_err->drop_rule = NULL;
+	}
+
+	if (rx_err->rule) {
+		mlx5_del_flow_rules(rx_err->rule);
+		rx_err->rule = NULL;
+	}
+
+	if (rx_err->copy_modify_hdr) {
+		mlx5_modify_header_dealloc(fs->mdev, rx_err->copy_modify_hdr);
+		rx_err->copy_modify_hdr = NULL;
+	}
+}
+
+static void accel_nisp_fs_rx_err_destroy_ft(struct mlx5e_nisp_fs *fs,
+					    struct mlx5e_nisp_rx_err *rx_err)
+{
+	accel_nisp_fs_rx_err_del_rules(fs, rx_err);
+
+	if (rx_err->ft) {
+		mlx5_destroy_flow_table(rx_err->ft);
+		rx_err->ft = NULL;
+	}
+}
+
+static void accel_nisp_setup_syndrome_match(struct mlx5_flow_spec *spec,
+		enum accel_nisp_syndrome syndrome)
+{
+	void *misc_params_2;
+
+	spec->match_criteria_enable |= MLX5_MATCH_MISC_PARAMETERS_2;
+	misc_params_2 = MLX5_ADDR_OF(fte_match_param, spec->match_criteria, misc_parameters_2);
+	MLX5_SET_TO_ONES(fte_match_set_misc2, misc_params_2, nisp_syndrome);
+	misc_params_2 = MLX5_ADDR_OF(fte_match_param, spec->match_value, misc_parameters_2);
+	MLX5_SET(fte_match_set_misc2, misc_params_2, nisp_syndrome, syndrome);
+}
+
+static int accel_nisp_fs_rx_err_add_rule(struct mlx5e_nisp_fs *fs,
+		struct mlx5e_accel_fs_nisp_prot *fs_prot,
+		struct mlx5e_nisp_rx_err *rx_err)
+{
+	u8 action[MLX5_UN_SZ_BYTES(set_add_copy_action_in_auto)] = {};
+	struct mlx5_core_dev *mdev = fs->mdev;
+	struct mlx5_flow_act flow_act = {};
+	struct mlx5_modify_hdr *modify_hdr;
+	struct mlx5_flow_handle *fte;
+	struct mlx5_flow_spec *spec;
+	int err = 0;
+
+	spec = kzalloc(sizeof(*spec), GFP_KERNEL);
+	if (!spec)
+		return -ENOMEM;
+
+	/* Action to copy 7 bit nisp_syndrome to regB[23:29] */
+	MLX5_SET(copy_action_in, action, action_type, MLX5_ACTION_TYPE_COPY);
+	MLX5_SET(copy_action_in, action, src_field, MLX5_ACTION_IN_FIELD_NISP_SYNDROME);
+	MLX5_SET(copy_action_in, action, src_offset, 0);
+	MLX5_SET(copy_action_in, action, length, 7);
+	MLX5_SET(copy_action_in, action, dst_field, MLX5_ACTION_IN_FIELD_METADATA_REG_B);
+	MLX5_SET(copy_action_in, action, dst_offset, 23);
+
+	modify_hdr = mlx5_modify_header_alloc(mdev, MLX5_FLOW_NAMESPACE_KERNEL,
+			1, action);
+	if (IS_ERR(modify_hdr)) {
+		err = PTR_ERR(modify_hdr);
+		mlx5_core_err(mdev,
+			      "fail to alloc nisp copy modify_header_id err=%d\n", err);
+		goto out_spec;
+	}
+
+	accel_nisp_setup_syndrome_match(spec, NISP_OK);
+	/* create fte */
+	flow_act.action = MLX5_FLOW_CONTEXT_ACTION_MOD_HDR |
+		MLX5_FLOW_CONTEXT_ACTION_FWD_DEST;
+	flow_act.modify_hdr = modify_hdr;
+	fte = mlx5_add_flow_rules(rx_err->ft, spec, &flow_act,
+			&fs_prot->default_dest, 1);
+	if (IS_ERR(fte)) {
+		err = PTR_ERR(fte);
+		mlx5_core_err(mdev, "fail to add nisp rx err copy rule err=%d\n", err);
+		goto out;
+	}
+	rx_err->rule = fte;
+
+	/* add default drop rule */
+	memset(spec, 0, sizeof(*spec));
+	memset(&flow_act, 0, sizeof(flow_act));
+	/* create fte */
+	flow_act.action = MLX5_FLOW_CONTEXT_ACTION_DROP;
+	fte = mlx5_add_flow_rules(rx_err->ft, spec, &flow_act, NULL, 0);
+	if (IS_ERR(fte)) {
+		err = PTR_ERR(fte);
+		mlx5_core_err(mdev, "fail to add nisp rx err drop rule err=%d\n", err);
+		goto out_drop_rule;
+	}
+	rx_err->drop_rule = fte;
+	rx_err->copy_modify_hdr = modify_hdr;
+
+	goto out_spec;
+
+out_drop_rule:
+	mlx5_del_flow_rules(rx_err->rule);
+	rx_err->rule = NULL;
+out:
+	mlx5_modify_header_dealloc(mdev, modify_hdr);
+out_spec:
+	kfree(spec);
+	return err;
+}
+
+static int accel_nisp_fs_rx_err_create_ft(struct mlx5e_nisp_fs *fs,
+					  struct mlx5e_accel_fs_nisp_prot *fs_prot,
+					  struct mlx5e_nisp_rx_err *rx_err)
+{
+	struct mlx5_flow_namespace *ns = mlx5e_fs_get_ns(fs->fs, false);
+	struct mlx5_flow_table_attr ft_attr = {};
+	struct mlx5_flow_table *ft;
+	int err;
+
+	ft_attr.max_fte = 2;
+	ft_attr.autogroup.max_num_groups = 2;
+	ft_attr.level = MLX5E_ACCEL_FS_ESP_FT_ERR_LEVEL; // MLX5E_ACCEL_FS_TCP_FT_LEVEL
+	ft_attr.prio = MLX5E_NIC_PRIO;
+	ft = mlx5_create_auto_grouped_flow_table(ns, &ft_attr);
+	if (IS_ERR(ft)) {
+		err = PTR_ERR(ft);
+		mlx5_core_err(fs->mdev, "fail to create nisp rx inline ft err=%d\n", err);
+		return err;
+	}
+
+	rx_err->ft = ft;
+	err = accel_nisp_fs_rx_err_add_rule(fs, fs_prot, rx_err);
+	if (err)
+		goto out_err;
+
+	return 0;
+
+out_err:
+	mlx5_destroy_flow_table(ft);
+	rx_err->ft = NULL;
+	return err;
+}
+
+static void accel_nisp_fs_rx_fs_destroy(struct mlx5e_accel_fs_nisp_prot *fs_prot)
+{
+	if (fs_prot->def_rule) {
+		mlx5_del_flow_rules(fs_prot->def_rule);
+		fs_prot->def_rule = NULL;
+	}
+
+	if (fs_prot->miss_rule) {
+		mlx5_del_flow_rules(fs_prot->miss_rule);
+		fs_prot->miss_rule = NULL;
+	}
+
+	if (fs_prot->miss_group) {
+		mlx5_destroy_flow_group(fs_prot->miss_group);
+		fs_prot->miss_group = NULL;
+	}
+
+	if (fs_prot->ft) {
+		mlx5_destroy_flow_table(fs_prot->ft);
+		fs_prot->ft = NULL;
+	}
+}
+
 static void setup_fte_udp_psp(struct mlx5_flow_spec *spec, u16 udp_port)
 {
 	spec->match_criteria_enable |= MLX5_MATCH_OUTER_HEADERS;
@@ -41,6 +258,251 @@ static void setup_fte_udp_psp(struct mlx5_flow_spec *spec, u16 udp_port)
 	MLX5_SET(fte_match_set_lyr_2_4, spec->match_value, ip_protocol, IPPROTO_UDP);
 }
 
+static int accel_nisp_fs_rx_create_ft(struct mlx5e_nisp_fs *fs,
+				      struct mlx5e_accel_fs_nisp_prot *fs_prot)
+{
+	struct mlx5_flow_namespace *ns = mlx5e_fs_get_ns(fs->fs, false);
+	u8 action[MLX5_UN_SZ_BYTES(set_add_copy_action_in_auto)] = {};
+	int inlen = MLX5_ST_SZ_BYTES(create_flow_group_in);
+	struct mlx5_modify_hdr *modify_hdr = NULL;
+	struct mlx5_flow_table_attr ft_attr = {};
+	struct mlx5_flow_destination dest = {};
+	struct mlx5_core_dev *mdev = fs->mdev;
+	struct mlx5_flow_group *miss_group;
+	MLX5_DECLARE_FLOW_ACT(flow_act);
+	struct mlx5_flow_handle *rule;
+	struct mlx5_flow_spec *spec;
+	struct mlx5_flow_table *ft;
+	u32 *flow_group_in;
+	int err = 0;
+
+	flow_group_in = kvzalloc(inlen, GFP_KERNEL);
+	spec = kvzalloc(sizeof(*spec), GFP_KERNEL);
+	if (!flow_group_in || !spec) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	/* Create FT */
+	ft_attr.max_fte = 2;
+	ft_attr.level = MLX5E_ACCEL_FS_ESP_FT_LEVEL;
+	ft_attr.prio = MLX5E_NIC_PRIO;
+	ft_attr.autogroup.num_reserved_entries = 1;
+	ft_attr.autogroup.max_num_groups = 1;
+	ft = mlx5_create_auto_grouped_flow_table(ns, &ft_attr);
+	if (IS_ERR(ft)) {
+		err = PTR_ERR(ft);
+		mlx5_core_err(mdev, "fail to create nisp rx ft err=%d\n", err);
+		goto out_err;
+	}
+	fs_prot->ft = ft;
+
+	/* Create miss_group */
+	MLX5_SET(create_flow_group_in, flow_group_in, start_flow_index, ft->max_fte - 1);
+	MLX5_SET(create_flow_group_in, flow_group_in, end_flow_index, ft->max_fte - 1);
+	miss_group = mlx5_create_flow_group(ft, flow_group_in);
+	if (IS_ERR(miss_group)) {
+		err = PTR_ERR(miss_group);
+		mlx5_core_err(mdev, "fail to create nisp rx miss_group err=%d\n", err);
+		goto out_err;
+	}
+	fs_prot->miss_group = miss_group;
+
+	/* Create miss rule */
+	rule = mlx5_add_flow_rules(ft, spec, &flow_act, &fs_prot->default_dest, 1);
+	if (IS_ERR(rule)) {
+		err = PTR_ERR(rule);
+		mlx5_core_err(mdev, "fail to create nisp rx miss_rule err=%d\n", err);
+		goto out_err;
+	}
+	fs_prot->miss_rule = rule;
+
+	/* Add default Rx Nisp rule */
+	setup_fte_udp_psp(spec, PSP_DEFAULT_UDP_PORT);
+	flow_act.crypto.type = MLX5_FLOW_CONTEXT_ENCRYPT_DECRYPT_TYPE_NISP;
+	/* Set bit[31, 30] NISP marker */
+	/* Set bit[29-23] nisp_syndrome is set in error FT */
+#define MLX5E_NISP_MARKER_BIT (BIT(30) | BIT(31))
+	MLX5_SET(set_action_in, action, action_type, MLX5_ACTION_TYPE_SET);
+	MLX5_SET(set_action_in, action, field, MLX5_ACTION_IN_FIELD_METADATA_REG_B);
+	MLX5_SET(set_action_in, action, data, MLX5E_NISP_MARKER_BIT);
+	MLX5_SET(set_action_in, action, offset, 0);
+	MLX5_SET(set_action_in, action, length, 32);
+
+	modify_hdr = mlx5_modify_header_alloc(mdev, MLX5_FLOW_NAMESPACE_KERNEL, 1, action);
+	if (IS_ERR(modify_hdr)) {
+		err = PTR_ERR(modify_hdr);
+		mlx5_core_err(mdev, "fail to alloc nisp set modify_header_id err=%d\n", err);
+		modify_hdr = NULL;
+		goto out_err;
+	}
+
+	flow_act.action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST |
+		MLX5_FLOW_CONTEXT_ACTION_CRYPTO_DECRYPT |
+		MLX5_FLOW_CONTEXT_ACTION_MOD_HDR;
+	flow_act.modify_hdr = modify_hdr;
+	dest.type = MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE;
+	dest.ft = fs_prot->rx_err.ft;
+	rule = mlx5_add_flow_rules(fs_prot->ft, spec, &flow_act, &dest, 1);
+	if (IS_ERR(rule)) {
+		err = PTR_ERR(rule);
+		mlx5_core_err(mdev,
+			      "fail to add nisp rule Rx dycrption, err=%d, flow_act.action = %#04X\n",
+			      err, flow_act.action);
+		goto out_err;
+	}
+
+	fs_prot->def_rule = rule;
+	goto out;
+
+out_err:
+	accel_nisp_fs_rx_fs_destroy(fs_prot);
+out:
+	kvfree(flow_group_in);
+	kvfree(spec);
+	return err;
+}
+
+static int accel_nisp_fs_rx_destroy(struct mlx5e_nisp_fs *fs, enum accel_fs_nisp_type type)
+{
+	struct mlx5e_accel_fs_nisp_prot *fs_prot;
+	struct mlx5e_accel_fs_nisp *accel_nisp;
+
+	accel_nisp = fs->rx_fs;
+
+	/* The netdev unreg already happened, so all offloaded rule are already removed */
+	fs_prot = &accel_nisp->fs_prot[type];
+
+	accel_nisp_fs_rx_fs_destroy(fs_prot);
+
+	accel_nisp_fs_rx_err_destroy_ft(fs, &fs_prot->rx_err);
+
+	return 0;
+}
+
+static int accel_nisp_fs_rx_create(struct mlx5e_nisp_fs *fs, enum accel_fs_nisp_type type)
+{
+	struct mlx5_ttc_table *ttc = mlx5e_fs_get_ttc(fs->fs, false);
+	struct mlx5e_accel_fs_nisp_prot *fs_prot;
+	struct mlx5e_accel_fs_nisp *accel_nisp;
+	int err;
+
+	accel_nisp = fs->rx_fs;
+	fs_prot = &accel_nisp->fs_prot[type];
+
+	fs_prot->default_dest = mlx5_ttc_get_default_dest(ttc, fs_nisp2tt(type));
+
+	err = accel_nisp_fs_rx_err_create_ft(fs, fs_prot, &fs_prot->rx_err);
+	if (err)
+		return err;
+
+	err = accel_nisp_fs_rx_create_ft(fs, fs_prot);
+	if (err)
+		accel_nisp_fs_rx_err_destroy_ft(fs, &fs_prot->rx_err);
+
+	return err;
+}
+
+static int accel_nisp_fs_rx_ft_get(struct mlx5e_nisp_fs *fs, enum accel_fs_nisp_type type)
+{
+	struct mlx5_ttc_table *ttc = mlx5e_fs_get_ttc(fs->fs, false);
+	struct mlx5e_accel_fs_nisp_prot *fs_prot;
+	struct mlx5_flow_destination dest = {};
+	struct mlx5e_accel_fs_nisp *accel_nisp;
+	int err = 0;
+
+	if (!fs || !fs->rx_fs)
+		return -EINVAL;
+
+	accel_nisp = fs->rx_fs;
+	fs_prot = &accel_nisp->fs_prot[type];
+	mutex_lock(&fs_prot->prot_mutex);
+	if (fs_prot->refcnt++)
+		goto out;
+
+	/* create FT */
+	err = accel_nisp_fs_rx_create(fs, type);
+	if (err) {
+		fs_prot->refcnt--;
+		goto out;
+	}
+
+	/* connect */
+	dest.type = MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE;
+	dest.ft = fs_prot->ft;
+	mlx5_ttc_fwd_dest(ttc, fs_nisp2tt(type), &dest);
+
+out:
+	mutex_unlock(&fs_prot->prot_mutex);
+	return err;
+}
+
+static void accel_nisp_fs_rx_ft_put(struct mlx5e_nisp_fs *fs, enum accel_fs_nisp_type type)
+{
+	struct mlx5_ttc_table *ttc = mlx5e_fs_get_ttc(fs->fs, false);
+	struct mlx5e_accel_fs_nisp_prot *fs_prot;
+	struct mlx5e_accel_fs_nisp *accel_nisp;
+
+	accel_nisp = fs->rx_fs;
+	fs_prot = &accel_nisp->fs_prot[type];
+	mutex_lock(&fs_prot->prot_mutex);
+	if (--fs_prot->refcnt)
+		goto out;
+
+	/* disconnect */
+	mlx5_ttc_fwd_default_dest(ttc, fs_nisp2tt(type));
+
+	/* remove FT */
+	accel_nisp_fs_rx_destroy(fs, type);
+
+out:
+	mutex_unlock(&fs_prot->prot_mutex);
+}
+
+static void accel_nisp_fs_cleanup_rx(struct mlx5e_nisp_fs *fs)
+{
+	struct mlx5e_accel_fs_nisp_prot *fs_prot;
+	struct mlx5e_accel_fs_nisp *accel_nisp;
+	enum accel_fs_nisp_type i;
+
+	if (!fs->rx_fs)
+		return;
+
+	for (i = 0; i < ACCEL_FS_NISP_NUM_TYPES; i++)
+		accel_nisp_fs_rx_ft_put(fs, i);
+
+	accel_nisp = fs->rx_fs;
+	for (i = 0; i < ACCEL_FS_NISP_NUM_TYPES; i++) {
+		fs_prot = &accel_nisp->fs_prot[i];
+		mutex_destroy(&fs_prot->prot_mutex);
+		WARN_ON(fs_prot->refcnt);
+	}
+	kfree(fs->rx_fs);
+	fs->rx_fs = NULL;
+}
+
+static int accel_nisp_fs_init_rx(struct mlx5e_nisp_fs *fs)
+{
+	struct mlx5e_accel_fs_nisp_prot *fs_prot;
+	struct mlx5e_accel_fs_nisp *accel_nisp;
+	enum accel_fs_nisp_type i;
+
+	accel_nisp = kzalloc(sizeof(*accel_nisp), GFP_KERNEL);
+	if (!accel_nisp)
+		return -ENOMEM;
+
+	for (i = 0; i < ACCEL_FS_NISP_NUM_TYPES; i++) {
+		fs_prot = &accel_nisp->fs_prot[i];
+		mutex_init(&fs_prot->prot_mutex);
+	}
+
+	for (i = 0; i < ACCEL_FS_NISP_NUM_TYPES; i++)
+		accel_nisp_fs_rx_ft_get(fs, ACCEL_FS_NISP4);
+
+	fs->rx_fs = accel_nisp;
+	return 0;
+}
+
 static int accel_nisp_fs_tx_create_ft_table(struct mlx5e_nisp_fs *fs)
 {
 	int inlen = MLX5_ST_SZ_BYTES(create_flow_group_in);
@@ -207,6 +669,7 @@ int mlx5_accel_nisp_fs_init_tx_tables(struct mlx5e_priv *priv)
 
 void mlx5e_accel_nisp_fs_cleanup(struct mlx5e_nisp_fs *fs)
 {
+	accel_nisp_fs_cleanup_rx(fs);
 	accel_nisp_fs_cleanup_tx(fs);
 	kfree(fs);
 }
@@ -226,8 +689,14 @@ struct mlx5e_nisp_fs *mlx5e_accel_nisp_fs_init(struct mlx5e_priv *priv)
 		goto err_tx;
 
 	fs->fs = priv->fs;
+	err = accel_nisp_fs_init_rx(fs);
+	if (err)
+		goto err_rx;
 
 	return fs;
+
+err_rx:
+	accel_nisp_fs_cleanup_tx(fs);
 err_tx:
 	kfree(fs);
 	return ERR_PTR(err);
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC net-next 13/15] net/mlx5e: Configure PSP Rx flow steering rules
  2024-05-10  3:04 [RFC net-next 00/15] add basic PSP encryption for TCP connections Jakub Kicinski
                   ` (11 preceding siblings ...)
  2024-05-10  3:04 ` [RFC net-next 12/15] net/mlx5e: Add PSP steering in local NIC RX Jakub Kicinski
@ 2024-05-10  3:04 ` Jakub Kicinski
  2024-05-10  3:04 ` [RFC net-next 14/15] net/mlx5e: Add Rx data path offload Jakub Kicinski
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 44+ messages in thread
From: Jakub Kicinski @ 2024-05-10  3:04 UTC (permalink / raw
  To: netdev
  Cc: pabeni, willemdebruijn.kernel, borisp, gal, cratiu, rrameshbabu,
	steffen.klassert, tariqt, Raed Salem, Jakub Kicinski

From: Raed Salem <raeds@nvidia.com>

Set the Rx PSP flow steering rule where PSP packet is identified and
decrypted using the dedicated UDP destination port number 1000. If packet
is decrypted then a PSP marker and syndrome are added to metadata so SW can
use it later on in Rx data path.

The rule is set as part of init_rx netdev profile implementation.

Signed-off-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 .../mellanox/mlx5/core/en_accel/en_accel.h    | 14 +++++-
 .../mellanox/mlx5/core/en_accel/nisp_fs.c     | 46 ++++++++++++++++---
 .../mellanox/mlx5/core/en_accel/nisp_fs.h     |  7 +++
 3 files changed, 60 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/en_accel.h b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/en_accel.h
index cea997847fa4..38d2c73a0f79 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/en_accel.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/en_accel.h
@@ -231,12 +231,24 @@ static inline void mlx5e_accel_tx_finish(struct mlx5e_txqsq *sq,
 
 static inline int mlx5e_accel_init_rx(struct mlx5e_priv *priv)
 {
-	return mlx5e_ktls_init_rx(priv);
+	int err;
+
+	err = mlx5_accel_nisp_fs_init_rx_tables(priv);
+	if (err)
+		goto out;
+
+	err = mlx5e_ktls_init_rx(priv);
+	if (err)
+		mlx5_accel_nisp_fs_cleanup_rx_tables(priv);
+
+out:
+	return err;
 }
 
 static inline void mlx5e_accel_cleanup_rx(struct mlx5e_priv *priv)
 {
 	mlx5e_ktls_cleanup_rx(priv);
+	mlx5_accel_nisp_fs_cleanup_rx_tables(priv);
 }
 
 static inline int mlx5e_accel_init_tx(struct mlx5e_priv *priv)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp_fs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp_fs.c
index 11f583d13bdd..0d80f646e5b8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp_fs.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp_fs.c
@@ -468,9 +468,6 @@ static void accel_nisp_fs_cleanup_rx(struct mlx5e_nisp_fs *fs)
 	if (!fs->rx_fs)
 		return;
 
-	for (i = 0; i < ACCEL_FS_NISP_NUM_TYPES; i++)
-		accel_nisp_fs_rx_ft_put(fs, i);
-
 	accel_nisp = fs->rx_fs;
 	for (i = 0; i < ACCEL_FS_NISP_NUM_TYPES; i++) {
 		fs_prot = &accel_nisp->fs_prot[i];
@@ -496,13 +493,50 @@ static int accel_nisp_fs_init_rx(struct mlx5e_nisp_fs *fs)
 		mutex_init(&fs_prot->prot_mutex);
 	}
 
-	for (i = 0; i < ACCEL_FS_NISP_NUM_TYPES; i++)
-		accel_nisp_fs_rx_ft_get(fs, ACCEL_FS_NISP4);
-
 	fs->rx_fs = accel_nisp;
+
 	return 0;
 }
 
+void  mlx5_accel_nisp_fs_cleanup_rx_tables(struct mlx5e_priv *priv)
+{
+	int i;
+
+	if (!priv->nisp)
+		return;
+
+	for (i = 0; i < ACCEL_FS_NISP_NUM_TYPES; i++)
+		accel_nisp_fs_rx_ft_put(priv->nisp->fs, i);
+}
+
+int  mlx5_accel_nisp_fs_init_rx_tables(struct mlx5e_priv *priv)
+{
+	enum accel_fs_nisp_type i;
+	struct mlx5e_nisp_fs *fs;
+	int err;
+
+	if (!priv->nisp)
+		return 0;
+
+	fs = priv->nisp->fs;
+	for (i = 0; i < ACCEL_FS_NISP_NUM_TYPES; i++) {
+		err = accel_nisp_fs_rx_ft_get(fs, i);
+		if (err)
+			goto out_err;
+	}
+
+	return 0;
+
+out_err:
+	i--;
+	while (i >= 0) {
+		accel_nisp_fs_rx_ft_put(fs, i);
+		--i;
+	}
+
+	return err;
+}
+
 static int accel_nisp_fs_tx_create_ft_table(struct mlx5e_nisp_fs *fs)
 {
 	int inlen = MLX5_ST_SZ_BYTES(create_flow_group_in);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp_fs.h b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp_fs.h
index 11cdc447a401..8c44dd51317c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp_fs.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp_fs.h
@@ -12,6 +12,8 @@ struct mlx5e_nisp_fs *mlx5e_accel_nisp_fs_init(struct mlx5e_priv *priv);
 void mlx5e_accel_nisp_fs_cleanup(struct mlx5e_nisp_fs *fs);
 int mlx5_accel_nisp_fs_init_tx_tables(struct mlx5e_priv *priv);
 void mlx5_accel_nisp_fs_cleanup_tx_tables(struct mlx5e_priv *priv);
+int mlx5_accel_nisp_fs_init_rx_tables(struct mlx5e_priv *priv);
+void mlx5_accel_nisp_fs_cleanup_rx_tables(struct mlx5e_priv *priv);
 #else
 static inline int mlx5_accel_nisp_fs_init_tx_tables(struct mlx5e_priv *priv)
 {
@@ -19,5 +21,10 @@ static inline int mlx5_accel_nisp_fs_init_tx_tables(struct mlx5e_priv *priv)
 }
 
 static inline void mlx5_accel_nisp_fs_cleanup_tx_tables(struct mlx5e_priv *priv) { }
+static inline int mlx5_accel_nisp_fs_init_rx_tables(struct mlx5e_priv *priv)
+{
+	return 0;
+}
+static inline void mlx5_accel_nisp_fs_cleanup_rx_tables(struct mlx5e_priv *priv) { }
 #endif /* CONFIG_MLX5_EN_PSP */
 #endif /* __MLX5_NISP_FS_H__ */
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC net-next 14/15] net/mlx5e: Add Rx data path offload
  2024-05-10  3:04 [RFC net-next 00/15] add basic PSP encryption for TCP connections Jakub Kicinski
                   ` (12 preceding siblings ...)
  2024-05-10  3:04 ` [RFC net-next 13/15] net/mlx5e: Configure PSP Rx flow steering rules Jakub Kicinski
@ 2024-05-10  3:04 ` Jakub Kicinski
  2024-05-13  1:54   ` Willem de Bruijn
  2024-05-10  3:04 ` [RFC net-next 15/15] net/mlx5e: Implement PSP key_rotate operation Jakub Kicinski
  2024-05-29  9:16 ` [RFC net-next 00/15] add basic PSP encryption for TCP connections Boris Pismenny
  15 siblings, 1 reply; 44+ messages in thread
From: Jakub Kicinski @ 2024-05-10  3:04 UTC (permalink / raw
  To: netdev
  Cc: pabeni, willemdebruijn.kernel, borisp, gal, cratiu, rrameshbabu,
	steffen.klassert, tariqt, Raed Salem, Jakub Kicinski

From: Raed Salem <raeds@nvidia.com>

On receive flow inspect received packets for PSP offload indication using
the cqe, for PSP offloaded packets set SKB PSP metadata i.e spi, header
length and key generation number to stack for further processing.

Signed-off-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 .../mellanox/mlx5/core/en_accel/ipsec_rxtx.h  |  2 +-
 .../mellanox/mlx5/core/en_accel/nisp_rxtx.c   | 79 +++++++++++++++++++
 .../mellanox/mlx5/core/en_accel/nisp_rxtx.h   | 28 +++++++
 .../net/ethernet/mellanox/mlx5/core/en_rx.c   | 10 +++
 4 files changed, 118 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.h b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.h
index 82064614846f..9f025c80a6ef 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.h
@@ -40,7 +40,7 @@
 #include "en/txrx.h"
 
 /* Bit31: IPsec marker, Bit30: reserved, Bit29-24: IPsec syndrome, Bit23-0: IPsec obj id */
-#define MLX5_IPSEC_METADATA_MARKER(metadata)  (((metadata) >> 31) & 0x1)
+#define MLX5_IPSEC_METADATA_MARKER(metadata)  ((((metadata) >> 30) & 0x3) == 0x2)
 #define MLX5_IPSEC_METADATA_SYNDROM(metadata) (((metadata) >> 24) & GENMASK(5, 0))
 #define MLX5_IPSEC_METADATA_HANDLE(metadata)  ((metadata) & GENMASK(23, 0))
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp_rxtx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp_rxtx.c
index c719b2916677..17f42b8d9fd8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp_rxtx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp_rxtx.c
@@ -15,6 +15,12 @@
 #include "en_accel/nisp.h"
 #include "lib/psp_defs.h"
 
+enum {
+	MLX5E_NISP_OFFLOAD_RX_SYNDROME_DECRYPTED,
+	MLX5E_NISP_OFFLOAD_RX_SYNDROME_AUTH_FAILED,
+	MLX5E_NISP_OFFLOAD_RX_SYNDROME_BAD_TRAILER,
+};
+
 static void mlx5e_nisp_set_swp(struct sk_buff *skb,
 			       struct mlx5e_accel_tx_nisp_state *nisp_st,
 			       struct mlx5_wqe_eth_seg *eseg)
@@ -114,6 +120,79 @@ static bool mlx5e_nisp_set_state(struct mlx5e_priv *priv,
 	return ret;
 }
 
+void mlx5e_nisp_csum_complete(struct net_device *netdev, struct sk_buff *skb)
+{
+	pskb_trim(skb, skb->len - PSP_TRL_SIZE);
+}
+
+/* Receive handler for PSP packets.
+ *
+ * Presently it accepts only already-authenticated packets and does not
+ * support optional fields, such as virtualization cookies.
+ */
+static int psp_rcv(struct sk_buff *skb)
+{
+	const struct psphdr *psph;
+	int depth = 0, end_depth;
+	struct psp_skb_ext *pse;
+	struct ipv6hdr *ipv6h;
+	struct ethhdr *eth;
+	__be16 proto;
+	u32 spi;
+
+	eth = (struct ethhdr *)(skb->data);
+	proto = __vlan_get_protocol(skb, eth->h_proto, &depth);
+	if (proto != htons(ETH_P_IPV6))
+		return -EINVAL;
+
+	ipv6h = (struct ipv6hdr *)(skb->data + depth);
+	depth += sizeof(*ipv6h);
+	end_depth = depth + sizeof(struct udphdr) + sizeof(struct psphdr);
+
+	if (unlikely(end_depth > skb_headlen(skb)))
+		return -EINVAL;
+
+	pse = skb_ext_add(skb, SKB_EXT_PSP);
+	if (!pse)
+		return -EINVAL;
+
+	psph = (const struct psphdr *)(skb->data + depth + sizeof(struct udphdr));
+	pse->spi = psph->spi;
+	spi = ntohl(psph->spi);
+	pse->generation = 0;
+	pse->version = FIELD_GET(PSPHDR_VERFL_VERSION, psph->verfl);
+
+	ipv6h->nexthdr = psph->nexthdr;
+	ipv6h->payload_len =
+		htons(ntohs(ipv6h->payload_len) - PSP_ENCAP_HLEN - PSP_TRL_SIZE);
+
+	memmove(skb->data + PSP_ENCAP_HLEN, skb->data, depth);
+	skb_pull(skb, PSP_ENCAP_HLEN);
+
+	return 0;
+}
+
+void mlx5e_nisp_offload_handle_rx_skb(struct net_device *netdev, struct sk_buff *skb,
+				      struct mlx5_cqe64 *cqe)
+{
+	u32 nisp_meta_data = be32_to_cpu(cqe->ft_metadata);
+
+	/* TBD: report errors as SW counters to ethtool, any further handling ? */
+	switch (MLX5_NISP_METADATA_SYNDROM(nisp_meta_data)) {
+	case MLX5E_NISP_OFFLOAD_RX_SYNDROME_DECRYPTED:
+		if (psp_rcv(skb))
+			netdev_warn_once(netdev, "PSP handling failed");
+		skb->decrypted = 1;
+		break;
+	case MLX5E_NISP_OFFLOAD_RX_SYNDROME_AUTH_FAILED:
+		break;
+	case MLX5E_NISP_OFFLOAD_RX_SYNDROME_BAD_TRAILER:
+		break;
+	default:
+		break;
+	}
+}
+
 void mlx5e_nisp_tx_build_eseg(struct mlx5e_priv *priv, struct sk_buff *skb,
 			      struct mlx5e_accel_tx_nisp_state *nisp_st,
 			      struct mlx5_wqe_eth_seg *eseg)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp_rxtx.h b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp_rxtx.h
index 1350a73c2019..834481232b21 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp_rxtx.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp_rxtx.h
@@ -10,6 +10,11 @@
 #include "en.h"
 #include "en/txrx.h"
 
+/* Bit30: NISP marker, Bit29-23: NISP syndrome, Bit22-0: NISP obj id */
+#define MLX5_NISP_METADATA_MARKER(metadata)  ((((metadata) >> 30) & 0x3) == 0x3)
+#define MLX5_NISP_METADATA_SYNDROM(metadata) (((metadata) >> 23) & GENMASK(6, 0))
+#define MLX5_NISP_METADATA_HANDLE(metadata)  ((metadata) & GENMASK(22, 0))
+
 struct mlx5e_accel_tx_nisp_state {
 	u32 tailen;
 	u32 keyid;
@@ -75,6 +80,16 @@ static inline unsigned int mlx5e_nisp_tx_ids_len(struct mlx5e_accel_tx_nisp_stat
 {
 	return nisp_st->tailen;
 }
+
+static inline bool mlx5e_nisp_is_rx_flow(struct mlx5_cqe64 *cqe)
+{
+	return MLX5_NISP_METADATA_MARKER(be32_to_cpu(cqe->ft_metadata));
+}
+
+void mlx5e_nisp_offload_handle_rx_skb(struct net_device *netdev, struct sk_buff *skb,
+				      struct mlx5_cqe64 *cqe);
+
+void mlx5e_nisp_csum_complete(struct net_device *netdev, struct sk_buff *skb);
 #else
 static inline bool mlx5e_psp_is_offload_state(struct mlx5e_accel_tx_nisp_state *nisp_state)
 {
@@ -92,5 +107,18 @@ static inline bool mlx5e_nisp_txwqe_build_eseg_csum(struct mlx5e_txqsq *sq, stru
 {
 	return false;
 }
+
+static inline bool mlx5e_nisp_is_rx_flow(struct mlx5_cqe64 *cqe)
+{
+	return false;
+}
+
+static inline void mlx5e_nisp_offload_handle_rx_skb(struct net_device *netdev,
+						    struct sk_buff *skb,
+						    struct mlx5_cqe64 *cqe)
+{
+}
+
+static inline void mlx5e_nisp_csum_complete(struct net_device *netdev, struct sk_buff *skb) { }
 #endif
 #endif
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index d601b5faaed5..41a4f8832f2f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -51,6 +51,7 @@
 #include "ipoib/ipoib.h"
 #include "en_accel/ipsec.h"
 #include "en_accel/macsec.h"
+#include "en_accel/nisp_rxtx.h"
 #include "en_accel/ipsec_rxtx.h"
 #include "en_accel/ktls_txrx.h"
 #include "en/xdp.h"
@@ -1517,6 +1518,12 @@ static inline void mlx5e_handle_csum(struct net_device *netdev,
 		skb->ip_summed = CHECKSUM_COMPLETE;
 		skb->csum = csum_unfold((__force __sum16)cqe->check_sum);
 
+		if (unlikely(mlx5e_nisp_is_rx_flow(cqe))) {
+			/* TBD: PSP csum complete corrections for now chose csum_unnecessary path */
+			mlx5e_nisp_csum_complete(netdev, skb);
+			goto csum_unnecessary;
+		}
+
 		if (test_bit(MLX5E_RQ_STATE_CSUM_FULL, &rq->state))
 			return; /* CQE csum covers all received bytes */
 
@@ -1559,6 +1566,9 @@ static inline void mlx5e_build_rx_skb(struct mlx5_cqe64 *cqe,
 	if (unlikely(get_cqe_tls_offload(cqe)))
 		mlx5e_ktls_handle_rx_skb(rq, skb, cqe, &cqe_bcnt);
 
+	if (unlikely(mlx5e_nisp_is_rx_flow(cqe)))
+		mlx5e_nisp_offload_handle_rx_skb(netdev, skb, cqe);
+
 	if (unlikely(mlx5_ipsec_is_rx_flow(cqe)))
 		mlx5e_ipsec_offload_handle_rx_skb(netdev, skb,
 						  be32_to_cpu(cqe->ft_metadata));
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC net-next 15/15] net/mlx5e: Implement PSP key_rotate operation
  2024-05-10  3:04 [RFC net-next 00/15] add basic PSP encryption for TCP connections Jakub Kicinski
                   ` (13 preceding siblings ...)
  2024-05-10  3:04 ` [RFC net-next 14/15] net/mlx5e: Add Rx data path offload Jakub Kicinski
@ 2024-05-10  3:04 ` Jakub Kicinski
  2024-05-29  9:16 ` [RFC net-next 00/15] add basic PSP encryption for TCP connections Boris Pismenny
  15 siblings, 0 replies; 44+ messages in thread
From: Jakub Kicinski @ 2024-05-10  3:04 UTC (permalink / raw
  To: netdev
  Cc: pabeni, willemdebruijn.kernel, borisp, gal, cratiu, rrameshbabu,
	steffen.klassert, tariqt, Raed Salem, Jakub Kicinski

From: Raed Salem <raeds@nvidia.com>

Implement .key_rotate operation where when invoked will cause the HW to use
a new master key to derive PSP spi/key pairs with complience with PSP spec.

Signed-off-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 .../net/ethernet/mellanox/mlx5/core/en_accel/nisp.c   | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp.c
index 1131aa6e9b3d..cab4e79135d8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp.c
@@ -96,11 +96,22 @@ static void mlx5e_psp_assoc_del(struct psp_dev *psd, struct psp_assoc *pas)
 	atomic_dec(&nisp->tx_key_cnt);
 }
 
+static int mlx5e_psp_key_rotate(struct psp_dev *psd, struct netlink_ext_ack *exack)
+{
+	struct mlx5e_priv *priv = netdev_priv(psd->main_netdev);
+
+	/* no support for protecting against external rotations */
+	psd->generation = 0;
+
+	return mlx5e_nisp_rotate_key(priv->mdev);
+}
+
 static struct psp_dev_ops mlx5_psp_ops = {
 	.set_config   = mlx5e_psp_set_config,
 	.rx_spi_alloc = mlx5e_psp_rx_spi_alloc,
 	.tx_key_add   = mlx5e_psp_assoc_add,
 	.tx_key_del   = mlx5e_psp_assoc_del,
+	.key_rotate   = mlx5e_psp_key_rotate,
 };
 
 void mlx5e_nisp_unregister(struct mlx5e_priv *priv)
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [RFC net-next 01/15] psp: add documentation
  2024-05-10  3:04 ` [RFC net-next 01/15] psp: add documentation Jakub Kicinski
@ 2024-05-10 22:19   ` Saeed Mahameed
  2024-05-11  0:11     ` Jakub Kicinski
  2024-05-13  1:24   ` Willem de Bruijn
  1 sibling, 1 reply; 44+ messages in thread
From: Saeed Mahameed @ 2024-05-10 22:19 UTC (permalink / raw
  To: Jakub Kicinski
  Cc: netdev, pabeni, willemdebruijn.kernel, borisp, gal, cratiu,
	rrameshbabu, steffen.klassert, tariqt

On 09 May 20:04, Jakub Kicinski wrote:
>Add documentation of things which belong in the docs rather
>than commit messages.
>
>Signed-off-by: Jakub Kicinski <kuba@kernel.org>
>---
> Documentation/networking/index.rst |   1 +
> Documentation/networking/psp.rst   | 138 +++++++++++++++++++++++++++++
> 2 files changed, 139 insertions(+)
> create mode 100644 Documentation/networking/psp.rst
>
>diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst
>index 7664c0bfe461..0376029ecbdf 100644
>--- a/Documentation/networking/index.rst
>+++ b/Documentation/networking/index.rst
>@@ -94,6 +94,7 @@ Refer to :ref:`netdev-FAQ` for a guide on netdev development process specifics.
>    ppp_generic
>    proc_net_tcp
>    pse-pd/index
>+   psp
>    radiotap-headers
>    rds
>    regulatory
>diff --git a/Documentation/networking/psp.rst b/Documentation/networking/psp.rst
>new file mode 100644
>index 000000000000..a39b464813ab
>--- /dev/null
>+++ b/Documentation/networking/psp.rst
>@@ -0,0 +1,138 @@
>+.. SPDX-License-Identifier: GPL-2.0-only
>+
>+=====================
>+PSP Security Protocol
>+=====================
>+
>+Protocol
>+========
>+
>+PSP Security Protocol (PSP) was defined at Google and published in:
>+
>+https://raw.githubusercontent.com/google/psp/main/doc/PSP_Arch_Spec.pdf
>+
>+This section briefly covers protocol aspects crucial for understanding
>+the kernel API. Refer to the protocol specification for further details.
>+
>+Note that the kernel implementation and documentation uses the term
>+"secret state" in place of "master key", it is both less confusing
>+to an average developer and is less likely to run afoul any naming
>+guidelines.
>+

[ ... ] 

>+User facing API
>+===============
>+
>+PSP is designed primarily for hardware offloads. There is currently
>+no software fallback for systems which do not have PSP capable NICs.
>+There is also no standard (or otherwise defined) way of establishing
>+a PSP-secured connection or exchanging the symmetric keys.
>+
>+The expectation is that higher layer protocols will take care of
>+protocol and key negotiation. For example one may use TLS key exchange,
>+announce the PSP capability, and switch to PSP if both endpoints
>+are PSP-capable.
>+

The documentation doesn't include anything about userspace, other than
highlevel remarks on how this is expected to work.
What are we planning for userspace? I know we have kperf basic support and
some experimental python library, but nothing official or psp centric. 

I propose to start community driven project with a well established
library, with some concrete sample implementation for key negotiation,
as a plugin maybe, so anyone can implement their own key-exchange
mechanisms on top of the official psp library.



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC net-next 01/15] psp: add documentation
  2024-05-10 22:19   ` Saeed Mahameed
@ 2024-05-11  0:11     ` Jakub Kicinski
  2024-05-11  9:41       ` Vadim Fedorenko
  0 siblings, 1 reply; 44+ messages in thread
From: Jakub Kicinski @ 2024-05-11  0:11 UTC (permalink / raw
  To: Saeed Mahameed
  Cc: netdev, pabeni, willemdebruijn.kernel, borisp, gal, cratiu,
	rrameshbabu, steffen.klassert, tariqt, mingtao, knekritz

On Fri, 10 May 2024 15:19:23 -0700 Saeed Mahameed wrote:
> >+PSP is designed primarily for hardware offloads. There is currently
> >+no software fallback for systems which do not have PSP capable NICs.
> >+There is also no standard (or otherwise defined) way of establishing
> >+a PSP-secured connection or exchanging the symmetric keys.
> >+
> >+The expectation is that higher layer protocols will take care of
> >+protocol and key negotiation. For example one may use TLS key exchange,
> >+announce the PSP capability, and switch to PSP if both endpoints
> >+are PSP-capable.
> 
> The documentation doesn't include anything about userspace, other than
> highlevel remarks on how this is expected to work.

The cover letter does.

> What are we planning for userspace? I know we have kperf basic support and
> some experimental python library, but nothing official or psp centric. 

Remind me, how long did it take for kernel TLS support to be merged
into OpenSSL? ;)

> I propose to start community driven project with a well established
> library, with some concrete sample implementation for key negotiation,
> as a plugin maybe, so anyone can implement their own key-exchange
> mechanisms on top of the official psp library.

Yes, I should have CCed Meta's folks who work on TLS [1]. Adding them
now. More than happy to facilitate the discussion, maybe Willem can
CC the right Google folks, IDK who else...

We should start moving with the kernel support, IMO, until we do 
the user space implementation is stalled. I don't expect that the
way we install keys in the kernel would be impacted by the handshake.

[1] https://github.com/facebookincubator/fizz

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC net-next 01/15] psp: add documentation
  2024-05-11  0:11     ` Jakub Kicinski
@ 2024-05-11  9:41       ` Vadim Fedorenko
  2024-05-11 16:25         ` David Ahern
  0 siblings, 1 reply; 44+ messages in thread
From: Vadim Fedorenko @ 2024-05-11  9:41 UTC (permalink / raw
  To: Jakub Kicinski, Saeed Mahameed
  Cc: netdev, pabeni, willemdebruijn.kernel, borisp, gal, cratiu,
	rrameshbabu, steffen.klassert, tariqt, mingtao, knekritz

On 11.05.2024 01:11, Jakub Kicinski wrote:
> On Fri, 10 May 2024 15:19:23 -0700 Saeed Mahameed wrote:
>>> +PSP is designed primarily for hardware offloads. There is currently
>>> +no software fallback for systems which do not have PSP capable NICs.
>>> +There is also no standard (or otherwise defined) way of establishing
>>> +a PSP-secured connection or exchanging the symmetric keys.
>>> +
>>> +The expectation is that higher layer protocols will take care of
>>> +protocol and key negotiation. For example one may use TLS key exchange,
>>> +announce the PSP capability, and switch to PSP if both endpoints
>>> +are PSP-capable.
>>
>> The documentation doesn't include anything about userspace, other than
>> highlevel remarks on how this is expected to work.
> 
> The cover letter does.
> 
>> What are we planning for userspace? I know we have kperf basic support and
>> some experimental python library, but nothing official or psp centric.
> 
> Remind me, how long did it take for kernel TLS support to be merged
> into OpenSSL? ;)

I believe it was bad timing for OpenSSL. The patches with kTLS support were
sitting in the main branch for long time, the problem was postponed release with
with jump to new versioning schema.

But I agree, there is no easy way to start coding user-space lib without initial
support from kernel.

>> I propose to start community driven project with a well established
>> library, with some concrete sample implementation for key negotiation,
>> as a plugin maybe, so anyone can implement their own key-exchange
>> mechanisms on top of the official psp library.
> 
> Yes, I should have CCed Meta's folks who work on TLS [1]. Adding them
> now. More than happy to facilitate the discussion, maybe Willem can
> CC the right Google folks, IDK who else...
> 
> We should start moving with the kernel support, IMO, until we do
> the user space implementation is stalled. I don't expect that the
> way we install keys in the kernel would be impacted by the handshake.
> 
> [1] https://github.com/facebookincubator/fizz


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC net-next 01/15] psp: add documentation
  2024-05-11  9:41       ` Vadim Fedorenko
@ 2024-05-11 16:25         ` David Ahern
  0 siblings, 0 replies; 44+ messages in thread
From: David Ahern @ 2024-05-11 16:25 UTC (permalink / raw
  To: Vadim Fedorenko, Jakub Kicinski, Saeed Mahameed
  Cc: netdev, pabeni, willemdebruijn.kernel, borisp, gal, cratiu,
	rrameshbabu, steffen.klassert, tariqt, mingtao, knekritz

On 5/11/24 3:41 AM, Vadim Fedorenko wrote:
> But I agree, there is no easy way to start coding user-space lib without
> initial
> support from kernel.


The 2 sides should be co-developed; it is the only way to merge a sane uapi.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC net-next 01/15] psp: add documentation
  2024-05-10  3:04 ` [RFC net-next 01/15] psp: add documentation Jakub Kicinski
  2024-05-10 22:19   ` Saeed Mahameed
@ 2024-05-13  1:24   ` Willem de Bruijn
  2024-05-29 17:35     ` Jakub Kicinski
  1 sibling, 1 reply; 44+ messages in thread
From: Willem de Bruijn @ 2024-05-13  1:24 UTC (permalink / raw
  To: Jakub Kicinski, netdev
  Cc: pabeni, willemdebruijn.kernel, borisp, gal, cratiu, rrameshbabu,
	steffen.klassert, tariqt, Jakub Kicinski

Jakub Kicinski wrote:
> Add documentation of things which belong in the docs rather
> than commit messages.
> 
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
>  Documentation/networking/index.rst |   1 +
>  Documentation/networking/psp.rst   | 138 +++++++++++++++++++++++++++++
>  2 files changed, 139 insertions(+)
>  create mode 100644 Documentation/networking/psp.rst
> 
> diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst
> index 7664c0bfe461..0376029ecbdf 100644
> --- a/Documentation/networking/index.rst
> +++ b/Documentation/networking/index.rst
> @@ -94,6 +94,7 @@ Refer to :ref:`netdev-FAQ` for a guide on netdev development process specifics.
>     ppp_generic
>     proc_net_tcp
>     pse-pd/index
> +   psp
>     radiotap-headers
>     rds
>     regulatory
> diff --git a/Documentation/networking/psp.rst b/Documentation/networking/psp.rst
> new file mode 100644
> index 000000000000..a39b464813ab
> --- /dev/null
> +++ b/Documentation/networking/psp.rst
> @@ -0,0 +1,138 @@
> +.. SPDX-License-Identifier: GPL-2.0-only
> +
> +=====================
> +PSP Security Protocol
> +=====================
> +
> +Protocol
> +========
> +
> +PSP Security Protocol (PSP) was defined at Google and published in:
> +
> +https://raw.githubusercontent.com/google/psp/main/doc/PSP_Arch_Spec.pdf
> +
> +This section briefly covers protocol aspects crucial for understanding
> +the kernel API. Refer to the protocol specification for further details.
> +
> +Note that the kernel implementation and documentation uses the term
> +"secret state" in place of "master key", it is both less confusing
> +to an average developer and is less likely to run afoul any naming
> +guidelines.

There is some value in using the same terminology in the code as in
the spec.

And the session keys are derived from a key. That is more precise than
state. Specifically, counter-mode KDF from an AES key.

Perhaps device key, instead of master key? 

> +Derived Rx keys
> +---------------
> +
> +PSP borrows some terms and mechanisms from IPsec. PSP was designed
> +with HW offloads in mind. The key feature of PSP is that Rx keys for every
> +connection do not have to be stored by the receiver but can be derived
> +from secret state and information present in packet headers.

A second less obvious, but neat, feature is that it supports an
encryption offset, such that (say) the L4 ports are integrity
protected, but not encrypted, to allow for in-network telemetry.

> +This makes it possible to implement receivers which require a constant
> +amount of memory regardless of the number of connections (``O(1)`` scaling).
> +
> +Tx keys have to be stored like with any other protocol,

Keys can optionally be passed in descriptor.

> +The expectation is that higher layer protocols will take care of
> +protocol and key negotiation. For example one may use TLS key exchange,
> +announce the PSP capability, and switch to PSP if both endpoints
> +are PSP-capable.

> +Securing a connection
> +---------------------
> +
> +PSP encryption is currently only supported for TCP connections.
> +Rx and Tx keys are allocated separately. First the ``rx-assoc``
> +Netlink command needs to be issued, specifying a target TCP socket.
> +Kernel will allocate a new PSP Rx key from the NIC and associate it
> +with given socket. At this stage socket will accept both PSP-secured
> +and plain text TCP packets.
> +
> +Tx keys are installed using the ``tx-assoc`` Netlink command.
> +Once the Tx keys are installed all data read from the socket will
> +be PSP-secured. In other words act of installing Tx keys has the secondary
> +effect on the Rx direction, requring all received packets to be encrypted.

Consider clarifying the entire state diagram from when one pair
initiates upgrade.

And some edge cases:

- retransmits
- TCP fin handshake, if only one peer succeeds
- TCP control socket response to encrypted pkt

What is the expectation for data already queued for transmission when
the tx assocation is made?

More generally, what happens for data in flight. One possible
simplification is to only allow an upgrade sequence (possibly
including in-band exchange of keys) when no other data is in
flight.

> +Since packet reception is asynchronous, to make it possible for the
> +application to trust that any data read from the socket after the ``tx-assoc``
> +call returns success has been encrypted, the kernel will scan the receive
> +queue of the socket at ``tx-assoc`` time. If any enqueued packet was received
> +in clear text the Tx association will fail, and application should retry
> +installing the Tx key after draining the socket (this should not be necessary
> +if both endpoints are well behaved).
> +
> +Rotation notifications
> +----------------------
> +
> +The rotations of secret state happen asynchornously and are usually

typo: asynchronously

> +performed by management daemons, not under application control.
> +The PSP netlink family will generate a notification whenever keys
> +are rotated. The applications are expected to re-establish connections
> +before keys are rotated again.

Connection key rotation is not supported? I did notice that tx key
insertion fails if a key is already present, so this does appear to be
the behavior.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC net-next 07/15] net: psp: update the TCP MSS to reflect PSP packet overhead
  2024-05-10  3:04 ` [RFC net-next 07/15] net: psp: update the TCP MSS to reflect PSP packet overhead Jakub Kicinski
@ 2024-05-13  1:47   ` Willem de Bruijn
  2024-05-29 17:48     ` Jakub Kicinski
  0 siblings, 1 reply; 44+ messages in thread
From: Willem de Bruijn @ 2024-05-13  1:47 UTC (permalink / raw
  To: Jakub Kicinski, netdev
  Cc: pabeni, willemdebruijn.kernel, borisp, gal, cratiu, rrameshbabu,
	steffen.klassert, tariqt, Jakub Kicinski

Jakub Kicinski wrote:
> PSP eats 32B of header space. Adjust MSS appropriately.
> 
> We can either modify tcp_mtu_to_mss() / tcp_mss_to_mtu()
> or reuse icsk_ext_hdr_len. The former option is more TCP
> specific and has runtime overhead. The latter is a bit
> of a hack as PSP is not an ext_hdr. If one squints hard
> enough, UDP encap is just a more practical version of
> IPv6 exthdr, so go with the latter. Happy to change.
> 
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>

> +static inline unsigned int psp_sk_overhead(const struct sock *sk)
> +{
> +	bool has_psp = rcu_access_pointer(sk->psp_assoc);
> +
> +	return has_psp ? PSP_HDR_SIZE + PSP_TRL_SIZE : 0;
> +}

> diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
> index 6991464511c3..c67700fc49a1 100644
> --- a/net/ipv6/tcp_ipv6.c
> +++ b/net/ipv6/tcp_ipv6.c
> @@ -299,10 +299,10 @@ static int tcp_v6_connect(struct sock *sk, struct sockaddr *uaddr,
>  	sk->sk_gso_type = SKB_GSO_TCPV6;
>  	ip6_dst_store(sk, dst, NULL, NULL);
>  
> -	icsk->icsk_ext_hdr_len = 0;
> +	icsk->icsk_ext_hdr_len = psp_sk_overhead(sk);
>  	if (opt)
> -		icsk->icsk_ext_hdr_len = opt->opt_flen +
> -					 opt->opt_nflen;
> +		icsk->icsk_ext_hdr_len += opt->opt_flen +
> +					  opt->opt_nflen;
>  
>  	tp->rx_opt.mss_clamp = IPV6_MIN_MTU - sizeof(struct tcphdr) - sizeof(struct ipv6hdr);
>  
> @@ -1500,10 +1500,10 @@ static struct sock *tcp_v6_syn_recv_sock(const struct sock *sk, struct sk_buff *
>  		opt = ipv6_dup_options(newsk, opt);
>  		RCU_INIT_POINTER(newnp->opt, opt);
>  	}
> -	inet_csk(newsk)->icsk_ext_hdr_len = 0;
> +	inet_csk(newsk)->icsk_ext_hdr_len = psp_sk_overhead(sk);
>  	if (opt)
> -		inet_csk(newsk)->icsk_ext_hdr_len = opt->opt_nflen +
> -						    opt->opt_flen;
> +		inet_csk(newsk)->icsk_ext_hdr_len += opt->opt_nflen +
> +						     opt->opt_flen;
>  
>  	tcp_ca_openreq_child(newsk, dst);

The below code adjusts ext_hdr_len and recalculates mss when
setting the tx association.

Why already include it at connect and syn_recv, above?

My assumption was that the upgrade to PSP only happens during
TCP_ESTABLISHED. But perhaps I'm wrong.

Is it allowed to set rx and tx association even from as early as the
initial socket(), when still in TCP_CLOSE, client-side?

Server-side, there is no connection fd to pass to netlink commands
before TCP_ESTABLISHED.

> diff --git a/net/psp/psp_sock.c b/net/psp/psp_sock.c
> index 42b881e681b9..bcef042cb8a5 100644
> --- a/net/psp/psp_sock.c
> +++ b/net/psp/psp_sock.c
> @@ -170,6 +170,7 @@ int psp_sock_assoc_set_tx(struct sock *sk, struct psp_dev *psd,
>  			  u32 version, struct psp_key_parsed *key,
>  			  struct netlink_ext_ack *extack)
>  {
> +	struct inet_connection_sock *icsk;
>  	struct psp_assoc *pas, *dummy;
>  	int err;
>  
> @@ -220,6 +221,10 @@ int psp_sock_assoc_set_tx(struct sock *sk, struct psp_dev *psd,
>  
>  	WRITE_ONCE(sk->sk_validate_xmit_skb, psp_validate_xmit);
>  
> +	icsk = inet_csk(sk);
> +	icsk->icsk_ext_hdr_len += psp_sk_overhead(sk);
> +	icsk->icsk_sync_mss(sk, icsk->icsk_pmtu_cookie);
> +
>  exit_free_dummy:
>  	kfree(dummy);
>  exit_clear_rx:
> -- 
> 2.45.0
> 



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC net-next 12/15] net/mlx5e: Add PSP steering in local NIC RX
  2024-05-10  3:04 ` [RFC net-next 12/15] net/mlx5e: Add PSP steering in local NIC RX Jakub Kicinski
@ 2024-05-13  1:52   ` Willem de Bruijn
  0 siblings, 0 replies; 44+ messages in thread
From: Willem de Bruijn @ 2024-05-13  1:52 UTC (permalink / raw
  To: Jakub Kicinski, netdev
  Cc: pabeni, willemdebruijn.kernel, borisp, gal, cratiu, rrameshbabu,
	steffen.klassert, tariqt, Raed Salem, Jakub Kicinski

Jakub Kicinski wrote:
> From: Raed Salem <raeds@nvidia.com>
> 
> Introduce decrypt FT, the RX error FT, and the default rules.
> 
> The PSP (PSP) RX decrypt flow table is pointed by the TTC
> (Traffic Type Classifier) UDP steering rules.
> The decrypt flow table has two flow groups. The first flow group
> keeps the decrypt steering rule programmed always when PSP packet is
> recognized using the dedicated udp destenation port number 1000, if

typo: destination

> packet is decrypted then a PSP marker is set in metadata_regB[30].
> The second flow group has a default rule to forward all non-offloaded
> PSP packet to the TTC UDP default RSS TIR.
> 
> The RX error flow table is the destination of the decrypt steering rules in
> the PSP RX decrypt flow table. It has two fixed rule one with single copy
> action that copies nisp_syndrome to metadata_regB[23:29]. The PSP marker
> and syndrome is used to filter out non-nisp packet and to return the PSP
> crypto offload status in Rx flow. The marker is used to identify such
> packet in driver so the driver could set SKB PSP metadata. The destination
> of RX error flow table is the TTC UDP default RSS TIR. The second rule will
> drop packets that failed to be decrypted (like in case illegal SPI or
> expired SPI is used).
> 
> Signed-off-by: Raed Salem <raeds@nvidia.com>
> Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC net-next 14/15] net/mlx5e: Add Rx data path offload
  2024-05-10  3:04 ` [RFC net-next 14/15] net/mlx5e: Add Rx data path offload Jakub Kicinski
@ 2024-05-13  1:54   ` Willem de Bruijn
  2024-05-29 18:38     ` Jakub Kicinski
  0 siblings, 1 reply; 44+ messages in thread
From: Willem de Bruijn @ 2024-05-13  1:54 UTC (permalink / raw
  To: Jakub Kicinski, netdev
  Cc: pabeni, willemdebruijn.kernel, borisp, gal, cratiu, rrameshbabu,
	steffen.klassert, tariqt, Raed Salem, Jakub Kicinski

Jakub Kicinski wrote:
> From: Raed Salem <raeds@nvidia.com>
> 
> On receive flow inspect received packets for PSP offload indication using
> the cqe, for PSP offloaded packets set SKB PSP metadata i.e spi, header
> length and key generation number to stack for further processing.
> 
> Signed-off-by: Raed Salem <raeds@nvidia.com>
> Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
>  .../mellanox/mlx5/core/en_accel/ipsec_rxtx.h  |  2 +-
>  .../mellanox/mlx5/core/en_accel/nisp_rxtx.c   | 79 +++++++++++++++++++
>  .../mellanox/mlx5/core/en_accel/nisp_rxtx.h   | 28 +++++++
>  .../net/ethernet/mellanox/mlx5/core/en_rx.c   | 10 +++
>  4 files changed, 118 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.h b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.h
> index 82064614846f..9f025c80a6ef 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.h
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.h
> @@ -40,7 +40,7 @@
>  #include "en/txrx.h"
>  
>  /* Bit31: IPsec marker, Bit30: reserved, Bit29-24: IPsec syndrome, Bit23-0: IPsec obj id */
> -#define MLX5_IPSEC_METADATA_MARKER(metadata)  (((metadata) >> 31) & 0x1)
> +#define MLX5_IPSEC_METADATA_MARKER(metadata)  ((((metadata) >> 30) & 0x3) == 0x2)
>  #define MLX5_IPSEC_METADATA_SYNDROM(metadata) (((metadata) >> 24) & GENMASK(5, 0))
>  #define MLX5_IPSEC_METADATA_HANDLE(metadata)  ((metadata) & GENMASK(23, 0))
>  
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp_rxtx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp_rxtx.c
> index c719b2916677..17f42b8d9fd8 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp_rxtx.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp_rxtx.c
> @@ -15,6 +15,12 @@
>  #include "en_accel/nisp.h"
>  #include "lib/psp_defs.h"
>  
> +enum {
> +	MLX5E_NISP_OFFLOAD_RX_SYNDROME_DECRYPTED,
> +	MLX5E_NISP_OFFLOAD_RX_SYNDROME_AUTH_FAILED,
> +	MLX5E_NISP_OFFLOAD_RX_SYNDROME_BAD_TRAILER,
> +};
> +
>  static void mlx5e_nisp_set_swp(struct sk_buff *skb,
>  			       struct mlx5e_accel_tx_nisp_state *nisp_st,
>  			       struct mlx5_wqe_eth_seg *eseg)
> @@ -114,6 +120,79 @@ static bool mlx5e_nisp_set_state(struct mlx5e_priv *priv,
>  	return ret;
>  }
>  
> +void mlx5e_nisp_csum_complete(struct net_device *netdev, struct sk_buff *skb)
> +{
> +	pskb_trim(skb, skb->len - PSP_TRL_SIZE);
> +}
> +
> +/* Receive handler for PSP packets.
> + *
> + * Presently it accepts only already-authenticated packets and does not
> + * support optional fields, such as virtualization cookies.
> + */
> +static int psp_rcv(struct sk_buff *skb)
> +{
> +	const struct psphdr *psph;
> +	int depth = 0, end_depth;
> +	struct psp_skb_ext *pse;
> +	struct ipv6hdr *ipv6h;
> +	struct ethhdr *eth;
> +	__be16 proto;
> +	u32 spi;
> +
> +	eth = (struct ethhdr *)(skb->data);
> +	proto = __vlan_get_protocol(skb, eth->h_proto, &depth);
> +	if (proto != htons(ETH_P_IPV6))
> +		return -EINVAL;
> +
> +	ipv6h = (struct ipv6hdr *)(skb->data + depth);
> +	depth += sizeof(*ipv6h);
> +	end_depth = depth + sizeof(struct udphdr) + sizeof(struct psphdr);
> +
> +	if (unlikely(end_depth > skb_headlen(skb)))
> +		return -EINVAL;
> +
> +	pse = skb_ext_add(skb, SKB_EXT_PSP);
> +	if (!pse)
> +		return -EINVAL;
> +
> +	psph = (const struct psphdr *)(skb->data + depth + sizeof(struct udphdr));
> +	pse->spi = psph->spi;
> +	spi = ntohl(psph->spi);
> +	pse->generation = 0;
> +	pse->version = FIELD_GET(PSPHDR_VERFL_VERSION, psph->verfl);
> +
> +	ipv6h->nexthdr = psph->nexthdr;
> +	ipv6h->payload_len =
> +		htons(ntohs(ipv6h->payload_len) - PSP_ENCAP_HLEN - PSP_TRL_SIZE);
> +
> +	memmove(skb->data + PSP_ENCAP_HLEN, skb->data, depth);
> +	skb_pull(skb, PSP_ENCAP_HLEN);
> +
> +	return 0;
> +}
> +
> +void mlx5e_nisp_offload_handle_rx_skb(struct net_device *netdev, struct sk_buff *skb,
> +				      struct mlx5_cqe64 *cqe)
> +{
> +	u32 nisp_meta_data = be32_to_cpu(cqe->ft_metadata);
> +
> +	/* TBD: report errors as SW counters to ethtool, any further handling ? */
> +	switch (MLX5_NISP_METADATA_SYNDROM(nisp_meta_data)) {
> +	case MLX5E_NISP_OFFLOAD_RX_SYNDROME_DECRYPTED:
> +		if (psp_rcv(skb))
> +			netdev_warn_once(netdev, "PSP handling failed");
> +		skb->decrypted = 1;

Do not set skb->decrypted if psp_rcv failed? But drop the packet and
account the drop, likely.

> +		break;
> +	case MLX5E_NISP_OFFLOAD_RX_SYNDROME_AUTH_FAILED:
> +		break;
> +	case MLX5E_NISP_OFFLOAD_RX_SYNDROME_BAD_TRAILER:
> +		break;
> +	default:
> +		break;
> +	}
> +}


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC net-next 05/15] psp: add op for rotation of secret state
  2024-05-10  3:04 ` [RFC net-next 05/15] psp: add op for rotation of secret state Jakub Kicinski
@ 2024-05-16 19:59   ` Lance Richardson
  2024-05-29 17:43     ` Jakub Kicinski
  0 siblings, 1 reply; 44+ messages in thread
From: Lance Richardson @ 2024-05-16 19:59 UTC (permalink / raw
  To: Jakub Kicinski
  Cc: netdev, pabeni, willemdebruijn.kernel, borisp, gal, cratiu,
	rrameshbabu, steffen.klassert, tariqt, Lance Richardson

On Thu, May 9, 2024 at 11:05 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> Rotating the secret state is a key part of the PSP protocol design.
> Some external daemon needs to do it once a day, or so.
> Add a netlink op to perform this operation.
> Add a notification group for informing users that key has been
> rotated and they should rekey (next rotation will cut them off).
>
Does this allow for the possibility of NIC firmware or the driver initiating
a rotation? E.g. during key generation if the SPI space has been
exhausted a rotation will be required in order to generate a new
derived key.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC net-next 00/15] add basic PSP encryption for TCP connections
  2024-05-10  3:04 [RFC net-next 00/15] add basic PSP encryption for TCP connections Jakub Kicinski
                   ` (14 preceding siblings ...)
  2024-05-10  3:04 ` [RFC net-next 15/15] net/mlx5e: Implement PSP key_rotate operation Jakub Kicinski
@ 2024-05-29  9:16 ` Boris Pismenny
  2024-05-29 18:50   ` Jakub Kicinski
  15 siblings, 1 reply; 44+ messages in thread
From: Boris Pismenny @ 2024-05-29  9:16 UTC (permalink / raw
  To: Jakub Kicinski, netdev
  Cc: pabeni, willemdebruijn.kernel, gal, cratiu, rrameshbabu,
	steffen.klassert, tariqt, jgg



On 10.05.2024 05:04, Jakub Kicinski wrote:
> External email: Use caution opening links or attachments
>
>
> Hi!
>
> Add support for PSP encryption of TCP connections.
>
> PSP is a protocol out of Google:
> https://github.com/google/psp/blob/main/doc/PSP_Arch_Spec.pdf
> which shares some similarities with IPsec. I added some more info
> in the first patch so I'll keep it short here.
>
> The protocol can work in multiple modes including tunneling.
> But I'm mostly interested in using it as TLS replacement because
> of its superior offload characteristics. 

Hi!

Thank you for doing this. I agree that TLS-like socket support
is a main use-case. I'd like to hear what you think on a few
other use-cases that I think should be considered as well
since it may be difficult to add them as an afterthought:
- Tunnel mode. What are your plans for tunnel mode? Clearly it
is different from the current approach in some aspects, for
example, no sockets will be involved.
- RDMA. The ultra ethernet group has mentioned RDMA encryption
using PSP. Do you think that RDMA verbs will support PSP in
a similar manner to sockets? i.e., using netlink to pass
parameters to the device and linking QPs to PSP SAs?
- Virtualization. How does PSP work from a VM? is the key
shared with the hypervisor or is it private per-VM?
and what about containers?


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC net-next 01/15] psp: add documentation
  2024-05-13  1:24   ` Willem de Bruijn
@ 2024-05-29 17:35     ` Jakub Kicinski
  2024-05-30  0:47       ` Willem de Bruijn
  0 siblings, 1 reply; 44+ messages in thread
From: Jakub Kicinski @ 2024-05-29 17:35 UTC (permalink / raw
  To: Willem de Bruijn
  Cc: netdev, pabeni, borisp, gal, cratiu, rrameshbabu,
	steffen.klassert, tariqt

On Sun, 12 May 2024 21:24:23 -0400 Willem de Bruijn wrote:
> Jakub Kicinski wrote:
> > +PSP Security Protocol (PSP) was defined at Google and published in:
> > +
> > +https://raw.githubusercontent.com/google/psp/main/doc/PSP_Arch_Spec.pdf
> > +
> > +This section briefly covers protocol aspects crucial for understanding
> > +the kernel API. Refer to the protocol specification for further details.
> > +
> > +Note that the kernel implementation and documentation uses the term
> > +"secret state" in place of "master key", it is both less confusing
> > +to an average developer and is less likely to run afoul any naming
> > +guidelines.  
> 
> There is some value in using the same terminology in the code as in
> the spec.
> 
> And the session keys are derived from a key. That is more precise than
> state. Specifically, counter-mode KDF from an AES key.
> 
> Perhaps device key, instead of master key? 

Weak preference towards secret state, but device key works, too.

> > +Derived Rx keys
> > +---------------
> > +
> > +PSP borrows some terms and mechanisms from IPsec. PSP was designed
> > +with HW offloads in mind. The key feature of PSP is that Rx keys for every
> > +connection do not have to be stored by the receiver but can be derived
> > +from secret state and information present in packet headers.  
> 
> A second less obvious, but neat, feature is that it supports an
> encryption offset, such that (say) the L4 ports are integrity
> protected, but not encrypted, to allow for in-network telemetry.

I know, but the opening paragraph has:

   This section briefly covers protocol aspects crucial for
   understanding the kernel API. Refer to the protocol specification for further details.

:) .. and I didn't implement the offset, yet. (It's trivial to add and
ETOOMANYPATCHES.)

> > +This makes it possible to implement receivers which require a constant
> > +amount of memory regardless of the number of connections (``O(1)`` scaling).
> > +
> > +Tx keys have to be stored like with any other protocol,  
> 
> Keys can optionally be passed in descriptor.

Added: Preferably, the Tx keys should be provided with the packet (e.g.
as part of the descriptors).

> > +The expectation is that higher layer protocols will take care of
> > +protocol and key negotiation. For example one may use TLS key exchange,
> > +announce the PSP capability, and switch to PSP if both endpoints
> > +are PSP-capable.  
> 
> > +Securing a connection
> > +---------------------
> > +
> > +PSP encryption is currently only supported for TCP connections.
> > +Rx and Tx keys are allocated separately. First the ``rx-assoc``
> > +Netlink command needs to be issued, specifying a target TCP socket.
> > +Kernel will allocate a new PSP Rx key from the NIC and associate it
> > +with given socket. At this stage socket will accept both PSP-secured
> > +and plain text TCP packets.
> > +
> > +Tx keys are installed using the ``tx-assoc`` Netlink command.
> > +Once the Tx keys are installed all data read from the socket will
> > +be PSP-secured. In other words act of installing Tx keys has the secondary
> > +effect on the Rx direction, requring all received packets to be encrypted.  
> 
> Consider clarifying the entire state diagram from when one pair
> initiates upgrade.

Not sure about state diagram, there are only 3 states. Or do you mean
extend TCP state diagrams? I think a table may be better:

Event         | Normal TCP      | Rx PSP key present | Tx PSP key present |
---------------------------------------------------------------------------
Rx plain text | accept          | accept             | drop               |

Rx PSP (good) | drop            | accept             | accept             |

Rx PSP (bad)  | drop            | drop               | drop               |

Tx            | plain text      | plain text         | encrypted *        |

* data enqueued before Tx key in installed will not be encrypted
  (either initial send nor retranmissions)


What should I add?

> And some edge cases:
> 
> - retransmits
> - TCP fin handshake, if only one peer succeeds

So FIN when one end is "locked down" and the other isn't?

> - TCP control socket response to encrypted pkt

Control sock ignores PSP.

> What is the expectation for data already queued for transmission when
> the tx assocation is made?
> 
> More generally, what happens for data in flight. One possible
> simplification is to only allow an upgrade sequence (possibly
> including in-band exchange of keys) when no other data is in
> flight.

Like TLS offload, the data is annotated "for encryption" when queued.
So data queued earlier or retransmits of such data will never be
encrypted.

> > +performed by management daemons, not under application control.
> > +The PSP netlink family will generate a notification whenever keys
> > +are rotated. The applications are expected to re-establish connections
> > +before keys are rotated again.  
> 
> Connection key rotation is not supported? I did notice that tx key
> insertion fails if a key is already present, so this does appear to be
> the behavior.

Correct, for now connections need to be re-established once a day.
Rx should be easy, Tx we can make easy by only supporting rotation
when there's no data queued.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC net-next 05/15] psp: add op for rotation of secret state
  2024-05-16 19:59   ` Lance Richardson
@ 2024-05-29 17:43     ` Jakub Kicinski
  0 siblings, 0 replies; 44+ messages in thread
From: Jakub Kicinski @ 2024-05-29 17:43 UTC (permalink / raw
  To: Lance Richardson
  Cc: netdev, pabeni, willemdebruijn.kernel, borisp, gal, cratiu,
	rrameshbabu, steffen.klassert, tariqt, Lance Richardson

On Thu, 16 May 2024 15:59:14 -0400 Lance Richardson wrote:
> On Thu, May 9, 2024 at 11:05 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > Rotating the secret state is a key part of the PSP protocol design.
> > Some external daemon needs to do it once a day, or so.
> > Add a netlink op to perform this operation.
> > Add a notification group for informing users that key has been
> > rotated and they should rekey (next rotation will cut them off).
> >  
> Does this allow for the possibility of NIC firmware or the driver initiating
> a rotation? E.g. during key generation if the SPI space has been
> exhausted a rotation will be required in order to generate a new
> derived key.

I think it should be fine - I was designing with that use case in mind,
but ended up not needing it. We can export a driver facing function
which will basically perform the equivalent of psp_nl_key_rotate_doit().

I added the "key generation", because I worried that if unsynchronized
agents on multiple hosts can rotate the key - the chances of
double-rotation and immediate reuse of a SPI are much higher.
Not sure if the extra key generation bits are really necessary..
it seemed like a good idea at the time :)

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC net-next 07/15] net: psp: update the TCP MSS to reflect PSP packet overhead
  2024-05-13  1:47   ` Willem de Bruijn
@ 2024-05-29 17:48     ` Jakub Kicinski
  2024-05-30  0:52       ` Willem de Bruijn
  0 siblings, 1 reply; 44+ messages in thread
From: Jakub Kicinski @ 2024-05-29 17:48 UTC (permalink / raw
  To: Willem de Bruijn
  Cc: netdev, pabeni, borisp, gal, cratiu, rrameshbabu,
	steffen.klassert, tariqt

On Sun, 12 May 2024 21:47:16 -0400 Willem de Bruijn wrote:
> > -	inet_csk(newsk)->icsk_ext_hdr_len = 0;
> > +	inet_csk(newsk)->icsk_ext_hdr_len = psp_sk_overhead(sk);
> >  	if (opt)
> > -		inet_csk(newsk)->icsk_ext_hdr_len = opt->opt_nflen +
> > -						    opt->opt_flen;
> > +		inet_csk(newsk)->icsk_ext_hdr_len += opt->opt_nflen +
> > +						     opt->opt_flen;
> >  
> >  	tcp_ca_openreq_child(newsk, dst);  
> 
> The below code adjusts ext_hdr_len and recalculates mss when
> setting the tx association.
> 
> Why already include it at connect and syn_recv, above?
> 
> My assumption was that the upgrade to PSP only happens during
> TCP_ESTABLISHED. But perhaps I'm wrong.
> 
> Is it allowed to set rx and tx association even from as early as the
> initial socket(), when still in TCP_CLOSE, client-side?
> 
> Server-side, there is no connection fd to pass to netlink commands
> before TCP_ESTABLISHED.

Mostly for symmetry, really. IDK what's worse, the dead code or that
someone may be surprised it's not there.. Should I delete it?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC net-next 14/15] net/mlx5e: Add Rx data path offload
  2024-05-13  1:54   ` Willem de Bruijn
@ 2024-05-29 18:38     ` Jakub Kicinski
  2024-05-30  9:04       ` Cosmin Ratiu
  0 siblings, 1 reply; 44+ messages in thread
From: Jakub Kicinski @ 2024-05-29 18:38 UTC (permalink / raw
  To: cratiu, rrameshbabu, Raed Salem
  Cc: Willem de Bruijn, netdev, pabeni, borisp, gal, steffen.klassert,
	tariqt

On Sun, 12 May 2024 21:54:38 -0400 Willem de Bruijn wrote:
> > +	/* TBD: report errors as SW counters to ethtool, any further handling ? */
> > +	switch (MLX5_NISP_METADATA_SYNDROM(nisp_meta_data)) {
> > +	case MLX5E_NISP_OFFLOAD_RX_SYNDROME_DECRYPTED:
> > +		if (psp_rcv(skb))
> > +			netdev_warn_once(netdev, "PSP handling failed");
> > +		skb->decrypted = 1;  
> 
> Do not set skb->decrypted if psp_rcv failed? But drop the packet and
> account the drop, likely.

nVidia folks does this seem reasonable?

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp_rxtx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp_rxtx.c
index 7ae3e8246d8f..8cf6a8daf721 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp_rxtx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp_rxtx.c
@@ -172,22 +172,24 @@ static int psp_rcv(struct sk_buff *skb)
 	return 0;
 }
 
-void mlx5e_nisp_offload_handle_rx_skb(struct net_device *netdev, struct sk_buff *skb,
+bool mlx5e_nisp_offload_handle_rx_skb(struct net_device *netdev, struct sk_buff *skb,
 				      struct mlx5_cqe64 *cqe)
 {
 	u32 nisp_meta_data = be32_to_cpu(cqe->ft_metadata);
 
 	/* TBD: report errors as SW counters to ethtool, any further handling ? */
-	switch (MLX5_NISP_METADATA_SYNDROM(nisp_meta_data)) {
-	case MLX5E_NISP_OFFLOAD_RX_SYNDROME_DECRYPTED:
-		if (psp_rcv(skb))
-			netdev_warn_once(netdev, "PSP handling failed");
-		skb->decrypted = 1;
-		break;
-	default:
-		WARN_ON_ONCE(true);
-		break;
-	}
+	if (MLX5_NISP_METADATA_SYNDROM(nisp_meta_data) != MLX5E_NISP_OFFLOAD_RX_SYNDROME_DECRYPTED)
+		goto drop;
+
+	if (psp_rcv(skb))
+		goto drop;
+
+	skb->decrypted = 1;
+	return false;
+
+drop:
+	kfree_skb(skb);
+	return true;
 }
 
 void mlx5e_nisp_tx_build_eseg(struct mlx5e_priv *priv, struct sk_buff *skb,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp_rxtx.h b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp_rxtx.h
index 834481232b21..1e13b09b3522 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp_rxtx.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/nisp_rxtx.h
@@ -86,7 +86,7 @@ static inline bool mlx5e_nisp_is_rx_flow(struct mlx5_cqe64 *cqe)
 	return MLX5_NISP_METADATA_MARKER(be32_to_cpu(cqe->ft_metadata));
 }
 
-void mlx5e_nisp_offload_handle_rx_skb(struct net_device *netdev, struct sk_buff *skb,
+bool mlx5e_nisp_offload_handle_rx_skb(struct net_device *netdev, struct sk_buff *skb,
 				      struct mlx5_cqe64 *cqe);
 
 void mlx5e_nisp_csum_complete(struct net_device *netdev, struct sk_buff *skb);
@@ -113,10 +113,11 @@ static inline bool mlx5e_nisp_is_rx_flow(struct mlx5_cqe64 *cqe)
 	return false;
 }
 
-static inline void mlx5e_nisp_offload_handle_rx_skb(struct net_device *netdev,
+static inline bool mlx5e_nisp_offload_handle_rx_skb(struct net_device *netdev,
 						    struct sk_buff *skb,
 						    struct mlx5_cqe64 *cqe)
 {
+	return false;
 }
 
 static inline void mlx5e_nisp_csum_complete(struct net_device *netdev, struct sk_buff *skb) { }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index ed3c7d8cf99d..22cf1c563844 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -1552,7 +1552,7 @@ static inline void mlx5e_handle_csum(struct net_device *netdev,
 
 #define MLX5E_CE_BIT_MASK 0x80
 
-static inline void mlx5e_build_rx_skb(struct mlx5_cqe64 *cqe,
+static inline bool mlx5e_build_rx_skb(struct mlx5_cqe64 *cqe,
 				      u32 cqe_bcnt,
 				      struct mlx5e_rq *rq,
 				      struct sk_buff *skb)
@@ -1566,8 +1566,10 @@ static inline void mlx5e_build_rx_skb(struct mlx5_cqe64 *cqe,
 	if (unlikely(get_cqe_tls_offload(cqe)))
 		mlx5e_ktls_handle_rx_skb(rq, skb, cqe, &cqe_bcnt);
 
-	if (unlikely(mlx5e_nisp_is_rx_flow(cqe)))
-		mlx5e_nisp_offload_handle_rx_skb(netdev, skb, cqe);
+	if (unlikely(mlx5e_nisp_is_rx_flow(cqe))) {
+		if (mlx5e_nisp_offload_handle_rx_skb(netdev, skb, cqe))
+			return true;
+	}
 
 	if (unlikely(mlx5_ipsec_is_rx_flow(cqe)))
 		mlx5e_ipsec_offload_handle_rx_skb(netdev, skb,
@@ -1612,9 +1614,11 @@ static inline void mlx5e_build_rx_skb(struct mlx5_cqe64 *cqe,
 
 	if (unlikely(mlx5e_skb_is_multicast(skb)))
 		stats->mcast_packets++;
+
+	return false;
 }
 
-static void mlx5e_shampo_complete_rx_cqe(struct mlx5e_rq *rq,
+static bool mlx5e_shampo_complete_rx_cqe(struct mlx5e_rq *rq,
 					 struct mlx5_cqe64 *cqe,
 					 u32 cqe_bcnt,
 					 struct sk_buff *skb)
@@ -1626,16 +1630,20 @@ static void mlx5e_shampo_complete_rx_cqe(struct mlx5e_rq *rq,
 	stats->bytes += cqe_bcnt;
 	stats->gro_bytes += cqe_bcnt;
 	if (NAPI_GRO_CB(skb)->count != 1)
-		return;
-	mlx5e_build_rx_skb(cqe, cqe_bcnt, rq, skb);
+		return false;
+
+	if (mlx5e_build_rx_skb(cqe, cqe_bcnt, rq, skb))
+		return true;
+
 	skb_reset_network_header(skb);
 	if (!skb_flow_dissect_flow_keys(skb, &rq->hw_gro_data->fk, 0)) {
 		napi_gro_receive(rq->cq.napi, skb);
 		rq->hw_gro_data->skb = NULL;
 	}
+	return false;
 }
 
-static inline void mlx5e_complete_rx_cqe(struct mlx5e_rq *rq,
+static inline bool mlx5e_complete_rx_cqe(struct mlx5e_rq *rq,
 					 struct mlx5_cqe64 *cqe,
 					 u32 cqe_bcnt,
 					 struct sk_buff *skb)
@@ -1644,7 +1652,7 @@ static inline void mlx5e_complete_rx_cqe(struct mlx5e_rq *rq,
 
 	stats->packets++;
 	stats->bytes += cqe_bcnt;
-	mlx5e_build_rx_skb(cqe, cqe_bcnt, rq, skb);
+	return mlx5e_build_rx_skb(cqe, cqe_bcnt, rq, skb);
 }
 
 static inline
@@ -1858,7 +1866,8 @@ static void mlx5e_handle_rx_cqe(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe)
 		goto wq_cyc_pop;
 	}
 
-	mlx5e_complete_rx_cqe(rq, cqe, cqe_bcnt, skb);
+	if (mlx5e_complete_rx_cqe(rq, cqe, cqe_bcnt, skb))
+		goto wq_cyc_pop;
 
 	if (mlx5e_cqe_regb_chain(cqe))
 		if (!mlx5e_tc_update_skb_nic(cqe, skb)) {
@@ -1905,7 +1914,8 @@ static void mlx5e_handle_rx_cqe_rep(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe)
 		goto wq_cyc_pop;
 	}
 
-	mlx5e_complete_rx_cqe(rq, cqe, cqe_bcnt, skb);
+	if (mlx5e_complete_rx_cqe(rq, cqe, cqe_bcnt, skb))
+		goto wq_cyc_pop;
 
 	if (rep->vlan && skb_vlan_tag_present(skb))
 		skb_vlan_pop(skb);
@@ -1954,7 +1964,8 @@ static void mlx5e_handle_rx_cqe_mpwrq_rep(struct mlx5e_rq *rq, struct mlx5_cqe64
 	if (!skb)
 		goto mpwrq_cqe_out;
 
-	mlx5e_complete_rx_cqe(rq, cqe, cqe_bcnt, skb);
+	if (mlx5e_complete_rx_cqe(rq, cqe, cqe_bcnt, skb))
+		goto mpwrq_cqe_out;
 
 	mlx5e_rep_tc_receive(cqe, rq, skb);
 
@@ -2375,7 +2386,10 @@ static void mlx5e_handle_rx_cqe_mpwrq_shampo(struct mlx5e_rq *rq, struct mlx5_cq
 		mlx5e_fill_skb_data(*skb, rq, frag_page, data_bcnt, data_offset);
 	}
 
-	mlx5e_shampo_complete_rx_cqe(rq, cqe, cqe_bcnt, *skb);
+	if (mlx5e_shampo_complete_rx_cqe(rq, cqe, cqe_bcnt, *skb)) {
+		*skb = NULL;
+		goto free_hd_entry;
+	}
 	if (flush)
 		mlx5e_shampo_flush_skb(rq, cqe, match);
 free_hd_entry:
@@ -2429,7 +2443,8 @@ static void mlx5e_handle_rx_cqe_mpwrq(struct mlx5e_rq *rq, struct mlx5_cqe64 *cq
 	if (!skb)
 		goto mpwrq_cqe_out;
 
-	mlx5e_complete_rx_cqe(rq, cqe, cqe_bcnt, skb);
+	if (mlx5e_complete_rx_cqe(rq, cqe, cqe_bcnt, skb))
+		goto mpwrq_cqe_out;
 
 	if (mlx5e_cqe_regb_chain(cqe))
 		if (!mlx5e_tc_update_skb_nic(cqe, skb)) {
@@ -2762,7 +2777,8 @@ static void mlx5e_trap_handle_rx_cqe(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe
 	if (!skb)
 		goto wq_cyc_pop;
 
-	mlx5e_complete_rx_cqe(rq, cqe, cqe_bcnt, skb);
+	if (mlx5e_complete_rx_cqe(rq, cqe, cqe_bcnt, skb))
+		goto wq_cyc_pop;
 	skb_push(skb, ETH_HLEN);
 
 	mlx5_devlink_trap_report(rq->mdev, trap_id, skb,

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [RFC net-next 00/15] add basic PSP encryption for TCP connections
  2024-05-29  9:16 ` [RFC net-next 00/15] add basic PSP encryption for TCP connections Boris Pismenny
@ 2024-05-29 18:50   ` Jakub Kicinski
  2024-05-29 20:01     ` Boris Pismenny
  0 siblings, 1 reply; 44+ messages in thread
From: Jakub Kicinski @ 2024-05-29 18:50 UTC (permalink / raw
  To: Boris Pismenny
  Cc: netdev, pabeni, willemdebruijn.kernel, gal, cratiu, rrameshbabu,
	steffen.klassert, tariqt, jgg

On Wed, 29 May 2024 11:16:12 +0200 Boris Pismenny wrote:
> Thank you for doing this. I agree that TLS-like socket support
> is a main use-case. I'd like to hear what you think on a few
> other use-cases that I think should be considered as well
> since it may be difficult to add them as an afterthought:
> - Tunnel mode. What are your plans for tunnel mode? Clearly it
> is different from the current approach in some aspects, for
> example, no sockets will be involved.

The drivers should only decap for known L4 protos, I think that's
the only catch when we add tunnel support. Otherwise it should be
fairly straightforward. Open a UDP socket in the kernel. Get a key
+ SPI using existing ops. Demux within the UDP socket using SPI.

> - RDMA. The ultra ethernet group has mentioned RDMA encryption
> using PSP. Do you think that RDMA verbs will support PSP in
> a similar manner to sockets? i.e., using netlink to pass
> parameters to the device and linking QPs to PSP SAs?
> - Virtualization. How does PSP work from a VM? is the key
> shared with the hypervisor or is it private per-VM?

Depends on the deployment and security model, really, but I'd
expect the device key is shared, hypervisor is responsible for
rotations, and mediates all key ops from the guests.

> and what about containers?

I tried to apply some of the lessons learned from TLS offload and made
the "PSP device" a separate object. This should make it easy to
"forward" the offload to software/container netdevs.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC net-next 00/15] add basic PSP encryption for TCP connections
  2024-05-29 18:50   ` Jakub Kicinski
@ 2024-05-29 20:01     ` Boris Pismenny
  2024-05-29 20:38       ` Jakub Kicinski
  0 siblings, 1 reply; 44+ messages in thread
From: Boris Pismenny @ 2024-05-29 20:01 UTC (permalink / raw
  To: Jakub Kicinski
  Cc: netdev, pabeni, willemdebruijn.kernel, gal, cratiu, rrameshbabu,
	steffen.klassert, tariqt, jgg

On 29.05.2024 20:50, Jakub Kicinski wrote:
> On Wed, 29 May 2024 11:16:12 +0200 Boris Pismenny wrote:
>> Thank you for doing this. I agree that TLS-like socket support
>> is a main use-case. I'd like to hear what you think on a few
>> other use-cases that I think should be considered as well
>> since it may be difficult to add them as an afterthought:
>> - Tunnel mode. What are your plans for tunnel mode? Clearly it
>> is different from the current approach in some aspects, for
>> example, no sockets will be involved.
> The drivers should only decap for known L4 protos, I think that's
> the only catch when we add tunnel support. Otherwise it should be
> fairly straightforward. Open a UDP socket in the kernel. Get a key
> + SPI using existing ops. Demux within the UDP socket using SPI.

IIUC, you refer to tunnel mode as if it offloads
encryption alone while keeping headers intact. But,
what I had in mind is a fully offloaded tunnel.
This is called packet offload mode in IPsec,
and with encryption such offloads rely on TC.

Note that the main use-case for PSP tunnel mode,
unlike transport mode, is carrying VM traffic as
indicated by the spec:
"The tunnel mode packet format is typically used in
virtualized environments.". With virtualization, encap/decap offload is an implicit assumption if not a performance necessity.
>> - RDMA. The ultra ethernet group has mentioned RDMA encryption
>> using PSP. Do you think that RDMA verbs will support PSP in
>> a similar manner to sockets? i.e., using netlink to pass
>> parameters to the device and linking QPs to PSP SAs?
>> - Virtualization. How does PSP work from a VM? is the key
>> shared with the hypervisor or is it private per-VM?
> Depends on the deployment and security model, really, but I'd
> expect the device key is shared, hypervisor is responsible for
> rotations, and mediates all key ops from the guests.

I can imagine how this will work, but there are a few issues:
- Guests may run out of Tx keys, but they can't initiate key
rotation without affecting others. This fate sharing between
VMs seems undesirable.
- Unclear what sort of mediation is the hypervisor expected
to provide: on the one hand, block a key rotation request and
the requesting guest is denied service, on the other hand,
allow key rotation and a guest may spam these requests to
the hypervisor, which will also spam other VMs with
notifications of key rotation.

>
>> and what about containers?
> I tried to apply some of the lessons learned from TLS offload and made
> the "PSP device" a separate object. This should make it easy to
> "forward" the offload to software/container netdevs.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC net-next 00/15] add basic PSP encryption for TCP connections
  2024-05-29 20:01     ` Boris Pismenny
@ 2024-05-29 20:38       ` Jakub Kicinski
  0 siblings, 0 replies; 44+ messages in thread
From: Jakub Kicinski @ 2024-05-29 20:38 UTC (permalink / raw
  To: Boris Pismenny
  Cc: netdev, pabeni, willemdebruijn.kernel, gal, cratiu, rrameshbabu,
	steffen.klassert, tariqt, jgg

On Wed, 29 May 2024 22:01:52 +0200 Boris Pismenny wrote:
> > The drivers should only decap for known L4 protos, I think that's
> > the only catch when we add tunnel support. Otherwise it should be
> > fairly straightforward. Open a UDP socket in the kernel. Get a key
> > + SPI using existing ops. Demux within the UDP socket using SPI.
> 
> IIUC, you refer to tunnel mode as if it offloads
> encryption alone while keeping headers intact. But,
> what I had in mind is a fully offloaded tunnel.
> This is called packet offload mode in IPsec,
> and with encryption such offloads rely on TC.

Do you see any challenge?

> > Depends on the deployment and security model, really, but I'd
> > expect the device key is shared, hypervisor is responsible for
> > rotations, and mediates all key ops from the guests.
> 
> I can imagine how this will work, but there are a few issues:
> - Guests may run out of Tx keys, but they can't initiate key
> rotation without affecting others. This fate sharing between
> VMs seems undesirable.
> - Unclear what sort of mediation is the hypervisor expected
> to provide: on the one hand, block a key rotation request and
> the requesting guest is denied service, on the other hand,
> allow key rotation and a guest may spam these requests to
> the hypervisor, which will also spam other VMs with
> notifications of key rotation.

I don't have much experience working with VMs, or a good understanding
of what mlx5 does internally. Without access to the details of even a
single device which does PSP - any comment I'd make would be too much
of a speculation for my taste :(

On the NFP I'm pretty sure we could have given every VM a separate
device key, no problem.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC net-next 01/15] psp: add documentation
  2024-05-29 17:35     ` Jakub Kicinski
@ 2024-05-30  0:47       ` Willem de Bruijn
  2024-05-30 19:51         ` Jakub Kicinski
  0 siblings, 1 reply; 44+ messages in thread
From: Willem de Bruijn @ 2024-05-30  0:47 UTC (permalink / raw
  To: Jakub Kicinski, Willem de Bruijn
  Cc: netdev, pabeni, borisp, gal, cratiu, rrameshbabu,
	steffen.klassert, tariqt

Jakub Kicinski wrote:
> On Sun, 12 May 2024 21:24:23 -0400 Willem de Bruijn wrote:
> > Jakub Kicinski wrote:
> > > +PSP Security Protocol (PSP) was defined at Google and published in:
> > > +
> > > +https://raw.githubusercontent.com/google/psp/main/doc/PSP_Arch_Spec.pdf
> > > +
> > > +This section briefly covers protocol aspects crucial for understanding
> > > +the kernel API. Refer to the protocol specification for further details.
> > > +
> > > +Note that the kernel implementation and documentation uses the term
> > > +"secret state" in place of "master key", it is both less confusing
> > > +to an average developer and is less likely to run afoul any naming
> > > +guidelines.  
> > 
> > There is some value in using the same terminology in the code as in
> > the spec.
> > 
> > And the session keys are derived from a key. That is more precise than
> > state. Specifically, counter-mode KDF from an AES key.
> > 
> > Perhaps device key, instead of master key? 
> 
> Weak preference towards secret state, but device key works, too.

Totally your choice. I just wanted to make sure this was considered.
 
> > > +Derived Rx keys
> > > +---------------
> > > +
> > > +PSP borrows some terms and mechanisms from IPsec. PSP was designed
> > > +with HW offloads in mind. The key feature of PSP is that Rx keys for every
> > > +connection do not have to be stored by the receiver but can be derived
> > > +from secret state and information present in packet headers.  
> > 
> > A second less obvious, but neat, feature is that it supports an
> > encryption offset, such that (say) the L4 ports are integrity
> > protected, but not encrypted, to allow for in-network telemetry.
> 
> I know, but the opening paragraph has:
> 
>    This section briefly covers protocol aspects crucial for
>    understanding the kernel API. Refer to the protocol specification for further details.
> 
> :) .. and I didn't implement the offset, yet. (It's trivial to add and
> ETOOMANYPATCHES.)

Ack, sounds good.

> 
> > > +This makes it possible to implement receivers which require a constant
> > > +amount of memory regardless of the number of connections (``O(1)`` scaling).
> > > +
> > > +Tx keys have to be stored like with any other protocol,  
> > 
> > Keys can optionally be passed in descriptor.
> 
> Added: Preferably, the Tx keys should be provided with the packet (e.g.
> as part of the descriptors).
> 
> > > +The expectation is that higher layer protocols will take care of
> > > +protocol and key negotiation. For example one may use TLS key exchange,
> > > +announce the PSP capability, and switch to PSP if both endpoints
> > > +are PSP-capable.  
> > 
> > > +Securing a connection
> > > +---------------------
> > > +
> > > +PSP encryption is currently only supported for TCP connections.
> > > +Rx and Tx keys are allocated separately. First the ``rx-assoc``
> > > +Netlink command needs to be issued, specifying a target TCP socket.
> > > +Kernel will allocate a new PSP Rx key from the NIC and associate it
> > > +with given socket. At this stage socket will accept both PSP-secured
> > > +and plain text TCP packets.
> > > +
> > > +Tx keys are installed using the ``tx-assoc`` Netlink command.
> > > +Once the Tx keys are installed all data read from the socket will
> > > +be PSP-secured. In other words act of installing Tx keys has the secondary
> > > +effect on the Rx direction, requring all received packets to be encrypted.  
> > 
> > Consider clarifying the entire state diagram from when one pair
> > initiates upgrade.
> 
> Not sure about state diagram, there are only 3 states. Or do you mean
> extend TCP state diagrams? I think a table may be better:
> 
> Event         | Normal TCP      | Rx PSP key present | Tx PSP key present |
> ---------------------------------------------------------------------------
> Rx plain text | accept          | accept             | drop               |
> 
> Rx PSP (good) | drop            | accept             | accept             |
> 
> Rx PSP (bad)  | drop            | drop               | drop               |
> 
> Tx            | plain text      | plain text         | encrypted *        |
> 
> * data enqueued before Tx key in installed will not be encrypted
>   (either initial send nor retranmissions)
> 
> 
> What should I add?

I've mostly been concerned about the below edge cases.

If both peers are in TCP_ESTABLISHED for the during of the upgrade,
and data is aligned on message boundary, things are straightforward.

The retransmit logic is clear, as this is controlled by skb->decrypted
on the individual skbs on the retransmit queue.

That also solves another edge case: skb geometry changes on retransmit
(due to different MSS or segs, using tcp_fragment, tso_fragment,
tcp_retrans_try_collapse, ..) maintain skb->decrypted. It's not
possible that skb is accidentally created that combines plaintext and
ciphertext content.

Although.. does this require adding that skb->decrypted check to
tcp_skb_can_collapse?
 
> > And some edge cases:
> > 
> > - retransmits
> > - TCP fin handshake, if only one peer succeeds
> 
> So FIN when one end is "locked down" and the other isn't?

If one peer can enter the state where it drops all plaintext, while
the other decides to close the connection before completing the
upgrade, and thus sends a plaintext FIN.

If (big if) that can happen, then the connection cannot be cleanly
closed.
 
> > - TCP control socket response to encrypted pkt
> 
> Control sock ignores PSP.

Another example where a peer stays open and stays retrying if it has
upgraded and drops all plaintext.



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC net-next 07/15] net: psp: update the TCP MSS to reflect PSP packet overhead
  2024-05-29 17:48     ` Jakub Kicinski
@ 2024-05-30  0:52       ` Willem de Bruijn
  0 siblings, 0 replies; 44+ messages in thread
From: Willem de Bruijn @ 2024-05-30  0:52 UTC (permalink / raw
  To: Jakub Kicinski, Willem de Bruijn
  Cc: netdev, pabeni, borisp, gal, cratiu, rrameshbabu,
	steffen.klassert, tariqt

Jakub Kicinski wrote:
> On Sun, 12 May 2024 21:47:16 -0400 Willem de Bruijn wrote:
> > > -	inet_csk(newsk)->icsk_ext_hdr_len = 0;
> > > +	inet_csk(newsk)->icsk_ext_hdr_len = psp_sk_overhead(sk);
> > >  	if (opt)
> > > -		inet_csk(newsk)->icsk_ext_hdr_len = opt->opt_nflen +
> > > -						    opt->opt_flen;
> > > +		inet_csk(newsk)->icsk_ext_hdr_len += opt->opt_nflen +
> > > +						     opt->opt_flen;
> > >  
> > >  	tcp_ca_openreq_child(newsk, dst);  
> > 
> > The below code adjusts ext_hdr_len and recalculates mss when
> > setting the tx association.
> > 
> > Why already include it at connect and syn_recv, above?
> > 
> > My assumption was that the upgrade to PSP only happens during
> > TCP_ESTABLISHED. But perhaps I'm wrong.
> > 
> > Is it allowed to set rx and tx association even from as early as the
> > initial socket(), when still in TCP_CLOSE, client-side?
> > 
> > Server-side, there is no connection fd to pass to netlink commands
> > before TCP_ESTABLISHED.
> 
> Mostly for symmetry, really. IDK what's worse, the dead code or that
> someone may be surprised it's not there.. Should I delete it?

Symmetry with what?

This dead code had me scratching my head what it was doing, so my vote
to drop it. If you want something, maybe a code comment instead?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC net-next 14/15] net/mlx5e: Add Rx data path offload
  2024-05-29 18:38     ` Jakub Kicinski
@ 2024-05-30  9:04       ` Cosmin Ratiu
  0 siblings, 0 replies; 44+ messages in thread
From: Cosmin Ratiu @ 2024-05-30  9:04 UTC (permalink / raw
  To: kuba@kernel.org, Rahul Rameshbabu, Raed Salem
  Cc: steffen.klassert@secunet.com, Tariq Toukan, Gal Pressman,
	willemdebruijn.kernel@gmail.com, netdev@vger.kernel.org,
	Boris Pismenny, pabeni@redhat.com

On Wed, 2024-05-29 at 11:38 -0700, Jakub Kicinski wrote:
> On Sun, 12 May 2024 21:54:38 -0400 Willem de Bruijn wrote:
> > > +	/* TBD: report errors as SW counters to ethtool, any further handling ? */
> > > +	switch (MLX5_NISP_METADATA_SYNDROM(nisp_meta_data)) {
> > > +	case MLX5E_NISP_OFFLOAD_RX_SYNDROME_DECRYPTED:
> > > +		if (psp_rcv(skb))
> > > +			netdev_warn_once(netdev, "PSP handling failed");
> > > +		skb->decrypted = 1;  
> > 
> > Do not set skb->decrypted if psp_rcv failed? But drop the packet and
> > account the drop, likely.
> 
> nVidia folks does this seem reasonable?

This seems reasonable. It's also what the comment above the switch
suggests should be done.
psp_rcv unrefs the skb on errors (doesn't even return error in all
cases) and I think it's no longer safe to touch it, unless there's
another ref held somewhere.

I've tried implementing this tweak in the shared repo we have, but it
seems it doesn't have the versions of this patch that you sent.

Cosmin.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC net-next 01/15] psp: add documentation
  2024-05-30  0:47       ` Willem de Bruijn
@ 2024-05-30 19:51         ` Jakub Kicinski
  2024-05-30 20:15           ` Jakub Kicinski
  2024-05-31 13:56           ` Willem de Bruijn
  0 siblings, 2 replies; 44+ messages in thread
From: Jakub Kicinski @ 2024-05-30 19:51 UTC (permalink / raw
  To: Willem de Bruijn
  Cc: netdev, pabeni, borisp, gal, cratiu, rrameshbabu,
	steffen.klassert, tariqt

On Wed, 29 May 2024 20:47:02 -0400 Willem de Bruijn wrote:
> Jakub Kicinski wrote:
> > On Sun, 12 May 2024 21:24:23 -0400 Willem de Bruijn wrote:  
> > > There is some value in using the same terminology in the code as in
> > > the spec.
> > > 
> > > And the session keys are derived from a key. That is more precise than
> > > state. Specifically, counter-mode KDF from an AES key.
> > > 
> > > Perhaps device key, instead of master key?   
> > 
> > Weak preference towards secret state, but device key works, too.  
> 
> Totally your choice. I just wanted to make sure this was considered.

Already run the sed, device key it is :)

> > > Consider clarifying the entire state diagram from when one pair
> > > initiates upgrade.  
> > 
> > Not sure about state diagram, there are only 3 states. Or do you mean
> > extend TCP state diagrams? I think a table may be better:
> > 
> > Event         | Normal TCP      | Rx PSP key present | Tx PSP key present |
> > ---------------------------------------------------------------------------
> > Rx plain text | accept          | accept             | drop               |
> > 
> > Rx PSP (good) | drop            | accept             | accept             |
> > 
> > Rx PSP (bad)  | drop            | drop               | drop               |
> > 
> > Tx            | plain text      | plain text         | encrypted *        |
> > 
> > * data enqueued before Tx key in installed will not be encrypted
> >   (either initial send nor retranmissions)
> > 
> > 
> > What should I add?  
> 
> I've mostly been concerned about the below edge cases.
> 
> If both peers are in TCP_ESTABLISHED for the during of the upgrade,
> and data is aligned on message boundary, things are straightforward.
> 
> The retransmit logic is clear, as this is controlled by skb->decrypted
> on the individual skbs on the retransmit queue.
> 
> That also solves another edge case: skb geometry changes on retransmit
> (due to different MSS or segs, using tcp_fragment, tso_fragment,
> tcp_retrans_try_collapse, ..) maintain skb->decrypted. It's not
> possible that skb is accidentally created that combines plaintext and
> ciphertext content.
> 
> Although.. does this require adding that skb->decrypted check to
> tcp_skb_can_collapse?

Good catch. The TLS checks predate tcp_skb_can_collapse() (and MPTCP).
We've grown the check in tcp_shift_skb_data() and the logic
in tcp_grow_skb(), both missing the decrypted check.

I'll send some fixes, these are existing bugs :(

> > > And some edge cases:
> > > 
> > > - retransmits
> > > - TCP fin handshake, if only one peer succeeds  
> > 
> > So FIN when one end is "locked down" and the other isn't?  
> 
> If one peer can enter the state where it drops all plaintext, while
> the other decides to close the connection before completing the
> upgrade, and thus sends a plaintext FIN.
> 
> If (big if) that can happen, then the connection cannot be cleanly
> closed.

Hm. And we can avoid this by only enforcing encryption of data-less
segments once we've seen some encrypted data?

> > > - TCP control socket response to encrypted pkt  
> > 
> > Control sock ignores PSP.  
> 
> Another example where a peer stays open and stays retrying if it has
> upgraded and drops all plaintext.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC net-next 01/15] psp: add documentation
  2024-05-30 19:51         ` Jakub Kicinski
@ 2024-05-30 20:15           ` Jakub Kicinski
  2024-05-30 21:03             ` Willem de Bruijn
  2024-05-31 13:56           ` Willem de Bruijn
  1 sibling, 1 reply; 44+ messages in thread
From: Jakub Kicinski @ 2024-05-30 20:15 UTC (permalink / raw
  To: Willem de Bruijn
  Cc: netdev, pabeni, borisp, gal, cratiu, rrameshbabu,
	steffen.klassert, tariqt

On Thu, 30 May 2024 12:51:20 -0700 Jakub Kicinski wrote:
> > I've mostly been concerned about the below edge cases.
> > 
> > If both peers are in TCP_ESTABLISHED for the during of the upgrade,
> > and data is aligned on message boundary, things are straightforward.
> > 
> > The retransmit logic is clear, as this is controlled by skb->decrypted
> > on the individual skbs on the retransmit queue.
> > 
> > That also solves another edge case: skb geometry changes on retransmit
> > (due to different MSS or segs, using tcp_fragment, tso_fragment,
> > tcp_retrans_try_collapse, ..) maintain skb->decrypted. It's not
> > possible that skb is accidentally created that combines plaintext and
> > ciphertext content.
> > 
> > Although.. does this require adding that skb->decrypted check to
> > tcp_skb_can_collapse?  
> 
> Good catch. The TLS checks predate tcp_skb_can_collapse() (and MPTCP).
> We've grown the check in tcp_shift_skb_data() and the logic
> in tcp_grow_skb(), both missing the decrypted check.
> 
> I'll send some fixes, these are existing bugs :(

I take that back, we can depend on EOR like TLS does.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC net-next 01/15] psp: add documentation
  2024-05-30 20:15           ` Jakub Kicinski
@ 2024-05-30 21:03             ` Willem de Bruijn
  0 siblings, 0 replies; 44+ messages in thread
From: Willem de Bruijn @ 2024-05-30 21:03 UTC (permalink / raw
  To: Jakub Kicinski, Willem de Bruijn
  Cc: netdev, pabeni, borisp, gal, cratiu, rrameshbabu,
	steffen.klassert, tariqt

Jakub Kicinski wrote:
> On Thu, 30 May 2024 12:51:20 -0700 Jakub Kicinski wrote:
> > > I've mostly been concerned about the below edge cases.
> > > 
> > > If both peers are in TCP_ESTABLISHED for the during of the upgrade,
> > > and data is aligned on message boundary, things are straightforward.
> > > 
> > > The retransmit logic is clear, as this is controlled by skb->decrypted
> > > on the individual skbs on the retransmit queue.
> > > 
> > > That also solves another edge case: skb geometry changes on retransmit
> > > (due to different MSS or segs, using tcp_fragment, tso_fragment,
> > > tcp_retrans_try_collapse, ..) maintain skb->decrypted. It's not
> > > possible that skb is accidentally created that combines plaintext and
> > > ciphertext content.
> > > 
> > > Although.. does this require adding that skb->decrypted check to
> > > tcp_skb_can_collapse?  
> > 
> > Good catch. The TLS checks predate tcp_skb_can_collapse() (and MPTCP).
> > We've grown the check in tcp_shift_skb_data() and the logic
> > in tcp_grow_skb(), both missing the decrypted check.
> > 
> > I'll send some fixes, these are existing bugs :(
> 
> I take that back, we can depend on EOR like TLS does.

Oh yes. Neat solution.

This relies on userspace doing the right thing by passing MSG_EOR
right? That is easy to get wrong. Should we still add a check or
a WARN_ONCE. That would be net-next material.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC net-next 01/15] psp: add documentation
  2024-05-30 19:51         ` Jakub Kicinski
  2024-05-30 20:15           ` Jakub Kicinski
@ 2024-05-31 13:56           ` Willem de Bruijn
  2024-06-05  0:08             ` Jakub Kicinski
  1 sibling, 1 reply; 44+ messages in thread
From: Willem de Bruijn @ 2024-05-31 13:56 UTC (permalink / raw
  To: Jakub Kicinski, Willem de Bruijn
  Cc: netdev, pabeni, borisp, gal, cratiu, rrameshbabu,
	steffen.klassert, tariqt

Jakub Kicinski wrote:
> On Wed, 29 May 2024 20:47:02 -0400 Willem de Bruijn wrote:
> > Jakub Kicinski wrote:
> > > On Sun, 12 May 2024 21:24:23 -0400 Willem de Bruijn wrote:  
> > > > There is some value in using the same terminology in the code as in
> > > > the spec.
> > > > 
> > > > And the session keys are derived from a key. That is more precise than
> > > > state. Specifically, counter-mode KDF from an AES key.
> > > > 
> > > > Perhaps device key, instead of master key?   
> > > 
> > > Weak preference towards secret state, but device key works, too.  
> > 
> > Totally your choice. I just wanted to make sure this was considered.
> 
> Already run the sed, device key it is :)
> 
> > > > Consider clarifying the entire state diagram from when one pair
> > > > initiates upgrade.  
> > > 
> > > Not sure about state diagram, there are only 3 states. Or do you mean
> > > extend TCP state diagrams? I think a table may be better:
> > > 
> > > Event         | Normal TCP      | Rx PSP key present | Tx PSP key present |
> > > ---------------------------------------------------------------------------
> > > Rx plain text | accept          | accept             | drop               |
> > > 
> > > Rx PSP (good) | drop            | accept             | accept             |
> > > 
> > > Rx PSP (bad)  | drop            | drop               | drop               |
> > > 
> > > Tx            | plain text      | plain text         | encrypted *        |
> > > 
> > > * data enqueued before Tx key in installed will not be encrypted
> > >   (either initial send nor retranmissions)
> > > 
> > > 
> > > What should I add?  
> > 
> > I've mostly been concerned about the below edge cases.
> > 
> > If both peers are in TCP_ESTABLISHED for the during of the upgrade,
> > and data is aligned on message boundary, things are straightforward.
> > 
> > The retransmit logic is clear, as this is controlled by skb->decrypted
> > on the individual skbs on the retransmit queue.
> > 
> > That also solves another edge case: skb geometry changes on retransmit
> > (due to different MSS or segs, using tcp_fragment, tso_fragment,
> > tcp_retrans_try_collapse, ..) maintain skb->decrypted. It's not
> > possible that skb is accidentally created that combines plaintext and
> > ciphertext content.
> > 
> > Although.. does this require adding that skb->decrypted check to
> > tcp_skb_can_collapse?
> 
> Good catch. The TLS checks predate tcp_skb_can_collapse() (and MPTCP).
> We've grown the check in tcp_shift_skb_data() and the logic
> in tcp_grow_skb(), both missing the decrypted check.
> 
> I'll send some fixes, these are existing bugs :(
> 
> > > > And some edge cases:
> > > > 
> > > > - retransmits
> > > > - TCP fin handshake, if only one peer succeeds  
> > > 
> > > So FIN when one end is "locked down" and the other isn't?  
> > 
> > If one peer can enter the state where it drops all plaintext, while
> > the other decides to close the connection before completing the
> > upgrade, and thus sends a plaintext FIN.
> > 
> > If (big if) that can happen, then the connection cannot be cleanly
> > closed.
> 
> Hm. And we can avoid this by only enforcing encryption of data-less
> segments once we've seen some encrypted data?

That would help. It may also be needed to accept a pure ACK right at
the upgrade seqno. Depends on the upgrade process.

Which may be worth documenting explicitly: the system call and network
packet exchange from when one peer initiates (by generating its local
key) until the connection is fully encrypted. That also allows poking
at the various edge cases that may happen if packets are lost, or when
actions can race.

One unexpected example of the latter that I came across was Tx SADB
key insertion in tail edge cases taking longer than network RTT, for
instance.

The kernel API can be exercised in a variety of ways, not all of them
will uphold the correctness. Documenting how it should be used should
help.

Even better when it reduces the option space. As it already does by
failing a Tx key install until Rx is configured.

> > > > - TCP control socket response to encrypted pkt  
> > > 
> > > Control sock ignores PSP.  
> > 
> > Another example where a peer stays open and stays retrying if it has
> > upgraded and drops all plaintext.

May want to always allow plaintext RSTs. This is a potential DoS
vector. In all these cases, I suppose this has already been figured
out for TLS.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC net-next 01/15] psp: add documentation
  2024-05-31 13:56           ` Willem de Bruijn
@ 2024-06-05  0:08             ` Jakub Kicinski
  2024-06-05 20:11               ` Willem de Bruijn
  0 siblings, 1 reply; 44+ messages in thread
From: Jakub Kicinski @ 2024-06-05  0:08 UTC (permalink / raw
  To: Willem de Bruijn
  Cc: netdev, pabeni, borisp, gal, cratiu, rrameshbabu,
	steffen.klassert, tariqt, mingtao, knekritz, Lance Richardson

On Fri, 31 May 2024 09:56:42 -0400 Willem de Bruijn wrote:
> > > If one peer can enter the state where it drops all plaintext, while
> > > the other decides to close the connection before completing the
> > > upgrade, and thus sends a plaintext FIN.
> > > 
> > > If (big if) that can happen, then the connection cannot be cleanly
> > > closed.  
> > 
> > Hm. And we can avoid this by only enforcing encryption of data-less
> > segments once we've seen some encrypted data?  
> 
> That would help. It may also be needed to accept a pure ACK right at
> the upgrade seqno. Depends on the upgrade process.
> 
> Which may be worth documenting explicitly: the system call and network
> packet exchange from when one peer initiates (by generating its local
> key) until the connection is fully encrypted. That also allows poking
> at the various edge cases that may happen if packets are lost, or when
> actions can race.

Dunno if the format below is good, but you're very right.
At least to me writing the diagram was an hour well spent :)

> One unexpected example of the latter that I came across was Tx SADB
> key insertion in tail edge cases taking longer than network RTT, for
> instance.
> 
> The kernel API can be exercised in a variety of ways, not all of them
> will uphold the correctness. Documenting how it should be used should
> help.
> 
> Even better when it reduces the option space. As it already does by
> failing a Tx key install until Rx is configured.

Something along these lines?

"Sequence" diagram of the worst case scenario:

01 p       Host A                         Host B
02 l t        ~~~~~~~~~~~[TCP 3 WHS]~~~~~~~~~~
03 a e        ~~~~~~[crypto negotiation]~~~~~~
04 i x                             [Rx key alloc = K-B]
05 n t                          <--- [app] K-B key send 
06 ------[Rx key alloc = K-A]-
07     [app] K-A key send -->|
08        [TCP] K-B input <-----
08 P      [TCP] K-B ACK ---->|
09 S R [app] recv(K-B)       |
10 P x [app] [Tx key set]    |  
11 -------------------------- 
12 P T [app] send(RPC) #####>|   
13 S x                       |<----    [TCP] Seq OoO! queue RPC, SACK
14 P      [TCP] retr K-A --->|   
15                           |  `->    [TCP] K-A input
16                           | <---    [TCP] K-A ACK (or FIN) 
17                           |      [app] recv(K-A)
18                           |      [app] [Tx key set]
19                            -----------------------------------
20

There is a causal dependency between Host B allocating the key (line 4),
sending it (line 5) and Host A receiving it (line 8). Since Host B will
accept PSP packets as soon as it allocated the key, Host A does not
need to wait to start using the key (line 12). Host B will queue the
RPC to the socket (line 13).

[Problem #1]

However, because Host B does not have a Tx key, the ACK / SACK packet
(line 13) will not be encrypted. (Similarly if Host B decided to close
the connection at this point, the resulting FIN packet would not be
encrypted.) Host B needs to accept unencrypted non-data segments 
(pure acks, pure FIN) until it sees an encrypted packet from Host B.

[Problem #2]

The retansmissions of K-A are unencrypted, to avoid sending the same
data in encrypted and unencrypted form. This poses a risk if an ACK
gets lost but both hosts end up in the PSP Tx state. Assume that Host A
did not send the RPC (line 12), and the retransmission (line 14)
happens as an RTO or TLP. Host B may already reach PSP Tx state (line
"20") and expect encrypted data. Plain text retransmissions (with
sequence number before rcv_nxt) must be accepted until Host B sees
encrypted data from Host A.


With that I think the state machine needs to be amended:

Event          | Normal TCP  | Rx PSP      | Tx PSP      | PSP full    |
-----------------------------------------------------------------------
Rx plain (new) | accept      | accept      | drop        | drop        |

Rx plain       | accept      | accept      | accept      | drop        |
(ACK|FIN|rtx)  |             |             |             |             |

Rx PSP (good)  | drop        | accept      | accept      | accept      |

Rx PSP (bad    | drop        | drop        | drop        | drop        |
(crypt, !=SPI) |             |             |             |             |

Tx             | plain text  | plain text  | encrypted   | encrypted   |
               |             |             | (excl. rtx) | (excl. rtx) |

> > > Another example where a peer stays open and stays retrying if it has
> > > upgraded and drops all plaintext.  
> 
> May want to always allow plaintext RSTs. This is a potential DoS
> vector.

Because of key exhaustion? Or we can be tricked into spamming someone
with retranmissions and ignoring their RST?

> In all these cases, I suppose this has already been figured
> out for TLS.

Assuming the answer above is "key exhaustion" - I wouldn't be surprised
if it wasn't :(

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC net-next 01/15] psp: add documentation
  2024-06-05  0:08             ` Jakub Kicinski
@ 2024-06-05 20:11               ` Willem de Bruijn
  2024-06-05 22:24                 ` Jakub Kicinski
  0 siblings, 1 reply; 44+ messages in thread
From: Willem de Bruijn @ 2024-06-05 20:11 UTC (permalink / raw
  To: Jakub Kicinski, Willem de Bruijn
  Cc: netdev, pabeni, borisp, gal, cratiu, rrameshbabu,
	steffen.klassert, tariqt, mingtao, knekritz, Lance Richardson

Jakub Kicinski wrote:
> On Fri, 31 May 2024 09:56:42 -0400 Willem de Bruijn wrote:
> > > > If one peer can enter the state where it drops all plaintext, while
> > > > the other decides to close the connection before completing the
> > > > upgrade, and thus sends a plaintext FIN.
> > > > 
> > > > If (big if) that can happen, then the connection cannot be cleanly
> > > > closed.  
> > > 
> > > Hm. And we can avoid this by only enforcing encryption of data-less
> > > segments once we've seen some encrypted data?  
> > 
> > That would help. It may also be needed to accept a pure ACK right at
> > the upgrade seqno. Depends on the upgrade process.
> > 
> > Which may be worth documenting explicitly: the system call and network
> > packet exchange from when one peer initiates (by generating its local
> > key) until the connection is fully encrypted. That also allows poking
> > at the various edge cases that may happen if packets are lost, or when
> > actions can race.
> 
> Dunno if the format below is good, but you're very right.
> At least to me writing the diagram was an hour well spent :)

Great :)
 
> > One unexpected example of the latter that I came across was Tx SADB
> > key insertion in tail edge cases taking longer than network RTT, for
> > instance.
> > 
> > The kernel API can be exercised in a variety of ways, not all of them
> > will uphold the correctness. Documenting how it should be used should
> > help.
> > 
> > Even better when it reduces the option space. As it already does by
> > failing a Tx key install until Rx is configured.
> 
> Something along these lines?
> 
> "Sequence" diagram of the worst case scenario:
> 
> 01 p       Host A                         Host B
> 02 l t        ~~~~~~~~~~~[TCP 3 WHS]~~~~~~~~~~
> 03 a e        ~~~~~~[crypto negotiation]~~~~~~
> 04 i x                             [Rx key alloc = K-B]
> 05 n t                          <--- [app] K-B key send 
> 06 ------[Rx key alloc = K-A]-
> 07     [app] K-A key send -->|
> 08        [TCP] K-B input <-----
> 08 P      [TCP] K-B ACK ---->|
> 09 S R [app] recv(K-B)       |
> 10 P x [app] [Tx key set]    |  
> 11 -------------------------- 
> 12 P T [app] send(RPC) #####>|   
> 13 S x                       |<----    [TCP] Seq OoO! queue RPC, SACK
> 14 P      [TCP] retr K-A --->|   
> 15                           |  `->    [TCP] K-A input
> 16                           | <---    [TCP] K-A ACK (or FIN) 
> 17                           |      [app] recv(K-A)
> 18                           |      [app] [Tx key set]
> 19                            -----------------------------------
> 20
> 
> There is a causal dependency between Host B allocating the key (line 4),
> sending it (line 5) and Host A receiving it (line 8). Since Host B will
> accept PSP packets as soon as it allocated the key, Host A does not
> need to wait to start using the key (line 12). Host B will queue the
> RPC to the socket (line 13).
> 
> [Problem #1]
> 
> However, because Host B does not have a Tx key, the ACK / SACK packet
> (line 13) will not be encrypted. (Similarly if Host B decided to close
> the connection at this point, the resulting FIN packet would not be
> encrypted.)

Or if it plays SO_LINGER games the resulting RST.

> Host B needs to accept unencrypted non-data segments 
> (pure acks, pure FIN) until it sees an encrypted packet from Host B.
>
> [Problem #2]
> 
> The retansmissions of K-A are unencrypted, to avoid sending the same
> data in encrypted and unencrypted form. This poses a risk if an ACK
> gets lost but both hosts end up in the PSP Tx state. Assume that Host A
> did not send the RPC (line 12), and the retransmission (line 14)
> happens as an RTO or TLP. Host B may already reach PSP Tx state (line
> "20") and expect encrypted data. Plain text retransmissions (with
> sequence number before rcv_nxt) must be accepted until Host B sees
> encrypted data from Host A.

Is that sufficient if an initial encrypted packet could get reordered
by the network to arrive before a plaintext retransmit of a lower
seqno?

Both scenarios make sense. It is unfortunately harder to be sure that
we have captured all edge cases.

An issue related to the rcv_nxt cut-point, not sure how important: the
plaintext packet contents are protected by user crypto before upgrade.
But the TCP headers are not. PSP relies on TCP PAWS against replay
protection. It is possible for a MITM to offset all seqno from the
start of connection establishment. I don't see an immediate issue. But
at a minimum it could be possible to insert or delete before PSP is
upgraded.

> 
> With that I think the state machine needs to be amended:
> 
> Event          | Normal TCP  | Rx PSP      | Tx PSP      | PSP full    |
> -----------------------------------------------------------------------
> Rx plain (new) | accept      | accept      | drop        | drop        |
> 
> Rx plain       | accept      | accept      | accept      | drop        |
> (ACK|FIN|rtx)  |             |             |             |             |
> 
> Rx PSP (good)  | drop        | accept      | accept      | accept      |
> 
> Rx PSP (bad    | drop        | drop        | drop        | drop        |
> (crypt, !=SPI) |             |             |             |             |
> 
> Tx             | plain text  | plain text  | encrypted   | encrypted   |
>                |             |             | (excl. rtx) | (excl. rtx) |
> 
> > > > Another example where a peer stays open and stays retrying if it has
> > > > upgraded and drops all plaintext.  
> > 
> > May want to always allow plaintext RSTs. This is a potential DoS
> > vector.
> 
> Because of key exhaustion? Or we can be tricked into spamming someone
> with retranmissions and ignoring their RST?

Simpler: this falls back onto unencrypted TCP where someone capable of
spoofing valid data is capable of terminating a connection.

If denying all plaintext after upgrade, PSP protects against this.

It is arguably low on the list of concerns, especially in a closed
world hyperscaler setting. As it is hardly the only DoS vector.

> > In all these cases, I suppose this has already been figured
> > out for TLS.
> 
> Assuming the answer above is "key exhaustion" - I wouldn't be surprised
> if it wasn't :(



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC net-next 01/15] psp: add documentation
  2024-06-05 20:11               ` Willem de Bruijn
@ 2024-06-05 22:24                 ` Jakub Kicinski
  2024-06-06  2:40                   ` Willem de Bruijn
  0 siblings, 1 reply; 44+ messages in thread
From: Jakub Kicinski @ 2024-06-05 22:24 UTC (permalink / raw
  To: Willem de Bruijn
  Cc: netdev, pabeni, borisp, gal, cratiu, rrameshbabu,
	steffen.klassert, tariqt, mingtao, knekritz, Lance Richardson

On Wed, 05 Jun 2024 16:11:31 -0400 Willem de Bruijn wrote:
> > The retansmissions of K-A are unencrypted, to avoid sending the same
> > data in encrypted and unencrypted form. This poses a risk if an ACK
> > gets lost but both hosts end up in the PSP Tx state. Assume that Host A
> > did not send the RPC (line 12), and the retransmission (line 14)
> > happens as an RTO or TLP. Host B may already reach PSP Tx state (line
> > "20") and expect encrypted data. Plain text retransmissions (with
> > sequence number before rcv_nxt) must be accepted until Host B sees
> > encrypted data from Host A.  
> 
> Is that sufficient if an initial encrypted packet could get reordered
> by the network to arrive before a plaintext retransmit of a lower
> seqno?

Yes, I believe that's fine. 

I will document this clearer but both sides must be pretty precise in
their understanding when the switchover happens. They must read what 
they expect to be clear text, and then install the Tx key thus locking
down the socket.

1. If they under-read and clear text data is already queued - the kernel
   will error out.
2. If they under-read and clear text arrives later - the connection will
   hang.
3. If they over-read they will presumably get PSP-protected data
   which they have no way of validating, since it won't be secured by
   user crypto.

We could protect from over-read (case 3) by refusing to give out
PSP-protected data until keys are installed. But it adds to the fast
path and I don't think it's all that beneficial, since there's no way
to protect a sloppy application from under-read (case 2).

Back to your question about reordering plain text with cipher text:
the application should not lock down the socket until it gets all
its clear text. So clear text retransmissions _after_ lock down must be
spurious. The only worry is that we lose an ACK and never tell
the other side that we got all the clear text. But we're guaranteed
to successfully ACK any PSP-protected data, so if we receive some
there is no way to get stuck.  Let me copy / paste the diagram:

01 p       Host A                         Host B
02 l t        ~~~~~~~~~~~[TCP 3 WHS]~~~~~~~~~~
03 a e        ~~~~~~[crypto negotiation]~~~~~~
04 i x                             [Rx key alloc = K-B]
05 n t                          <--- [app] K-B key send 
06 ------[Rx key alloc = K-A]-
07     [app] K-A key send -->|
08        [TCP] K-B input <-----
08 P      [TCP] K-B ACK ---->|
09 S R [app] recv(K-B)       |
10 P x [app] [Tx key set]    |  
11 -------------------------- 
12 P T [app] send(RPC) #####>|   
13 S x                       |<----    [TCP] Seq OoO! queue RPC, SACK
14 P      [TCP] retr K-A --->|   
15                           |  `->    [TCP] K-A input
16                           | <---    [TCP] K-A ACK (or FIN) 
17                           |      [app] recv(K-A)
18                           |      [app] [Tx key set]
19                            -----------------------------------
20

Looking as Host A, if we receive encrypted data, we must have
allocated and sent key (line 7) so we will start accepting encrypted
data. But at this point we are also accepting plain text (until we
reach line 9). We will send a plain text (S)ACK to encrypted data, 
but that's fine too since Host B hasn't seen any encrypted data from us
and will accept such ACKs.

> Both scenarios make sense. It is unfortunately harder to be sure that
> we have captured all edge cases.

Are you trying to say packetdrill without saying packetdrill? :)

> An issue related to the rcv_nxt cut-point, not sure how important: the
> plaintext packet contents are protected by user crypto before upgrade.
> But the TCP headers are not. PSP relies on TCP PAWS against replay
> protection. It is possible for a MITM to offset all seqno from the
> start of connection establishment. I don't see an immediate issue. But
> at a minimum it could be possible to insert or delete before PSP is
> upgraded.

Yes, the "cut off" point must be quite clearly defined, because both
sides must precisely read out all the clear text. Then they install 
the Tx key and anything they read must have been PSP-protected.

Hope I understood the point.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC net-next 01/15] psp: add documentation
  2024-06-05 22:24                 ` Jakub Kicinski
@ 2024-06-06  2:40                   ` Willem de Bruijn
  0 siblings, 0 replies; 44+ messages in thread
From: Willem de Bruijn @ 2024-06-06  2:40 UTC (permalink / raw
  To: Jakub Kicinski, Willem de Bruijn
  Cc: netdev, pabeni, borisp, gal, cratiu, rrameshbabu,
	steffen.klassert, tariqt, mingtao, knekritz, Lance Richardson

Jakub Kicinski wrote:
> On Wed, 05 Jun 2024 16:11:31 -0400 Willem de Bruijn wrote:
> > > The retansmissions of K-A are unencrypted, to avoid sending the same
> > > data in encrypted and unencrypted form. This poses a risk if an ACK
> > > gets lost but both hosts end up in the PSP Tx state. Assume that Host A
> > > did not send the RPC (line 12), and the retransmission (line 14)
> > > happens as an RTO or TLP. Host B may already reach PSP Tx state (line
> > > "20") and expect encrypted data. Plain text retransmissions (with
> > > sequence number before rcv_nxt) must be accepted until Host B sees
> > > encrypted data from Host A.  
> > 
> > Is that sufficient if an initial encrypted packet could get reordered
> > by the network to arrive before a plaintext retransmit of a lower
> > seqno?
> 
> Yes, I believe that's fine. 
> 
> I will document this clearer but both sides must be pretty precise in
> their understanding when the switchover happens. They must read what 
> they expect to be clear text, and then install the Tx key thus locking
> down the socket.
> 
> 1. If they under-read and clear text data is already queued - the kernel
>    will error out.
> 2. If they under-read and clear text arrives later - the connection will
>    hang.
> 3. If they over-read they will presumably get PSP-protected data
>    which they have no way of validating, since it won't be secured by
>    user crypto.
> 
> We could protect from over-read (case 3) by refusing to give out
> PSP-protected data until keys are installed. But it adds to the fast
> path and I don't think it's all that beneficial, since there's no way
> to protect a sloppy application from under-read (case 2).
> 
> Back to your question about reordering plain text with cipher text:
> the application should not lock down the socket until it gets all
> its clear text. So clear text retransmissions _after_ lock down must be
> spurious.

Ah yes, good point.

> The only worry is that we lose an ACK and never tell
> the other side that we got all the clear text. But we're guaranteed
> to successfully ACK any PSP-protected data, so if we receive some
> there is no way to get stuck.  Let me copy / paste the diagram:
> 
> 01 p       Host A                         Host B
> 02 l t        ~~~~~~~~~~~[TCP 3 WHS]~~~~~~~~~~
> 03 a e        ~~~~~~[crypto negotiation]~~~~~~
> 04 i x                             [Rx key alloc = K-B]
> 05 n t                          <--- [app] K-B key send 
> 06 ------[Rx key alloc = K-A]-
> 07     [app] K-A key send -->|
> 08        [TCP] K-B input <-----
> 08 P      [TCP] K-B ACK ---->|
> 09 S R [app] recv(K-B)       |
> 10 P x [app] [Tx key set]    |  
> 11 -------------------------- 
> 12 P T [app] send(RPC) #####>|   
> 13 S x                       |<----    [TCP] Seq OoO! queue RPC, SACK
> 14 P      [TCP] retr K-A --->|   
> 15                           |  `->    [TCP] K-A input
> 16                           | <---    [TCP] K-A ACK (or FIN) 
> 17                           |      [app] recv(K-A)
> 18                           |      [app] [Tx key set]
> 19                            -----------------------------------
> 20
> 
> Looking as Host A, if we receive encrypted data, we must have
> allocated and sent key (line 7) so we will start accepting encrypted
> data. But at this point we are also accepting plain text (until we
> reach line 9). We will send a plain text (S)ACK to encrypted data, 
> but that's fine too since Host B hasn't seen any encrypted data from us
> and will accept such ACKs.
> 
> > Both scenarios make sense. It is unfortunately harder to be sure that
> > we have captured all edge cases.
> 
> Are you trying to say packetdrill without saying packetdrill? :)

Ha, no, no such hint implied.

I did expand packetdrill to PSP to exercise the cases that I could
come up with, at a minimum to ensure coverage of all branches.

But does that cover all edge cases possible? Including drops,
reorders, geometry changes from MTU changes, SO_LINGER 0, races with
slow OS operations (like that slow SADB insertion I mentioned)? The
unknown unknowns. Stuck connections are a low risk, bugs that can be
fixed later. As long as it is easy to reason that actual crypto issues
like plaintext leaks are not reachable.

Extending packetdrill to netlink would be quite some work, I suspect.
A quick scan shows that it knows NLA, but only for OPT_STATS decoding.

> > An issue related to the rcv_nxt cut-point, not sure how important: the
> > plaintext packet contents are protected by user crypto before upgrade.
> > But the TCP headers are not. PSP relies on TCP PAWS against replay
> > protection. It is possible for a MITM to offset all seqno from the
> > start of connection establishment. I don't see an immediate issue. But
> > at a minimum it could be possible to insert or delete before PSP is
> > upgraded.
> 
> Yes, the "cut off" point must be quite clearly defined, because both
> sides must precisely read out all the clear text. Then they install 
> the Tx key and anything they read must have been PSP-protected.
> 
> Hope I understood the point.

I think the issue, if any, is that there may be a gap between the two
methods of integrity protection. What we call "cleartext" here is
integrity protected such that no insertion or deletion attacks are
possible. And PSP ensures the same. But is a a deletion of the last
plaintext or first ciphertext possible?

An insertion is not an issue as it will be protected by neither,
while PSP is expected, so it is dropped.

As long as the application (or is it presentation?) layer has a clear
definition of at what point in the stream it must insert the Tx key,
plaintext deletion is not possible, as the key is not inserted until
all plaintext has been received.

Which leaves: is it possible for a MITM to offset the seqno, such that
the first PSP encrypted packet can be removed from the stream and this
goes undetected?

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2024-06-06  2:40 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-10  3:04 [RFC net-next 00/15] add basic PSP encryption for TCP connections Jakub Kicinski
2024-05-10  3:04 ` [RFC net-next 01/15] psp: add documentation Jakub Kicinski
2024-05-10 22:19   ` Saeed Mahameed
2024-05-11  0:11     ` Jakub Kicinski
2024-05-11  9:41       ` Vadim Fedorenko
2024-05-11 16:25         ` David Ahern
2024-05-13  1:24   ` Willem de Bruijn
2024-05-29 17:35     ` Jakub Kicinski
2024-05-30  0:47       ` Willem de Bruijn
2024-05-30 19:51         ` Jakub Kicinski
2024-05-30 20:15           ` Jakub Kicinski
2024-05-30 21:03             ` Willem de Bruijn
2024-05-31 13:56           ` Willem de Bruijn
2024-06-05  0:08             ` Jakub Kicinski
2024-06-05 20:11               ` Willem de Bruijn
2024-06-05 22:24                 ` Jakub Kicinski
2024-06-06  2:40                   ` Willem de Bruijn
2024-05-10  3:04 ` [RFC net-next 02/15] psp: base PSP device support Jakub Kicinski
2024-05-10  3:04 ` [RFC net-next 03/15] net: modify core data structures for PSP datapath support Jakub Kicinski
2024-05-10  3:04 ` [RFC net-next 04/15] tcp: add datapath logic for PSP with inline key exchange Jakub Kicinski
2024-05-10  3:04 ` [RFC net-next 05/15] psp: add op for rotation of secret state Jakub Kicinski
2024-05-16 19:59   ` Lance Richardson
2024-05-29 17:43     ` Jakub Kicinski
2024-05-10  3:04 ` [RFC net-next 06/15] net: psp: add socket security association code Jakub Kicinski
2024-05-10  3:04 ` [RFC net-next 07/15] net: psp: update the TCP MSS to reflect PSP packet overhead Jakub Kicinski
2024-05-13  1:47   ` Willem de Bruijn
2024-05-29 17:48     ` Jakub Kicinski
2024-05-30  0:52       ` Willem de Bruijn
2024-05-10  3:04 ` [RFC net-next 08/15] psp: track generations of secret state Jakub Kicinski
2024-05-10  3:04 ` [RFC net-next 09/15] net/mlx5e: Support PSP offload functionality Jakub Kicinski
2024-05-10  3:04 ` [RFC net-next 10/15] net/mlx5e: Implement PSP operations .assoc_add and .assoc_del Jakub Kicinski
2024-05-10  3:04 ` [RFC net-next 11/15] net/mlx5e: Implement PSP Tx data path Jakub Kicinski
2024-05-10  3:04 ` [RFC net-next 12/15] net/mlx5e: Add PSP steering in local NIC RX Jakub Kicinski
2024-05-13  1:52   ` Willem de Bruijn
2024-05-10  3:04 ` [RFC net-next 13/15] net/mlx5e: Configure PSP Rx flow steering rules Jakub Kicinski
2024-05-10  3:04 ` [RFC net-next 14/15] net/mlx5e: Add Rx data path offload Jakub Kicinski
2024-05-13  1:54   ` Willem de Bruijn
2024-05-29 18:38     ` Jakub Kicinski
2024-05-30  9:04       ` Cosmin Ratiu
2024-05-10  3:04 ` [RFC net-next 15/15] net/mlx5e: Implement PSP key_rotate operation Jakub Kicinski
2024-05-29  9:16 ` [RFC net-next 00/15] add basic PSP encryption for TCP connections Boris Pismenny
2024-05-29 18:50   ` Jakub Kicinski
2024-05-29 20:01     ` Boris Pismenny
2024-05-29 20:38       ` Jakub Kicinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).