Netdev Archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH bpf-next v3 0/2] Replace mono_delivery_time with tstamp_type
@ 2024-04-12 21:01 Abhishek Chauhan
  2024-04-12 21:01 ` [RFC PATCH bpf-next v3 1/2] net: Rename mono_delivery_time to tstamp_type for scalabilty Abhishek Chauhan
  2024-04-12 21:01 ` [RFC PATCH bpf-next v3 2/2] net: Add additional bit to support userspace timestamp type Abhishek Chauhan
  0 siblings, 2 replies; 13+ messages in thread
From: Abhishek Chauhan @ 2024-04-12 21:01 UTC (permalink / raw
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev, linux-kernel, Andrew Halaney, Willem de Bruijn,
	Martin KaFai Lau, Martin KaFai Lau, Daniel Borkmann, bpf
  Cc: kernel

Patch 1 :- This patch takes care of only renaming the mono delivery
timestamp to tstamp_type with no change in functionality of 
existing available code in kernel also  
Starts assigning tstamp_type with either mono or real and 
introduces a new enum in the skbuff.h, again no change in functionality 
of the existing available code in kernel , just making the code scalable.

Patch 2 :- Additional bit was added to support userspace timestamp to 
avoid tstamp drops in the forwarding path when testing TC-ETF. 
With this patch i am not sure what impacts it has towards BPF code. 
I need upstream BPF community to help me in adding the necessary BPF 
changes to avoid any BPF test case failures. 
I haven't changed any of the BPF functionalities and hence i need 
upstream BPF help to assist me with those changes so i can make them as 
part of this patch.  

Abhishek Chauhan (2):
  net: Rename mono_delivery_time to tstamp_type for scalabilty
  net: Add additional bit to support userspace timestamp type

 include/linux/skbuff.h                        | 50 ++++++++++++++-----
 include/net/inet_frag.h                       |  4 +-
 net/bridge/netfilter/nf_conntrack_bridge.c    |  6 +--
 net/core/dev.c                                |  2 +-
 net/core/filter.c                             |  8 +--
 net/ipv4/inet_fragment.c                      |  2 +-
 net/ipv4/ip_fragment.c                        |  2 +-
 net/ipv4/ip_output.c                          | 10 ++--
 net/ipv4/raw.c                                |  2 +-
 net/ipv4/tcp_output.c                         | 14 +++---
 net/ipv6/ip6_output.c                         |  8 +--
 net/ipv6/netfilter.c                          |  6 +--
 net/ipv6/netfilter/nf_conntrack_reasm.c       |  2 +-
 net/ipv6/raw.c                                |  2 +-
 net/ipv6/reassembly.c                         |  2 +-
 net/ipv6/tcp_ipv6.c                           |  2 +-
 net/packet/af_packet.c                        |  7 ++-
 net/sched/act_bpf.c                           |  4 +-
 net/sched/cls_bpf.c                           |  4 +-
 .../selftests/bpf/prog_tests/ctx_rewrite.c    |  8 +--
 20 files changed, 84 insertions(+), 61 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC PATCH bpf-next v3 1/2] net: Rename mono_delivery_time to tstamp_type for scalabilty
  2024-04-12 21:01 [RFC PATCH bpf-next v3 0/2] Replace mono_delivery_time with tstamp_type Abhishek Chauhan
@ 2024-04-12 21:01 ` Abhishek Chauhan
  2024-04-13  0:37   ` Abhishek Chauhan (ABC)
  2024-04-13 18:54   ` Willem de Bruijn
  2024-04-12 21:01 ` [RFC PATCH bpf-next v3 2/2] net: Add additional bit to support userspace timestamp type Abhishek Chauhan
  1 sibling, 2 replies; 13+ messages in thread
From: Abhishek Chauhan @ 2024-04-12 21:01 UTC (permalink / raw
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev, linux-kernel, Andrew Halaney, Willem de Bruijn,
	Martin KaFai Lau, Martin KaFai Lau, Daniel Borkmann, bpf
  Cc: kernel

mono_delivery_time was added to check if skb->tstamp has delivery
time in mono clock base (i.e. EDT) otherwise skb->tstamp has
timestamp in ingress and delivery_time at egress.

Renaming the bitfield from mono_delivery_time to tstamp_type is for
extensibilty for other timestamps such as userspace timestamp
(i.e. SO_TXTIME) set via sock opts.

As we are renaming the mono_delivery_time to tstamp_type, it makes
sense to start assigning tstamp_type based out if enum defined as
part of this commit

Earlier we used bool arg flag to check if the tstamp is mono in
function skb_set_delivery_time, Now the signature of the functions
accepts enum to distinguish between mono and real time

Bridge driver today has no support to forward the userspace timestamp
packets and ends up resetting the timestamp. ETF qdisc checks the
packet coming from userspace and encounters to be 0 thereby dropping
time sensitive packets. These changes will allow userspace timestamps
packets to be forwarded from the bridge to NIC drivers.

In future tstamp_type:1 can be extended to support userspace timestamp
by increasing the bitfield.

Link: https://lore.kernel.org/netdev/bc037db4-58bb-4861-ac31-a361a93841d3@linux.dev/
Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com>
---
Changes since v2
- Minor changes to commit subject

Changes since v1
- Squashed the two commits into one as mentioned by Willem.
- Introduced switch in skb_set_delivery_time.
- Renamed and removed directionality aspects w.r.t tstamp_type 
  as mentioned by Willem.

 include/linux/skbuff.h                     | 33 +++++++++++++++-------
 include/net/inet_frag.h                    |  4 +--
 net/bridge/netfilter/nf_conntrack_bridge.c |  6 ++--
 net/core/dev.c                             |  2 +-
 net/core/filter.c                          |  8 +++---
 net/ipv4/inet_fragment.c                   |  2 +-
 net/ipv4/ip_fragment.c                     |  2 +-
 net/ipv4/ip_output.c                       |  8 +++---
 net/ipv4/tcp_output.c                      | 14 ++++-----
 net/ipv6/ip6_output.c                      |  6 ++--
 net/ipv6/netfilter.c                       |  6 ++--
 net/ipv6/netfilter/nf_conntrack_reasm.c    |  2 +-
 net/ipv6/reassembly.c                      |  2 +-
 net/ipv6/tcp_ipv6.c                        |  2 +-
 net/sched/act_bpf.c                        |  4 +--
 net/sched/cls_bpf.c                        |  4 +--
 16 files changed, 59 insertions(+), 46 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 7135a3e94afd..a83a2120b57f 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -702,6 +702,11 @@ typedef unsigned int sk_buff_data_t;
 typedef unsigned char *sk_buff_data_t;
 #endif
 
+enum skb_tstamp_type {
+	CLOCK_REAL = 0, /* Time base is realtime */
+	CLOCK_MONO = 1, /* Time base is Monotonic */
+};
+
 /**
  * DOC: Basic sk_buff geometry
  *
@@ -819,7 +824,7 @@ typedef unsigned char *sk_buff_data_t;
  *	@dst_pending_confirm: need to confirm neighbour
  *	@decrypted: Decrypted SKB
  *	@slow_gro: state present at GRO time, slower prepare step required
- *	@mono_delivery_time: When set, skb->tstamp has the
+ *	@tstamp_type: When set, skb->tstamp has the
  *		delivery_time in mono clock base (i.e. EDT).  Otherwise, the
  *		skb->tstamp has the (rcv) timestamp at ingress and
  *		delivery_time at egress.
@@ -950,7 +955,7 @@ struct sk_buff {
 	/* private: */
 	__u8			__mono_tc_offset[0];
 	/* public: */
-	__u8			mono_delivery_time:1;	/* See SKB_MONO_DELIVERY_TIME_MASK */
+	__u8			tstamp_type:1;	/* See SKB_MONO_DELIVERY_TIME_MASK */
 #ifdef CONFIG_NET_XGRESS
 	__u8			tc_at_ingress:1;	/* See TC_AT_INGRESS_MASK */
 	__u8			tc_skip_classify:1;
@@ -4237,7 +4242,7 @@ static inline void skb_get_new_timestampns(const struct sk_buff *skb,
 static inline void __net_timestamp(struct sk_buff *skb)
 {
 	skb->tstamp = ktime_get_real();
-	skb->mono_delivery_time = 0;
+	skb->tstamp_type = CLOCK_REAL;
 }
 
 static inline ktime_t net_timedelta(ktime_t t)
@@ -4246,10 +4251,18 @@ static inline ktime_t net_timedelta(ktime_t t)
 }
 
 static inline void skb_set_delivery_time(struct sk_buff *skb, ktime_t kt,
-					 bool mono)
+					  u8 tstamp_type)
 {
 	skb->tstamp = kt;
-	skb->mono_delivery_time = kt && mono;
+
+	switch (tstamp_type) {
+	case CLOCK_REAL:
+		skb->tstamp_type = CLOCK_REAL;
+		break;
+	case CLOCK_MONO:
+		skb->tstamp_type = kt && tstamp_type;
+		break;
+	}
 }
 
 DECLARE_STATIC_KEY_FALSE(netstamp_needed_key);
@@ -4259,8 +4272,8 @@ DECLARE_STATIC_KEY_FALSE(netstamp_needed_key);
  */
 static inline void skb_clear_delivery_time(struct sk_buff *skb)
 {
-	if (skb->mono_delivery_time) {
-		skb->mono_delivery_time = 0;
+	if (skb->tstamp_type) {
+		skb->tstamp_type = CLOCK_REAL;
 		if (static_branch_unlikely(&netstamp_needed_key))
 			skb->tstamp = ktime_get_real();
 		else
@@ -4270,7 +4283,7 @@ static inline void skb_clear_delivery_time(struct sk_buff *skb)
 
 static inline void skb_clear_tstamp(struct sk_buff *skb)
 {
-	if (skb->mono_delivery_time)
+	if (skb->tstamp_type)
 		return;
 
 	skb->tstamp = 0;
@@ -4278,7 +4291,7 @@ static inline void skb_clear_tstamp(struct sk_buff *skb)
 
 static inline ktime_t skb_tstamp(const struct sk_buff *skb)
 {
-	if (skb->mono_delivery_time)
+	if (skb->tstamp_type == CLOCK_MONO)
 		return 0;
 
 	return skb->tstamp;
@@ -4286,7 +4299,7 @@ static inline ktime_t skb_tstamp(const struct sk_buff *skb)
 
 static inline ktime_t skb_tstamp_cond(const struct sk_buff *skb, bool cond)
 {
-	if (!skb->mono_delivery_time && skb->tstamp)
+	if (skb->tstamp_type != CLOCK_MONO && skb->tstamp)
 		return skb->tstamp;
 
 	if (static_branch_unlikely(&netstamp_needed_key) || cond)
diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
index 153960663ce4..5af6eb14c5db 100644
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -76,7 +76,7 @@ struct frag_v6_compare_key {
  * @stamp: timestamp of the last received fragment
  * @len: total length of the original datagram
  * @meat: length of received fragments so far
- * @mono_delivery_time: stamp has a mono delivery time (EDT)
+ * @tstamp_type: stamp has a mono delivery time (EDT)
  * @flags: fragment queue flags
  * @max_size: maximum received fragment size
  * @fqdir: pointer to struct fqdir
@@ -97,7 +97,7 @@ struct inet_frag_queue {
 	ktime_t			stamp;
 	int			len;
 	int			meat;
-	u8			mono_delivery_time;
+	u8			tstamp_type;
 	__u8			flags;
 	u16			max_size;
 	struct fqdir		*fqdir;
diff --git a/net/bridge/netfilter/nf_conntrack_bridge.c b/net/bridge/netfilter/nf_conntrack_bridge.c
index c3c51b9a6826..816bb0fde718 100644
--- a/net/bridge/netfilter/nf_conntrack_bridge.c
+++ b/net/bridge/netfilter/nf_conntrack_bridge.c
@@ -32,7 +32,7 @@ static int nf_br_ip_fragment(struct net *net, struct sock *sk,
 					   struct sk_buff *))
 {
 	int frag_max_size = BR_INPUT_SKB_CB(skb)->frag_max_size;
-	bool mono_delivery_time = skb->mono_delivery_time;
+	u8 tstamp_type = skb->tstamp_type;
 	unsigned int hlen, ll_rs, mtu;
 	ktime_t tstamp = skb->tstamp;
 	struct ip_frag_state state;
@@ -82,7 +82,7 @@ static int nf_br_ip_fragment(struct net *net, struct sock *sk,
 			if (iter.frag)
 				ip_fraglist_prepare(skb, &iter);
 
-			skb_set_delivery_time(skb, tstamp, mono_delivery_time);
+			skb_set_delivery_time(skb, tstamp, tstamp_type);
 			err = output(net, sk, data, skb);
 			if (err || !iter.frag)
 				break;
@@ -113,7 +113,7 @@ static int nf_br_ip_fragment(struct net *net, struct sock *sk,
 			goto blackhole;
 		}
 
-		skb_set_delivery_time(skb2, tstamp, mono_delivery_time);
+		skb_set_delivery_time(skb2, tstamp, tstamp_type);
 		err = output(net, sk, data, skb2);
 		if (err)
 			goto blackhole;
diff --git a/net/core/dev.c b/net/core/dev.c
index 854a3a28a8d8..77a43c05dfe3 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2146,7 +2146,7 @@ EXPORT_SYMBOL(net_disable_timestamp);
 static inline void net_timestamp_set(struct sk_buff *skb)
 {
 	skb->tstamp = 0;
-	skb->mono_delivery_time = 0;
+	skb->tstamp_type = CLOCK_REAL;
 	if (static_branch_unlikely(&netstamp_needed_key))
 		skb->tstamp = ktime_get_real();
 }
diff --git a/net/core/filter.c b/net/core/filter.c
index 8d185d99a643..8bb45423df52 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -7709,13 +7709,13 @@ BPF_CALL_3(bpf_skb_set_tstamp, struct sk_buff *, skb,
 		if (!tstamp)
 			return -EINVAL;
 		skb->tstamp = tstamp;
-		skb->mono_delivery_time = 1;
+		skb->tstamp_type = CLOCK_MONO;
 		break;
 	case BPF_SKB_TSTAMP_UNSPEC:
 		if (tstamp)
 			return -EINVAL;
 		skb->tstamp = 0;
-		skb->mono_delivery_time = 0;
+		skb->tstamp_type = CLOCK_REAL;
 		break;
 	default:
 		return -EINVAL;
@@ -9422,7 +9422,7 @@ static struct bpf_insn *bpf_convert_tstamp_read(const struct bpf_prog *prog,
 					TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK);
 		*insn++ = BPF_JMP32_IMM(BPF_JNE, tmp_reg,
 					TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK, 2);
-		/* skb->tc_at_ingress && skb->mono_delivery_time,
+		/* skb->tc_at_ingress && skb->tstamp_type:1,
 		 * read 0 as the (rcv) timestamp.
 		 */
 		*insn++ = BPF_MOV64_IMM(value_reg, 0);
@@ -9447,7 +9447,7 @@ static struct bpf_insn *bpf_convert_tstamp_write(const struct bpf_prog *prog,
 	 * the bpf prog is aware the tstamp could have delivery time.
 	 * Thus, write skb->tstamp as is if tstamp_type_access is true.
 	 * Otherwise, writing at ingress will have to clear the
-	 * mono_delivery_time bit also.
+	 * mono_delivery_time (skb->tstamp_type:1)bit also.
 	 */
 	if (!prog->tstamp_type_access) {
 		__u8 tmp_reg = BPF_REG_AX;
diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
index faaec92a46ac..d179a2c84222 100644
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -619,7 +619,7 @@ void inet_frag_reasm_finish(struct inet_frag_queue *q, struct sk_buff *head,
 	skb_mark_not_on_list(head);
 	head->prev = NULL;
 	head->tstamp = q->stamp;
-	head->mono_delivery_time = q->mono_delivery_time;
+	head->tstamp_type = q->tstamp_type;
 
 	if (sk)
 		refcount_add(sum_truesize - head_truesize, &sk->sk_wmem_alloc);
diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index fb947d1613fe..787aa86800f5 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -355,7 +355,7 @@ static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb)
 		qp->iif = dev->ifindex;
 
 	qp->q.stamp = skb->tstamp;
-	qp->q.mono_delivery_time = skb->mono_delivery_time;
+	qp->q.tstamp_type = skb->tstamp_type;
 	qp->q.meat += skb->len;
 	qp->ecn |= ecn;
 	add_frag_mem_limit(qp->q.fqdir, skb->truesize);
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 1fe794967211..62e457f7c02c 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -764,7 +764,7 @@ int ip_do_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
 {
 	struct iphdr *iph;
 	struct sk_buff *skb2;
-	bool mono_delivery_time = skb->mono_delivery_time;
+	u8 tstamp_type = skb->tstamp_type;
 	struct rtable *rt = skb_rtable(skb);
 	unsigned int mtu, hlen, ll_rs;
 	struct ip_fraglist_iter iter;
@@ -856,7 +856,7 @@ int ip_do_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
 				}
 			}
 
-			skb_set_delivery_time(skb, tstamp, mono_delivery_time);
+			skb_set_delivery_time(skb, tstamp, tstamp_type);
 			err = output(net, sk, skb);
 
 			if (!err)
@@ -912,7 +912,7 @@ int ip_do_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
 		/*
 		 *	Put this fragment into the sending queue.
 		 */
-		skb_set_delivery_time(skb2, tstamp, mono_delivery_time);
+		skb_set_delivery_time(skb2, tstamp, tstamp_type);
 		err = output(net, sk, skb2);
 		if (err)
 			goto fail;
@@ -1649,7 +1649,7 @@ void ip_send_unicast_reply(struct sock *sk, struct sk_buff *skb,
 			  arg->csumoffset) = csum_fold(csum_add(nskb->csum,
 								arg->csum));
 		nskb->ip_summed = CHECKSUM_NONE;
-		nskb->mono_delivery_time = !!transmit_time;
+		nskb->tstamp_type = !!transmit_time;
 		if (txhash)
 			skb_set_hash(nskb, txhash, PKT_HASH_TYPE_L4);
 		ip_push_pending_frames(sk, &fl4);
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 9282fafc0e61..42e6ed1decf4 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1299,7 +1299,7 @@ static int __tcp_transmit_skb(struct sock *sk, struct sk_buff *skb,
 	tp = tcp_sk(sk);
 	prior_wstamp = tp->tcp_wstamp_ns;
 	tp->tcp_wstamp_ns = max(tp->tcp_wstamp_ns, tp->tcp_clock_cache);
-	skb_set_delivery_time(skb, tp->tcp_wstamp_ns, true);
+	skb_set_delivery_time(skb, tp->tcp_wstamp_ns, CLOCK_MONO);
 	if (clone_it) {
 		oskb = skb;
 
@@ -1649,7 +1649,7 @@ int tcp_fragment(struct sock *sk, enum tcp_queue tcp_queue,
 
 	skb_split(skb, buff, len);
 
-	skb_set_delivery_time(buff, skb->tstamp, true);
+	skb_set_delivery_time(buff, skb->tstamp, CLOCK_MONO);
 	tcp_fragment_tstamp(skb, buff);
 
 	old_factor = tcp_skb_pcount(skb);
@@ -2730,7 +2730,7 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
 		if (unlikely(tp->repair) && tp->repair_queue == TCP_SEND_QUEUE) {
 			/* "skb_mstamp_ns" is used as a start point for the retransmit timer */
 			tp->tcp_wstamp_ns = tp->tcp_clock_cache;
-			skb_set_delivery_time(skb, tp->tcp_wstamp_ns, true);
+			skb_set_delivery_time(skb, tp->tcp_wstamp_ns, CLOCK_MONO);
 			list_move_tail(&skb->tcp_tsorted_anchor, &tp->tsorted_sent_queue);
 			tcp_init_tso_segs(skb, mss_now);
 			goto repair; /* Skip network transmission */
@@ -3713,11 +3713,11 @@ struct sk_buff *tcp_make_synack(const struct sock *sk, struct dst_entry *dst,
 #ifdef CONFIG_SYN_COOKIES
 	if (unlikely(synack_type == TCP_SYNACK_COOKIE && ireq->tstamp_ok))
 		skb_set_delivery_time(skb, cookie_init_timestamp(req, now),
-				      true);
+				      CLOCK_MONO);
 	else
 #endif
 	{
-		skb_set_delivery_time(skb, now, true);
+		skb_set_delivery_time(skb, now, CLOCK_MONO);
 		if (!tcp_rsk(req)->snt_synack) /* Timestamp first SYNACK */
 			tcp_rsk(req)->snt_synack = tcp_skb_timestamp_us(skb);
 	}
@@ -3804,7 +3804,7 @@ struct sk_buff *tcp_make_synack(const struct sock *sk, struct dst_entry *dst,
 	bpf_skops_write_hdr_opt((struct sock *)sk, skb, req, syn_skb,
 				synack_type, &opts);
 
-	skb_set_delivery_time(skb, now, true);
+	skb_set_delivery_time(skb, now, CLOCK_MONO);
 	tcp_add_tx_delay(skb, tp);
 
 	return skb;
@@ -3988,7 +3988,7 @@ static int tcp_send_syn_data(struct sock *sk, struct sk_buff *syn)
 
 	err = tcp_transmit_skb(sk, syn_data, 1, sk->sk_allocation);
 
-	skb_set_delivery_time(syn, syn_data->skb_mstamp_ns, true);
+	skb_set_delivery_time(syn, syn_data->skb_mstamp_ns, CLOCK_MONO);
 
 	/* Now full SYN+DATA was cloned and sent (or not),
 	 * remove the SYN from the original skb (syn_data)
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index b9dd3a66e423..a9e819115622 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -859,7 +859,7 @@ int ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
 	struct rt6_info *rt = (struct rt6_info *)skb_dst(skb);
 	struct ipv6_pinfo *np = skb->sk && !dev_recursion_level() ?
 				inet6_sk(skb->sk) : NULL;
-	bool mono_delivery_time = skb->mono_delivery_time;
+	u8 tstamp_type = skb->tstamp_type;
 	struct ip6_frag_state state;
 	unsigned int mtu, hlen, nexthdr_offset;
 	ktime_t tstamp = skb->tstamp;
@@ -955,7 +955,7 @@ int ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
 			if (iter.frag)
 				ip6_fraglist_prepare(skb, &iter);
 
-			skb_set_delivery_time(skb, tstamp, mono_delivery_time);
+			skb_set_delivery_time(skb, tstamp, tstamp_type);
 			err = output(net, sk, skb);
 			if (!err)
 				IP6_INC_STATS(net, ip6_dst_idev(&rt->dst),
@@ -1016,7 +1016,7 @@ int ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
 		/*
 		 *	Put this fragment into the sending queue.
 		 */
-		skb_set_delivery_time(frag, tstamp, mono_delivery_time);
+		skb_set_delivery_time(frag, tstamp, tstamp_type);
 		err = output(net, sk, frag);
 		if (err)
 			goto fail;
diff --git a/net/ipv6/netfilter.c b/net/ipv6/netfilter.c
index 53d255838e6a..e0c2347b4dc6 100644
--- a/net/ipv6/netfilter.c
+++ b/net/ipv6/netfilter.c
@@ -126,7 +126,7 @@ int br_ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
 				  struct sk_buff *))
 {
 	int frag_max_size = BR_INPUT_SKB_CB(skb)->frag_max_size;
-	bool mono_delivery_time = skb->mono_delivery_time;
+	u8 tstamp_type = skb->tstamp_type;
 	ktime_t tstamp = skb->tstamp;
 	struct ip6_frag_state state;
 	u8 *prevhdr, nexthdr = 0;
@@ -192,7 +192,7 @@ int br_ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
 			if (iter.frag)
 				ip6_fraglist_prepare(skb, &iter);
 
-			skb_set_delivery_time(skb, tstamp, mono_delivery_time);
+			skb_set_delivery_time(skb, tstamp, tstamp_type);
 			err = output(net, sk, data, skb);
 			if (err || !iter.frag)
 				break;
@@ -225,7 +225,7 @@ int br_ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
 			goto blackhole;
 		}
 
-		skb_set_delivery_time(skb2, tstamp, mono_delivery_time);
+		skb_set_delivery_time(skb2, tstamp, tstamp_type);
 		err = output(net, sk, data, skb2);
 		if (err)
 			goto blackhole;
diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c
index d0dcbaca1994..5cc5d823d33f 100644
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -264,7 +264,7 @@ static int nf_ct_frag6_queue(struct frag_queue *fq, struct sk_buff *skb,
 		fq->iif = dev->ifindex;
 
 	fq->q.stamp = skb->tstamp;
-	fq->q.mono_delivery_time = skb->mono_delivery_time;
+	fq->q.tstamp_type = skb->tstamp_type;
 	fq->q.meat += skb->len;
 	fq->ecn |= ecn;
 	if (payload_len > fq->q.max_size)
diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
index acb4f119e11f..ea724ff558b4 100644
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -198,7 +198,7 @@ static int ip6_frag_queue(struct frag_queue *fq, struct sk_buff *skb,
 		fq->iif = dev->ifindex;
 
 	fq->q.stamp = skb->tstamp;
-	fq->q.mono_delivery_time = skb->mono_delivery_time;
+	fq->q.tstamp_type = skb->tstamp_type;
 	fq->q.meat += skb->len;
 	fq->ecn |= ecn;
 	add_frag_mem_limit(fq->q.fqdir, skb->truesize);
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 3aa9da5c9a66..b60196061489 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -975,7 +975,7 @@ static void tcp_v6_send_response(const struct sock *sk, struct sk_buff *skb, u32
 			mark = inet_twsk(sk)->tw_mark;
 		else
 			mark = READ_ONCE(sk->sk_mark);
-		skb_set_delivery_time(buff, tcp_transmit_time(sk), true);
+		skb_set_delivery_time(buff, tcp_transmit_time(sk), CLOCK_MONO);
 	}
 	if (txhash) {
 		/* autoflowlabel/skb_get_hash_flowi6 rely on buff->hash */
diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c
index 0e3cf11ae5fc..1f8b5a3f065e 100644
--- a/net/sched/act_bpf.c
+++ b/net/sched/act_bpf.c
@@ -54,8 +54,8 @@ TC_INDIRECT_SCOPE int tcf_bpf_act(struct sk_buff *skb,
 		bpf_compute_data_pointers(skb);
 		filter_res = bpf_prog_run(filter, skb);
 	}
-	if (unlikely(!skb->tstamp && skb->mono_delivery_time))
-		skb->mono_delivery_time = 0;
+	if (unlikely(!skb->tstamp && skb->tstamp_type))
+		skb->tstamp_type = CLOCK_REAL;
 	if (skb_sk_is_prefetched(skb) && filter_res != TC_ACT_OK)
 		skb_orphan(skb);
 
diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
index 5e83e890f6a4..3f843e0eea3c 100644
--- a/net/sched/cls_bpf.c
+++ b/net/sched/cls_bpf.c
@@ -104,8 +104,8 @@ TC_INDIRECT_SCOPE int cls_bpf_classify(struct sk_buff *skb,
 			bpf_compute_data_pointers(skb);
 			filter_res = bpf_prog_run(prog->filter, skb);
 		}
-		if (unlikely(!skb->tstamp && skb->mono_delivery_time))
-			skb->mono_delivery_time = 0;
+		if (unlikely(!skb->tstamp && skb->tstamp_type))
+			skb->tstamp_type = CLOCK_REAL;
 
 		if (prog->exts_integrated) {
 			res->class   = 0;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC PATCH bpf-next v3 2/2] net: Add additional bit to support userspace timestamp type
  2024-04-12 21:01 [RFC PATCH bpf-next v3 0/2] Replace mono_delivery_time with tstamp_type Abhishek Chauhan
  2024-04-12 21:01 ` [RFC PATCH bpf-next v3 1/2] net: Rename mono_delivery_time to tstamp_type for scalabilty Abhishek Chauhan
@ 2024-04-12 21:01 ` Abhishek Chauhan
  2024-04-13 19:07   ` Willem de Bruijn
  1 sibling, 1 reply; 13+ messages in thread
From: Abhishek Chauhan @ 2024-04-12 21:01 UTC (permalink / raw
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev, linux-kernel, Andrew Halaney, Willem de Bruijn,
	Martin KaFai Lau, Martin KaFai Lau, Daniel Borkmann, bpf
  Cc: kernel

tstamp_type can be real, mono or userspace timestamp.

This commit adds userspace timestamp and sets it if there is
valid transmit_time available in socket coming from userspace.

To make the design scalable for future needs this commit bring in
the change to extend the tstamp_type:1 to tstamp_type:2 to support
userspace timestamp.

Link: https://lore.kernel.org/netdev/bc037db4-58bb-4861-ac31-a361a93841d3@linux.dev/
Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com>
---
Changes since v2
- Minor changes to commit subject

Changes since v1 
- identified additional changes in BPF framework.
- Bit shift in SKB_MONO_DELIVERY_TIME_MASK and TC_AT_INGRESS_MASK.
- Made changes in skb_set_delivery_time to keep changes similar to 
  previous code for mono_delivery_time and just setting tstamp_type
  bit 1 for userspace timestamp.


 include/linux/skbuff.h                        | 19 +++++++++++++++----
 net/ipv4/ip_output.c                          |  2 +-
 net/ipv4/raw.c                                |  2 +-
 net/ipv6/ip6_output.c                         |  2 +-
 net/ipv6/raw.c                                |  2 +-
 net/packet/af_packet.c                        |  7 +++----
 .../selftests/bpf/prog_tests/ctx_rewrite.c    |  8 ++++----
 7 files changed, 26 insertions(+), 16 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index a83a2120b57f..b6346c21c3d4 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -827,7 +827,8 @@ enum skb_tstamp_type {
  *	@tstamp_type: When set, skb->tstamp has the
  *		delivery_time in mono clock base (i.e. EDT).  Otherwise, the
  *		skb->tstamp has the (rcv) timestamp at ingress and
- *		delivery_time at egress.
+ *		delivery_time at egress or skb->tstamp defined by skb->sk->sk_clockid
+ *		coming from userspace
  *	@napi_id: id of the NAPI struct this skb came from
  *	@sender_cpu: (aka @napi_id) source CPU in XPS
  *	@alloc_cpu: CPU which did the skb allocation.
@@ -955,7 +956,7 @@ struct sk_buff {
 	/* private: */
 	__u8			__mono_tc_offset[0];
 	/* public: */
-	__u8			tstamp_type:1;	/* See SKB_MONO_DELIVERY_TIME_MASK */
+	__u8			tstamp_type:2;	/* See SKB_MONO_DELIVERY_TIME_MASK */
 #ifdef CONFIG_NET_XGRESS
 	__u8			tc_at_ingress:1;	/* See TC_AT_INGRESS_MASK */
 	__u8			tc_skip_classify:1;
@@ -1090,10 +1091,10 @@ struct sk_buff {
  */
 #ifdef __BIG_ENDIAN_BITFIELD
 #define SKB_MONO_DELIVERY_TIME_MASK	(1 << 7)
-#define TC_AT_INGRESS_MASK		(1 << 6)
+#define TC_AT_INGRESS_MASK		(1 << 5)
 #else
 #define SKB_MONO_DELIVERY_TIME_MASK	(1 << 0)
-#define TC_AT_INGRESS_MASK		(1 << 1)
+#define TC_AT_INGRESS_MASK		(1 << 2)
 #endif
 #define SKB_BF_MONO_TC_OFFSET		offsetof(struct sk_buff, __mono_tc_offset)
 
@@ -4262,6 +4263,16 @@ static inline void skb_set_delivery_time(struct sk_buff *skb, ktime_t kt,
 	case CLOCK_MONO:
 		skb->tstamp_type = kt && tstamp_type;
 		break;
+	/* if any other time base, must be from userspace
+	 * so set userspace tstamp_type bit
+	 * See skbuff tstamp_type:2
+	 * 0x0 => real timestamp_type
+	 * 0x1 => mono timestamp_type
+	 * 0x2 => timestamp_type set from userspace
+	 */
+	default:
+		if (kt && tstamp_type)
+			skb->tstamp_type = 0x2;
 	}
 }
 
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 62e457f7c02c..c9317d4addce 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1457,7 +1457,7 @@ struct sk_buff *__ip_make_skb(struct sock *sk,
 
 	skb->priority = (cork->tos != -1) ? cork->priority: READ_ONCE(sk->sk_priority);
 	skb->mark = cork->mark;
-	skb->tstamp = cork->transmit_time;
+	skb_set_delivery_time(skb, cork->transmit_time, sk->sk_clockid);
 	/*
 	 * Steal rt from cork.dst to avoid a pair of atomic_inc/atomic_dec
 	 * on dst refcount
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index dcb11f22cbf2..bbc46a40c8b6 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -360,7 +360,7 @@ static int raw_send_hdrinc(struct sock *sk, struct flowi4 *fl4,
 	skb->protocol = htons(ETH_P_IP);
 	skb->priority = READ_ONCE(sk->sk_priority);
 	skb->mark = sockc->mark;
-	skb->tstamp = sockc->transmit_time;
+	skb_set_delivery_time(skb, sockc->transmit_time, sk->sk_clockid);
 	skb_dst_set(skb, &rt->dst);
 	*rtp = NULL;
 
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index a9e819115622..0b8193bdd98f 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1924,7 +1924,7 @@ struct sk_buff *__ip6_make_skb(struct sock *sk,
 
 	skb->priority = READ_ONCE(sk->sk_priority);
 	skb->mark = cork->base.mark;
-	skb->tstamp = cork->base.transmit_time;
+	skb_set_delivery_time(skb, cork->base.transmit_time, sk->sk_clockid);
 
 	ip6_cork_steal_dst(skb, cork);
 	IP6_INC_STATS(net, rt->rt6i_idev, IPSTATS_MIB_OUTREQUESTS);
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 0d896ca7b589..625f3a917e50 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -621,7 +621,7 @@ static int rawv6_send_hdrinc(struct sock *sk, struct msghdr *msg, int length,
 	skb->protocol = htons(ETH_P_IPV6);
 	skb->priority = READ_ONCE(sk->sk_priority);
 	skb->mark = sockc->mark;
-	skb->tstamp = sockc->transmit_time;
+	skb_set_delivery_time(skb, sockc->transmit_time, sk->sk_clockid);
 
 	skb_put(skb, length);
 	skb_reset_network_header(skb);
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 8c6d3fbb4ed8..356c96f23370 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2056,8 +2056,7 @@ static int packet_sendmsg_spkt(struct socket *sock, struct msghdr *msg,
 	skb->dev = dev;
 	skb->priority = READ_ONCE(sk->sk_priority);
 	skb->mark = READ_ONCE(sk->sk_mark);
-	skb->tstamp = sockc.transmit_time;
-
+	skb_set_delivery_time(skb, sockc.transmit_time, sk->sk_clockid);
 	skb_setup_tx_timestamp(skb, sockc.tsflags);
 
 	if (unlikely(extra_len == 4))
@@ -2585,7 +2584,7 @@ static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb,
 	skb->dev = dev;
 	skb->priority = READ_ONCE(po->sk.sk_priority);
 	skb->mark = READ_ONCE(po->sk.sk_mark);
-	skb->tstamp = sockc->transmit_time;
+	skb_set_delivery_time(skb, sockc->transmit_time, po->sk.sk_clockid);
 	skb_setup_tx_timestamp(skb, sockc->tsflags);
 	skb_zcopy_set_nouarg(skb, ph.raw);
 
@@ -3063,7 +3062,7 @@ static int packet_snd(struct socket *sock, struct msghdr *msg, size_t len)
 	skb->dev = dev;
 	skb->priority = READ_ONCE(sk->sk_priority);
 	skb->mark = sockc.mark;
-	skb->tstamp = sockc.transmit_time;
+	skb_set_delivery_time(skb, sockc.transmit_time, sk->sk_clockid);
 
 	if (unlikely(extra_len == 4))
 		skb->no_fcs = 1;
diff --git a/tools/testing/selftests/bpf/prog_tests/ctx_rewrite.c b/tools/testing/selftests/bpf/prog_tests/ctx_rewrite.c
index 3b7c57fe55a5..d7f58d9671f7 100644
--- a/tools/testing/selftests/bpf/prog_tests/ctx_rewrite.c
+++ b/tools/testing/selftests/bpf/prog_tests/ctx_rewrite.c
@@ -69,15 +69,15 @@ static struct test_case test_cases[] = {
 	{
 		N(SCHED_CLS, struct __sk_buff, tstamp),
 		.read  = "r11 = *(u8 *)($ctx + sk_buff::__mono_tc_offset);"
-			 "w11 &= 3;"
-			 "if w11 != 0x3 goto pc+2;"
+			 "w11 &= 5;"
+			 "if w11 != 0x5 goto pc+2;"
 			 "$dst = 0;"
 			 "goto pc+1;"
 			 "$dst = *(u64 *)($ctx + sk_buff::tstamp);",
 		.write = "r11 = *(u8 *)($ctx + sk_buff::__mono_tc_offset);"
-			 "if w11 & 0x2 goto pc+1;"
+			 "if w11 & 0x4 goto pc+1;"
 			 "goto pc+2;"
-			 "w11 &= -2;"
+			 "w11 &= -4;"
 			 "*(u8 *)($ctx + sk_buff::__mono_tc_offset) = r11;"
 			 "*(u64 *)($ctx + sk_buff::tstamp) = $src;",
 	},
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH bpf-next v3 1/2] net: Rename mono_delivery_time to tstamp_type for scalabilty
  2024-04-12 21:01 ` [RFC PATCH bpf-next v3 1/2] net: Rename mono_delivery_time to tstamp_type for scalabilty Abhishek Chauhan
@ 2024-04-13  0:37   ` Abhishek Chauhan (ABC)
  2024-04-13 18:54   ` Willem de Bruijn
  1 sibling, 0 replies; 13+ messages in thread
From: Abhishek Chauhan (ABC) @ 2024-04-13  0:37 UTC (permalink / raw
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev, linux-kernel, Andrew Halaney, Willem de Bruijn,
	Martin KaFai Lau, Martin KaFai Lau, Daniel Borkmann, bpf
  Cc: kernel



On 4/12/2024 2:01 PM, Abhishek Chauhan wrote:
> mono_delivery_time was added to check if skb->tstamp has delivery
> time in mono clock base (i.e. EDT) otherwise skb->tstamp has
> timestamp in ingress and delivery_time at egress.
> 
> Renaming the bitfield from mono_delivery_time to tstamp_type is for
> extensibilty for other timestamps such as userspace timestamp
> (i.e. SO_TXTIME) set via sock opts.
> 
> As we are renaming the mono_delivery_time to tstamp_type, it makes
> sense to start assigning tstamp_type based out if enum defined as
> part of this commit
> 
> Earlier we used bool arg flag to check if the tstamp is mono in
> function skb_set_delivery_time, Now the signature of the functions
> accepts enum to distinguish between mono and real time
> 
> Bridge driver today has no support to forward the userspace timestamp
> packets and ends up resetting the timestamp. ETF qdisc checks the
> packet coming from userspace and encounters to be 0 thereby dropping
> time sensitive packets. These changes will allow userspace timestamps
> packets to be forwarded from the bridge to NIC drivers.
> 
> In future tstamp_type:1 can be extended to support userspace timestamp
> by increasing the bitfield.
> 
> Link: https://lore.kernel.org/netdev/bc037db4-58bb-4861-ac31-a361a93841d3@linux.dev/
> Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com>
> ---
> Changes since v2
> - Minor changes to commit subject
> 
> Changes since v1
> - Squashed the two commits into one as mentioned by Willem.
> - Introduced switch in skb_set_delivery_time.
> - Renamed and removed directionality aspects w.r.t tstamp_type 
>   as mentioned by Willem.
> 
>  include/linux/skbuff.h                     | 33 +++++++++++++++-------
>  include/net/inet_frag.h                    |  4 +--
>  net/bridge/netfilter/nf_conntrack_bridge.c |  6 ++--
>  net/core/dev.c                             |  2 +-
>  net/core/filter.c                          |  8 +++---
>  net/ipv4/inet_fragment.c                   |  2 +-
>  net/ipv4/ip_fragment.c                     |  2 +-
>  net/ipv4/ip_output.c                       |  8 +++---
>  net/ipv4/tcp_output.c                      | 14 ++++-----
>  net/ipv6/ip6_output.c                      |  6 ++--
>  net/ipv6/netfilter.c                       |  6 ++--
>  net/ipv6/netfilter/nf_conntrack_reasm.c    |  2 +-
>  net/ipv6/reassembly.c                      |  2 +-
>  net/ipv6/tcp_ipv6.c                        |  2 +-
>  net/sched/act_bpf.c                        |  4 +--
>  net/sched/cls_bpf.c                        |  4 +--
>  16 files changed, 59 insertions(+), 46 deletions(-)
> 
self review :- 
 
One more file needs to be updated here net/ieee802154/6lowpan/reassembly.c  :( 

Unfortunately when i compile kdev with defconfig this file never gets compiled on my internal workspace. 

I will find all other instances and update accordingly. 


> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 7135a3e94afd..a83a2120b57f 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -702,6 +702,11 @@ typedef unsigned int sk_buff_data_t;
>  typedef unsigned char *sk_buff_data_t;
>  #endif
>  
> +enum skb_tstamp_type {
> +	CLOCK_REAL = 0, /* Time base is realtime */
> +	CLOCK_MONO = 1, /* Time base is Monotonic */
> +};
> +
>  /**
>   * DOC: Basic sk_buff geometry
>   *
> @@ -819,7 +824,7 @@ typedef unsigned char *sk_buff_data_t;
>   *	@dst_pending_confirm: need to confirm neighbour
>   *	@decrypted: Decrypted SKB
>   *	@slow_gro: state present at GRO time, slower prepare step required
> - *	@mono_delivery_time: When set, skb->tstamp has the
> + *	@tstamp_type: When set, skb->tstamp has the
>   *		delivery_time in mono clock base (i.e. EDT).  Otherwise, the
>   *		skb->tstamp has the (rcv) timestamp at ingress and
>   *		delivery_time at egress.
> @@ -950,7 +955,7 @@ struct sk_buff {
>  	/* private: */
>  	__u8			__mono_tc_offset[0];
>  	/* public: */
> -	__u8			mono_delivery_time:1;	/* See SKB_MONO_DELIVERY_TIME_MASK */
> +	__u8			tstamp_type:1;	/* See SKB_MONO_DELIVERY_TIME_MASK */
>  #ifdef CONFIG_NET_XGRESS
>  	__u8			tc_at_ingress:1;	/* See TC_AT_INGRESS_MASK */
>  	__u8			tc_skip_classify:1;
> @@ -4237,7 +4242,7 @@ static inline void skb_get_new_timestampns(const struct sk_buff *skb,
>  static inline void __net_timestamp(struct sk_buff *skb)
>  {
>  	skb->tstamp = ktime_get_real();
> -	skb->mono_delivery_time = 0;
> +	skb->tstamp_type = CLOCK_REAL;
>  }
>  
>  static inline ktime_t net_timedelta(ktime_t t)
> @@ -4246,10 +4251,18 @@ static inline ktime_t net_timedelta(ktime_t t)
>  }
>  
>  static inline void skb_set_delivery_time(struct sk_buff *skb, ktime_t kt,
> -					 bool mono)
> +					  u8 tstamp_type)
>  {
>  	skb->tstamp = kt;
> -	skb->mono_delivery_time = kt && mono;
> +
> +	switch (tstamp_type) {
> +	case CLOCK_REAL:
> +		skb->tstamp_type = CLOCK_REAL;
> +		break;
> +	case CLOCK_MONO:
> +		skb->tstamp_type = kt && tstamp_type;
> +		break;
> +	}
>  }
>  
>  DECLARE_STATIC_KEY_FALSE(netstamp_needed_key);
> @@ -4259,8 +4272,8 @@ DECLARE_STATIC_KEY_FALSE(netstamp_needed_key);
>   */
>  static inline void skb_clear_delivery_time(struct sk_buff *skb)
>  {
> -	if (skb->mono_delivery_time) {
> -		skb->mono_delivery_time = 0;
> +	if (skb->tstamp_type) {
> +		skb->tstamp_type = CLOCK_REAL;
>  		if (static_branch_unlikely(&netstamp_needed_key))
>  			skb->tstamp = ktime_get_real();
>  		else
> @@ -4270,7 +4283,7 @@ static inline void skb_clear_delivery_time(struct sk_buff *skb)
>  
>  static inline void skb_clear_tstamp(struct sk_buff *skb)
>  {
> -	if (skb->mono_delivery_time)
> +	if (skb->tstamp_type)
>  		return;
>  
>  	skb->tstamp = 0;
> @@ -4278,7 +4291,7 @@ static inline void skb_clear_tstamp(struct sk_buff *skb)
>  
>  static inline ktime_t skb_tstamp(const struct sk_buff *skb)
>  {
> -	if (skb->mono_delivery_time)
> +	if (skb->tstamp_type == CLOCK_MONO)
>  		return 0;
>  
>  	return skb->tstamp;
> @@ -4286,7 +4299,7 @@ static inline ktime_t skb_tstamp(const struct sk_buff *skb)
>  
>  static inline ktime_t skb_tstamp_cond(const struct sk_buff *skb, bool cond)
>  {
> -	if (!skb->mono_delivery_time && skb->tstamp)
> +	if (skb->tstamp_type != CLOCK_MONO && skb->tstamp)
>  		return skb->tstamp;
>  
>  	if (static_branch_unlikely(&netstamp_needed_key) || cond)
> diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
> index 153960663ce4..5af6eb14c5db 100644
> --- a/include/net/inet_frag.h
> +++ b/include/net/inet_frag.h
> @@ -76,7 +76,7 @@ struct frag_v6_compare_key {
>   * @stamp: timestamp of the last received fragment
>   * @len: total length of the original datagram
>   * @meat: length of received fragments so far
> - * @mono_delivery_time: stamp has a mono delivery time (EDT)
> + * @tstamp_type: stamp has a mono delivery time (EDT)
>   * @flags: fragment queue flags
>   * @max_size: maximum received fragment size
>   * @fqdir: pointer to struct fqdir
> @@ -97,7 +97,7 @@ struct inet_frag_queue {
>  	ktime_t			stamp;
>  	int			len;
>  	int			meat;
> -	u8			mono_delivery_time;
> +	u8			tstamp_type;
>  	__u8			flags;
>  	u16			max_size;
>  	struct fqdir		*fqdir;
> diff --git a/net/bridge/netfilter/nf_conntrack_bridge.c b/net/bridge/netfilter/nf_conntrack_bridge.c
> index c3c51b9a6826..816bb0fde718 100644
> --- a/net/bridge/netfilter/nf_conntrack_bridge.c
> +++ b/net/bridge/netfilter/nf_conntrack_bridge.c
> @@ -32,7 +32,7 @@ static int nf_br_ip_fragment(struct net *net, struct sock *sk,
>  					   struct sk_buff *))
>  {
>  	int frag_max_size = BR_INPUT_SKB_CB(skb)->frag_max_size;
> -	bool mono_delivery_time = skb->mono_delivery_time;
> +	u8 tstamp_type = skb->tstamp_type;
>  	unsigned int hlen, ll_rs, mtu;
>  	ktime_t tstamp = skb->tstamp;
>  	struct ip_frag_state state;
> @@ -82,7 +82,7 @@ static int nf_br_ip_fragment(struct net *net, struct sock *sk,
>  			if (iter.frag)
>  				ip_fraglist_prepare(skb, &iter);
>  
> -			skb_set_delivery_time(skb, tstamp, mono_delivery_time);
> +			skb_set_delivery_time(skb, tstamp, tstamp_type);
>  			err = output(net, sk, data, skb);
>  			if (err || !iter.frag)
>  				break;
> @@ -113,7 +113,7 @@ static int nf_br_ip_fragment(struct net *net, struct sock *sk,
>  			goto blackhole;
>  		}
>  
> -		skb_set_delivery_time(skb2, tstamp, mono_delivery_time);
> +		skb_set_delivery_time(skb2, tstamp, tstamp_type);
>  		err = output(net, sk, data, skb2);
>  		if (err)
>  			goto blackhole;
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 854a3a28a8d8..77a43c05dfe3 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -2146,7 +2146,7 @@ EXPORT_SYMBOL(net_disable_timestamp);
>  static inline void net_timestamp_set(struct sk_buff *skb)
>  {
>  	skb->tstamp = 0;
> -	skb->mono_delivery_time = 0;
> +	skb->tstamp_type = CLOCK_REAL;
>  	if (static_branch_unlikely(&netstamp_needed_key))
>  		skb->tstamp = ktime_get_real();
>  }
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 8d185d99a643..8bb45423df52 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -7709,13 +7709,13 @@ BPF_CALL_3(bpf_skb_set_tstamp, struct sk_buff *, skb,
>  		if (!tstamp)
>  			return -EINVAL;
>  		skb->tstamp = tstamp;
> -		skb->mono_delivery_time = 1;
> +		skb->tstamp_type = CLOCK_MONO;
>  		break;
>  	case BPF_SKB_TSTAMP_UNSPEC:
>  		if (tstamp)
>  			return -EINVAL;
>  		skb->tstamp = 0;
> -		skb->mono_delivery_time = 0;
> +		skb->tstamp_type = CLOCK_REAL;
>  		break;
>  	default:
>  		return -EINVAL;
> @@ -9422,7 +9422,7 @@ static struct bpf_insn *bpf_convert_tstamp_read(const struct bpf_prog *prog,
>  					TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK);
>  		*insn++ = BPF_JMP32_IMM(BPF_JNE, tmp_reg,
>  					TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK, 2);
> -		/* skb->tc_at_ingress && skb->mono_delivery_time,
> +		/* skb->tc_at_ingress && skb->tstamp_type:1,
>  		 * read 0 as the (rcv) timestamp.
>  		 */
>  		*insn++ = BPF_MOV64_IMM(value_reg, 0);
> @@ -9447,7 +9447,7 @@ static struct bpf_insn *bpf_convert_tstamp_write(const struct bpf_prog *prog,
>  	 * the bpf prog is aware the tstamp could have delivery time.
>  	 * Thus, write skb->tstamp as is if tstamp_type_access is true.
>  	 * Otherwise, writing at ingress will have to clear the
> -	 * mono_delivery_time bit also.
> +	 * mono_delivery_time (skb->tstamp_type:1)bit also.
>  	 */
>  	if (!prog->tstamp_type_access) {
>  		__u8 tmp_reg = BPF_REG_AX;
> diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
> index faaec92a46ac..d179a2c84222 100644
> --- a/net/ipv4/inet_fragment.c
> +++ b/net/ipv4/inet_fragment.c
> @@ -619,7 +619,7 @@ void inet_frag_reasm_finish(struct inet_frag_queue *q, struct sk_buff *head,
>  	skb_mark_not_on_list(head);
>  	head->prev = NULL;
>  	head->tstamp = q->stamp;
> -	head->mono_delivery_time = q->mono_delivery_time;
> +	head->tstamp_type = q->tstamp_type;
>  
>  	if (sk)
>  		refcount_add(sum_truesize - head_truesize, &sk->sk_wmem_alloc);
> diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
> index fb947d1613fe..787aa86800f5 100644
> --- a/net/ipv4/ip_fragment.c
> +++ b/net/ipv4/ip_fragment.c
> @@ -355,7 +355,7 @@ static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb)
>  		qp->iif = dev->ifindex;
>  
>  	qp->q.stamp = skb->tstamp;
> -	qp->q.mono_delivery_time = skb->mono_delivery_time;
> +	qp->q.tstamp_type = skb->tstamp_type;
>  	qp->q.meat += skb->len;
>  	qp->ecn |= ecn;
>  	add_frag_mem_limit(qp->q.fqdir, skb->truesize);
> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> index 1fe794967211..62e457f7c02c 100644
> --- a/net/ipv4/ip_output.c
> +++ b/net/ipv4/ip_output.c
> @@ -764,7 +764,7 @@ int ip_do_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
>  {
>  	struct iphdr *iph;
>  	struct sk_buff *skb2;
> -	bool mono_delivery_time = skb->mono_delivery_time;
> +	u8 tstamp_type = skb->tstamp_type;
>  	struct rtable *rt = skb_rtable(skb);
>  	unsigned int mtu, hlen, ll_rs;
>  	struct ip_fraglist_iter iter;
> @@ -856,7 +856,7 @@ int ip_do_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
>  				}
>  			}
>  
> -			skb_set_delivery_time(skb, tstamp, mono_delivery_time);
> +			skb_set_delivery_time(skb, tstamp, tstamp_type);
>  			err = output(net, sk, skb);
>  
>  			if (!err)
> @@ -912,7 +912,7 @@ int ip_do_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
>  		/*
>  		 *	Put this fragment into the sending queue.
>  		 */
> -		skb_set_delivery_time(skb2, tstamp, mono_delivery_time);
> +		skb_set_delivery_time(skb2, tstamp, tstamp_type);
>  		err = output(net, sk, skb2);
>  		if (err)
>  			goto fail;
> @@ -1649,7 +1649,7 @@ void ip_send_unicast_reply(struct sock *sk, struct sk_buff *skb,
>  			  arg->csumoffset) = csum_fold(csum_add(nskb->csum,
>  								arg->csum));
>  		nskb->ip_summed = CHECKSUM_NONE;
> -		nskb->mono_delivery_time = !!transmit_time;
> +		nskb->tstamp_type = !!transmit_time;
>  		if (txhash)
>  			skb_set_hash(nskb, txhash, PKT_HASH_TYPE_L4);
>  		ip_push_pending_frames(sk, &fl4);
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 9282fafc0e61..42e6ed1decf4 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -1299,7 +1299,7 @@ static int __tcp_transmit_skb(struct sock *sk, struct sk_buff *skb,
>  	tp = tcp_sk(sk);
>  	prior_wstamp = tp->tcp_wstamp_ns;
>  	tp->tcp_wstamp_ns = max(tp->tcp_wstamp_ns, tp->tcp_clock_cache);
> -	skb_set_delivery_time(skb, tp->tcp_wstamp_ns, true);
> +	skb_set_delivery_time(skb, tp->tcp_wstamp_ns, CLOCK_MONO);
>  	if (clone_it) {
>  		oskb = skb;
>  
> @@ -1649,7 +1649,7 @@ int tcp_fragment(struct sock *sk, enum tcp_queue tcp_queue,
>  
>  	skb_split(skb, buff, len);
>  
> -	skb_set_delivery_time(buff, skb->tstamp, true);
> +	skb_set_delivery_time(buff, skb->tstamp, CLOCK_MONO);
>  	tcp_fragment_tstamp(skb, buff);
>  
>  	old_factor = tcp_skb_pcount(skb);
> @@ -2730,7 +2730,7 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
>  		if (unlikely(tp->repair) && tp->repair_queue == TCP_SEND_QUEUE) {
>  			/* "skb_mstamp_ns" is used as a start point for the retransmit timer */
>  			tp->tcp_wstamp_ns = tp->tcp_clock_cache;
> -			skb_set_delivery_time(skb, tp->tcp_wstamp_ns, true);
> +			skb_set_delivery_time(skb, tp->tcp_wstamp_ns, CLOCK_MONO);
>  			list_move_tail(&skb->tcp_tsorted_anchor, &tp->tsorted_sent_queue);
>  			tcp_init_tso_segs(skb, mss_now);
>  			goto repair; /* Skip network transmission */
> @@ -3713,11 +3713,11 @@ struct sk_buff *tcp_make_synack(const struct sock *sk, struct dst_entry *dst,
>  #ifdef CONFIG_SYN_COOKIES
>  	if (unlikely(synack_type == TCP_SYNACK_COOKIE && ireq->tstamp_ok))
>  		skb_set_delivery_time(skb, cookie_init_timestamp(req, now),
> -				      true);
> +				      CLOCK_MONO);
>  	else
>  #endif
>  	{
> -		skb_set_delivery_time(skb, now, true);
> +		skb_set_delivery_time(skb, now, CLOCK_MONO);
>  		if (!tcp_rsk(req)->snt_synack) /* Timestamp first SYNACK */
>  			tcp_rsk(req)->snt_synack = tcp_skb_timestamp_us(skb);
>  	}
> @@ -3804,7 +3804,7 @@ struct sk_buff *tcp_make_synack(const struct sock *sk, struct dst_entry *dst,
>  	bpf_skops_write_hdr_opt((struct sock *)sk, skb, req, syn_skb,
>  				synack_type, &opts);
>  
> -	skb_set_delivery_time(skb, now, true);
> +	skb_set_delivery_time(skb, now, CLOCK_MONO);
>  	tcp_add_tx_delay(skb, tp);
>  
>  	return skb;
> @@ -3988,7 +3988,7 @@ static int tcp_send_syn_data(struct sock *sk, struct sk_buff *syn)
>  
>  	err = tcp_transmit_skb(sk, syn_data, 1, sk->sk_allocation);
>  
> -	skb_set_delivery_time(syn, syn_data->skb_mstamp_ns, true);
> +	skb_set_delivery_time(syn, syn_data->skb_mstamp_ns, CLOCK_MONO);
>  
>  	/* Now full SYN+DATA was cloned and sent (or not),
>  	 * remove the SYN from the original skb (syn_data)
> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> index b9dd3a66e423..a9e819115622 100644
> --- a/net/ipv6/ip6_output.c
> +++ b/net/ipv6/ip6_output.c
> @@ -859,7 +859,7 @@ int ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
>  	struct rt6_info *rt = (struct rt6_info *)skb_dst(skb);
>  	struct ipv6_pinfo *np = skb->sk && !dev_recursion_level() ?
>  				inet6_sk(skb->sk) : NULL;
> -	bool mono_delivery_time = skb->mono_delivery_time;
> +	u8 tstamp_type = skb->tstamp_type;
>  	struct ip6_frag_state state;
>  	unsigned int mtu, hlen, nexthdr_offset;
>  	ktime_t tstamp = skb->tstamp;
> @@ -955,7 +955,7 @@ int ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
>  			if (iter.frag)
>  				ip6_fraglist_prepare(skb, &iter);
>  
> -			skb_set_delivery_time(skb, tstamp, mono_delivery_time);
> +			skb_set_delivery_time(skb, tstamp, tstamp_type);
>  			err = output(net, sk, skb);
>  			if (!err)
>  				IP6_INC_STATS(net, ip6_dst_idev(&rt->dst),
> @@ -1016,7 +1016,7 @@ int ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
>  		/*
>  		 *	Put this fragment into the sending queue.
>  		 */
> -		skb_set_delivery_time(frag, tstamp, mono_delivery_time);
> +		skb_set_delivery_time(frag, tstamp, tstamp_type);
>  		err = output(net, sk, frag);
>  		if (err)
>  			goto fail;
> diff --git a/net/ipv6/netfilter.c b/net/ipv6/netfilter.c
> index 53d255838e6a..e0c2347b4dc6 100644
> --- a/net/ipv6/netfilter.c
> +++ b/net/ipv6/netfilter.c
> @@ -126,7 +126,7 @@ int br_ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
>  				  struct sk_buff *))
>  {
>  	int frag_max_size = BR_INPUT_SKB_CB(skb)->frag_max_size;
> -	bool mono_delivery_time = skb->mono_delivery_time;
> +	u8 tstamp_type = skb->tstamp_type;
>  	ktime_t tstamp = skb->tstamp;
>  	struct ip6_frag_state state;
>  	u8 *prevhdr, nexthdr = 0;
> @@ -192,7 +192,7 @@ int br_ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
>  			if (iter.frag)
>  				ip6_fraglist_prepare(skb, &iter);
>  
> -			skb_set_delivery_time(skb, tstamp, mono_delivery_time);
> +			skb_set_delivery_time(skb, tstamp, tstamp_type);
>  			err = output(net, sk, data, skb);
>  			if (err || !iter.frag)
>  				break;
> @@ -225,7 +225,7 @@ int br_ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
>  			goto blackhole;
>  		}
>  
> -		skb_set_delivery_time(skb2, tstamp, mono_delivery_time);
> +		skb_set_delivery_time(skb2, tstamp, tstamp_type);
>  		err = output(net, sk, data, skb2);
>  		if (err)
>  			goto blackhole;
> diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c
> index d0dcbaca1994..5cc5d823d33f 100644
> --- a/net/ipv6/netfilter/nf_conntrack_reasm.c
> +++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
> @@ -264,7 +264,7 @@ static int nf_ct_frag6_queue(struct frag_queue *fq, struct sk_buff *skb,
>  		fq->iif = dev->ifindex;
>  
>  	fq->q.stamp = skb->tstamp;
> -	fq->q.mono_delivery_time = skb->mono_delivery_time;
> +	fq->q.tstamp_type = skb->tstamp_type;
>  	fq->q.meat += skb->len;
>  	fq->ecn |= ecn;
>  	if (payload_len > fq->q.max_size)
> diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
> index acb4f119e11f..ea724ff558b4 100644
> --- a/net/ipv6/reassembly.c
> +++ b/net/ipv6/reassembly.c
> @@ -198,7 +198,7 @@ static int ip6_frag_queue(struct frag_queue *fq, struct sk_buff *skb,
>  		fq->iif = dev->ifindex;
>  
>  	fq->q.stamp = skb->tstamp;
> -	fq->q.mono_delivery_time = skb->mono_delivery_time;
> +	fq->q.tstamp_type = skb->tstamp_type;
>  	fq->q.meat += skb->len;
>  	fq->ecn |= ecn;
>  	add_frag_mem_limit(fq->q.fqdir, skb->truesize);
> diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
> index 3aa9da5c9a66..b60196061489 100644
> --- a/net/ipv6/tcp_ipv6.c
> +++ b/net/ipv6/tcp_ipv6.c
> @@ -975,7 +975,7 @@ static void tcp_v6_send_response(const struct sock *sk, struct sk_buff *skb, u32
>  			mark = inet_twsk(sk)->tw_mark;
>  		else
>  			mark = READ_ONCE(sk->sk_mark);
> -		skb_set_delivery_time(buff, tcp_transmit_time(sk), true);
> +		skb_set_delivery_time(buff, tcp_transmit_time(sk), CLOCK_MONO);
>  	}
>  	if (txhash) {
>  		/* autoflowlabel/skb_get_hash_flowi6 rely on buff->hash */
> diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c
> index 0e3cf11ae5fc..1f8b5a3f065e 100644
> --- a/net/sched/act_bpf.c
> +++ b/net/sched/act_bpf.c
> @@ -54,8 +54,8 @@ TC_INDIRECT_SCOPE int tcf_bpf_act(struct sk_buff *skb,
>  		bpf_compute_data_pointers(skb);
>  		filter_res = bpf_prog_run(filter, skb);
>  	}
> -	if (unlikely(!skb->tstamp && skb->mono_delivery_time))
> -		skb->mono_delivery_time = 0;
> +	if (unlikely(!skb->tstamp && skb->tstamp_type))
> +		skb->tstamp_type = CLOCK_REAL;
>  	if (skb_sk_is_prefetched(skb) && filter_res != TC_ACT_OK)
>  		skb_orphan(skb);
>  
> diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
> index 5e83e890f6a4..3f843e0eea3c 100644
> --- a/net/sched/cls_bpf.c
> +++ b/net/sched/cls_bpf.c
> @@ -104,8 +104,8 @@ TC_INDIRECT_SCOPE int cls_bpf_classify(struct sk_buff *skb,
>  			bpf_compute_data_pointers(skb);
>  			filter_res = bpf_prog_run(prog->filter, skb);
>  		}
> -		if (unlikely(!skb->tstamp && skb->mono_delivery_time))
> -			skb->mono_delivery_time = 0;
> +		if (unlikely(!skb->tstamp && skb->tstamp_type))
> +			skb->tstamp_type = CLOCK_REAL;
>  
>  		if (prog->exts_integrated) {
>  			res->class   = 0;

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH bpf-next v3 1/2] net: Rename mono_delivery_time to tstamp_type for scalabilty
  2024-04-12 21:01 ` [RFC PATCH bpf-next v3 1/2] net: Rename mono_delivery_time to tstamp_type for scalabilty Abhishek Chauhan
  2024-04-13  0:37   ` Abhishek Chauhan (ABC)
@ 2024-04-13 18:54   ` Willem de Bruijn
  2024-04-15 20:27     ` Abhishek Chauhan (ABC)
  1 sibling, 1 reply; 13+ messages in thread
From: Willem de Bruijn @ 2024-04-13 18:54 UTC (permalink / raw
  To: Abhishek Chauhan, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, netdev, linux-kernel, Andrew Halaney,
	Willem de Bruijn, Martin KaFai Lau, Martin KaFai Lau,
	Daniel Borkmann, bpf
  Cc: kernel

Abhishek Chauhan wrote:
> mono_delivery_time was added to check if skb->tstamp has delivery
> time in mono clock base (i.e. EDT) otherwise skb->tstamp has
> timestamp in ingress and delivery_time at egress.
> 
> Renaming the bitfield from mono_delivery_time to tstamp_type is for
> extensibilty for other timestamps such as userspace timestamp
> (i.e. SO_TXTIME) set via sock opts.
> 
> As we are renaming the mono_delivery_time to tstamp_type, it makes
> sense to start assigning tstamp_type based out if enum defined as
> part of this commit
> 
> Earlier we used bool arg flag to check if the tstamp is mono in
> function skb_set_delivery_time, Now the signature of the functions
> accepts enum to distinguish between mono and real time
> 
> Bridge driver today has no support to forward the userspace timestamp
> packets and ends up resetting the timestamp. ETF qdisc checks the
> packet coming from userspace and encounters to be 0 thereby dropping
> time sensitive packets. These changes will allow userspace timestamps
> packets to be forwarded from the bridge to NIC drivers.
> 
> In future tstamp_type:1 can be extended to support userspace timestamp
> by increasing the bitfield.
> 
> Link: https://lore.kernel.org/netdev/bc037db4-58bb-4861-ac31-a361a93841d3@linux.dev/
> Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com>
> ---
> Changes since v2
> - Minor changes to commit subject
> 
> Changes since v1
> - Squashed the two commits into one as mentioned by Willem.
> - Introduced switch in skb_set_delivery_time.
> - Renamed and removed directionality aspects w.r.t tstamp_type 
>   as mentioned by Willem.
> 
>  include/linux/skbuff.h                     | 33 +++++++++++++++-------
>  include/net/inet_frag.h                    |  4 +--
>  net/bridge/netfilter/nf_conntrack_bridge.c |  6 ++--
>  net/core/dev.c                             |  2 +-
>  net/core/filter.c                          |  8 +++---
>  net/ipv4/inet_fragment.c                   |  2 +-
>  net/ipv4/ip_fragment.c                     |  2 +-
>  net/ipv4/ip_output.c                       |  8 +++---
>  net/ipv4/tcp_output.c                      | 14 ++++-----
>  net/ipv6/ip6_output.c                      |  6 ++--
>  net/ipv6/netfilter.c                       |  6 ++--
>  net/ipv6/netfilter/nf_conntrack_reasm.c    |  2 +-
>  net/ipv6/reassembly.c                      |  2 +-
>  net/ipv6/tcp_ipv6.c                        |  2 +-
>  net/sched/act_bpf.c                        |  4 +--
>  net/sched/cls_bpf.c                        |  4 +--
>  16 files changed, 59 insertions(+), 46 deletions(-)
> 
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 7135a3e94afd..a83a2120b57f 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -702,6 +702,11 @@ typedef unsigned int sk_buff_data_t;
>  typedef unsigned char *sk_buff_data_t;
>  #endif
>  
> +enum skb_tstamp_type {
> +	CLOCK_REAL = 0, /* Time base is realtime */
> +	CLOCK_MONO = 1, /* Time base is Monotonic */
> +};

Minor: inconsistent capitalization

> +
>  /**
>   * DOC: Basic sk_buff geometry
>   *
> @@ -819,7 +824,7 @@ typedef unsigned char *sk_buff_data_t;
>   *	@dst_pending_confirm: need to confirm neighbour
>   *	@decrypted: Decrypted SKB
>   *	@slow_gro: state present at GRO time, slower prepare step required
> - *	@mono_delivery_time: When set, skb->tstamp has the
> + *	@tstamp_type: When set, skb->tstamp has the
>   *		delivery_time in mono clock base (i.e. EDT).  Otherwise, the
>   *		skb->tstamp has the (rcv) timestamp at ingress and
>   *		delivery_time at egress.
> @@ -950,7 +955,7 @@ struct sk_buff {
>  	/* private: */
>  	__u8			__mono_tc_offset[0];
>  	/* public: */
> -	__u8			mono_delivery_time:1;	/* See SKB_MONO_DELIVERY_TIME_MASK */
> +	__u8			tstamp_type:1;	/* See SKB_MONO_DELIVERY_TIME_MASK */

Also remove reference to MONO_DELIVERY_TIME_MASK, or instead refer to
skb_tstamp_type.

>  #ifdef CONFIG_NET_XGRESS
>  	__u8			tc_at_ingress:1;	/* See TC_AT_INGRESS_MASK */
>  	__u8			tc_skip_classify:1;
> @@ -4237,7 +4242,7 @@ static inline void skb_get_new_timestampns(const struct sk_buff *skb,
>  static inline void __net_timestamp(struct sk_buff *skb)
>  {
>  	skb->tstamp = ktime_get_real();
> -	skb->mono_delivery_time = 0;
> +	skb->tstamp_type = CLOCK_REAL;
>  }
>  
>  static inline ktime_t net_timedelta(ktime_t t)
> @@ -4246,10 +4251,18 @@ static inline ktime_t net_timedelta(ktime_t t)
>  }
>  
>  static inline void skb_set_delivery_time(struct sk_buff *skb, ktime_t kt,
> -					 bool mono)
> +					  u8 tstamp_type)
>  {
>  	skb->tstamp = kt;
> -	skb->mono_delivery_time = kt && mono;
> +
> +	switch (tstamp_type) {
> +	case CLOCK_REAL:
> +		skb->tstamp_type = CLOCK_REAL;
> +		break;
> +	case CLOCK_MONO:
> +		skb->tstamp_type = kt && tstamp_type;
> +		break;
> +	}

Technically this leaves the tstamp_type undefined if (skb, 0, CLOCK_REAL)
>  }
>  
>  DECLARE_STATIC_KEY_FALSE(netstamp_needed_key);
> @@ -4259,8 +4272,8 @@ DECLARE_STATIC_KEY_FALSE(netstamp_needed_key);
>   */
>  static inline void skb_clear_delivery_time(struct sk_buff *skb)
>  {
> -	if (skb->mono_delivery_time) {
> -		skb->mono_delivery_time = 0;
> +	if (skb->tstamp_type) {
> +		skb->tstamp_type = CLOCK_REAL;
>  		if (static_branch_unlikely(&netstamp_needed_key))
>  			skb->tstamp = ktime_get_real();
>  		else
> @@ -4270,7 +4283,7 @@ static inline void skb_clear_delivery_time(struct sk_buff *skb)
>  
>  static inline void skb_clear_tstamp(struct sk_buff *skb)
>  {
> -	if (skb->mono_delivery_time)
> +	if (skb->tstamp_type)
>  		return;
>  
>  	skb->tstamp = 0;
> @@ -4278,7 +4291,7 @@ static inline void skb_clear_tstamp(struct sk_buff *skb)
>  
>  static inline ktime_t skb_tstamp(const struct sk_buff *skb)
>  {
> -	if (skb->mono_delivery_time)
> +	if (skb->tstamp_type == CLOCK_MONO)
>  		return 0;

Should this be if (skb->tstamp_type), in line with skb_clear_tstamp,
right above?

> @@ -1649,7 +1649,7 @@ void ip_send_unicast_reply(struct sock *sk, struct sk_buff *skb,
>  			  arg->csumoffset) = csum_fold(csum_add(nskb->csum,
>  								arg->csum));
>  		nskb->ip_summed = CHECKSUM_NONE;
> -		nskb->mono_delivery_time = !!transmit_time;
> +		nskb->tstamp_type = !!transmit_time;

In anticipation of more tstamp_types, explicitly set to CLOCK_MONO.

>  		if (txhash)
>  			skb_set_hash(nskb, txhash, PKT_HASH_TYPE_L4);
>  		ip_push_pending_frames(sk, &fl4);

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH bpf-next v3 2/2] net: Add additional bit to support userspace timestamp type
  2024-04-12 21:01 ` [RFC PATCH bpf-next v3 2/2] net: Add additional bit to support userspace timestamp type Abhishek Chauhan
@ 2024-04-13 19:07   ` Willem de Bruijn
  2024-04-15 20:00     ` Martin KaFai Lau
  0 siblings, 1 reply; 13+ messages in thread
From: Willem de Bruijn @ 2024-04-13 19:07 UTC (permalink / raw
  To: Abhishek Chauhan, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, netdev, linux-kernel, Andrew Halaney,
	Willem de Bruijn, Martin KaFai Lau, Martin KaFai Lau,
	Daniel Borkmann, bpf
  Cc: kernel

Abhishek Chauhan wrote:
> tstamp_type can be real, mono or userspace timestamp.
> 
> This commit adds userspace timestamp and sets it if there is
> valid transmit_time available in socket coming from userspace.

Comment is outdated: we now set the actual clockid_t (compressed
into fewer bits), rather than an abstract "go see sk_clockid".
 
> To make the design scalable for future needs this commit bring in
> the change to extend the tstamp_type:1 to tstamp_type:2 to support
> userspace timestamp.
> 
> Link: https://lore.kernel.org/netdev/bc037db4-58bb-4861-ac31-a361a93841d3@linux.dev/
> Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com>
> ---
> Changes since v2
> - Minor changes to commit subject
> 
> Changes since v1 
> - identified additional changes in BPF framework.
> - Bit shift in SKB_MONO_DELIVERY_TIME_MASK and TC_AT_INGRESS_MASK.
> - Made changes in skb_set_delivery_time to keep changes similar to 
>   previous code for mono_delivery_time and just setting tstamp_type
>   bit 1 for userspace timestamp.
> 
> 
>  include/linux/skbuff.h                        | 19 +++++++++++++++----
>  net/ipv4/ip_output.c                          |  2 +-
>  net/ipv4/raw.c                                |  2 +-
>  net/ipv6/ip6_output.c                         |  2 +-
>  net/ipv6/raw.c                                |  2 +-
>  net/packet/af_packet.c                        |  7 +++----
>  .../selftests/bpf/prog_tests/ctx_rewrite.c    |  8 ++++----
>  7 files changed, 26 insertions(+), 16 deletions(-)
> 
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index a83a2120b57f..b6346c21c3d4 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -827,7 +827,8 @@ enum skb_tstamp_type {
>   *	@tstamp_type: When set, skb->tstamp has the
>   *		delivery_time in mono clock base (i.e. EDT).  Otherwise, the
>   *		skb->tstamp has the (rcv) timestamp at ingress and
> - *		delivery_time at egress.
> + *		delivery_time at egress or skb->tstamp defined by skb->sk->sk_clockid
> + *		coming from userspace
>   *	@napi_id: id of the NAPI struct this skb came from
>   *	@sender_cpu: (aka @napi_id) source CPU in XPS
>   *	@alloc_cpu: CPU which did the skb allocation.
> @@ -955,7 +956,7 @@ struct sk_buff {
>  	/* private: */
>  	__u8			__mono_tc_offset[0];
>  	/* public: */
> -	__u8			tstamp_type:1;	/* See SKB_MONO_DELIVERY_TIME_MASK */
> +	__u8			tstamp_type:2;	/* See SKB_MONO_DELIVERY_TIME_MASK */
>  #ifdef CONFIG_NET_XGRESS
>  	__u8			tc_at_ingress:1;	/* See TC_AT_INGRESS_MASK */
>  	__u8			tc_skip_classify:1;

A quick pahole for a fairly standard .config that I had laying around
shows a hole after this list of bits, so no huge concerns there from
adding a bit:

           __u8               slow_gro:1;           /*     3: 4  1 */
           __u8               csum_not_inet:1;      /*     3: 5  1 */

           /* XXX 2 bits hole, try to pack */

           __u16              tc_index;             /*     4     2 */

> @@ -1090,10 +1091,10 @@ struct sk_buff {
>   */
>  #ifdef __BIG_ENDIAN_BITFIELD
>  #define SKB_MONO_DELIVERY_TIME_MASK	(1 << 7)
> -#define TC_AT_INGRESS_MASK		(1 << 6)
> +#define TC_AT_INGRESS_MASK		(1 << 5)

Have to be careful when adding a new 2 bit tstamp_type with both bits
set, that this does not incorrectly get interpreted as MONO.

I haven't looked closely at the BPF API, but hopefully it can be
extensible to return the specific type. If it is hardcoded to return
either MONO or not, then only 0x1 should match, not 0x3.

>  #else
>  #define SKB_MONO_DELIVERY_TIME_MASK	(1 << 0)
> -#define TC_AT_INGRESS_MASK		(1 << 1)
> +#define TC_AT_INGRESS_MASK		(1 << 2)
>  #endif
>  #define SKB_BF_MONO_TC_OFFSET		offsetof(struct sk_buff, __mono_tc_offset)
>  
> @@ -4262,6 +4263,16 @@ static inline void skb_set_delivery_time(struct sk_buff *skb, ktime_t kt,
>  	case CLOCK_MONO:

Come to think of it, these CLOCK_* names are too generic and shadow
existing ones like CLOCK_MONOTONIC.

Instead, define SKB_CLOCK_.

>  		skb->tstamp_type = kt && tstamp_type;
>  		break;
> +	/* if any other time base, must be from userspace
> +	 * so set userspace tstamp_type bit
> +	 * See skbuff tstamp_type:2
> +	 * 0x0 => real timestamp_type
> +	 * 0x1 => mono timestamp_type
> +	 * 0x2 => timestamp_type set from userspace
> +	 */
> +	default:
> +		if (kt && tstamp_type)
> +			skb->tstamp_type = 0x2;

Needs a constant.

Plan is to add SKB_CLOCK_TAI, rather than SKB_CLOCK_USER that
requires a further lookup to sk_clockid.

>  	}
>  }
>  
> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> index 62e457f7c02c..c9317d4addce 100644
> --- a/net/ipv4/ip_output.c
> +++ b/net/ipv4/ip_output.c
> @@ -1457,7 +1457,7 @@ struct sk_buff *__ip_make_skb(struct sock *sk,
>  
>  	skb->priority = (cork->tos != -1) ? cork->priority: READ_ONCE(sk->sk_priority);
>  	skb->mark = cork->mark;
> -	skb->tstamp = cork->transmit_time;
> +	skb_set_delivery_time(skb, cork->transmit_time, sk->sk_clockid);

If adding 1 or 2 specific clock types, like SKB_CLOCK_TAI, then
skb_set_delivery_time will have to detect unsupported sk_clockid
values and fail for those.

The function does not return an error, so just fail to set the
delivery time and WARN_ONCE.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH bpf-next v3 2/2] net: Add additional bit to support userspace timestamp type
  2024-04-13 19:07   ` Willem de Bruijn
@ 2024-04-15 20:00     ` Martin KaFai Lau
  2024-04-16 23:40       ` Abhishek Chauhan (ABC)
  0 siblings, 1 reply; 13+ messages in thread
From: Martin KaFai Lau @ 2024-04-15 20:00 UTC (permalink / raw
  To: Abhishek Chauhan, Willem de Bruijn
  Cc: kernel, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, netdev, linux-kernel, Andrew Halaney,
	Martin KaFai Lau, Daniel Borkmann, bpf

On 4/13/24 12:07 PM, Willem de Bruijn wrote:
>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>> index a83a2120b57f..b6346c21c3d4 100644
>> --- a/include/linux/skbuff.h
>> +++ b/include/linux/skbuff.h
>> @@ -827,7 +827,8 @@ enum skb_tstamp_type {
>>    *	@tstamp_type: When set, skb->tstamp has the
>>    *		delivery_time in mono clock base (i.e. EDT).  Otherwise, the
>>    *		skb->tstamp has the (rcv) timestamp at ingress and
>> - *		delivery_time at egress.
>> + *		delivery_time at egress or skb->tstamp defined by skb->sk->sk_clockid
>> + *		coming from userspace
>>    *	@napi_id: id of the NAPI struct this skb came from
>>    *	@sender_cpu: (aka @napi_id) source CPU in XPS
>>    *	@alloc_cpu: CPU which did the skb allocation.
>> @@ -955,7 +956,7 @@ struct sk_buff {
>>   	/* private: */
>>   	__u8			__mono_tc_offset[0];
>>   	/* public: */
>> -	__u8			tstamp_type:1;	/* See SKB_MONO_DELIVERY_TIME_MASK */
>> +	__u8			tstamp_type:2;	/* See SKB_MONO_DELIVERY_TIME_MASK */
>>   #ifdef CONFIG_NET_XGRESS
>>   	__u8			tc_at_ingress:1;	/* See TC_AT_INGRESS_MASK */
>>   	__u8			tc_skip_classify:1;
> 
> A quick pahole for a fairly standard .config that I had laying around
> shows a hole after this list of bits, so no huge concerns there from
> adding a bit:
> 
>             __u8               slow_gro:1;           /*     3: 4  1 */
>             __u8               csum_not_inet:1;      /*     3: 5  1 */
> 
>             /* XXX 2 bits hole, try to pack */
> 
>             __u16              tc_index;             /*     4     2 */
> 
>> @@ -1090,10 +1091,10 @@ struct sk_buff {
>>    */
>>   #ifdef __BIG_ENDIAN_BITFIELD
>>   #define SKB_MONO_DELIVERY_TIME_MASK	(1 << 7)
>> -#define TC_AT_INGRESS_MASK		(1 << 6)
>> +#define TC_AT_INGRESS_MASK		(1 << 5)
> 
> Have to be careful when adding a new 2 bit tstamp_type with both bits
> set, that this does not incorrectly get interpreted as MONO.
> 
> I haven't looked closely at the BPF API, but hopefully it can be
> extensible to return the specific type. If it is hardcoded to return
> either MONO or not, then only 0x1 should match, not 0x3.

Good point. I believe it is the best to have bpf to consider both bits in 
tstamp_type:2 in filter.c to avoid the 0x3 surprise in the future. The BPF API 
can be extended to support SKB_CLOCK_TAI.

Regardless, in bpf_convert_tstamp_write(), it still needs to clear both bits in 
tstamp_type when it is at ingress. Right now it only clears the mono bit.

Then it may as well consider both tstamp_type:2 bits in 
bpf_convert_tstamp_read() and bpf_convert_tstamp_type_read(). e.g. 
bpf_convert_tstamp_type_read(), it should be a pretty straight forward change 
because the SKB_CLOCK_* enum value should be a 1:1 mapping to the BPF_SKB_TSTAMP_*.

> 
>>   #else
>>   #define SKB_MONO_DELIVERY_TIME_MASK	(1 << 0)
>> -#define TC_AT_INGRESS_MASK		(1 << 1)
>> +#define TC_AT_INGRESS_MASK		(1 << 2)
>>   #endif
>>   #define SKB_BF_MONO_TC_OFFSET		offsetof(struct sk_buff, __mono_tc_offset)
>>   


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH bpf-next v3 1/2] net: Rename mono_delivery_time to tstamp_type for scalabilty
  2024-04-13 18:54   ` Willem de Bruijn
@ 2024-04-15 20:27     ` Abhishek Chauhan (ABC)
  2024-04-15 20:46       ` Willem de Bruijn
  0 siblings, 1 reply; 13+ messages in thread
From: Abhishek Chauhan (ABC) @ 2024-04-15 20:27 UTC (permalink / raw
  To: Willem de Bruijn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, netdev, linux-kernel, Andrew Halaney,
	Martin KaFai Lau, Martin KaFai Lau, Daniel Borkmann, bpf
  Cc: kernel



On 4/13/2024 11:54 AM, Willem de Bruijn wrote:
> Abhishek Chauhan wrote:
>> mono_delivery_time was added to check if skb->tstamp has delivery
>> time in mono clock base (i.e. EDT) otherwise skb->tstamp has
>> timestamp in ingress and delivery_time at egress.
>>
>> Renaming the bitfield from mono_delivery_time to tstamp_type is for
>> extensibilty for other timestamps such as userspace timestamp
>> (i.e. SO_TXTIME) set via sock opts.
>>
>> As we are renaming the mono_delivery_time to tstamp_type, it makes
>> sense to start assigning tstamp_type based out if enum defined as
>> part of this commit
>>
>> Earlier we used bool arg flag to check if the tstamp is mono in
>> function skb_set_delivery_time, Now the signature of the functions
>> accepts enum to distinguish between mono and real time
>>
>> Bridge driver today has no support to forward the userspace timestamp
>> packets and ends up resetting the timestamp. ETF qdisc checks the
>> packet coming from userspace and encounters to be 0 thereby dropping
>> time sensitive packets. These changes will allow userspace timestamps
>> packets to be forwarded from the bridge to NIC drivers.
>>
>> In future tstamp_type:1 can be extended to support userspace timestamp
>> by increasing the bitfield.
>>
>> Link: https://lore.kernel.org/netdev/bc037db4-58bb-4861-ac31-a361a93841d3@linux.dev/
>> Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com>
>> ---
>> Changes since v2
>> - Minor changes to commit subject
>>
>> Changes since v1
>> - Squashed the two commits into one as mentioned by Willem.
>> - Introduced switch in skb_set_delivery_time.
>> - Renamed and removed directionality aspects w.r.t tstamp_type 
>>   as mentioned by Willem.
>>
>>  include/linux/skbuff.h                     | 33 +++++++++++++++-------
>>  include/net/inet_frag.h                    |  4 +--
>>  net/bridge/netfilter/nf_conntrack_bridge.c |  6 ++--
>>  net/core/dev.c                             |  2 +-
>>  net/core/filter.c                          |  8 +++---
>>  net/ipv4/inet_fragment.c                   |  2 +-
>>  net/ipv4/ip_fragment.c                     |  2 +-
>>  net/ipv4/ip_output.c                       |  8 +++---
>>  net/ipv4/tcp_output.c                      | 14 ++++-----
>>  net/ipv6/ip6_output.c                      |  6 ++--
>>  net/ipv6/netfilter.c                       |  6 ++--
>>  net/ipv6/netfilter/nf_conntrack_reasm.c    |  2 +-
>>  net/ipv6/reassembly.c                      |  2 +-
>>  net/ipv6/tcp_ipv6.c                        |  2 +-
>>  net/sched/act_bpf.c                        |  4 +--
>>  net/sched/cls_bpf.c                        |  4 +--
>>  16 files changed, 59 insertions(+), 46 deletions(-)
>>
>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>> index 7135a3e94afd..a83a2120b57f 100644
>> --- a/include/linux/skbuff.h
>> +++ b/include/linux/skbuff.h
>> @@ -702,6 +702,11 @@ typedef unsigned int sk_buff_data_t;
>>  typedef unsigned char *sk_buff_data_t;
>>  #endif
>>  
>> +enum skb_tstamp_type {
>> +	CLOCK_REAL = 0, /* Time base is realtime */
>> +	CLOCK_MONO = 1, /* Time base is Monotonic */
>> +};
> 
> Minor: inconsistent capitalization
> 
I will fix this. 

>> +
>>  /**
>>   * DOC: Basic sk_buff geometry
>>   *
>> @@ -819,7 +824,7 @@ typedef unsigned char *sk_buff_data_t;
>>   *	@dst_pending_confirm: need to confirm neighbour
>>   *	@decrypted: Decrypted SKB
>>   *	@slow_gro: state present at GRO time, slower prepare step required
>> - *	@mono_delivery_time: When set, skb->tstamp has the
>> + *	@tstamp_type: When set, skb->tstamp has the
>>   *		delivery_time in mono clock base (i.e. EDT).  Otherwise, the
>>   *		skb->tstamp has the (rcv) timestamp at ingress and
>>   *		delivery_time at egress.
>> @@ -950,7 +955,7 @@ struct sk_buff {
>>  	/* private: */
>>  	__u8			__mono_tc_offset[0];
>>  	/* public: */
>> -	__u8			mono_delivery_time:1;	/* See SKB_MONO_DELIVERY_TIME_MASK */
>> +	__u8			tstamp_type:1;	/* See SKB_MONO_DELIVERY_TIME_MASK */
> 
> Also remove reference to MONO_DELIVERY_TIME_MASK, or instead refer to
> skb_tstamp_type.
> 
>>  #ifdef CONFIG_NET_XGRESS
>>  	__u8			tc_at_ingress:1;	/* See TC_AT_INGRESS_MASK */
>>  	__u8			tc_skip_classify:1;
>> @@ -4237,7 +4242,7 @@ static inline void skb_get_new_timestampns(const struct sk_buff *skb,
>>  static inline void __net_timestamp(struct sk_buff *skb)
>>  {
>>  	skb->tstamp = ktime_get_real();
>> -	skb->mono_delivery_time = 0;
>> +	skb->tstamp_type = CLOCK_REAL;
>>  }
>>  
>>  static inline ktime_t net_timedelta(ktime_t t)
>> @@ -4246,10 +4251,18 @@ static inline ktime_t net_timedelta(ktime_t t)
>>  }
>>  
>>  static inline void skb_set_delivery_time(struct sk_buff *skb, ktime_t kt,
>> -					 bool mono)
>> +					  u8 tstamp_type)
>>  {
>>  	skb->tstamp = kt;
>> -	skb->mono_delivery_time = kt && mono;
>> +
>> +	switch (tstamp_type) {
>> +	case CLOCK_REAL:
>> +		skb->tstamp_type = CLOCK_REAL;
>> +		break;
>> +	case CLOCK_MONO:
>> +		skb->tstamp_type = kt && tstamp_type;
>> +		break;
>> +	}
> 
> Technically this leaves the tstamp_type undefined if (skb, 0, CLOCK_REAL)
Do you think i should be checking for valid value of tstamp before setting the tstamp_type ? Only then set it. 

>>  }
>>  
>>  DECLARE_STATIC_KEY_FALSE(netstamp_needed_key);
>> @@ -4259,8 +4272,8 @@ DECLARE_STATIC_KEY_FALSE(netstamp_needed_key);
>>   */
>>  static inline void skb_clear_delivery_time(struct sk_buff *skb)
>>  {
>> -	if (skb->mono_delivery_time) {
>> -		skb->mono_delivery_time = 0;
>> +	if (skb->tstamp_type) {
>> +		skb->tstamp_type = CLOCK_REAL;
>>  		if (static_branch_unlikely(&netstamp_needed_key))
>>  			skb->tstamp = ktime_get_real();
>>  		else
>> @@ -4270,7 +4283,7 @@ static inline void skb_clear_delivery_time(struct sk_buff *skb)
>>  
>>  static inline void skb_clear_tstamp(struct sk_buff *skb)
>>  {
>> -	if (skb->mono_delivery_time)
>> +	if (skb->tstamp_type)
>>  		return;
>>  
>>  	skb->tstamp = 0;
>> @@ -4278,7 +4291,7 @@ static inline void skb_clear_tstamp(struct sk_buff *skb)
>>  
>>  static inline ktime_t skb_tstamp(const struct sk_buff *skb)
>>  {
>> -	if (skb->mono_delivery_time)
>> +	if (skb->tstamp_type == CLOCK_MONO)
>>  		return 0;
> 
> Should this be if (skb->tstamp_type), in line with skb_clear_tstamp,
> right above?
I think so too. I thought i will not alter the existing functionality and keep it as close to the previous working changes. 
> 
>> @@ -1649,7 +1649,7 @@ void ip_send_unicast_reply(struct sock *sk, struct sk_buff *skb,
>>  			  arg->csumoffset) = csum_fold(csum_add(nskb->csum,
>>  								arg->csum));
>>  		nskb->ip_summed = CHECKSUM_NONE;
>> -		nskb->mono_delivery_time = !!transmit_time;
>> +		nskb->tstamp_type = !!transmit_time;
> 
> In anticipation of more tstamp_types, explicitly set to CLOCK_MONO.
I will set it to SKB_CLOCK_MONO 
> 
>>  		if (txhash)
>>  			skb_set_hash(nskb, txhash, PKT_HASH_TYPE_L4);
>>  		ip_push_pending_frames(sk, &fl4);

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH bpf-next v3 1/2] net: Rename mono_delivery_time to tstamp_type for scalabilty
  2024-04-15 20:27     ` Abhishek Chauhan (ABC)
@ 2024-04-15 20:46       ` Willem de Bruijn
  2024-04-15 21:06         ` Abhishek Chauhan (ABC)
  0 siblings, 1 reply; 13+ messages in thread
From: Willem de Bruijn @ 2024-04-15 20:46 UTC (permalink / raw
  To: Abhishek Chauhan (ABC), Willem de Bruijn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev, linux-kernel,
	Andrew Halaney, Martin KaFai Lau, Martin KaFai Lau,
	Daniel Borkmann, bpf
  Cc: kernel

Abhishek Chauhan (ABC) wrote:
> 
> 
> On 4/13/2024 11:54 AM, Willem de Bruijn wrote:
> > Abhishek Chauhan wrote:
> >> mono_delivery_time was added to check if skb->tstamp has delivery
> >> time in mono clock base (i.e. EDT) otherwise skb->tstamp has
> >> timestamp in ingress and delivery_time at egress.
> >>
> >> Renaming the bitfield from mono_delivery_time to tstamp_type is for
> >> extensibilty for other timestamps such as userspace timestamp
> >> (i.e. SO_TXTIME) set via sock opts.
> >>
> >> As we are renaming the mono_delivery_time to tstamp_type, it makes
> >> sense to start assigning tstamp_type based out if enum defined as
> >> part of this commit
> >>
> >> Earlier we used bool arg flag to check if the tstamp is mono in
> >> function skb_set_delivery_time, Now the signature of the functions
> >> accepts enum to distinguish between mono and real time
> >>
> >> Bridge driver today has no support to forward the userspace timestamp
> >> packets and ends up resetting the timestamp. ETF qdisc checks the
> >> packet coming from userspace and encounters to be 0 thereby dropping
> >> time sensitive packets. These changes will allow userspace timestamps
> >> packets to be forwarded from the bridge to NIC drivers.
> >>
> >> In future tstamp_type:1 can be extended to support userspace timestamp
> >> by increasing the bitfield.
> >>
> >> Link: https://lore.kernel.org/netdev/bc037db4-58bb-4861-ac31-a361a93841d3@linux.dev/
> >> Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com>
> >> ---
> >> Changes since v2
> >> - Minor changes to commit subject
> >>
> >> Changes since v1
> >> - Squashed the two commits into one as mentioned by Willem.
> >> - Introduced switch in skb_set_delivery_time.
> >> - Renamed and removed directionality aspects w.r.t tstamp_type 
> >>   as mentioned by Willem.
> >>
> >>  include/linux/skbuff.h                     | 33 +++++++++++++++-------
> >>  include/net/inet_frag.h                    |  4 +--
> >>  net/bridge/netfilter/nf_conntrack_bridge.c |  6 ++--
> >>  net/core/dev.c                             |  2 +-
> >>  net/core/filter.c                          |  8 +++---
> >>  net/ipv4/inet_fragment.c                   |  2 +-
> >>  net/ipv4/ip_fragment.c                     |  2 +-
> >>  net/ipv4/ip_output.c                       |  8 +++---
> >>  net/ipv4/tcp_output.c                      | 14 ++++-----
> >>  net/ipv6/ip6_output.c                      |  6 ++--
> >>  net/ipv6/netfilter.c                       |  6 ++--
> >>  net/ipv6/netfilter/nf_conntrack_reasm.c    |  2 +-
> >>  net/ipv6/reassembly.c                      |  2 +-
> >>  net/ipv6/tcp_ipv6.c                        |  2 +-
> >>  net/sched/act_bpf.c                        |  4 +--
> >>  net/sched/cls_bpf.c                        |  4 +--
> >>  16 files changed, 59 insertions(+), 46 deletions(-)
> >>
> >> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> >> index 7135a3e94afd..a83a2120b57f 100644
> >> --- a/include/linux/skbuff.h
> >> +++ b/include/linux/skbuff.h
> >> @@ -702,6 +702,11 @@ typedef unsigned int sk_buff_data_t;
> >>  typedef unsigned char *sk_buff_data_t;
> >>  #endif
> >>  
> >> +enum skb_tstamp_type {
> >> +	CLOCK_REAL = 0, /* Time base is realtime */
> >> +	CLOCK_MONO = 1, /* Time base is Monotonic */
> >> +};
> > 
> > Minor: inconsistent capitalization
> > 
> I will fix this. 
> 
> >> +
> >>  /**
> >>   * DOC: Basic sk_buff geometry
> >>   *
> >> @@ -819,7 +824,7 @@ typedef unsigned char *sk_buff_data_t;
> >>   *	@dst_pending_confirm: need to confirm neighbour
> >>   *	@decrypted: Decrypted SKB
> >>   *	@slow_gro: state present at GRO time, slower prepare step required
> >> - *	@mono_delivery_time: When set, skb->tstamp has the
> >> + *	@tstamp_type: When set, skb->tstamp has the
> >>   *		delivery_time in mono clock base (i.e. EDT).  Otherwise, the
> >>   *		skb->tstamp has the (rcv) timestamp at ingress and
> >>   *		delivery_time at egress.
> >> @@ -950,7 +955,7 @@ struct sk_buff {
> >>  	/* private: */
> >>  	__u8			__mono_tc_offset[0];
> >>  	/* public: */
> >> -	__u8			mono_delivery_time:1;	/* See SKB_MONO_DELIVERY_TIME_MASK */
> >> +	__u8			tstamp_type:1;	/* See SKB_MONO_DELIVERY_TIME_MASK */
> > 
> > Also remove reference to MONO_DELIVERY_TIME_MASK, or instead refer to
> > skb_tstamp_type.
> > 
> >>  #ifdef CONFIG_NET_XGRESS
> >>  	__u8			tc_at_ingress:1;	/* See TC_AT_INGRESS_MASK */
> >>  	__u8			tc_skip_classify:1;
> >> @@ -4237,7 +4242,7 @@ static inline void skb_get_new_timestampns(const struct sk_buff *skb,
> >>  static inline void __net_timestamp(struct sk_buff *skb)
> >>  {
> >>  	skb->tstamp = ktime_get_real();
> >> -	skb->mono_delivery_time = 0;
> >> +	skb->tstamp_type = CLOCK_REAL;
> >>  }
> >>  
> >>  static inline ktime_t net_timedelta(ktime_t t)
> >> @@ -4246,10 +4251,18 @@ static inline ktime_t net_timedelta(ktime_t t)
> >>  }
> >>  
> >>  static inline void skb_set_delivery_time(struct sk_buff *skb, ktime_t kt,
> >> -					 bool mono)
> >> +					  u8 tstamp_type)
> >>  {
> >>  	skb->tstamp = kt;
> >> -	skb->mono_delivery_time = kt && mono;
> >> +
> >> +	switch (tstamp_type) {
> >> +	case CLOCK_REAL:
> >> +		skb->tstamp_type = CLOCK_REAL;
> >> +		break;
> >> +	case CLOCK_MONO:
> >> +		skb->tstamp_type = kt && tstamp_type;
> >> +		break;
> >> +	}
> > 
> > Technically this leaves the tstamp_type undefined if (skb, 0, CLOCK_REAL)
> Do you think i should be checking for valid value of tstamp before setting the tstamp_type ? Only then set it. 

A kt of 0 is interpreted as resetting the type. That should probably
be maintained.

For SO_TIMESTAMPING, a mono delivery time of 0 does have some meaning.
In __sock_recv_timestamp:

        /* Race occurred between timestamp enabling and packet
           receiving.  Fill in the current time for now. */
        if (need_software_tstamp && skb->tstamp == 0) {
                __net_timestamp(skb);
                false_tstamp = 1;
        }

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH bpf-next v3 1/2] net: Rename mono_delivery_time to tstamp_type for scalabilty
  2024-04-15 20:46       ` Willem de Bruijn
@ 2024-04-15 21:06         ` Abhishek Chauhan (ABC)
  2024-04-15 21:22           ` Willem de Bruijn
  0 siblings, 1 reply; 13+ messages in thread
From: Abhishek Chauhan (ABC) @ 2024-04-15 21:06 UTC (permalink / raw
  To: Willem de Bruijn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, netdev, linux-kernel, Andrew Halaney,
	Martin KaFai Lau, Martin KaFai Lau, Daniel Borkmann, bpf
  Cc: kernel



On 4/15/2024 1:46 PM, Willem de Bruijn wrote:
> Abhishek Chauhan (ABC) wrote:
>>
>>
>> On 4/13/2024 11:54 AM, Willem de Bruijn wrote:
>>> Abhishek Chauhan wrote:
>>>> mono_delivery_time was added to check if skb->tstamp has delivery
>>>> time in mono clock base (i.e. EDT) otherwise skb->tstamp has
>>>> timestamp in ingress and delivery_time at egress.
>>>>
>>>> Renaming the bitfield from mono_delivery_time to tstamp_type is for
>>>> extensibilty for other timestamps such as userspace timestamp
>>>> (i.e. SO_TXTIME) set via sock opts.
>>>>
>>>> As we are renaming the mono_delivery_time to tstamp_type, it makes
>>>> sense to start assigning tstamp_type based out if enum defined as
>>>> part of this commit
>>>>
>>>> Earlier we used bool arg flag to check if the tstamp is mono in
>>>> function skb_set_delivery_time, Now the signature of the functions
>>>> accepts enum to distinguish between mono and real time
>>>>
>>>> Bridge driver today has no support to forward the userspace timestamp
>>>> packets and ends up resetting the timestamp. ETF qdisc checks the
>>>> packet coming from userspace and encounters to be 0 thereby dropping
>>>> time sensitive packets. These changes will allow userspace timestamps
>>>> packets to be forwarded from the bridge to NIC drivers.
>>>>
>>>> In future tstamp_type:1 can be extended to support userspace timestamp
>>>> by increasing the bitfield.
>>>>
>>>> Link: https://lore.kernel.org/netdev/bc037db4-58bb-4861-ac31-a361a93841d3@linux.dev/
>>>> Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com>
>>>> ---
>>>> Changes since v2
>>>> - Minor changes to commit subject
>>>>
>>>> Changes since v1
>>>> - Squashed the two commits into one as mentioned by Willem.
>>>> - Introduced switch in skb_set_delivery_time.
>>>> - Renamed and removed directionality aspects w.r.t tstamp_type 
>>>>   as mentioned by Willem.
>>>>
>>>>  include/linux/skbuff.h                     | 33 +++++++++++++++-------
>>>>  include/net/inet_frag.h                    |  4 +--
>>>>  net/bridge/netfilter/nf_conntrack_bridge.c |  6 ++--
>>>>  net/core/dev.c                             |  2 +-
>>>>  net/core/filter.c                          |  8 +++---
>>>>  net/ipv4/inet_fragment.c                   |  2 +-
>>>>  net/ipv4/ip_fragment.c                     |  2 +-
>>>>  net/ipv4/ip_output.c                       |  8 +++---
>>>>  net/ipv4/tcp_output.c                      | 14 ++++-----
>>>>  net/ipv6/ip6_output.c                      |  6 ++--
>>>>  net/ipv6/netfilter.c                       |  6 ++--
>>>>  net/ipv6/netfilter/nf_conntrack_reasm.c    |  2 +-
>>>>  net/ipv6/reassembly.c                      |  2 +-
>>>>  net/ipv6/tcp_ipv6.c                        |  2 +-
>>>>  net/sched/act_bpf.c                        |  4 +--
>>>>  net/sched/cls_bpf.c                        |  4 +--
>>>>  16 files changed, 59 insertions(+), 46 deletions(-)
>>>>
>>>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>>>> index 7135a3e94afd..a83a2120b57f 100644
>>>> --- a/include/linux/skbuff.h
>>>> +++ b/include/linux/skbuff.h
>>>> @@ -702,6 +702,11 @@ typedef unsigned int sk_buff_data_t;
>>>>  typedef unsigned char *sk_buff_data_t;
>>>>  #endif
>>>>  
>>>> +enum skb_tstamp_type {
>>>> +	CLOCK_REAL = 0, /* Time base is realtime */
>>>> +	CLOCK_MONO = 1, /* Time base is Monotonic */
>>>> +};
>>>
>>> Minor: inconsistent capitalization
>>>
>> I will fix this. 
>>
>>>> +
>>>>  /**
>>>>   * DOC: Basic sk_buff geometry
>>>>   *
>>>> @@ -819,7 +824,7 @@ typedef unsigned char *sk_buff_data_t;
>>>>   *	@dst_pending_confirm: need to confirm neighbour
>>>>   *	@decrypted: Decrypted SKB
>>>>   *	@slow_gro: state present at GRO time, slower prepare step required
>>>> - *	@mono_delivery_time: When set, skb->tstamp has the
>>>> + *	@tstamp_type: When set, skb->tstamp has the
>>>>   *		delivery_time in mono clock base (i.e. EDT).  Otherwise, the
>>>>   *		skb->tstamp has the (rcv) timestamp at ingress and
>>>>   *		delivery_time at egress.
>>>> @@ -950,7 +955,7 @@ struct sk_buff {
>>>>  	/* private: */
>>>>  	__u8			__mono_tc_offset[0];
>>>>  	/* public: */
>>>> -	__u8			mono_delivery_time:1;	/* See SKB_MONO_DELIVERY_TIME_MASK */
>>>> +	__u8			tstamp_type:1;	/* See SKB_MONO_DELIVERY_TIME_MASK */
>>>
>>> Also remove reference to MONO_DELIVERY_TIME_MASK, or instead refer to
>>> skb_tstamp_type.
>>>
>>>>  #ifdef CONFIG_NET_XGRESS
>>>>  	__u8			tc_at_ingress:1;	/* See TC_AT_INGRESS_MASK */
>>>>  	__u8			tc_skip_classify:1;
>>>> @@ -4237,7 +4242,7 @@ static inline void skb_get_new_timestampns(const struct sk_buff *skb,
>>>>  static inline void __net_timestamp(struct sk_buff *skb)
>>>>  {
>>>>  	skb->tstamp = ktime_get_real();
>>>> -	skb->mono_delivery_time = 0;
>>>> +	skb->tstamp_type = CLOCK_REAL;
>>>>  }
>>>>  
>>>>  static inline ktime_t net_timedelta(ktime_t t)
>>>> @@ -4246,10 +4251,18 @@ static inline ktime_t net_timedelta(ktime_t t)
>>>>  }
>>>>  
>>>>  static inline void skb_set_delivery_time(struct sk_buff *skb, ktime_t kt,
>>>> -					 bool mono)
>>>> +					  u8 tstamp_type)
>>>>  {
>>>>  	skb->tstamp = kt;
>>>> -	skb->mono_delivery_time = kt && mono;
>>>> +
>>>> +	switch (tstamp_type) {
>>>> +	case CLOCK_REAL:
>>>> +		skb->tstamp_type = CLOCK_REAL;
>>>> +		break;
>>>> +	case CLOCK_MONO:
>>>> +		skb->tstamp_type = kt && tstamp_type;
>>>> +		break;
>>>> +	}
>>>
>>> Technically this leaves the tstamp_type undefined if (skb, 0, CLOCK_REAL)
>> Do you think i should be checking for valid value of tstamp before setting the tstamp_type ? Only then set it. 
> 
> A kt of 0 is interpreted as resetting the type. That should probably
> be maintained.
> 
> For SO_TIMESTAMPING, a mono delivery time of 0 does have some meaning.
> In __sock_recv_timestamp:
> 
>         /* Race occurred between timestamp enabling and packet
>            receiving.  Fill in the current time for now. */
>         if (need_software_tstamp && skb->tstamp == 0) {
>                 __net_timestamp(skb);
>                 false_tstamp = 1;
>         }

Well in that case the above logic still resets the tstamp and sets the tstamp_type to CLOCK_REAL(value 0). 
Anyway the tstamp_type will be 0 to begin with. 
The logic is still inline with previous implementation, because previously if kt was 0 then kt && mono sets the tstamp_type (previously called as mono_delivery_time) to 0 (i.e SKB_CLOCK_REAL). 



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH bpf-next v3 1/2] net: Rename mono_delivery_time to tstamp_type for scalabilty
  2024-04-15 21:06         ` Abhishek Chauhan (ABC)
@ 2024-04-15 21:22           ` Willem de Bruijn
  2024-04-15 21:26             ` Abhishek Chauhan (ABC)
  0 siblings, 1 reply; 13+ messages in thread
From: Willem de Bruijn @ 2024-04-15 21:22 UTC (permalink / raw
  To: Abhishek Chauhan (ABC), Willem de Bruijn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev, linux-kernel,
	Andrew Halaney, Martin KaFai Lau, Martin KaFai Lau,
	Daniel Borkmann, bpf
  Cc: kernel

> >>>>  static inline void skb_set_delivery_time(struct sk_buff *skb, ktime_t kt,
> >>>> -					 bool mono)
> >>>> +					  u8 tstamp_type)
> >>>>  {
> >>>>  	skb->tstamp = kt;
> >>>> -	skb->mono_delivery_time = kt && mono;
> >>>> +
> >>>> +	switch (tstamp_type) {
> >>>> +	case CLOCK_REAL:
> >>>> +		skb->tstamp_type = CLOCK_REAL;
> >>>> +		break;
> >>>> +	case CLOCK_MONO:
> >>>> +		skb->tstamp_type = kt && tstamp_type;
> >>>> +		break;
> >>>> +	}
> >>>
> >>> Technically this leaves the tstamp_type undefined if (skb, 0, CLOCK_REAL)
> >> Do you think i should be checking for valid value of tstamp before setting the tstamp_type ? Only then set it. 
> > 
> > A kt of 0 is interpreted as resetting the type. That should probably
> > be maintained.
> > 
> > For SO_TIMESTAMPING, a mono delivery time of 0 does have some meaning.
> > In __sock_recv_timestamp:
> > 
> >         /* Race occurred between timestamp enabling and packet
> >            receiving.  Fill in the current time for now. */
> >         if (need_software_tstamp && skb->tstamp == 0) {
> >                 __net_timestamp(skb);
> >                 false_tstamp = 1;
> >         }
> 
> Well in that case the above logic still resets the tstamp and sets the tstamp_type to CLOCK_REAL(value 0). 
> Anyway the tstamp_type will be 0 to begin with. 
> The logic is still inline with previous implementation, because previously if kt was 0 then kt && mono sets the tstamp_type (previously called as mono_delivery_time) to 0 (i.e SKB_CLOCK_REAL). 

Sorry, I got my defaults confused. If we maintain that a zero tstamp
resets the type, then here should be no case with skb->tstamp 0 and
skb->tstamp_type SKB_CLOCK_REAL (or SKB_CLOCK_TAI or whatever). I
think it's preferable to make that obvious in the
skb_set_delivery_time implementation, rather than depend on knowledge
of its callers.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH bpf-next v3 1/2] net: Rename mono_delivery_time to tstamp_type for scalabilty
  2024-04-15 21:22           ` Willem de Bruijn
@ 2024-04-15 21:26             ` Abhishek Chauhan (ABC)
  0 siblings, 0 replies; 13+ messages in thread
From: Abhishek Chauhan (ABC) @ 2024-04-15 21:26 UTC (permalink / raw
  To: Willem de Bruijn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, netdev, linux-kernel, Andrew Halaney,
	Martin KaFai Lau, Martin KaFai Lau, Daniel Borkmann, bpf
  Cc: kernel



On 4/15/2024 2:22 PM, Willem de Bruijn wrote:
>>>>>>  static inline void skb_set_delivery_time(struct sk_buff *skb, ktime_t kt,
>>>>>> -					 bool mono)
>>>>>> +					  u8 tstamp_type)
>>>>>>  {
>>>>>>  	skb->tstamp = kt;
>>>>>> -	skb->mono_delivery_time = kt && mono;
>>>>>> +
>>>>>> +	switch (tstamp_type) {
>>>>>> +	case CLOCK_REAL:
>>>>>> +		skb->tstamp_type = CLOCK_REAL;
>>>>>> +		break;
>>>>>> +	case CLOCK_MONO:
>>>>>> +		skb->tstamp_type = kt && tstamp_type;
>>>>>> +		break;
>>>>>> +	}
>>>>>
>>>>> Technically this leaves the tstamp_type undefined if (skb, 0, CLOCK_REAL)
>>>> Do you think i should be checking for valid value of tstamp before setting the tstamp_type ? Only then set it. 
>>>
>>> A kt of 0 is interpreted as resetting the type. That should probably
>>> be maintained.
>>>
>>> For SO_TIMESTAMPING, a mono delivery time of 0 does have some meaning.
>>> In __sock_recv_timestamp:
>>>
>>>         /* Race occurred between timestamp enabling and packet
>>>            receiving.  Fill in the current time for now. */
>>>         if (need_software_tstamp && skb->tstamp == 0) {
>>>                 __net_timestamp(skb);
>>>                 false_tstamp = 1;
>>>         }
>>
>> Well in that case the above logic still resets the tstamp and sets the tstamp_type to CLOCK_REAL(value 0). 
>> Anyway the tstamp_type will be 0 to begin with. 
>> The logic is still inline with previous implementation, because previously if kt was 0 then kt && mono sets the tstamp_type (previously called as mono_delivery_time) to 0 (i.e SKB_CLOCK_REAL). 
> 
> Sorry, I got my defaults confused. If we maintain that a zero tstamp
> resets the type, then here should be no case with skb->tstamp 0 and
> skb->tstamp_type SKB_CLOCK_REAL (or SKB_CLOCK_TAI or whatever). I
> think it's preferable to make that obvious in the
> skb_set_delivery_time implementation, rather than depend on knowledge
> of its callers.

Noted!. I will do the same as part of the next patchset. 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH bpf-next v3 2/2] net: Add additional bit to support userspace timestamp type
  2024-04-15 20:00     ` Martin KaFai Lau
@ 2024-04-16 23:40       ` Abhishek Chauhan (ABC)
  0 siblings, 0 replies; 13+ messages in thread
From: Abhishek Chauhan (ABC) @ 2024-04-16 23:40 UTC (permalink / raw
  To: Martin KaFai Lau, Willem de Bruijn
  Cc: kernel, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, netdev, linux-kernel, Andrew Halaney,
	Martin KaFai Lau, Daniel Borkmann, bpf



On 4/15/2024 1:00 PM, Martin KaFai Lau wrote:
> On 4/13/24 12:07 PM, Willem de Bruijn wrote:
>>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>>> index a83a2120b57f..b6346c21c3d4 100644
>>> --- a/include/linux/skbuff.h
>>> +++ b/include/linux/skbuff.h
>>> @@ -827,7 +827,8 @@ enum skb_tstamp_type {
>>>    *    @tstamp_type: When set, skb->tstamp has the
>>>    *        delivery_time in mono clock base (i.e. EDT).  Otherwise, the
>>>    *        skb->tstamp has the (rcv) timestamp at ingress and
>>> - *        delivery_time at egress.
>>> + *        delivery_time at egress or skb->tstamp defined by skb->sk->sk_clockid
>>> + *        coming from userspace
>>>    *    @napi_id: id of the NAPI struct this skb came from
>>>    *    @sender_cpu: (aka @napi_id) source CPU in XPS
>>>    *    @alloc_cpu: CPU which did the skb allocation.
>>> @@ -955,7 +956,7 @@ struct sk_buff {
>>>       /* private: */
>>>       __u8            __mono_tc_offset[0];
>>>       /* public: */
>>> -    __u8            tstamp_type:1;    /* See SKB_MONO_DELIVERY_TIME_MASK */
>>> +    __u8            tstamp_type:2;    /* See SKB_MONO_DELIVERY_TIME_MASK */
>>>   #ifdef CONFIG_NET_XGRESS
>>>       __u8            tc_at_ingress:1;    /* See TC_AT_INGRESS_MASK */
>>>       __u8            tc_skip_classify:1;
>>
>> A quick pahole for a fairly standard .config that I had laying around
>> shows a hole after this list of bits, so no huge concerns there from
>> adding a bit:
>>
>>             __u8               slow_gro:1;           /*     3: 4  1 */
>>             __u8               csum_not_inet:1;      /*     3: 5  1 */
>>
>>             /* XXX 2 bits hole, try to pack */
>>
>>             __u16              tc_index;             /*     4     2 */
>>
>>> @@ -1090,10 +1091,10 @@ struct sk_buff {
>>>    */
>>>   #ifdef __BIG_ENDIAN_BITFIELD
>>>   #define SKB_MONO_DELIVERY_TIME_MASK    (1 << 7)
>>> -#define TC_AT_INGRESS_MASK        (1 << 6)
>>> +#define TC_AT_INGRESS_MASK        (1 << 5)
>>
>> Have to be careful when adding a new 2 bit tstamp_type with both bits
>> set, that this does not incorrectly get interpreted as MONO.
>>
>> I haven't looked closely at the BPF API, but hopefully it can be
>> extensible to return the specific type. If it is hardcoded to return
>> either MONO or not, then only 0x1 should match, not 0x3.
> 
> Good point. I believe it is the best to have bpf to consider both bits in tstamp_type:2 in filter.c to avoid the 0x3 surprise in the future. The BPF API can be extended to support SKB_CLOCK_TAI.
> 
> Regardless, in bpf_convert_tstamp_write(), it still needs to clear both bits in tstamp_type when it is at ingress. Right now it only clears the mono bit.
> 
> Then it may as well consider both tstamp_type:2 bits in bpf_convert_tstamp_read() and bpf_convert_tstamp_type_read(). e.g. bpf_convert_tstamp_type_read(), it should be a pretty straight forward change because the SKB_CLOCK_* enum value should be a 1:1 mapping to the BPF_SKB_TSTAMP_*.
> 
>>
>>>   #else
>>>   #define SKB_MONO_DELIVERY_TIME_MASK    (1 << 0)
>>> -#define TC_AT_INGRESS_MASK        (1 << 1)
>>> +#define TC_AT_INGRESS_MASK        (1 << 2)
>>>   #endif
>>>   #define SKB_BF_MONO_TC_OFFSET        offsetof(struct sk_buff, __mono_tc_offset)
>>>   
> 
Hi Martin and Willem,

I have made the changes as per your guidelines . I will be raising RFC patch bpf-next v4 soon. Giving you heads up of the changes i am bringing in the BPF code. If you feel i have done something incorrectly, please do correct me here.  
I apologies for adding the code here and making the content of the email huge for other upstream reviewers. 

//Introduce a new BPF tstamp type mask in bpf.h 

#ifdef __BIG_ENDIAN_BITFIELD
#define SKB_MONO_DELIVERY_TIME_MASK	(1 << 7)
+ #define SKB_TAI_DELIVERY_TIME_MASK	(1 << 6) (new)
#define TC_AT_INGRESS_MASK		(1 << 5)
#else
#define SKB_MONO_DELIVERY_TIME_MASK	(1 << 0)
+ #define SKB_TAI_DELIVERY_TIME_MASK	(1 << 1) (new)
#define TC_AT_INGRESS_MASK		(1 << 2)
#endif

//changes in the filter.c (bpf_convert_tstamp_{read,write}, bpf_convert_tstamp_type_read())code are accordingly taken care since now we have 3 bits instead of 2 

Value 3 => unspec
Value 2 => tai 
Value 1 => mono

static struct bpf_insn *bpf_convert_tstamp_type_read(const struct bpf_insn *si,
						     struct bpf_insn *insn)
{
	__u8 value_reg = si->dst_reg;
	__u8 skb_reg = si->src_reg;
	/* AX is needed because src_reg and dst_reg could be the same */
	__u8 tmp_reg = BPF_REG_AX;

	*insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg,
			      SKB_BF_MONO_TC_OFFSET);
	*insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg,
				SKB_MONO_DELIVERY_TIME_MASK | SKB_TAI_DELIVERY_TIME_MASK, 2); <== check for both bits are set (if so make it tstamp_unspec)
	*insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg, 
				SKB_MONO_DELIVERY_TIME_MASK, 3); <== if mono is set then its mono base and jump to 3rd instruction from here.  
	*insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg,
				SKB_TAI_DELIVERY_TIME_MASK, 4); <== if tai is set then its tai base and jump to 4th instruction from here. 
	*insn++ = BPF_MOV32_IMM(value_reg, BPF_SKB_TSTAMP_UNSPEC);
	*insn++ = BPF_JMP_A(1);
	*insn++ = BPF_MOV32_IMM(value_reg, BPF_SKB_TSTAMP_DELIVERY_MONO);
	*insn++ = BPF_JMP_A(1);
	*insn++ = BPF_MOV32_IMM(value_reg, BPF_SKB_TSTAMP_DELIVERY_TAI);

	return insn;
}

//Values 7, 6 and 5 as input will set the value reg to 0 
//otherwise the skb_reg has correct configuration and will store the content in value reg. 
                 
static struct bpf_insn *bpf_convert_tstamp_read(const struct bpf_prog *prog,
						const struct bpf_insn *si,
						struct bpf_insn *insn)
{
	__u8 value_reg = si->dst_reg;
	__u8 skb_reg = si->src_reg;

#ifdef CONFIG_NET_XGRESS
	/* If the tstamp_type is read,
	 * the bpf prog is aware the tstamp could have delivery time.
	 * Thus, read skb->tstamp as is if tstamp_type_access is true.
	 */
	if (!prog->tstamp_type_access) {
		/* AX is needed because src_reg and dst_reg could be the same */
		__u8 tmp_reg = BPF_REG_AX;

		*insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg, SKB_BF_MONO_TC_OFFSET);
		/*check if all three bits are set*/
		*insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg,
					TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK |
					SKB_TAI_DELIVERY_TIME_MASK);  <== check if all 3 bits are set which is value 7 
		/*if all 3 bits are set jump 3 instructions and clear the register */
		*insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg,
					TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK | 
					SKB_TAI_DELIVERY_TIME_MASK, 4); <== if all 3 bits are set then value reg is set to 0 
		/*Now check Mono is set with ingress mask if so clear*/
		*insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg,
					TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK, 3); <== check if value is 5 (mono + ingress) if so then value reg is set to 0 
		/*Now Check tai is set with ingress mask if so clear*/
		*insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg,
					TC_AT_INGRESS_MASK | SKB_TAI_DELIVERY_TIME_MASK, 2); <== check if value is 6 (tai + ingress) if so then value reg is set to 0 
		/*Now Check tai and mono are set if so clear */
		*insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg,
					SKB_MONO_DELIVERY_TIME_MASK | SKB_TAI_DELIVERY_TIME_MASK, 1); <== check if value 3 (tai + mono) if so then value reg is set to 0
		/* goto <store> */
		*insn++ = BPF_JMP_A(2);
		/* skb->tc_at_ingress && skb->tstamp_type:1,
		 * read 0 as the (rcv) timestamp.
		 */
		*insn++ = BPF_MOV64_IMM(value_reg, 0);
		*insn++ = BPF_JMP_A(1);
	}
#endif

	*insn++ = BPF_LDX_MEM(BPF_DW, value_reg, skb_reg,
			      offsetof(struct sk_buff, tstamp));
	return insn;
}

//this was pretty straight forward 
//if ingress mask is set just go ahead and unset both tai and mono delivery time. 

static struct bpf_insn *bpf_convert_tstamp_write(const struct bpf_prog *prog,
						 const struct bpf_insn *si,
						 struct bpf_insn *insn)
{
	__u8 value_reg = si->src_reg;
	__u8 skb_reg = si->dst_reg;

#ifdef CONFIG_NET_XGRESS
	/* If the tstamp_type is read,
	 * the bpf prog is aware the tstamp could have delivery time.
	 * Thus, write skb->tstamp as is if tstamp_type_access is true.
	 * Otherwise, writing at ingress will have to clear the
	 * mono_delivery_time (skb->tstamp_type:1)bit also.
	 */
	if (!prog->tstamp_type_access) {
		__u8 tmp_reg = BPF_REG_AX;

		*insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg, SKB_BF_MONO_TC_OFFSET);
		/* Writing __sk_buff->tstamp as ingress, goto <clear> */
		*insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg, TC_AT_INGRESS_MASK, 1);
		/* goto <store> */
		*insn++ = BPF_JMP_A(3);
		/* <clear>: mono_delivery_time or (skb->tstamp_type:1) */
		*insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg, ~SKB_MONO_DELIVERY_TIME_MASK);
		/* <clear>: tai delivery_time or (skb->tstamp_type:2) */ (new)
		*insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg, ~SKB_TAI_DELIVERY_TIME_MASK); (new) <== reset tai delivery mask if ingress bit is set. 
		*insn++ = BPF_STX_MEM(BPF_B, skb_reg, tmp_reg, SKB_BF_MONO_TC_OFFSET); 
	}
#endif

	/* <store>: skb->tstamp = tstamp */
	*insn++ = BPF_RAW_INSN(BPF_CLASS(si->code) | BPF_DW | BPF_MEM,
			       skb_reg, value_reg, offsetof(struct sk_buff, tstamp), si->imm);
	return insn;


And the ctx_rewrite will be as follows 

		.read  = "r11 = *(u8 *)($ctx + sk_buff::__mono_tc_offset);"
			 "w11 &= 7;"
			 "if w11 == 0x7 goto pc+4;"
			 "if w11 == 0x5 goto pc+3;"
			 "if w11 == 0x6 goto pc+2;"
			 "if w11 == 0x3 goto pc+1;"
			 "goto pc+2"
			 "$dst = 0;"
			 "goto pc+1;"
			 "$dst = *(u64 *)($ctx + sk_buff::tstamp);",
		.write = "r11 = *(u8 *)($ctx + sk_buff::__mono_tc_offset);"
			 "if w11 & 0x4 goto pc+1;"
			 "goto pc+3;"
			 "w11 &= -2;"
			 "w11 &= -3;"
			 "*(u8 *)($ctx + sk_buff::__mono_tc_offset) = r11;"
			 "*(u64 *)($ctx + sk_buff::tstamp) = $src;",

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2024-04-16 23:41 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-12 21:01 [RFC PATCH bpf-next v3 0/2] Replace mono_delivery_time with tstamp_type Abhishek Chauhan
2024-04-12 21:01 ` [RFC PATCH bpf-next v3 1/2] net: Rename mono_delivery_time to tstamp_type for scalabilty Abhishek Chauhan
2024-04-13  0:37   ` Abhishek Chauhan (ABC)
2024-04-13 18:54   ` Willem de Bruijn
2024-04-15 20:27     ` Abhishek Chauhan (ABC)
2024-04-15 20:46       ` Willem de Bruijn
2024-04-15 21:06         ` Abhishek Chauhan (ABC)
2024-04-15 21:22           ` Willem de Bruijn
2024-04-15 21:26             ` Abhishek Chauhan (ABC)
2024-04-12 21:01 ` [RFC PATCH bpf-next v3 2/2] net: Add additional bit to support userspace timestamp type Abhishek Chauhan
2024-04-13 19:07   ` Willem de Bruijn
2024-04-15 20:00     ` Martin KaFai Lau
2024-04-16 23:40       ` Abhishek Chauhan (ABC)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).