[PATCH] netlink: enable skb header refcounting before sending first broadcast

All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] netlink: enable skb header refcounting before sending first broadcast
@ 2015-07-10 11:51 Konstantin Khlebnikov
  2015-07-10 13:49 ` Eric Dumazet
  2015-07-13  7:23 ` Herbert Xu
  0 siblings, 2 replies; 18+ messages in thread
From: Konstantin Khlebnikov @ 2015-07-10 11:51 UTC (permalink / raw)
  To: netdev, David S. Miller; +Cc: Eric Dumazet, Herbert Xu

This fixes race between non-atomic updates of adjacent bit-fields:
skb->cloned could be lost because netlink broadcast clones skb after
sending it to the first listener who sets skb->peeked at the same skb.
As a result atomic refcounting of skb header stays disabled and
skb_release_data() frees it twice. Race leads to double-free in kmalloc-xxx.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Fixes: b19372273164 ("net: reorganize sk_buff for faster __copy_skb_header()")
---
 net/netlink/af_netlink.c |    6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index dea925388a5b..921e0d8dfe3a 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2028,6 +2028,12 @@ int netlink_broadcast_filtered(struct sock *ssk, struct sk_buff *skb, u32 portid
 	info.tx_filter = filter;
 	info.tx_data = filter_data;
 
+	/* Enable atomic refcounting in skb_release_data() before first send:
+	 * non-atomic set of that bit-field in __skb_clone() could race with
+	 * __skb_recv_datagram() which touches the same set of bit-fields.
+	 */
+	skb->cloned = 1;
+
 	/* While we sleep in clone, do not allow to change socket list */
 
 	netlink_lock_table();

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] netlink: enable skb header refcounting before sending first broadcast
  2015-07-10 11:51 [PATCH] netlink: enable skb header refcounting before sending first broadcast Konstantin Khlebnikov
@ 2015-07-10 13:49 ` Eric Dumazet
  2015-07-10 14:08   ` Konstantin Khlebnikov
  2015-07-13  7:23 ` Herbert Xu
  1 sibling, 1 reply; 18+ messages in thread
From: Eric Dumazet @ 2015-07-10 13:49 UTC (permalink / raw)
  To: Konstantin Khlebnikov; +Cc: netdev, David S. Miller, Eric Dumazet, Herbert Xu

On Fri, 2015-07-10 at 14:51 +0300, Konstantin Khlebnikov wrote:
> This fixes race between non-atomic updates of adjacent bit-fields:
> skb->cloned could be lost because netlink broadcast clones skb after
> sending it to the first listener who sets skb->peeked at the same skb.
> As a result atomic refcounting of skb header stays disabled and
> skb_release_data() frees it twice. Race leads to double-free in kmalloc-xxx.
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Fixes: b19372273164 ("net: reorganize sk_buff for faster __copy_skb_header()")
> ---
>  net/netlink/af_netlink.c |    6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
> index dea925388a5b..921e0d8dfe3a 100644
> --- a/net/netlink/af_netlink.c
> +++ b/net/netlink/af_netlink.c
> @@ -2028,6 +2028,12 @@ int netlink_broadcast_filtered(struct sock *ssk, struct sk_buff *skb, u32 portid
>  	info.tx_filter = filter;
>  	info.tx_data = filter_data;
>  
> +	/* Enable atomic refcounting in skb_release_data() before first send:
> +	 * non-atomic set of that bit-field in __skb_clone() could race with
> +	 * __skb_recv_datagram() which touches the same set of bit-fields.
> +	 */
> +	skb->cloned = 1;
> +
>  	/* While we sleep in clone, do not allow to change socket list */
>  
>  	netlink_lock_table();

Wow, this is tricky.

I wonder how you found this bug ????

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] netlink: enable skb header refcounting before sending first broadcast
  2015-07-10 13:49 ` Eric Dumazet
@ 2015-07-10 14:08   ` Konstantin Khlebnikov
  0 siblings, 0 replies; 18+ messages in thread
From: Konstantin Khlebnikov @ 2015-07-10 14:08 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, David S. Miller, Eric Dumazet, Herbert Xu

On 10.07.2015 16:49, Eric Dumazet wrote:
> On Fri, 2015-07-10 at 14:51 +0300, Konstantin Khlebnikov wrote:
>> This fixes race between non-atomic updates of adjacent bit-fields:
>> skb->cloned could be lost because netlink broadcast clones skb after
>> sending it to the first listener who sets skb->peeked at the same skb.
>> As a result atomic refcounting of skb header stays disabled and
>> skb_release_data() frees it twice. Race leads to double-free in kmalloc-xxx.
>>
>> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>> Fixes: b19372273164 ("net: reorganize sk_buff for faster __copy_skb_header()")
>> ---
>>   net/netlink/af_netlink.c |    6 ++++++
>>   1 file changed, 6 insertions(+)
>>
>> diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
>> index dea925388a5b..921e0d8dfe3a 100644
>> --- a/net/netlink/af_netlink.c
>> +++ b/net/netlink/af_netlink.c
>> @@ -2028,6 +2028,12 @@ int netlink_broadcast_filtered(struct sock *ssk, struct sk_buff *skb, u32 portid
>>   	info.tx_filter = filter;
>>   	info.tx_data = filter_data;
>>
>> +	/* Enable atomic refcounting in skb_release_data() before first send:
>> +	 * non-atomic set of that bit-field in __skb_clone() could race with
>> +	 * __skb_recv_datagram() which touches the same set of bit-fields.
>> +	 */
>> +	skb->cloned = 1;
>> +
>>   	/* While we sleep in clone, do not allow to change socket list */
>>
>>   	netlink_lock_table();
>
> Wow, this is tricky.
>
> I wonder how you found this bug ????

In some setups race happens quite often: once or twice per hour.
I guess the main trigger was the openvswitch which generates a
lot of netlink traffic. Though debugging was a real pain.

>
> Acked-by: Eric Dumazet <edumazet@google.com>
>
>
>


-- 
Konstantin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] netlink: enable skb header refcounting before sending first broadcast
  2015-07-10 11:51 [PATCH] netlink: enable skb header refcounting before sending first broadcast Konstantin Khlebnikov
  2015-07-10 13:49 ` Eric Dumazet
@ 2015-07-13  7:23 ` Herbert Xu
  2015-07-13  8:04   ` net: Clone skb before setting peeked flag Herbert Xu
                     ` (2 more replies)
  1 sibling, 3 replies; 18+ messages in thread
From: Herbert Xu @ 2015-07-13  7:23 UTC (permalink / raw)
  To: Konstantin Khlebnikov; +Cc: netdev, David S. Miller, Eric Dumazet

On Fri, Jul 10, 2015 at 02:51:41PM +0300, Konstantin Khlebnikov wrote:
> This fixes race between non-atomic updates of adjacent bit-fields:
> skb->cloned could be lost because netlink broadcast clones skb after
> sending it to the first listener who sets skb->peeked at the same skb.
> As a result atomic refcounting of skb header stays disabled and
> skb_release_data() frees it twice. Race leads to double-free in kmalloc-xxx.
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Fixes: b19372273164 ("net: reorganize sk_buff for faster __copy_skb_header()")
> ---
>  net/netlink/af_netlink.c |    6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
> index dea925388a5b..921e0d8dfe3a 100644
> --- a/net/netlink/af_netlink.c
> +++ b/net/netlink/af_netlink.c
> @@ -2028,6 +2028,12 @@ int netlink_broadcast_filtered(struct sock *ssk, struct sk_buff *skb, u32 portid
>  	info.tx_filter = filter;
>  	info.tx_data = filter_data;
>  
> +	/* Enable atomic refcounting in skb_release_data() before first send:
> +	 * non-atomic set of that bit-field in __skb_clone() could race with
> +	 * __skb_recv_datagram() which touches the same set of bit-fields.
> +	 */
> +	skb->cloned = 1;
> +
>  	/* While we sleep in clone, do not allow to change socket list */
>  
>  	netlink_lock_table();

Your effort in finding this bug is wonderful.  However I think
the fix is a bit dirty.

The real issue here is that the recv path no longer handles shared
skbs.  So either we need to fix the recv path to not touch skbs
without cloning them, or we need to get rid of the use of shared
skbs in netlink.

In fact it looks I introduced the bug way back in

commit a59322be07c964e916d15be3df473fb7ba20c41e
Author: Herbert Xu <herbert@gondor.apana.org.au>
Date:   Wed Dec 5 01:53:40 2007 -0800

    [UDP]: Only increment counter on first peek/recv

I will try to mend this error :)

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 18+ messages in thread

* net: Clone skb before setting peeked flag
  2015-07-13  7:23 ` Herbert Xu
@ 2015-07-13  8:04   ` Herbert Xu
  2015-07-15 23:13     ` David Miller
  2015-07-13  8:05   ` [PATCH] netlink: enable skb header refcounting before sending first broadcast Eric Dumazet
  2015-07-13  8:54   ` [PATCH] netlink: enable skb header refcounting before sending first broadcast Konstantin Khlebnikov
  2 siblings, 1 reply; 18+ messages in thread
From: Herbert Xu @ 2015-07-13  8:04 UTC (permalink / raw)
  To: Konstantin Khlebnikov; +Cc: netdev, David S. Miller, Eric Dumazet

Shared skbs must not be modified and this is crucial for broadcast
and/or multicast paths where we use it as an optimisation to avoid
unnecessary cloning.

The function skb_recv_datagram breaks this rule by setting peeked
without cloning the skb first.  This causes funky races which leads
to double-free.

This patch fixes this by cloning the skb and replacing the skb
in the list when setting skb->peeked.

Fixes: a59322be07c9 ("[UDP]: Only increment counter on first peek/recv")
Reported-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/net/core/datagram.c b/net/core/datagram.c
index b80fb91..4e9a3f6 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -131,6 +131,35 @@ out_noerr:
 	goto out;
 }
 
+static int skb_set_peeked(struct sk_buff *skb)
+{
+	struct sk_buff *nskb;
+
+	if (skb->peeked)
+		return 0;
+
+	/* We have to unshare an skb before modifying it. */
+	if (!skb_shared(skb))
+		goto done;
+
+	nskb = skb_clone(skb, GFP_ATOMIC);
+	if (!nskb)
+		return -ENOMEM;
+
+	skb->prev->next = nskb;
+	skb->next->prev = nskb;
+	nskb->prev = skb->prev;
+	nskb->next = skb->next;
+
+	consume_skb(skb);
+	skb = nskb;
+
+done:
+	skb->peeked = 1;
+
+	return 0;
+}
+
 /**
  *	__skb_recv_datagram - Receive a datagram skbuff
  *	@sk: socket
@@ -165,7 +194,9 @@ out_noerr:
 struct sk_buff *__skb_recv_datagram(struct sock *sk, unsigned int flags,
 				    int *peeked, int *off, int *err)
 {
+	struct sk_buff_head *queue = &sk->sk_receive_queue;
 	struct sk_buff *skb, *last;
+	unsigned long cpu_flags;
 	long timeo;
 	/*
 	 * Caller is allowed not to check sk->sk_err before skb_recv_datagram()
@@ -184,8 +215,6 @@ struct sk_buff *__skb_recv_datagram(struct sock *sk, unsigned int flags,
 		 * Look at current nfs client by the way...
 		 * However, this function was correct in any case. 8)
 		 */
-		unsigned long cpu_flags;
-		struct sk_buff_head *queue = &sk->sk_receive_queue;
 		int _off = *off;
 
 		last = (struct sk_buff *)queue;
@@ -199,7 +228,11 @@ struct sk_buff *__skb_recv_datagram(struct sock *sk, unsigned int flags,
 					_off -= skb->len;
 					continue;
 				}
-				skb->peeked = 1;
+
+				error = skb_set_peeked(skb);
+				if (error)
+					goto unlock_err;
+
 				atomic_inc(&skb->users);
 			} else
 				__skb_unlink(skb, queue);
@@ -223,6 +256,8 @@ struct sk_buff *__skb_recv_datagram(struct sock *sk, unsigned int flags,
 
 	return NULL;
 
+unlock_err:
+	spin_unlock_irqrestore(&queue->lock, cpu_flags);
 no_packet:
 	*err = error;
 	return NULL;
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] netlink: enable skb header refcounting before sending first broadcast
  2015-07-13  7:23 ` Herbert Xu
  2015-07-13  8:04   ` net: Clone skb before setting peeked flag Herbert Xu
@ 2015-07-13  8:05   ` Eric Dumazet
  2015-07-13  8:10     ` Herbert Xu
  2015-07-13  8:54   ` [PATCH] netlink: enable skb header refcounting before sending first broadcast Konstantin Khlebnikov
  2 siblings, 1 reply; 18+ messages in thread
From: Eric Dumazet @ 2015-07-13  8:05 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Konstantin Khlebnikov, netdev, David S. Miller, Eric Dumazet

On Mon, 2015-07-13 at 15:23 +0800, Herbert Xu wrote:

> The real issue here is that the recv path no longer handles shared
> skbs.  So either we need to fix the recv path to not touch skbs
> without cloning them, or we need to get rid of the use of shared
> skbs in netlink.
> 
> In fact it looks I introduced the bug way back in
> 
> commit a59322be07c964e916d15be3df473fb7ba20c41e
> Author: Herbert Xu <herbert@gondor.apana.org.au>
> Date:   Wed Dec 5 01:53:40 2007 -0800
> 
>     [UDP]: Only increment counter on first peek/recv
> 
> I will try to mend this error :)
> 
> Cheers,

Herbert, UDP peek support is very buggy anyway, because of deferred
checksums

__skb_checksum_complete() will happily manipulate csum, ip_summed,
csum_complete_sw & csum_valid

Ideally, peek should never touch skb (but skb->users)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] netlink: enable skb header refcounting before sending first broadcast
  2015-07-13  8:05   ` [PATCH] netlink: enable skb header refcounting before sending first broadcast Eric Dumazet
@ 2015-07-13  8:10     ` Herbert Xu
  2015-07-13  8:22       ` Eric Dumazet
  0 siblings, 1 reply; 18+ messages in thread
From: Herbert Xu @ 2015-07-13  8:10 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Konstantin Khlebnikov, netdev, David S. Miller, Eric Dumazet

On Mon, Jul 13, 2015 at 10:05:42AM +0200, Eric Dumazet wrote:
>
> Herbert, UDP peek support is very buggy anyway, because of deferred
> checksums
> 
> __skb_checksum_complete() will happily manipulate csum, ip_summed,
> csum_complete_sw & csum_valid
> 
> Ideally, peek should never touch skb (but skb->users)

I think UDP should be OK because the main creator of shared skbs
is af_packet and in that cast the IP stack will clone the skb upon
entry.  AFAIK there aren't any entities doing the shared skb trick
within the IP stack.

IOW the UDP stack does not have to worry about share skbs, unlike
netlink.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] netlink: enable skb header refcounting before sending first broadcast
  2015-07-13  8:10     ` Herbert Xu
@ 2015-07-13  8:22       ` Eric Dumazet
  2015-07-13  8:25         ` Herbert Xu
  0 siblings, 1 reply; 18+ messages in thread
From: Eric Dumazet @ 2015-07-13  8:22 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Konstantin Khlebnikov, netdev, David S. Miller, Eric Dumazet

On Mon, 2015-07-13 at 16:10 +0800, Herbert Xu wrote:
> On Mon, Jul 13, 2015 at 10:05:42AM +0200, Eric Dumazet wrote:
> >
> > Herbert, UDP peek support is very buggy anyway, because of deferred
> > checksums
> > 
> > __skb_checksum_complete() will happily manipulate csum, ip_summed,
> > csum_complete_sw & csum_valid
> > 
> > Ideally, peek should never touch skb (but skb->users)
> 
> I think UDP should be OK because the main creator of shared skbs
> is af_packet and in that cast the IP stack will clone the skb upon
> entry.  AFAIK there aren't any entities doing the shared skb trick
> within the IP stack.
> 
> IOW the UDP stack does not have to worry about share skbs, unlike
> netlink.

It should worry, in case multiple threads are using MSG_PEEK on same udp
socket ;)

Problem here is not the producer (might be unicast packets btw),
but multiple 'consumers'

It turns out your patch would also solve this problem.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] netlink: enable skb header refcounting before sending first broadcast
  2015-07-13  8:22       ` Eric Dumazet
@ 2015-07-13  8:25         ` Herbert Xu
  2015-07-13  8:28           ` Eric Dumazet
  0 siblings, 1 reply; 18+ messages in thread
From: Herbert Xu @ 2015-07-13  8:25 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Konstantin Khlebnikov, netdev, David S. Miller, Eric Dumazet

On Mon, Jul 13, 2015 at 10:22:34AM +0200, Eric Dumazet wrote:
>
> It should worry, in case multiple threads are using MSG_PEEK on same udp
> socket ;)

That should be fine because we already hold a spinlock on the
queue.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] netlink: enable skb header refcounting before sending first broadcast
  2015-07-13  8:25         ` Herbert Xu
@ 2015-07-13  8:28           ` Eric Dumazet
  2015-07-13  8:31             ` Herbert Xu
  0 siblings, 1 reply; 18+ messages in thread
From: Eric Dumazet @ 2015-07-13  8:28 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Eric Dumazet, Konstantin Khlebnikov, netdev, David S. Miller

On Mon, Jul 13, 2015 at 10:25 AM, Herbert Xu
<herbert@gondor.apana.org.au> wrote:
> On Mon, Jul 13, 2015 at 10:22:34AM +0200, Eric Dumazet wrote:
>>
>> It should worry, in case multiple threads are using MSG_PEEK on same udp
>> socket ;)
>
> That should be fine because we already hold a spinlock on the
> queue.
>

Except that udp checksum are checked outside of spinlock protection.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] netlink: enable skb header refcounting before sending first broadcast
  2015-07-13  8:28           ` Eric Dumazet
@ 2015-07-13  8:31             ` Herbert Xu
  2015-07-13 12:01               ` net: Fix skb csum races when peeking Herbert Xu
  0 siblings, 1 reply; 18+ messages in thread
From: Herbert Xu @ 2015-07-13  8:31 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Eric Dumazet, Konstantin Khlebnikov, netdev, David S. Miller

On Mon, Jul 13, 2015 at 10:28:19AM +0200, Eric Dumazet wrote:
>
> Except that udp checksum are checked outside of spinlock protection.

Good point.  I wonder when this got broken.  I'll do some digging.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] netlink: enable skb header refcounting before sending first broadcast
  2015-07-13  7:23 ` Herbert Xu
  2015-07-13  8:04   ` net: Clone skb before setting peeked flag Herbert Xu
  2015-07-13  8:05   ` [PATCH] netlink: enable skb header refcounting before sending first broadcast Eric Dumazet
@ 2015-07-13  8:54   ` Konstantin Khlebnikov
  2015-07-13  9:04     ` Herbert Xu
  2 siblings, 1 reply; 18+ messages in thread
From: Konstantin Khlebnikov @ 2015-07-13  8:54 UTC (permalink / raw)
  To: Herbert Xu; +Cc: netdev, David S. Miller, Eric Dumazet

On 13.07.2015 10:23, Herbert Xu wrote:
> On Fri, Jul 10, 2015 at 02:51:41PM +0300, Konstantin Khlebnikov wrote:
>> This fixes race between non-atomic updates of adjacent bit-fields:
>> skb->cloned could be lost because netlink broadcast clones skb after
>> sending it to the first listener who sets skb->peeked at the same skb.
>> As a result atomic refcounting of skb header stays disabled and
>> skb_release_data() frees it twice. Race leads to double-free in kmalloc-xxx.
>>
>> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>> Fixes: b19372273164 ("net: reorganize sk_buff for faster __copy_skb_header()")
>> ---
>>   net/netlink/af_netlink.c |    6 ++++++
>>   1 file changed, 6 insertions(+)
>>
>> diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
>> index dea925388a5b..921e0d8dfe3a 100644
>> --- a/net/netlink/af_netlink.c
>> +++ b/net/netlink/af_netlink.c
>> @@ -2028,6 +2028,12 @@ int netlink_broadcast_filtered(struct sock *ssk, struct sk_buff *skb, u32 portid
>>   	info.tx_filter = filter;
>>   	info.tx_data = filter_data;
>>
>> +	/* Enable atomic refcounting in skb_release_data() before first send:
>> +	 * non-atomic set of that bit-field in __skb_clone() could race with
>> +	 * __skb_recv_datagram() which touches the same set of bit-fields.
>> +	 */
>> +	skb->cloned = 1;
>> +
>>   	/* While we sleep in clone, do not allow to change socket list */
>>
>>   	netlink_lock_table();
>
> Your effort in finding this bug is wonderful.  However I think
> the fix is a bit dirty.
>
> The real issue here is that the recv path no longer handles shared
> skbs.  So either we need to fix the recv path to not touch skbs
> without cloning them, or we need to get rid of the use of shared
> skbs in netlink.

I don't think that recv path should care about shared skb -- skb can be
delivered into only one socket anyway.


Less dirty fix for that: do not send original skb.
That adds one extra clone but makes code much cleaner.


--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1957,17 +1957,16 @@ static void do_one_broadcast(struct sock *sk,
         }

         sock_hold(sk);
-       if (p->skb2 == NULL) {
-               if (skb_shared(p->skb)) {
-                       p->skb2 = skb_clone(p->skb, p->allocation);
-               } else {
-                       p->skb2 = skb_get(p->skb);
-                       /*
-                        * skb ownership may have been set when
-                        * delivered to a previous socket.
-                        */
-                       skb_orphan(p->skb2);
-               }
+       if (p->skb2 == NULL || skb_shared(p->skb2)) {
+               kfree_skb(p->skb2);
+               p->skb2 = skb_clone(p->skb, p->allocation);
+       } else {
+               skb_get(p->skb2);
+               /*
+                * skb ownership may have been set when
+                * delivered to a previous socket.
+                */
+               skb_orphan(p->skb2);
         }
         if (p->skb2 == NULL) {
                 netlink_overrun(sk);
@@ -1997,7 +1996,6 @@ static void do_one_broadcast(struct sock *sk,
         } else {
                 p->congested |= val;
                 p->delivered = 1;
-               p->skb2 = NULL;
         }
  out:
         sock_put(sk);



>
> In fact it looks I introduced the bug way back in
>
> commit a59322be07c964e916d15be3df473fb7ba20c41e
> Author: Herbert Xu <herbert@gondor.apana.org.au>
> Date:   Wed Dec 5 01:53:40 2007 -0800
>
>      [UDP]: Only increment counter on first peek/recv
>
> I will try to mend this error :)
>
> Cheers,
>


-- 
Konstantin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] netlink: enable skb header refcounting before sending first broadcast
  2015-07-13  8:54   ` [PATCH] netlink: enable skb header refcounting before sending first broadcast Konstantin Khlebnikov
@ 2015-07-13  9:04     ` Herbert Xu
  0 siblings, 0 replies; 18+ messages in thread
From: Herbert Xu @ 2015-07-13  9:04 UTC (permalink / raw)
  To: Konstantin Khlebnikov; +Cc: netdev, David S. Miller, Eric Dumazet

On Mon, Jul 13, 2015 at 11:54:47AM +0300, Konstantin Khlebnikov wrote:
> 
> I don't think that recv path should care about shared skb -- skb can be
> delivered into only one socket anyway.

The fact is that this is how the original code worked and was
expected to do.  The broadcast side would generate a shared skb
and the recv side is supposed to only read it, not modify it.

In fact apart from the skb->peeked bug netlink_recvmsg does all
the right things and never modifies skb.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 18+ messages in thread

* net: Fix skb csum races when peeking
  2015-07-13  8:31             ` Herbert Xu
@ 2015-07-13 12:01               ` Herbert Xu
  2015-07-13 14:25                 ` Herbert Xu
                                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Herbert Xu @ 2015-07-13 12:01 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Eric Dumazet, Konstantin Khlebnikov, netdev, David S. Miller

On Mon, Jul 13, 2015 at 04:31:00PM +0800, Herbert Xu wrote:
> On Mon, Jul 13, 2015 at 10:28:19AM +0200, Eric Dumazet wrote:
> >
> > Except that udp checksum are checked outside of spinlock protection.
> 
> Good point.  I wonder when this got broken.  I'll do some digging.

OK looks like I can claim credit for this bug too :)

commit fb286bb2990a107009dbf25f6ffebeb7df77f9be
Author: Herbert Xu <herbert@gondor.apana.org.au>
Date:   Thu Nov 10 13:01:24 2005 -0800

    [NET]: Detect hardware rx checksum faults correctly

Although others have made the hole bigger more recently.

PS we seem to no longer use the hardware checksum in case of
CHECKSUM_COMPLETE, I wonder why that is?

---8<---
When we calculate the checksum on the recv path, we store the
result in the skb as an optimisation in case we need the checksum
again down the line.

This is in fact bogus for the MSG_PEEK case as this is done without
any locking.  So multiple threads can peek and then store the result
to the same skb, potentially resulting in bogus skb states.

This patch fixes this by only storing the result if the skb is not
shared.  This preserves the optimisations for the few cases where
it can be done safely due to locking or other reasons, e.g., SIOCINQ.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/net/core/datagram.c b/net/core/datagram.c
index b80fb91..4967262 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -622,7 +657,8 @@ __sum16 __skb_checksum_complete_head(struct sk_buff *skb, int len)
 		    !skb->csum_complete_sw)
 			netdev_rx_csum_fault(skb->dev);
 	}
-	skb->csum_valid = !sum;
+	if (!skb_shared(skb))
+		skb->csum_valid = !sum;
 	return sum;
 }
 EXPORT_SYMBOL(__skb_checksum_complete_head);
@@ -642,11 +678,13 @@ __sum16 __skb_checksum_complete(struct sk_buff *skb)
 			netdev_rx_csum_fault(skb->dev);
 	}
 
-	/* Save full packet checksum */
-	skb->csum = csum;
-	skb->ip_summed = CHECKSUM_COMPLETE;
-	skb->csum_complete_sw = 1;
-	skb->csum_valid = !sum;
+	if (!skb_shared(skb)) {
+		/* Save full packet checksum */
+		skb->csum = csum;
+		skb->ip_summed = CHECKSUM_COMPLETE;
+		skb->csum_complete_sw = 1;
+		skb->csum_valid = !sum;
+	}
 
 	return sum;
 }
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: net: Fix skb csum races when peeking
  2015-07-13 12:01               ` net: Fix skb csum races when peeking Herbert Xu
@ 2015-07-13 14:25                 ` Herbert Xu
  2015-07-14  6:11                 ` Eric Dumazet
  2015-07-15 23:14                 ` David Miller
  2 siblings, 0 replies; 18+ messages in thread
From: Herbert Xu @ 2015-07-13 14:25 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Eric Dumazet, Konstantin Khlebnikov, netdev, David S. Miller

On Mon, Jul 13, 2015 at 08:01:42PM +0800, Herbert Xu wrote:
> 
> PS we seem to no longer use the hardware checksum in case of
> CHECKSUM_COMPLETE, I wonder why that is?

Nevermind, it's still there.  I was just looking in the wrong place.
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: net: Fix skb csum races when peeking
  2015-07-13 12:01               ` net: Fix skb csum races when peeking Herbert Xu
  2015-07-13 14:25                 ` Herbert Xu
@ 2015-07-14  6:11                 ` Eric Dumazet
  2015-07-15 23:14                 ` David Miller
  2 siblings, 0 replies; 18+ messages in thread
From: Eric Dumazet @ 2015-07-14  6:11 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Eric Dumazet, Konstantin Khlebnikov, netdev, David S. Miller

On Mon, 2015-07-13 at 20:01 +0800, Herbert Xu wrote:

> ---8<---
> When we calculate the checksum on the recv path, we store the
> result in the skb as an optimisation in case we need the checksum
> again down the line.
> 
> This is in fact bogus for the MSG_PEEK case as this is done without
> any locking.  So multiple threads can peek and then store the result
> to the same skb, potentially resulting in bogus skb states.
> 
> This patch fixes this by only storing the result if the skb is not
> shared.  This preserves the optimisations for the few cases where
> it can be done safely due to locking or other reasons, e.g., SIOCINQ.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> 
> diff --git a/net/core/datagram.c b/net/core/datagram.c
> index b80fb91..4967262 100644
> --- a/net/core/datagram.c
> +++ b/net/core/datagram.c
> @@ -622,7 +657,8 @@ __sum16 __skb_checksum_complete_head(struct sk_buff *skb, int len)
>  		    !skb->csum_complete_sw)
>  			netdev_rx_csum_fault(skb->dev);
>  	}
> -	skb->csum_valid = !sum;
> +	if (!skb_shared(skb))
> +		skb->csum_valid = !sum;
>  	return sum;
>  }
>  EXPORT_SYMBOL(__skb_checksum_complete_head);
> @@ -642,11 +678,13 @@ __sum16 __skb_checksum_complete(struct sk_buff *skb)
>  			netdev_rx_csum_fault(skb->dev);
>  	}
>  
> -	/* Save full packet checksum */
> -	skb->csum = csum;
> -	skb->ip_summed = CHECKSUM_COMPLETE;
> -	skb->csum_complete_sw = 1;
> -	skb->csum_valid = !sum;
> +	if (!skb_shared(skb)) {
> +		/* Save full packet checksum */
> +		skb->csum = csum;
> +		skb->ip_summed = CHECKSUM_COMPLETE;
> +		skb->csum_complete_sw = 1;
> +		skb->csum_valid = !sum;
> +	}
>  
>  	return sum;
>  }

Acked-by: Eric Dumazet <edumazet@google.com>

Thanks !

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: net: Clone skb before setting peeked flag
  2015-07-13  8:04   ` net: Clone skb before setting peeked flag Herbert Xu
@ 2015-07-15 23:13     ` David Miller
  0 siblings, 0 replies; 18+ messages in thread
From: David Miller @ 2015-07-15 23:13 UTC (permalink / raw)
  To: herbert; +Cc: khlebnikov, netdev, edumazet

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Mon, 13 Jul 2015 16:04:13 +0800

> Shared skbs must not be modified and this is crucial for broadcast
> and/or multicast paths where we use it as an optimisation to avoid
> unnecessary cloning.
> 
> The function skb_recv_datagram breaks this rule by setting peeked
> without cloning the skb first.  This causes funky races which leads
> to double-free.
> 
> This patch fixes this by cloning the skb and replacing the skb
> in the list when setting skb->peeked.
> 
> Fixes: a59322be07c9 ("[UDP]: Only increment counter on first peek/recv")
> Reported-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Applied and queued up for -stable.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: net: Fix skb csum races when peeking
  2015-07-13 12:01               ` net: Fix skb csum races when peeking Herbert Xu
  2015-07-13 14:25                 ` Herbert Xu
  2015-07-14  6:11                 ` Eric Dumazet
@ 2015-07-15 23:14                 ` David Miller
  2 siblings, 0 replies; 18+ messages in thread
From: David Miller @ 2015-07-15 23:14 UTC (permalink / raw)
  To: herbert; +Cc: edumazet, eric.dumazet, khlebnikov, netdev

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Mon, 13 Jul 2015 20:01:42 +0800

> When we calculate the checksum on the recv path, we store the
> result in the skb as an optimisation in case we need the checksum
> again down the line.
> 
> This is in fact bogus for the MSG_PEEK case as this is done without
> any locking.  So multiple threads can peek and then store the result
> to the same skb, potentially resulting in bogus skb states.
> 
> This patch fixes this by only storing the result if the skb is not
> shared.  This preserves the optimisations for the few cases where
> it can be done safely due to locking or other reasons, e.g., SIOCINQ.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Also applied and queued up for -stable, thanks!

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2015-07-15 23:14 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-10 11:51 [PATCH] netlink: enable skb header refcounting before sending first broadcast Konstantin Khlebnikov
2015-07-10 13:49 ` Eric Dumazet
2015-07-10 14:08   ` Konstantin Khlebnikov
2015-07-13  7:23 ` Herbert Xu
2015-07-13  8:04   ` net: Clone skb before setting peeked flag Herbert Xu
2015-07-15 23:13     ` David Miller
2015-07-13  8:05   ` [PATCH] netlink: enable skb header refcounting before sending first broadcast Eric Dumazet
2015-07-13  8:10     ` Herbert Xu
2015-07-13  8:22       ` Eric Dumazet
2015-07-13  8:25         ` Herbert Xu
2015-07-13  8:28           ` Eric Dumazet
2015-07-13  8:31             ` Herbert Xu
2015-07-13 12:01               ` net: Fix skb csum races when peeking Herbert Xu
2015-07-13 14:25                 ` Herbert Xu
2015-07-14  6:11                 ` Eric Dumazet
2015-07-15 23:14                 ` David Miller
2015-07-13  8:54   ` [PATCH] netlink: enable skb header refcounting before sending first broadcast Konstantin Khlebnikov
2015-07-13  9:04     ` Herbert Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.