From mboxrd@z Thu Jan 1 00:00:00 1970 From: Konstantin Khlebnikov Subject: Re: [PATCH] netlink: enable skb header refcounting before sending first broadcast Date: Fri, 10 Jul 2015 17:08:32 +0300 Message-ID: <559FD1E0.40909@yandex-team.ru> References: <20150710115141.12980.88829.stgit@buzz> <1436536187.24939.50.camel@edumazet-glaptop2.roam.corp.google.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Cc: netdev@vger.kernel.org, "David S. Miller" , Eric Dumazet , Herbert Xu To: Eric Dumazet Return-path: Received: from forward-corp1o.mail.yandex.net ([37.140.190.172]:46614 "EHLO forward-corp1o.mail.yandex.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754467AbbGJOIj (ORCPT ); Fri, 10 Jul 2015 10:08:39 -0400 In-Reply-To: <1436536187.24939.50.camel@edumazet-glaptop2.roam.corp.google.com> Sender: netdev-owner@vger.kernel.org List-ID: On 10.07.2015 16:49, Eric Dumazet wrote: > On Fri, 2015-07-10 at 14:51 +0300, Konstantin Khlebnikov wrote: >> This fixes race between non-atomic updates of adjacent bit-fields: >> skb->cloned could be lost because netlink broadcast clones skb after >> sending it to the first listener who sets skb->peeked at the same skb. >> As a result atomic refcounting of skb header stays disabled and >> skb_release_data() frees it twice. Race leads to double-free in kmalloc-xxx. >> >> Signed-off-by: Konstantin Khlebnikov >> Fixes: b19372273164 ("net: reorganize sk_buff for faster __copy_skb_header()") >> --- >> net/netlink/af_netlink.c | 6 ++++++ >> 1 file changed, 6 insertions(+) >> >> diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c >> index dea925388a5b..921e0d8dfe3a 100644 >> --- a/net/netlink/af_netlink.c >> +++ b/net/netlink/af_netlink.c >> @@ -2028,6 +2028,12 @@ int netlink_broadcast_filtered(struct sock *ssk, struct sk_buff *skb, u32 portid >> info.tx_filter = filter; >> info.tx_data = filter_data; >> >> + /* Enable atomic refcounting in skb_release_data() before first send: >> + * non-atomic set of that bit-field in __skb_clone() could race with >> + * __skb_recv_datagram() which touches the same set of bit-fields. >> + */ >> + skb->cloned = 1; >> + >> /* While we sleep in clone, do not allow to change socket list */ >> >> netlink_lock_table(); > > Wow, this is tricky. > > I wonder how you found this bug ???? In some setups race happens quite often: once or twice per hour. I guess the main trigger was the openvswitch which generates a lot of netlink traffic. Though debugging was a real pain. > > Acked-by: Eric Dumazet > > > -- Konstantin