From: Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com> To: Luiz Augusto von Dentz <luiz.dentz@gmail.com> Cc: Marcel Holtmann <marcel@holtmann.org>, Johan Hedberg <johan.hedberg@gmail.com>, David Miller <davem@davemloft.net>, Jakub Kicinski <kuba@kernel.org>, "linux-bluetooth@vger.kernel.org" <linux-bluetooth@vger.kernel.org>, "open list:NETWORKING [GENERAL]" <netdev@vger.kernel.org>, Linux Kernel Mailing List <linux-kernel@vger.kernel.org>, skhan@linuxfoundation.org, Greg Kroah-Hartman <gregkh@linuxfoundation.org>, linux-kernel-mentees@lists.linuxfoundation.org, syzbot+2f6d7c28bb4bf7e82060@syzkaller.appspotmail.com Subject: Re: [PATCH v4] Bluetooth: schedule SCO timeouts with delayed_work Date: Thu, 29 Jul 2021 12:28:34 +0800 [thread overview] Message-ID: <6c152b1f-fe15-6ea3-cb96-1d87f0f7dea7@gmail.com> (raw) In-Reply-To: <CABBYNZ+_mYB=r3B-f0Pu214ZmKVAM2EmpSFYQksTDbdm61Q4Bw@mail.gmail.com> Hi Luiz, On 29/7/21 7:07 am, Luiz Augusto von Dentz wrote: > Hi Desmond, > > On Wed, Jul 28, 2021 at 12:17 AM Desmond Cheong Zhi Xi > <desmondcheongzx@gmail.com> wrote: >> >> struct sock.sk_timer should be used as a sock cleanup timer. However, >> SCO uses it to implement sock timeouts. >> >> This causes issues because struct sock.sk_timer's callback is run in >> an IRQ context, and the timer callback function sco_sock_timeout takes >> a spin lock on the socket. However, other functions such as >> sco_conn_del, sco_conn_ready, rfcomm_connect_ind, and >> bt_accept_enqueue also take the spin lock with interrupts enabled. >> >> This inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} lock usage could >> lead to deadlocks as reported by Syzbot [1]: >> CPU0 >> ---- >> lock(slock-AF_BLUETOOTH-BTPROTO_SCO); >> <Interrupt> >> lock(slock-AF_BLUETOOTH-BTPROTO_SCO); >> >> To fix this, we use delayed work to implement SCO sock timouts >> instead. This allows us to avoid taking the spin lock on the socket in >> an IRQ context, and corrects the misuse of struct sock.sk_timer. >> >> Link: https://syzkaller.appspot.com/bug?id=9089d89de0502e120f234ca0fc8a703f7368b31e [1] >> Reported-by: syzbot+2f6d7c28bb4bf7e82060@syzkaller.appspotmail.com >> Signed-off-by: Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com> >> --- >> >> Hi, >> >> As suggested, this patch addresses the inconsistent lock state while >> avoiding having to deal with local_bh_disable. >> >> Now that sco_sock_timeout is no longer run in IRQ context, it might >> be the case that bh_lock_sock is no longer needed to sync between >> SOFTIRQ and user contexts, so we can switch to lock_sock. >> >> I'm not too certain about this, or if there's any benefit to using >> lock_sock instead, so I've left that out of this patch. >> >> v3 -> v4: >> - Switch to using delayed_work to schedule SCO sock timeouts instead >> of using local_bh_disable. As suggested by Luiz Augusto von Dentz. >> >> v2 -> v3: >> - Split SCO and RFCOMM code changes, as suggested by Luiz Augusto von >> Dentz. >> - Simplify local bh disabling in SCO by using local_bh_disable/enable >> inside sco_chan_del since local_bh_disable/enable pairs are reentrant. >> >> v1 -> v2: >> - Instead of pulling out the clean-up code out from sco_chan_del and >> using it directly in sco_conn_del, disable local softirqs for relevant >> sections. >> - Disable local softirqs more thoroughly for instances of >> bh_lock_sock/bh_lock_sock_nested in the bluetooth subsystem. >> Specifically, the calls in af_bluetooth.c and rfcomm/sock.c are now made >> with local softirqs disabled as well. >> >> Best wishes, >> Desmond >> >> net/bluetooth/sco.c | 39 ++++++++++++++++++++++++--------------- >> 1 file changed, 24 insertions(+), 15 deletions(-) >> >> diff --git a/net/bluetooth/sco.c b/net/bluetooth/sco.c >> index 3bd41563f118..b6dd16153d38 100644 >> --- a/net/bluetooth/sco.c >> +++ b/net/bluetooth/sco.c >> @@ -48,6 +48,8 @@ struct sco_conn { >> spinlock_t lock; >> struct sock *sk; >> >> + struct delayed_work sk_timer; >> + >> unsigned int mtu; >> }; >> >> @@ -74,9 +76,11 @@ struct sco_pinfo { >> #define SCO_CONN_TIMEOUT (HZ * 40) >> #define SCO_DISCONN_TIMEOUT (HZ * 2) >> >> -static void sco_sock_timeout(struct timer_list *t) >> +static void sco_sock_timeout(struct work_struct *work) >> { >> - struct sock *sk = from_timer(sk, t, sk_timer); >> + struct sco_conn *conn = container_of(work, struct sco_conn, >> + sk_timer.work); >> + struct sock *sk = conn->sk; >> >> BT_DBG("sock %p state %d", sk, sk->sk_state); >> >> @@ -89,16 +93,18 @@ static void sco_sock_timeout(struct timer_list *t) >> sock_put(sk); >> } >> >> -static void sco_sock_set_timer(struct sock *sk, long timeout) >> +static void sco_sock_set_timer(struct sock *sk, struct delayed_work *work, >> + long timeout) >> { >> BT_DBG("sock %p state %d timeout %ld", sk, sk->sk_state, timeout); >> - sk_reset_timer(sk, &sk->sk_timer, jiffies + timeout); >> + cancel_delayed_work(work); >> + schedule_delayed_work(work, timeout); > > I guess if you want to really guarantee cancel takes effect you must > call cancel_delayed_work_sync > Got it, thanks for catching that. >> } >> >> -static void sco_sock_clear_timer(struct sock *sk) >> +static void sco_sock_clear_timer(struct sock *sk, struct delayed_work *work) >> { >> BT_DBG("sock %p state %d", sk, sk->sk_state); >> - sk_stop_timer(sk, &sk->sk_timer); >> + cancel_delayed_work(work); >> } >> >> /* ---- SCO connections ---- */ >> @@ -174,7 +180,7 @@ static void sco_conn_del(struct hci_conn *hcon, int err) >> if (sk) { >> sock_hold(sk); >> bh_lock_sock(sk); >> - sco_sock_clear_timer(sk); >> + sco_sock_clear_timer(sk, &conn->sk_timer); >> sco_chan_del(sk, err); >> bh_unlock_sock(sk); >> sco_sock_kill(sk); >> @@ -193,6 +199,8 @@ static void __sco_chan_add(struct sco_conn *conn, struct sock *sk, >> sco_pi(sk)->conn = conn; >> conn->sk = sk; >> >> + INIT_DELAYED_WORK(&conn->sk_timer, sco_sock_timeout); >> + >> if (parent) >> bt_accept_enqueue(parent, sk, true); >> } >> @@ -260,11 +268,11 @@ static int sco_connect(struct sock *sk) >> goto done; >> >> if (hcon->state == BT_CONNECTED) { >> - sco_sock_clear_timer(sk); >> + sco_sock_clear_timer(sk, &conn->sk_timer); >> sk->sk_state = BT_CONNECTED; >> } else { >> sk->sk_state = BT_CONNECT; >> - sco_sock_set_timer(sk, sk->sk_sndtimeo); >> + sco_sock_set_timer(sk, &conn->sk_timer, sk->sk_sndtimeo); >> } >> >> done: >> @@ -419,7 +427,8 @@ static void __sco_sock_close(struct sock *sk) >> case BT_CONFIG: >> if (sco_pi(sk)->conn->hcon) { >> sk->sk_state = BT_DISCONN; >> - sco_sock_set_timer(sk, SCO_DISCONN_TIMEOUT); >> + sco_sock_set_timer(sk, &sco_pi(sk)->conn->sk_timer, >> + SCO_DISCONN_TIMEOUT); >> sco_conn_lock(sco_pi(sk)->conn); >> hci_conn_drop(sco_pi(sk)->conn->hcon); >> sco_pi(sk)->conn->hcon = NULL; >> @@ -443,7 +452,8 @@ static void __sco_sock_close(struct sock *sk) >> /* Must be called on unlocked socket. */ >> static void sco_sock_close(struct sock *sk) >> { >> - sco_sock_clear_timer(sk); >> + if (sco_pi(sk)->conn) >> + sco_sock_clear_timer(sk, &sco_pi(sk)->conn->sk_timer); >> lock_sock(sk); >> __sco_sock_close(sk); >> release_sock(sk); >> @@ -500,8 +510,6 @@ static struct sock *sco_sock_alloc(struct net *net, struct socket *sock, >> >> sco_pi(sk)->setting = BT_VOICE_CVSD_16BIT; >> >> - timer_setup(&sk->sk_timer, sco_sock_timeout, 0); >> - >> bt_sock_link(&sco_sk_list, sk); >> return sk; >> } >> @@ -1036,7 +1044,8 @@ static int sco_sock_shutdown(struct socket *sock, int how) >> >> if (!sk->sk_shutdown) { >> sk->sk_shutdown = SHUTDOWN_MASK; >> - sco_sock_clear_timer(sk); >> + if (sco_pi(sk)->conn) >> + sco_sock_clear_timer(sk, &sco_pi(sk)->conn->sk_timer); > > It probably makes it simpler if we can have the check for > sco_pi(sk)->conn inside sco_sock_{clear,set}_timer, that way we don't > need to keep checking like in the code above. > Makes sense, I'll make the change. Re: testing, this patch passes some local tests I set up to trigger the lockdep warning, but I'll run the updated patch through Syzbot again to double-check. Best wishes, Desmond >> __sco_sock_close(sk); >> >> if (sock_flag(sk, SOCK_LINGER) && sk->sk_lingertime && >> @@ -1083,7 +1092,7 @@ static void sco_conn_ready(struct sco_conn *conn) >> BT_DBG("conn %p", conn); >> >> if (sk) { >> - sco_sock_clear_timer(sk); >> + sco_sock_clear_timer(sk, &conn->sk_timer); >> bh_lock_sock(sk); >> sk->sk_state = BT_CONNECTED; >> sk->sk_state_change(sk); >> -- >> 2.25.1 >> > >
WARNING: multiple messages have this Message-ID (diff)
From: Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com> To: Luiz Augusto von Dentz <luiz.dentz@gmail.com> Cc: Johan Hedberg <johan.hedberg@gmail.com>, "open list:NETWORKING \[GENERAL\]" <netdev@vger.kernel.org>, Marcel Holtmann <marcel@holtmann.org>, Linux Kernel Mailing List <linux-kernel@vger.kernel.org>, "linux-bluetooth@vger.kernel.org" <linux-bluetooth@vger.kernel.org>, syzbot+2f6d7c28bb4bf7e82060@syzkaller.appspotmail.com, Jakub Kicinski <kuba@kernel.org>, linux-kernel-mentees@lists.linuxfoundation.org, David Miller <davem@davemloft.net> Subject: Re: [PATCH v4] Bluetooth: schedule SCO timeouts with delayed_work Date: Thu, 29 Jul 2021 12:28:34 +0800 [thread overview] Message-ID: <6c152b1f-fe15-6ea3-cb96-1d87f0f7dea7@gmail.com> (raw) In-Reply-To: <CABBYNZ+_mYB=r3B-f0Pu214ZmKVAM2EmpSFYQksTDbdm61Q4Bw@mail.gmail.com> Hi Luiz, On 29/7/21 7:07 am, Luiz Augusto von Dentz wrote: > Hi Desmond, > > On Wed, Jul 28, 2021 at 12:17 AM Desmond Cheong Zhi Xi > <desmondcheongzx@gmail.com> wrote: >> >> struct sock.sk_timer should be used as a sock cleanup timer. However, >> SCO uses it to implement sock timeouts. >> >> This causes issues because struct sock.sk_timer's callback is run in >> an IRQ context, and the timer callback function sco_sock_timeout takes >> a spin lock on the socket. However, other functions such as >> sco_conn_del, sco_conn_ready, rfcomm_connect_ind, and >> bt_accept_enqueue also take the spin lock with interrupts enabled. >> >> This inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} lock usage could >> lead to deadlocks as reported by Syzbot [1]: >> CPU0 >> ---- >> lock(slock-AF_BLUETOOTH-BTPROTO_SCO); >> <Interrupt> >> lock(slock-AF_BLUETOOTH-BTPROTO_SCO); >> >> To fix this, we use delayed work to implement SCO sock timouts >> instead. This allows us to avoid taking the spin lock on the socket in >> an IRQ context, and corrects the misuse of struct sock.sk_timer. >> >> Link: https://syzkaller.appspot.com/bug?id=9089d89de0502e120f234ca0fc8a703f7368b31e [1] >> Reported-by: syzbot+2f6d7c28bb4bf7e82060@syzkaller.appspotmail.com >> Signed-off-by: Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com> >> --- >> >> Hi, >> >> As suggested, this patch addresses the inconsistent lock state while >> avoiding having to deal with local_bh_disable. >> >> Now that sco_sock_timeout is no longer run in IRQ context, it might >> be the case that bh_lock_sock is no longer needed to sync between >> SOFTIRQ and user contexts, so we can switch to lock_sock. >> >> I'm not too certain about this, or if there's any benefit to using >> lock_sock instead, so I've left that out of this patch. >> >> v3 -> v4: >> - Switch to using delayed_work to schedule SCO sock timeouts instead >> of using local_bh_disable. As suggested by Luiz Augusto von Dentz. >> >> v2 -> v3: >> - Split SCO and RFCOMM code changes, as suggested by Luiz Augusto von >> Dentz. >> - Simplify local bh disabling in SCO by using local_bh_disable/enable >> inside sco_chan_del since local_bh_disable/enable pairs are reentrant. >> >> v1 -> v2: >> - Instead of pulling out the clean-up code out from sco_chan_del and >> using it directly in sco_conn_del, disable local softirqs for relevant >> sections. >> - Disable local softirqs more thoroughly for instances of >> bh_lock_sock/bh_lock_sock_nested in the bluetooth subsystem. >> Specifically, the calls in af_bluetooth.c and rfcomm/sock.c are now made >> with local softirqs disabled as well. >> >> Best wishes, >> Desmond >> >> net/bluetooth/sco.c | 39 ++++++++++++++++++++++++--------------- >> 1 file changed, 24 insertions(+), 15 deletions(-) >> >> diff --git a/net/bluetooth/sco.c b/net/bluetooth/sco.c >> index 3bd41563f118..b6dd16153d38 100644 >> --- a/net/bluetooth/sco.c >> +++ b/net/bluetooth/sco.c >> @@ -48,6 +48,8 @@ struct sco_conn { >> spinlock_t lock; >> struct sock *sk; >> >> + struct delayed_work sk_timer; >> + >> unsigned int mtu; >> }; >> >> @@ -74,9 +76,11 @@ struct sco_pinfo { >> #define SCO_CONN_TIMEOUT (HZ * 40) >> #define SCO_DISCONN_TIMEOUT (HZ * 2) >> >> -static void sco_sock_timeout(struct timer_list *t) >> +static void sco_sock_timeout(struct work_struct *work) >> { >> - struct sock *sk = from_timer(sk, t, sk_timer); >> + struct sco_conn *conn = container_of(work, struct sco_conn, >> + sk_timer.work); >> + struct sock *sk = conn->sk; >> >> BT_DBG("sock %p state %d", sk, sk->sk_state); >> >> @@ -89,16 +93,18 @@ static void sco_sock_timeout(struct timer_list *t) >> sock_put(sk); >> } >> >> -static void sco_sock_set_timer(struct sock *sk, long timeout) >> +static void sco_sock_set_timer(struct sock *sk, struct delayed_work *work, >> + long timeout) >> { >> BT_DBG("sock %p state %d timeout %ld", sk, sk->sk_state, timeout); >> - sk_reset_timer(sk, &sk->sk_timer, jiffies + timeout); >> + cancel_delayed_work(work); >> + schedule_delayed_work(work, timeout); > > I guess if you want to really guarantee cancel takes effect you must > call cancel_delayed_work_sync > Got it, thanks for catching that. >> } >> >> -static void sco_sock_clear_timer(struct sock *sk) >> +static void sco_sock_clear_timer(struct sock *sk, struct delayed_work *work) >> { >> BT_DBG("sock %p state %d", sk, sk->sk_state); >> - sk_stop_timer(sk, &sk->sk_timer); >> + cancel_delayed_work(work); >> } >> >> /* ---- SCO connections ---- */ >> @@ -174,7 +180,7 @@ static void sco_conn_del(struct hci_conn *hcon, int err) >> if (sk) { >> sock_hold(sk); >> bh_lock_sock(sk); >> - sco_sock_clear_timer(sk); >> + sco_sock_clear_timer(sk, &conn->sk_timer); >> sco_chan_del(sk, err); >> bh_unlock_sock(sk); >> sco_sock_kill(sk); >> @@ -193,6 +199,8 @@ static void __sco_chan_add(struct sco_conn *conn, struct sock *sk, >> sco_pi(sk)->conn = conn; >> conn->sk = sk; >> >> + INIT_DELAYED_WORK(&conn->sk_timer, sco_sock_timeout); >> + >> if (parent) >> bt_accept_enqueue(parent, sk, true); >> } >> @@ -260,11 +268,11 @@ static int sco_connect(struct sock *sk) >> goto done; >> >> if (hcon->state == BT_CONNECTED) { >> - sco_sock_clear_timer(sk); >> + sco_sock_clear_timer(sk, &conn->sk_timer); >> sk->sk_state = BT_CONNECTED; >> } else { >> sk->sk_state = BT_CONNECT; >> - sco_sock_set_timer(sk, sk->sk_sndtimeo); >> + sco_sock_set_timer(sk, &conn->sk_timer, sk->sk_sndtimeo); >> } >> >> done: >> @@ -419,7 +427,8 @@ static void __sco_sock_close(struct sock *sk) >> case BT_CONFIG: >> if (sco_pi(sk)->conn->hcon) { >> sk->sk_state = BT_DISCONN; >> - sco_sock_set_timer(sk, SCO_DISCONN_TIMEOUT); >> + sco_sock_set_timer(sk, &sco_pi(sk)->conn->sk_timer, >> + SCO_DISCONN_TIMEOUT); >> sco_conn_lock(sco_pi(sk)->conn); >> hci_conn_drop(sco_pi(sk)->conn->hcon); >> sco_pi(sk)->conn->hcon = NULL; >> @@ -443,7 +452,8 @@ static void __sco_sock_close(struct sock *sk) >> /* Must be called on unlocked socket. */ >> static void sco_sock_close(struct sock *sk) >> { >> - sco_sock_clear_timer(sk); >> + if (sco_pi(sk)->conn) >> + sco_sock_clear_timer(sk, &sco_pi(sk)->conn->sk_timer); >> lock_sock(sk); >> __sco_sock_close(sk); >> release_sock(sk); >> @@ -500,8 +510,6 @@ static struct sock *sco_sock_alloc(struct net *net, struct socket *sock, >> >> sco_pi(sk)->setting = BT_VOICE_CVSD_16BIT; >> >> - timer_setup(&sk->sk_timer, sco_sock_timeout, 0); >> - >> bt_sock_link(&sco_sk_list, sk); >> return sk; >> } >> @@ -1036,7 +1044,8 @@ static int sco_sock_shutdown(struct socket *sock, int how) >> >> if (!sk->sk_shutdown) { >> sk->sk_shutdown = SHUTDOWN_MASK; >> - sco_sock_clear_timer(sk); >> + if (sco_pi(sk)->conn) >> + sco_sock_clear_timer(sk, &sco_pi(sk)->conn->sk_timer); > > It probably makes it simpler if we can have the check for > sco_pi(sk)->conn inside sco_sock_{clear,set}_timer, that way we don't > need to keep checking like in the code above. > Makes sense, I'll make the change. Re: testing, this patch passes some local tests I set up to trigger the lockdep warning, but I'll run the updated patch through Syzbot again to double-check. Best wishes, Desmond >> __sco_sock_close(sk); >> >> if (sock_flag(sk, SOCK_LINGER) && sk->sk_lingertime && >> @@ -1083,7 +1092,7 @@ static void sco_conn_ready(struct sco_conn *conn) >> BT_DBG("conn %p", conn); >> >> if (sk) { >> - sco_sock_clear_timer(sk); >> + sco_sock_clear_timer(sk, &conn->sk_timer); >> bh_lock_sock(sk); >> sk->sk_state = BT_CONNECTED; >> sk->sk_state_change(sk); >> -- >> 2.25.1 >> > > _______________________________________________ Linux-kernel-mentees mailing list Linux-kernel-mentees@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees
next prev parent reply other threads:[~2021-07-29 4:28 UTC|newest] Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-07-28 7:17 [PATCH v4] Bluetooth: schedule SCO timeouts with delayed_work Desmond Cheong Zhi Xi 2021-07-28 7:17 ` Desmond Cheong Zhi Xi 2021-07-28 8:01 ` [v4] " bluez.test.bot 2021-07-28 22:50 ` [PATCH v4] " Luiz Augusto von Dentz 2021-07-28 22:50 ` Luiz Augusto von Dentz 2021-07-28 23:07 ` Luiz Augusto von Dentz 2021-07-28 23:07 ` Luiz Augusto von Dentz 2021-07-29 4:28 ` Desmond Cheong Zhi Xi [this message] 2021-07-29 4:28 ` Desmond Cheong Zhi Xi 2021-07-29 11:30 ` Marcel Holtmann 2021-07-29 11:30 ` Marcel Holtmann 2021-07-29 14:02 ` Desmond Cheong Zhi Xi 2021-07-29 14:02 ` Desmond Cheong Zhi Xi
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=6c152b1f-fe15-6ea3-cb96-1d87f0f7dea7@gmail.com \ --to=desmondcheongzx@gmail.com \ --cc=davem@davemloft.net \ --cc=gregkh@linuxfoundation.org \ --cc=johan.hedberg@gmail.com \ --cc=kuba@kernel.org \ --cc=linux-bluetooth@vger.kernel.org \ --cc=linux-kernel-mentees@lists.linuxfoundation.org \ --cc=linux-kernel@vger.kernel.org \ --cc=luiz.dentz@gmail.com \ --cc=marcel@holtmann.org \ --cc=netdev@vger.kernel.org \ --cc=skhan@linuxfoundation.org \ --cc=syzbot+2f6d7c28bb4bf7e82060@syzkaller.appspotmail.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.