Re: [PATCH net v6 3/3] net: sched: fix tx action reschedule issue with stopped queue

All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed

From: Jakub Kicinski <kuba@kernel.org>
To: Yunsheng Lin <linyunsheng@huawei.com>
Cc: <davem@davemloft.net>, <olteanv@gmail.com>, <ast@kernel.org>,
	<daniel@iogearbox.net>, <andriin@fb.com>, <edumazet@google.com>,
	<weiwan@google.com>, <cong.wang@bytedance.com>,
	<ap420073@gmail.com>, <netdev@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>, <linuxarm@openeuler.org>,
	<mkl@pengutronix.de>, <linux-can@vger.kernel.org>,
	<jhs@mojatatu.com>, <xiyou.wangcong@gmail.com>,
	<jiri@resnulli.us>, <andrii@kernel.org>, <kafai@fb.com>,
	<songliubraving@fb.com>, <yhs@fb.com>, <john.fastabend@gmail.com>,
	<kpsingh@kernel.org>, <bpf@vger.kernel.org>,
	<jonas.bonn@netrounds.com>, <pabeni@redhat.com>,
	<mzhivich@akamai.com>, <johunt@akamai.com>, <albcamus@gmail.com>,
	<kehuan.feng@gmail.com>, <a.fatoum@pengutronix.de>,
	<atenart@kernel.org>, <alexander.duyck@gmail.com>,
	<hdanton@sina.com>, <jgross@suse.com>, <JKosina@suse.com>,
	<mkubecek@suse.cz>, <bjorn@kernel.org>, <alobakin@pm.me>
Subject: Re: [PATCH net v6 3/3] net: sched: fix tx action reschedule issue with stopped queue
Date: Tue, 11 May 2021 16:30:49 -0700	[thread overview]
Message-ID: <20210511163049.37d2cba0@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com> (raw)
In-Reply-To: <8db8e594-9606-2c93-7274-1c180afaadb2@huawei.com>

On Tue, 11 May 2021 20:13:56 +0800 Yunsheng Lin wrote:
> On 2021/5/11 17:04, Yunsheng Lin wrote:
> > On 2021/5/11 12:22, Jakub Kicinski wrote:  
> >> The queues are woken asynchronously without holding any locks via
> >> netif_tx_wake_queue(). Theoretically we can have a situation where:
> >>
> >> CPU 0                            CPU 1   
> >>   .                                .
> >> dequeue_skb()                      .
> >>   netif_xmit_frozen..() # true     .
> >>   .                              [IRQ]
> >>   .                              netif_tx_wake_queue()
> >>   .                              <end of IRQ>
> >>   .                              netif_tx_action()
> >>   .                              set MISSED
> >>   clear MISSED
> >>   return NULL
> >> ret from qdisc_restart()
> >> ret from __qdisc_run()
> >> qdisc_run_end()  
>  [...]  
> > 
> > Yes, the above does seems to have the above data race.
> > 
> > As my understanding, there is two ways to fix the above data race:
> > 1. do not clear the STATE_MISSED for netif_xmit_frozen_or_stopped()
> >    case, just check the netif_xmit_frozen_or_stopped() before
> >    calling __netif_schedule() at the end of qdisc_run_end(). This seems
> >    to only work with qdisc with TCQ_F_ONETXQUEUE flag because it seems
> >    we can only check the netif_xmit_frozen_or_stopped() with q->dev_queue,
> >    I am not sure q->dev_queue is pointint to which netdev queue when qdisc
> >    is not set with TCQ_F_ONETXQUEUE flag.

Isn't the case where we have a NOLOCK qdisc without TCQ_F_ONETXQUEUE
rather unexpected? It'd have to be a single pfifo on multi-queue
netdev, right? Sounds not worth optimizing for. How about:

 static inline void qdisc_run_end(struct Qdisc *qdisc)
 {
 	write_seqcount_end(&qdisc->running);
	if (qdisc->flags & TCQ_F_NOLOCK) {
 		spin_unlock(&qdisc->seqlock);

		if (unlikely(test_bit(__QDISC_STATE_MISSED,
				      &qdisc->state))) {
			clear_bit(__QDISC_STATE_MISSED, &qdisc->state);
			if (!(q->flags & TCQ_F_ONETXQUEUE) ||
			    !netif_xmit_frozen_or_stopped(q->dev_queue))
				__netif_schedule(qdisc);
		}
	}
 }

For the strange non-ONETXQUEUE case we'd have an occasional unnecessary
net_tx_action, but no infinite loop possible.

> > 2. clearing the STATE_MISSED for netif_xmit_frozen_or_stopped() case
> >    as this patch does, and protect the __netif_schedule() with q->seqlock
> >    for netif_tx_wake_queue(), which might bring unnecessary overhead for
> >    non-stopped queue case
> > 
> > Any better idea?  
> 
> 3. Or check the netif_xmit_frozen_or_stopped() again after clearing
>    STATE_MISSED, like below:
> 
>    if (netif_xmit_frozen_or_stopped(txq)) {
> 	  clear_bit(__QDISC_STATE_MISSED, &q->state);
> 
> 	  /* Make sure the below netif_xmit_frozen_or_stopped()
> 	   * checking happens after clearing STATE_MISSED.
> 	   */
> 	  smp_mb__after_atomic();
> 
> 	  /* Checking netif_xmit_frozen_or_stopped() again to
> 	   * make sure __QDISC_STATE_MISSED is set if the
> 	   * __QDISC_STATE_MISSED set by netif_tx_wake_queue()'s
> 	   * rescheduling of net_tx_action() is cleared by the
> 	   * above clear_bit().
> 	   */
> 	  if (!netif_xmit_frozen_or_stopped(txq))
> 	  	set_bit(__QDISC_STATE_MISSED, &q->state);
>   }
> 
>   It is kind of ugly, but it does seem to fix the above data race too.
>   And it seems like a common pattern to deal with the concurrency between
>   xmit and NAPI polling, as below:
> 
> https://elixir.bootlin.com/linux/v5.12-rc2/source/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c#L1409

This is indeed the idiomatic way of dealing with Tx queue stopping race,
but it's a bit of code to sprinkle around. My vote would be option 1.

next prev parent reply	other threads:[~2021-05-11 23:30 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-10  1:42 [PATCH net v6 0/3] fix packet stuck problem for lockless qdisc Yunsheng Lin
2021-05-10  1:42 ` [PATCH net v6 1/3] net: sched: " Yunsheng Lin
2021-05-10  1:42 ` [PATCH net v6 2/3] net: sched: fix endless tx action reschedule during deactivation Yunsheng Lin
2021-05-10  1:42 ` [PATCH net v6 3/3] net: sched: fix tx action reschedule issue with stopped queue Yunsheng Lin
2021-05-11  4:22   ` Jakub Kicinski
2021-05-11  9:04     ` Yunsheng Lin
2021-05-11 12:13       ` Yunsheng Lin
2021-05-11 23:30         ` Jakub Kicinski [this message]
2021-05-12  3:34           ` [Linuxarm] " Yunsheng Lin
2021-05-12 19:47             ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210511163049.37d2cba0@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com \
    --to=kuba@kernel.org \
    --cc=JKosina@suse.com \
    --cc=a.fatoum@pengutronix.de \
    --cc=albcamus@gmail.com \
    --cc=alexander.duyck@gmail.com \
    --cc=alobakin@pm.me \
    --cc=andrii@kernel.org \
    --cc=andriin@fb.com \
    --cc=ap420073@gmail.com \
    --cc=ast@kernel.org \
    --cc=atenart@kernel.org \
    --cc=bjorn@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=cong.wang@bytedance.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=hdanton@sina.com \
    --cc=jgross@suse.com \
    --cc=jhs@mojatatu.com \
    --cc=jiri@resnulli.us \
    --cc=john.fastabend@gmail.com \
    --cc=johunt@akamai.com \
    --cc=jonas.bonn@netrounds.com \
    --cc=kafai@fb.com \
    --cc=kehuan.feng@gmail.com \
    --cc=kpsingh@kernel.org \
    --cc=linux-can@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxarm@openeuler.org \
    --cc=linyunsheng@huawei.com \
    --cc=mkl@pengutronix.de \
    --cc=mkubecek@suse.cz \
    --cc=mzhivich@akamai.com \
    --cc=netdev@vger.kernel.org \
    --cc=olteanv@gmail.com \
    --cc=pabeni@redhat.com \
    --cc=songliubraving@fb.com \
    --cc=weiwan@google.com \
    --cc=xiyou.wangcong@gmail.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.