Re: r8169: transmit queue timeouts and IRQ masking

Netdev Archive mirror
 help / color / mirror / Atom feed

From: Ken Milmore <ken.milmore@gmail.com>
To: Heiner Kallweit <hkallweit1@gmail.com>, netdev@vger.kernel.org
Cc: nic_swsd@realtek.com
Subject: Re: r8169: transmit queue timeouts and IRQ masking
Date: Sun, 12 May 2024 20:49:26 +0100	[thread overview]
Message-ID: <c71a960f-16d3-41f0-9899-0040116b30ee@gmail.com> (raw)
In-Reply-To: <940faa90-81db-40dc-8773-1720520b10ed@gmail.com>

On 11/05/2024 17:31, Heiner Kallweit wrote:
> On 11.05.2024 00:29, Ken Milmore wrote:
>>
>> Reading this worries me though:
>>
>> https://docs.kernel.org/networking/napi.html
>> "napi_disable() and subsequent calls to the poll method only wait for the ownership of the instance to be released, not for the poll method to exit.
>> This means that drivers should avoid accessing any data structures after calling napi_complete_done()."
>>
> According to kernel doc napi_disable() waits.
> 
> /**
>  *	napi_disable - prevent NAPI from scheduling
>  *	@n: NAPI context
>  *
>  * Stop NAPI from being scheduled on this context.
>  * Waits till any outstanding processing completes.
>  */
> 
>> Which seems to imply that the IRQ enable following napi_complete_done() is unguarded, and might race with the disable on an incoming poll.
>> Is that a possibility?
> 
> Same documents states in section "Scheduling and IRQ masking":
> IRQ should only be unmasked after a successful call to napi_complete_done()
> So I think we should be fine.
> 

Nevertheless, it would be good if we could get away without the flag.

I had started out with the assumption that an interrupt acknowledgement coinciding with some part of the work being done in rtl8169_poll() might be the cause of the problem.
So it seemed natural to try guarding the whole block by disabling interrupts at the beginning.
But this seems to work just as well:

diff --git linux-source-6.1~/drivers/net/ethernet/realtek/r8169_main.c linux-source-6.1/drivers/net/ethernet/realtek/r8169_main.c
index 6e34177..353ce99 100644
--- linux-source-6.1~/drivers/net/ethernet/realtek/r8169_main.c
+++ linux-source-6.1/drivers/net/ethernet/realtek/r8169_main.c
@@ -4659,8 +4659,10 @@ static int rtl8169_poll(struct napi_struct *napi, int budget)

 	work_done = rtl_rx(dev, tp, budget);

-	if (work_done < budget && napi_complete_done(napi, work_done))
+	if (work_done < budget && napi_complete_done(napi, work_done)) {
+		rtl_irq_disable(tp);
 		rtl_irq_enable(tp);
+	}

 	return work_done;
 }

On this basis, I assume the problem may actually involve some subtlety with the behaviour of the interrupt mask and status registers.

In addition, I'm not sure it is such a good idea to do away with disabling interrupts from within rtl8169_interrupt().
This causes a modest, but noticeable increase in IRQ rate which I measured at around 3 to 7%, depending on whether the load is Tx or Rx heavy and also on the setting of gro_flush_timeout and napi_defer_hard_irqs.

e.g.
Tx only test with iperf3, gro_flush_timeout=20000, napi_defer_hard_irqs=1:
Averaged 32343 vs 30165 interrupts per second, an increase of about 7%.

Bidirectional test with with gro_flush_timeout=0, napi_defer_hard_irqs=0:
Averaged 82118 vs 79689 interrupts per second, an increase of about 3%.

Given that these NICs are already fairly heavy on interrupt rate, it seems a shame to make them even worse!

All in all I preferred the solution where we do all the interrupt disabling in rtl8169_interrupt(), notwithstanding that it may require a change to the interface of napi_schedule_prep().

next prev parent reply	other threads:[~2024-05-12 19:49 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-06 21:28 r8169: transmit queue timeouts and IRQ masking Ken Milmore
2024-05-08 21:14 ` Heiner Kallweit
2024-05-09 22:24   ` Ken Milmore
2024-05-10 22:06     ` Heiner Kallweit
2024-05-10 22:29       ` Ken Milmore
2024-05-11 16:31         ` Heiner Kallweit
2024-05-12 19:49           ` Ken Milmore [this message]
2024-05-12 22:08             ` Heiner Kallweit
2024-05-13  0:27               ` Ken Milmore

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:6e34177 dfblob:353ce99 )
 OR (
bs:"Re: r8169: transmit queue timeouts and IRQ masking" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c71a960f-16d3-41f0-9899-0040116b30ee@gmail.com \
    --to=ken.milmore@gmail.com \
    --cc=hkallweit1@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=nic_swsd@realtek.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).