xenomai.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: "Chen, Hongzhan" <hongzhan.chen@intel.com>
To: "Bezdeka, Florian" <florian.bezdeka@siemens.com>,
	"Kiszka, Jan" <jan.kiszka@siemens.com>,
	Philippe Gerum <rpm@xenomai.org>
Cc: "xenomai@lists.linux.dev" <xenomai@lists.linux.dev>
Subject: RE: irq storm on sd/mmc card detect gpio interrupt alloced from pinctrl device
Date: Mon, 4 Sep 2023 13:35:52 +0000	[thread overview]
Message-ID: <MN2PR11MB4285C468A329950FDF2FF90FF2E9A@MN2PR11MB4285.namprd11.prod.outlook.com> (raw)
In-Reply-To: <2b0c0831b35897ea2d9ea0564f03fb13276ce603.camel@siemens.com>



>-----Original Message-----
>From: Florian Bezdeka <florian.bezdeka@siemens.com>
>Sent: Monday, September 4, 2023 5:35 PM
>To: Kiszka, Jan <jan.kiszka@siemens.com>; Chen, Hongzhan
><hongzhan.chen@intel.com>; Philippe Gerum <rpm@xenomai.org>
>Cc: xenomai@lists.linux.dev
>Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced from
>pinctrl device
>
>On Mon, 2023-09-04 at 10:56 +0200, Jan Kiszka wrote:
>> On 04.09.23 10:20, Chen, Hongzhan wrote:
>> >
>> >
>> > > -----Original Message-----
>> > > From: Philippe Gerum <rpm@xenomai.org>
>> > > Sent: Friday, September 1, 2023 2:35 PM
>> > > To: Chen, Hongzhan <hongzhan.chen@intel.com>
>> > > Cc: Bezdeka, Florian <florian.bezdeka@siemens.com>; Kiszka, Jan
>> > > <jan.kiszka@siemens.com>; xenomai@lists.linux.dev
>> > > Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced from
>> > > pinctrl device
>> > >
>> > >
>> > > "Chen, Hongzhan" <hongzhan.chen@intel.com> writes:
>> > >
>> > > > > -----Original Message-----
>> > > > > From: Philippe Gerum <rpm@xenomai.org>
>> > > > > Sent: Thursday, August 31, 2023 4:57 PM
>> > > > > To: Chen, Hongzhan <hongzhan.chen@intel.com>
>> > > > > Cc: Bezdeka, Florian <florian.bezdeka@siemens.com>; Kiszka, Jan
>> > > > > <jan.kiszka@siemens.com>; xenomai@lists.linux.dev
>> > > > > Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced
>from
>> > > > > pinctrl device
>> > > > >
>> > > > >
>> > > > > "Chen, Hongzhan" <hongzhan.chen@intel.com> writes:
>> > > > >
>> > > > > > > -----Original Message-----
>> > > > > > > From: Philippe Gerum <rpm@xenomai.org>
>> > > > > > > Sent: Thursday, August 31, 2023 4:06 PM
>> > > > > > > To: Chen, Hongzhan <hongzhan.chen@intel.com>
>> > > > > > > Cc: Bezdeka, Florian <florian.bezdeka@siemens.com>; Kiszka, Jan
>> > > > > > > <jan.kiszka@siemens.com>; xenomai@lists.linux.dev
>> > > > > > > Subject: Re: irq storm on sd/mmc card detect gpio interrupt
>alloced from
>> > > > > > > pinctrl device
>> > > > > > >
>> > > > > > >
>> > > > > > > "Chen, Hongzhan" <hongzhan.chen@intel.com> writes:
>> > > > > > >
>> > > > > > > > > -----Original Message-----
>> > > > > > > > > From: Florian Bezdeka <florian.bezdeka@siemens.com>
>> > > > > > > > > Sent: Thursday, August 31, 2023 3:10 PM
>> > > > > > > > > To: Chen, Hongzhan <hongzhan.chen@intel.com>
>> > > > > > > > > Cc: Kiszka, Jan <jan.kiszka@siemens.com>;
>xenomai@lists.linux.dev;
>> > > > > > > Philippe
>> > > > > > > > > Gerum <rpm@xenomai.org>
>> > > > > > > > > Subject: Re: irq storm on sd/mmc card detect gpio interrupt
>alloced
>> > > from
>> > > > > > > > > pinctrl device
>> > > > > > > > >
>> > > > > > > > > On Thu, 2023-08-31 at 00:45 +0000, Chen, Hongzhan wrote:
>> > > > > > > > > >
>> > > > > > > > > > > -----Original Message-----
>> > > > > > > > > > > From: Florian Bezdeka <florian.bezdeka@siemens.com>
>> > > > > > > > > > > Sent: Wednesday, August 30, 2023 9:23 PM
>> > > > > > > > > > > To: Chen, Hongzhan <hongzhan.chen@intel.com>
>> > > > > > > > > > > Cc: Kiszka, Jan <jan.kiszka@siemens.com>;
>xenomai@lists.linux.dev;
>> > > > > > > > > Philippe
>> > > > > > > > > > > Gerum <rpm@xenomai.org>
>> > > > > > > > > > > Subject: Re: irq storm on sd/mmc card detect gpio
>interrupt alloced
>> > > > > > > from
>> > > > > > > > > > > pinctrl device
>> > > > > > > > > > >
>> > > > > > > > > > > On Wed, 2023-08-30 at 07:54 +0000, Chen, Hongzhan
>wrote:
>> > > > > > > > > > > >
>> > > > > > > > > > > > > -----Original Message-----
>> > > > > > > > > > > > > From: Florian Bezdeka <florian.bezdeka@siemens.com>
>> > > > > > > > > > > > > Sent: Wednesday, August 30, 2023 3:20 PM
>> > > > > > > > > > > > > To: Chen, Hongzhan <hongzhan.chen@intel.com>;
>> > > > > > > > > > > xenomai@lists.linux.dev
>> > > > > > > > > > > > > Subject: Re: irq storm on sd/mmc card detect gpio
>interrupt
>> > > > > alloced
>> > > > > > > > > from
>> > > > > > > > > > > > > pinctrl device
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > On Wed, 2023-08-30 at 03:05 +0000, Chen, Hongzhan
>wrote:
>> > > > > > > > > > > > > > Hi Philippe/Jan
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > We found there is irq storm issue like [1] on sd/mmc
>card
>> > > detect
>> > > > > > > > > > > interrupt
>> > > > > > > > > > > > > alloced from
>> > > > > > > > > > > > > > pinctrl device with Xenomai 3.2 Dovetail 5.10.*
>> > > > > > > > > > > > > > According to our test,  the patch [2] works for such
>issue in
>> > > our
>> > > > > > > case.
>> > > > > > > > > It
>> > > > > > > > > > > > > seems that handle_edge_irq
>> > > > > > > > > > > > > > never ack the irq to clear the interrupt status for the
>Linux
>> > > gpio
>> > > > > > > > > interrupt.
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > I am wondering if there is any side-effect for this
>patch, Please
>> > > > > > > > > suggest.
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > Regards
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > Hongzhan Chen
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > [1]:
>> > > > > > > > > > > > > > cat /proc/interrupts
>> > > > > > > > > > > > > >   14:    2440190          0          0          0  IR-IO-APIC   14    -
>fasteoi
>> > > > > > > > > > > INTC1020:00,
>> > > > > > > > > > > > > INTC1020:01, INTC1020:03, INTC1020:04, INTC1020:05
>> > > > > > > > > > > > > > 128:    2440190          0          0          0  INTC1020:01  112
>> > > > > > > 0000:00:1a.1
>> > > > > > > > > cd
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > [2]:
>> > > > > > > > > > > > > > diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
>> > > > > > > > > > > > > > index 6aca6c036ada..4d08eaf076e9 100644
>> > > > > > > > > > > > > > --- a/kernel/irq/chip.c
>> > > > > > > > > > > > > > +++ b/kernel/irq/chip.c
>> > > > > > > > > > > > > > @@ -869,7 +869,6 @@ void handle_edge_irq(struct
>irq_desc
>> > > > > > > *desc)
>> > > > > > > > > > > > > >         kstat_incr_irqs_this_cpu(desc);
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > >         /* Start handling the irq */
>> > > > > > > > > > > > > > -       if (!irqs_pipelined())
>> > > > > > > > > > > > > >                 chip->irq_ack(&desc->irq_data);
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > That doesn't make much sense. A IRQ storm means
>that we
>> > > > > > > unmasked
>> > > > > > > > > the
>> > > > > > > > > > > > > IRQ to early or forgot to mask it.
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > A forgotten ACK would mean the IRQ stalls, so it would
>never be
>> > > > > > > fired
>> > > > > > > > > > > > > again. No?
>> > > > > > > > > > > >
>> > > > > > > > > > > > According to the test, it seems forgotten ACK will keep
>firing
>> > > > > interrupt
>> > > > > > > > > here
>> > > > > > > > > > > like
>> > > > > > > > > > > > following comparing:
>> > > > > > > > > > >
>> > > > > > > > > > > This doesn't make sense (IMHO).
>> > > > > > > > > > > Can you provide a callstack from handle_edge_irq() in case
>> > > > > > > > > > > on_pipeline_entry() returns false?
>> > > > > > > > > > >
>> > > > > > > > > > > The IRQ should be ACKed on pipeline entry already, even if
>> > > > > > > > > > > handle_edge_irq() is called again (from Linux/Inband
>context) the
>> > > IRQ
>> > > > > > > > > > > should already be ACKed.
>> > > > > > > > > >
>> > > > > > > > > > Following stack is dumped after patching my patch. Without
>patching
>> > > > > > > there
>> > > > > > > > > is no intel_gpio_irq_ack. It is called in
>handle_irq_pipelined_finish from
>> > > > > > > which
>> > > > > > > > > it already called irq_exit_pipeline before
>sync_current_irq_stage. That
>> > > is
>> > > > > > > why
>> > > > > > > > > on_pipeline_entry returned false in handle_edge_irq
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > [  291.009330] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G
>U
>> > > > > > > > > 5.10.179-intel-ese-standard-lts-dovetail+ #1
>> > > > > > > > > > [  291.009331] Hardware name: Intel Corporation Elkhart
>Lake
>> > > > > Embedded
>> > > > > > > > > Platform/ElkhartLake LPDDR4x T3 CRB, BIOS
>> > > > > > > > > EHLSFWI1.R00.5204.A01.2305180757 05/18/2023
>> > > > > > > > > > [  291.009332] IRQ stage: Linux
>> > > > > > > > > > [  291.009332] Call Trace:
>> > > > > > > > > > [  291.009332]  <IRQ>
>> > > > > > > > > > [  291.009333]  dump_stack+0x7b/0x93
>> > > > > > > > > > [  291.009333]  intel_gpio_irq_ack.cold+0x22/0x27
>> > > > > > > > > > [  291.009334]  handle_edge_irq+0xe4/0x310
>> > > > > > > > > > [  291.009334]  generic_handle_irq+0x4b/0x60
>> > > > > > > > > > [  291.009335]  intel_gpio_irq+0x11f/0x1c0
>> > > > > > > > > > [  291.009335]  __handle_irq_event_percpu+0x77/0x190
>> > > > > > > > > > [  291.009336]  ? handle_level_irq+0x200/0x200
>> > > > > > > > > > [  291.009336]  handle_irq_event+0x6c/0x100
>> > > > > > > > > > [  291.009337]  handle_fasteoi_irq+0xd8/0x250
>> > > > > > > > > > [  291.009337]  asm_call_irq_on_stack+0xf/0x20
>> > > > > > > > > > [  291.009338]  </IRQ>
>> > > > > > > > > > [  291.009338]  arch_do_IRQ_pipelined+0x11d/0x230
>> > > > > > > > > > [  291.009339]  sync_current_irq_stage+0x157/0x1f0
>> > > > > > > > >                   ^^
>> > > > > > > > > My understanding is that you are not handling an HW IRQ,
>you are in
>> > > the
>> > > > > > > > > stage where the inband IRQ log is being processed. HW IRQs
>are
>> > > already
>> > > > > > > > > ACKed at that stage.
>> > > > > > > > >
>> > > > > > > > > Maybe we have to understand your IRQ chip routing. Is there
>an IO
>> > > APIC
>> > > > > > > > > involved? handle_fasteoi_irq() would be the first level IRQ
>entry point
>> > > > > > > > > which takes care of masking + ack.
>> > > > > > > >
>> > > > > > > > Yes. IOAPIC is the first level for external devices.
>handle_fasteoi_irq() is
>> > > > > > > registered in io_apic driver. There is second level pinctrl device
>acting as
>> > > > > > > gpiochip which has irq domains where sd/mmc cd interrupt
>alloced from ,
>> > > > > > > which need to be acked in handle_edge_irq.
>> > > > > > > >
>> > > > > > > > Regards
>> > > > > > > >
>> > > > > > > > Hongzhan Chen
>> > > > > > > > >
>> > > > > > > > > > [  291.009339]  handle_irq_pipelined_finish+0xa2/0x1a0
>> > > > > > > > > > [  291.009340]  arch_pipeline_entry+0xdb/0x120
>> > > > > > > > > > [  291.009340]  asm_common_interrupt+0x1e/0x40
>> > > > > > > > > > [  291.009341] RIP: 0010:cpu_idle_poll.isra.0+0x3a/0x110
>> > > > > > > > > > [  291.009342] Code: 00 85 d2 0f 8f 9c 00 00 00 e8 72 33 28 ff
>65 48 8b
>> > > 1c
>> > > > > > > 25
>> > > > > > > > > 00 ad 01 00 e8 a4 29 27 ff 48 8b 03 a8 08 74 0b eb 1c f3 90 48
>8b 03 <a8>
>> > > > > 08
>> > > > > > > 75
>> > > > > > > > > 13 8b 05 cc 6b 01 01 85 c0 75 ed e8 f3 22 2a ff 85 c0 75
>> > > > > > > > > > [  291.009342] RSP: 0018:ffffffffa1803ec0 EFLAGS: 00000202
>> > > > > > > > > >
>> > > > > > > > > > Regards
>> > > > > > > > > >
>> > > > > > > > > > Hongzhan Chen
>> > > > > > > > > > >
>> > > > > > > > > > > But: If we use the dovetail sources to build a kernel with
>pipelining
>> > > > > > > > > > > disabled we have to ACK the IRQ. I assume pipelining is
>active on
>> > > your
>> > > > > > > > > > > end.
>> > > > > > > > > > >
>> > > > > > > > > > > As Jan said, I'm searching for a similar problem for some
>time now
>> > > > > that
>> > > > > > > > > > > I can only reproduce on VirtualBox VMM. In my case I'm
>seeing a
>> > > stall,
>> > > > > > > > > > > not a storm. I will give it a try, let's see.
>> > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > After patching:
>> > > > > > > > > > > > 128:          2          0          0          0  INTC1020:01  112
>0000:00:1a.1
>> > > > > cd
>> > > > > > > > > > > >
>> > > > > > > > > > > > Before patching:
>> > > > > > > > > > > > 128:    2440190          0          0          0  INTC1020:01  112
>> > > > > 0000:00:1a.1
>> > > > > > > cd
>> > > > > > > > > > > >
>> > > > > > > > > > > > Regards
>> > > > > > > > > > > >
>> > > > > > > > > > > > Hongzhan Chen
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > Could you please check if we call unmask_irq() with HW
>IRQs
>> > > > > > > enabled?
>> > > > > > > > > If
>> > > > > > > > > > > > > so, I have an idea what is going on...
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > Florian
>> > > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > >
>> > > > > > > To start with, prior to any further analysis:
>> > > > > > >
>> > > > > > > diff --git a/drivers/pinctrl/intel/pinctrl-intel.h
>> > > b/drivers/pinctrl/intel/pinctrl-
>> > > > > > > intel.h
>> > > > > > > index 65628423bf639..b7afa763ee079 100644
>> > > > > > > --- a/drivers/pinctrl/intel/pinctrl-intel.h
>> > > > > > > +++ b/drivers/pinctrl/intel/pinctrl-intel.h
>> > > > > > > @@ -222,7 +222,7 @@ struct intel_pinctrl_context {
>> > > > > > >  */
>> > > > > > > struct intel_pinctrl {
>> > > > > > > 	struct device *dev;
>> > > > > > > -	raw_spinlock_t lock;
>> > > > > > > +	hard_spinlock_t lock;
>> > > > > >
>> > > > > > I already modified it at first when we found the issue.  But it does
>not
>> > > work
>> > > > > with it.
>> > > > > >
>> > > > > > After patching my patch[2] I mentioned, the issue disappear finally.
>> > > > > >
>> > > > >
>> > > > > Ok, but the patch is papering over the real issue I believe.
>> > > >
>> > > > What kind of real issue it would be? What checkpoint I need to check to
>find
>> > > the real issue out? Please suggest.
>> > > > Continuous fired irq is obviously caused by forgotten ack behavior for
>the
>> > > issue. The ack is supposed to happen elsewhere?
>> > > >
>> > >
>> > > Edge ack is supposed to happen in handle_edge_irq() when a pipeline
>> > > entry is detected a few lines earlier, your patch would cause a double
>> >
>> > Following patch is OK for you? It should avoid double ACK.  Because
>on_pipeline_entry after  ack
>> > it set IRQS_EDGE. so we can ack here when find that  IRQS_EDGE is not set
>to avoid double ack.
>> >
>> > index 6aca6c036ada..d06cf4a79f2a 10064
>> > --- a/kernel/irq/chip.c
>> > +++ b/kernel/irq/chip.c
>> > @@ -868,8 +868,10 @@ void handle_edge_irq(struct irq_desc *desc)
>> >         if (on_pipeline_entry()) {
>> >                 chip->irq_ack(&desc->irq_data);
>> >                 desc->istate |= IRQS_EDGE;
>> >                 handle_oob_irq(desc);
>> >                 goto out_unlock;
>> >         }
>> >
>> >         kstat_incr_irqs_this_cpu(desc);
>> >
>> > -       /* Start handling the irq */
>> > -       if (!irqs_pipelined())
>> > +       /* Start handling the irq
>> > +        * Avoid duplicating ack when IRQS_EDGE set
>> > +        */
>> > +       if (!(desc->istate & IRQS_EDGE))
>> >                 chip->irq_ack(&desc->irq_data);
>> >
>>
>> Doesn't answer why we get here without an ack while the pipeline is
>> enabled, does it?
>
>The problem in this case is the "stacking" of IRQ chips. Philippe
>already confirmed that we have a problem here.
>
>This patch is still papering over the real issue.
>
>As already pointed out by Philippe we have two work items here:
> - Missing pipelining support within the Intel GPIO pin control

Yes, the patches already submitted to review.

> - Dovetail logic regarding stacked IRQ chips needs reworking

Per my understanding , there is only one side-effect need to fix is double ack issue based on my
first patch as Philippe explained. I do not know if my last patch of adding new flag address double ack issue.

Regards

Hongzhan Chen
>
>Florian
>
>>
>> Jan
>>


  reply	other threads:[~2023-09-04 13:35 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-30  3:05 irq storm on sd/mmc card detect gpio interrupt alloced from pinctrl device Chen, Hongzhan
2023-08-30  7:11 ` Jan Kiszka
2023-08-30  7:32   ` Chen, Hongzhan
2023-08-30  7:20 ` Florian Bezdeka
2023-08-30  7:54   ` Chen, Hongzhan
2023-08-30 13:23     ` Florian Bezdeka
2023-08-31  0:45       ` Chen, Hongzhan
2023-08-31  7:09         ` Florian Bezdeka
2023-08-31  7:39           ` Chen, Hongzhan
2023-08-31  8:06             ` Philippe Gerum
2023-08-31  8:35               ` Chen, Hongzhan
2023-08-31  8:56                 ` Philippe Gerum
2023-09-01  2:28                   ` Chen, Hongzhan
2023-09-01  6:35                     ` Philippe Gerum
2023-09-04  8:20                       ` Chen, Hongzhan
2023-09-04  8:56                         ` Jan Kiszka
2023-09-04  9:34                           ` Florian Bezdeka
2023-09-04 13:35                             ` Chen, Hongzhan [this message]
2023-09-04  9:21                         ` Philippe Gerum
2023-09-04 13:26                           ` Chen, Hongzhan
2023-09-04  9:36                         ` Chen, Hongzhan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=MN2PR11MB4285C468A329950FDF2FF90FF2E9A@MN2PR11MB4285.namprd11.prod.outlook.com \
    --to=hongzhan.chen@intel.com \
    --cc=florian.bezdeka@siemens.com \
    --cc=jan.kiszka@siemens.com \
    --cc=rpm@xenomai.org \
    --cc=xenomai@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).