* [PATCH v3 1/2] can: m_can: set init flag earlier in probe
@ 2024-09-23 15:32 Matthias Schiffer
2024-09-23 15:32 ` [PATCH v3 2/2] can: m_can: fix missed interrupts with m_can_pci Matthias Schiffer
2024-09-23 16:49 ` [PATCH v3 1/2] can: m_can: set init flag earlier in probe Markus Schneider-Pargmann
0 siblings, 2 replies; 7+ messages in thread
From: Matthias Schiffer @ 2024-09-23 15:32 UTC (permalink / raw)
To: Chandrasekar Ramakrishnan, Marc Kleine-Budde, Vincent Mailhol
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Martin Hundebøll, Markus Schneider-Pargmann,
Felipe Balbi (Intel), Raymond Tan, Jarkko Nikula, linux-can,
netdev, linux-kernel, linux, Matthias Schiffer
While an m_can controller usually already has the init flag from a
hardware reset, no such reset happens on the integrated m_can_pci of the
Intel Elkhart Lake. If the CAN controller is found in an active state,
m_can_dev_setup() would fail because m_can_niso_supported() calls
m_can_cccr_update_bits(), which refuses to modify any other configuration
bits when CCCR_INIT is not set.
To avoid this issue, set CCCR_INIT before attempting to modify any other
configuration flags.
Fixes: cd5a46ce6fa6 ("can: m_can: don't enable transceiver when probing")
Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com>
---
v2: no changes
v3: updated comment to mention Elkhart Lake
drivers/net/can/m_can/m_can.c | 14 +++++++++-----
1 file changed, 9 insertions(+), 5 deletions(-)
diff --git a/drivers/net/can/m_can/m_can.c b/drivers/net/can/m_can/m_can.c
index 012c3d22b01dd..c85ac1b15f723 100644
--- a/drivers/net/can/m_can/m_can.c
+++ b/drivers/net/can/m_can/m_can.c
@@ -1681,6 +1681,14 @@ static int m_can_dev_setup(struct m_can_classdev *cdev)
return -EINVAL;
}
+ /* Write the INIT bit, in case no hardware reset has happened before
+ * the probe (for example, it was observed that the Intel Elkhart Lake
+ * SoCs do not properly reset the CAN controllers on reboot)
+ */
+ err = m_can_cccr_update_bits(cdev, CCCR_INIT, CCCR_INIT);
+ if (err)
+ return err;
+
if (!cdev->is_peripheral)
netif_napi_add(dev, &cdev->napi, m_can_poll);
@@ -1732,11 +1740,7 @@ static int m_can_dev_setup(struct m_can_classdev *cdev)
return -EINVAL;
}
- /* Forcing standby mode should be redundant, as the chip should be in
- * standby after a reset. Write the INIT bit anyways, should the chip
- * be configured by previous stage.
- */
- return m_can_cccr_update_bits(cdev, CCCR_INIT, CCCR_INIT);
+ return 0;
}
static void m_can_stop(struct net_device *dev)
--
TQ-Systems GmbH | Mühlstraße 2, Gut Delling | 82229 Seefeld, Germany
Amtsgericht München, HRB 105018
Geschäftsführer: Detlef Schneider, Rüdiger Stahl, Stefan Schneider
https://www.tq-group.com/
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v3 2/2] can: m_can: fix missed interrupts with m_can_pci
2024-09-23 15:32 [PATCH v3 1/2] can: m_can: set init flag earlier in probe Matthias Schiffer
@ 2024-09-23 15:32 ` Matthias Schiffer
2024-09-24 6:08 ` Markus Schneider-Pargmann
2024-09-23 16:49 ` [PATCH v3 1/2] can: m_can: set init flag earlier in probe Markus Schneider-Pargmann
1 sibling, 1 reply; 7+ messages in thread
From: Matthias Schiffer @ 2024-09-23 15:32 UTC (permalink / raw)
To: Chandrasekar Ramakrishnan, Marc Kleine-Budde, Vincent Mailhol
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Martin Hundebøll, Markus Schneider-Pargmann,
Felipe Balbi (Intel), Raymond Tan, Jarkko Nikula, linux-can,
netdev, linux-kernel, linux, Matthias Schiffer
The interrupt line of PCI devices is interpreted as edge-triggered,
however the interrupt signal of the m_can controller integrated in Intel
Elkhart Lake CPUs appears to be generated level-triggered.
Consider the following sequence of events:
- IR register is read, interrupt X is set
- A new interrupt Y is triggered in the m_can controller
- IR register is written to acknowledge interrupt X. Y remains set in IR
As at no point in this sequence no interrupt flag is set in IR, the
m_can interrupt line will never become deasserted, and no edge will ever
be observed to trigger another run of the ISR. This was observed to
result in the TX queue of the EHL m_can to get stuck under high load,
because frames were queued to the hardware in m_can_start_xmit(), but
m_can_finish_tx() was never run to account for their successful
transmission.
To fix the issue, repeatedly read and acknowledge interrupts at the
start of the ISR until no interrupt flags are set, so the next incoming
interrupt will also result in an edge on the interrupt line.
Fixes: cab7ffc0324f ("can: m_can: add PCI glue driver for Intel Elkhart Lake")
Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com>
---
v2: introduce flag is_edge_triggered, so we can avoid the loop on !m_can_pci
v3:
- rename flag to irq_edge_triggered
- update comment to describe the issue more generically as one of systems with
edge-triggered interrupt line. m_can_pci is mentioned as an example, as it
is the only m_can variant that currently sets the irq_edge_triggered flag.
drivers/net/can/m_can/m_can.c | 22 +++++++++++++++++-----
drivers/net/can/m_can/m_can.h | 1 +
drivers/net/can/m_can/m_can_pci.c | 1 +
3 files changed, 19 insertions(+), 5 deletions(-)
diff --git a/drivers/net/can/m_can/m_can.c b/drivers/net/can/m_can/m_can.c
index c85ac1b15f723..24e348f677714 100644
--- a/drivers/net/can/m_can/m_can.c
+++ b/drivers/net/can/m_can/m_can.c
@@ -1207,20 +1207,32 @@ static void m_can_coalescing_update(struct m_can_classdev *cdev, u32 ir)
static int m_can_interrupt_handler(struct m_can_classdev *cdev)
{
struct net_device *dev = cdev->net;
- u32 ir;
+ u32 ir = 0, ir_read;
int ret;
if (pm_runtime_suspended(cdev->dev))
return IRQ_NONE;
- ir = m_can_read(cdev, M_CAN_IR);
+ /* The m_can controller signals its interrupt status as a level, but
+ * depending in the integration the CPU may interpret the signal as
+ * edge-triggered (for example with m_can_pci).
+ * We must observe that IR is 0 at least once to be sure that the next
+ * interrupt will generate an edge.
+ */
+ while ((ir_read = m_can_read(cdev, M_CAN_IR)) != 0) {
+ ir |= ir_read;
+
+ /* ACK all irqs */
+ m_can_write(cdev, M_CAN_IR, ir);
+
+ if (!cdev->irq_edge_triggered)
+ break;
+ }
+
m_can_coalescing_update(cdev, ir);
if (!ir)
return IRQ_NONE;
- /* ACK all irqs */
- m_can_write(cdev, M_CAN_IR, ir);
-
if (cdev->ops->clear_interrupts)
cdev->ops->clear_interrupts(cdev);
diff --git a/drivers/net/can/m_can/m_can.h b/drivers/net/can/m_can/m_can.h
index 92b2bd8628e6b..ef39e8e527ab6 100644
--- a/drivers/net/can/m_can/m_can.h
+++ b/drivers/net/can/m_can/m_can.h
@@ -99,6 +99,7 @@ struct m_can_classdev {
int pm_clock_support;
int pm_wake_source;
int is_peripheral;
+ bool irq_edge_triggered;
// Cached M_CAN_IE register content
u32 active_interrupts;
diff --git a/drivers/net/can/m_can/m_can_pci.c b/drivers/net/can/m_can/m_can_pci.c
index d72fe771dfc7a..9ad7419f88f83 100644
--- a/drivers/net/can/m_can/m_can_pci.c
+++ b/drivers/net/can/m_can/m_can_pci.c
@@ -127,6 +127,7 @@ static int m_can_pci_probe(struct pci_dev *pci, const struct pci_device_id *id)
mcan_class->pm_clock_support = 1;
mcan_class->pm_wake_source = 0;
mcan_class->can.clock.freq = id->driver_data;
+ mcan_class->irq_edge_triggered = true;
mcan_class->ops = &m_can_pci_ops;
pci_set_drvdata(pci, mcan_class);
--
TQ-Systems GmbH | Mühlstraße 2, Gut Delling | 82229 Seefeld, Germany
Amtsgericht München, HRB 105018
Geschäftsführer: Detlef Schneider, Rüdiger Stahl, Stefan Schneider
https://www.tq-group.com/
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v3 1/2] can: m_can: set init flag earlier in probe
2024-09-23 15:32 [PATCH v3 1/2] can: m_can: set init flag earlier in probe Matthias Schiffer
2024-09-23 15:32 ` [PATCH v3 2/2] can: m_can: fix missed interrupts with m_can_pci Matthias Schiffer
@ 2024-09-23 16:49 ` Markus Schneider-Pargmann
1 sibling, 0 replies; 7+ messages in thread
From: Markus Schneider-Pargmann @ 2024-09-23 16:49 UTC (permalink / raw)
To: Matthias Schiffer
Cc: Chandrasekar Ramakrishnan, Marc Kleine-Budde, Vincent Mailhol,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Martin Hundebøll, Felipe Balbi (Intel), Raymond Tan,
Jarkko Nikula, linux-can, netdev, linux-kernel, linux
On Mon, Sep 23, 2024 at 05:32:15PM GMT, Matthias Schiffer wrote:
> While an m_can controller usually already has the init flag from a
> hardware reset, no such reset happens on the integrated m_can_pci of the
> Intel Elkhart Lake. If the CAN controller is found in an active state,
> m_can_dev_setup() would fail because m_can_niso_supported() calls
> m_can_cccr_update_bits(), which refuses to modify any other configuration
> bits when CCCR_INIT is not set.
>
> To avoid this issue, set CCCR_INIT before attempting to modify any other
> configuration flags.
>
> Fixes: cd5a46ce6fa6 ("can: m_can: don't enable transceiver when probing")
> Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com>
Reviewed-by: Markus Schneider-Pargmann <msp@baylibre.com>
Best
Markus
> ---
>
> v2: no changes
> v3: updated comment to mention Elkhart Lake
>
> drivers/net/can/m_can/m_can.c | 14 +++++++++-----
> 1 file changed, 9 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/net/can/m_can/m_can.c b/drivers/net/can/m_can/m_can.c
> index 012c3d22b01dd..c85ac1b15f723 100644
> --- a/drivers/net/can/m_can/m_can.c
> +++ b/drivers/net/can/m_can/m_can.c
> @@ -1681,6 +1681,14 @@ static int m_can_dev_setup(struct m_can_classdev *cdev)
> return -EINVAL;
> }
>
> + /* Write the INIT bit, in case no hardware reset has happened before
> + * the probe (for example, it was observed that the Intel Elkhart Lake
> + * SoCs do not properly reset the CAN controllers on reboot)
> + */
> + err = m_can_cccr_update_bits(cdev, CCCR_INIT, CCCR_INIT);
> + if (err)
> + return err;
> +
> if (!cdev->is_peripheral)
> netif_napi_add(dev, &cdev->napi, m_can_poll);
>
> @@ -1732,11 +1740,7 @@ static int m_can_dev_setup(struct m_can_classdev *cdev)
> return -EINVAL;
> }
>
> - /* Forcing standby mode should be redundant, as the chip should be in
> - * standby after a reset. Write the INIT bit anyways, should the chip
> - * be configured by previous stage.
> - */
> - return m_can_cccr_update_bits(cdev, CCCR_INIT, CCCR_INIT);
> + return 0;
> }
>
> static void m_can_stop(struct net_device *dev)
> --
> TQ-Systems GmbH | Mühlstraße 2, Gut Delling | 82229 Seefeld, Germany
> Amtsgericht München, HRB 105018
> Geschäftsführer: Detlef Schneider, Rüdiger Stahl, Stefan Schneider
> https://www.tq-group.com/
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v3 2/2] can: m_can: fix missed interrupts with m_can_pci
2024-09-23 15:32 ` [PATCH v3 2/2] can: m_can: fix missed interrupts with m_can_pci Matthias Schiffer
@ 2024-09-24 6:08 ` Markus Schneider-Pargmann
2024-09-26 9:19 ` Matthias Schiffer
0 siblings, 1 reply; 7+ messages in thread
From: Markus Schneider-Pargmann @ 2024-09-24 6:08 UTC (permalink / raw)
To: Matthias Schiffer
Cc: Chandrasekar Ramakrishnan, Marc Kleine-Budde, Vincent Mailhol,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Martin Hundebøll, Felipe Balbi (Intel), Raymond Tan,
Jarkko Nikula, linux-can, netdev, linux-kernel, linux
On Mon, Sep 23, 2024 at 05:32:16PM GMT, Matthias Schiffer wrote:
> The interrupt line of PCI devices is interpreted as edge-triggered,
> however the interrupt signal of the m_can controller integrated in Intel
> Elkhart Lake CPUs appears to be generated level-triggered.
>
> Consider the following sequence of events:
>
> - IR register is read, interrupt X is set
> - A new interrupt Y is triggered in the m_can controller
> - IR register is written to acknowledge interrupt X. Y remains set in IR
>
> As at no point in this sequence no interrupt flag is set in IR, the
> m_can interrupt line will never become deasserted, and no edge will ever
> be observed to trigger another run of the ISR. This was observed to
> result in the TX queue of the EHL m_can to get stuck under high load,
> because frames were queued to the hardware in m_can_start_xmit(), but
> m_can_finish_tx() was never run to account for their successful
> transmission.
>
> To fix the issue, repeatedly read and acknowledge interrupts at the
> start of the ISR until no interrupt flags are set, so the next incoming
> interrupt will also result in an edge on the interrupt line.
>
> Fixes: cab7ffc0324f ("can: m_can: add PCI glue driver for Intel Elkhart Lake")
> Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com>
Just a few comment nitpicks below. Otherwise:
Reviewed-by: Markus Schneider-Pargmann <msp@baylibre.com>
> ---
>
> v2: introduce flag is_edge_triggered, so we can avoid the loop on !m_can_pci
> v3:
> - rename flag to irq_edge_triggered
> - update comment to describe the issue more generically as one of systems with
> edge-triggered interrupt line. m_can_pci is mentioned as an example, as it
> is the only m_can variant that currently sets the irq_edge_triggered flag.
>
> drivers/net/can/m_can/m_can.c | 22 +++++++++++++++++-----
> drivers/net/can/m_can/m_can.h | 1 +
> drivers/net/can/m_can/m_can_pci.c | 1 +
> 3 files changed, 19 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/net/can/m_can/m_can.c b/drivers/net/can/m_can/m_can.c
> index c85ac1b15f723..24e348f677714 100644
> --- a/drivers/net/can/m_can/m_can.c
> +++ b/drivers/net/can/m_can/m_can.c
> @@ -1207,20 +1207,32 @@ static void m_can_coalescing_update(struct m_can_classdev *cdev, u32 ir)
> static int m_can_interrupt_handler(struct m_can_classdev *cdev)
> {
> struct net_device *dev = cdev->net;
> - u32 ir;
> + u32 ir = 0, ir_read;
> int ret;
>
> if (pm_runtime_suspended(cdev->dev))
> return IRQ_NONE;
>
> - ir = m_can_read(cdev, M_CAN_IR);
> + /* The m_can controller signals its interrupt status as a level, but
> + * depending in the integration the CPU may interpret the signal as
^ on?
> + * edge-triggered (for example with m_can_pci).
> + * We must observe that IR is 0 at least once to be sure that the next
As the loop has a break for non edge-triggered chips, I think you should
include that in the comment, like 'For these edge-triggered
integrations, we must observe...' or something similar.
Best
Markus
> + * interrupt will generate an edge.
> + */
> + while ((ir_read = m_can_read(cdev, M_CAN_IR)) != 0) {
> + ir |= ir_read;
> +
> + /* ACK all irqs */
> + m_can_write(cdev, M_CAN_IR, ir);
> +
> + if (!cdev->irq_edge_triggered)
> + break;
> + }
> +
> m_can_coalescing_update(cdev, ir);
> if (!ir)
> return IRQ_NONE;
>
> - /* ACK all irqs */
> - m_can_write(cdev, M_CAN_IR, ir);
> -
> if (cdev->ops->clear_interrupts)
> cdev->ops->clear_interrupts(cdev);
>
> diff --git a/drivers/net/can/m_can/m_can.h b/drivers/net/can/m_can/m_can.h
> index 92b2bd8628e6b..ef39e8e527ab6 100644
> --- a/drivers/net/can/m_can/m_can.h
> +++ b/drivers/net/can/m_can/m_can.h
> @@ -99,6 +99,7 @@ struct m_can_classdev {
> int pm_clock_support;
> int pm_wake_source;
> int is_peripheral;
> + bool irq_edge_triggered;
>
> // Cached M_CAN_IE register content
> u32 active_interrupts;
> diff --git a/drivers/net/can/m_can/m_can_pci.c b/drivers/net/can/m_can/m_can_pci.c
> index d72fe771dfc7a..9ad7419f88f83 100644
> --- a/drivers/net/can/m_can/m_can_pci.c
> +++ b/drivers/net/can/m_can/m_can_pci.c
> @@ -127,6 +127,7 @@ static int m_can_pci_probe(struct pci_dev *pci, const struct pci_device_id *id)
> mcan_class->pm_clock_support = 1;
> mcan_class->pm_wake_source = 0;
> mcan_class->can.clock.freq = id->driver_data;
> + mcan_class->irq_edge_triggered = true;
> mcan_class->ops = &m_can_pci_ops;
>
> pci_set_drvdata(pci, mcan_class);
> --
> TQ-Systems GmbH | Mühlstraße 2, Gut Delling | 82229 Seefeld, Germany
> Amtsgericht München, HRB 105018
> Geschäftsführer: Detlef Schneider, Rüdiger Stahl, Stefan Schneider
> https://www.tq-group.com/
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v3 2/2] can: m_can: fix missed interrupts with m_can_pci
2024-09-24 6:08 ` Markus Schneider-Pargmann
@ 2024-09-26 9:19 ` Matthias Schiffer
2024-09-26 9:43 ` Marc Kleine-Budde
0 siblings, 1 reply; 7+ messages in thread
From: Matthias Schiffer @ 2024-09-26 9:19 UTC (permalink / raw)
To: Markus Schneider-Pargmann, Marc Kleine-Budde
Cc: Chandrasekar Ramakrishnan, Vincent Mailhol, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Martin Hundebøll,
Felipe Balbi (Intel), Raymond Tan, Jarkko Nikula, linux-can,
netdev, linux-kernel, linux
On Tue, 2024-09-24 at 08:08 +0200, Markus Schneider-Pargmann wrote:
>
> On Mon, Sep 23, 2024 at 05:32:16PM GMT, Matthias Schiffer wrote:
> > The interrupt line of PCI devices is interpreted as edge-triggered,
> > however the interrupt signal of the m_can controller integrated in Intel
> > Elkhart Lake CPUs appears to be generated level-triggered.
> >
> > Consider the following sequence of events:
> >
> > - IR register is read, interrupt X is set
> > - A new interrupt Y is triggered in the m_can controller
> > - IR register is written to acknowledge interrupt X. Y remains set in IR
> >
> > As at no point in this sequence no interrupt flag is set in IR, the
> > m_can interrupt line will never become deasserted, and no edge will ever
> > be observed to trigger another run of the ISR. This was observed to
> > result in the TX queue of the EHL m_can to get stuck under high load,
> > because frames were queued to the hardware in m_can_start_xmit(), but
> > m_can_finish_tx() was never run to account for their successful
> > transmission.
> >
> > To fix the issue, repeatedly read and acknowledge interrupts at the
> > start of the ISR until no interrupt flags are set, so the next incoming
> > interrupt will also result in an edge on the interrupt line.
> >
> > Fixes: cab7ffc0324f ("can: m_can: add PCI glue driver for Intel Elkhart Lake")
> > Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com>
>
> Just a few comment nitpicks below. Otherwise:
>
> Reviewed-by: Markus Schneider-Pargmann <msp@baylibre.com>
We have received a report that while this patch fixes a stuck queue issue reproducible with cangen,
the problem has not disappeared with our customer's application. I will hold off sending a new
version of the patch while we're investigating whether there is a separate issue with the same
symptoms or the patch is insufficient.
Patch 1/2 should be good to go and could be applied independently.
Matthias
>
> > ---
> >
> > v2: introduce flag is_edge_triggered, so we can avoid the loop on !m_can_pci
> > v3:
> > - rename flag to irq_edge_triggered
> > - update comment to describe the issue more generically as one of systems with
> > edge-triggered interrupt line. m_can_pci is mentioned as an example, as it
> > is the only m_can variant that currently sets the irq_edge_triggered flag.
> >
> > drivers/net/can/m_can/m_can.c | 22 +++++++++++++++++-----
> > drivers/net/can/m_can/m_can.h | 1 +
> > drivers/net/can/m_can/m_can_pci.c | 1 +
> > 3 files changed, 19 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/net/can/m_can/m_can.c b/drivers/net/can/m_can/m_can.c
> > index c85ac1b15f723..24e348f677714 100644
> > --- a/drivers/net/can/m_can/m_can.c
> > +++ b/drivers/net/can/m_can/m_can.c
> > @@ -1207,20 +1207,32 @@ static void m_can_coalescing_update(struct m_can_classdev *cdev, u32 ir)
> > static int m_can_interrupt_handler(struct m_can_classdev *cdev)
> > {
> > struct net_device *dev = cdev->net;
> > - u32 ir;
> > + u32 ir = 0, ir_read;
> > int ret;
> >
> > if (pm_runtime_suspended(cdev->dev))
> > return IRQ_NONE;
> >
> > - ir = m_can_read(cdev, M_CAN_IR);
> > + /* The m_can controller signals its interrupt status as a level, but
> > + * depending in the integration the CPU may interpret the signal as
> ^ on?
>
> > + * edge-triggered (for example with m_can_pci).
> > + * We must observe that IR is 0 at least once to be sure that the next
>
> As the loop has a break for non edge-triggered chips, I think you should
> include that in the comment, like 'For these edge-triggered
> integrations, we must observe...' or something similar.
>
> Best
> Markus
>
> > + * interrupt will generate an edge.
> > + */
> > + while ((ir_read = m_can_read(cdev, M_CAN_IR)) != 0) {
> > + ir |= ir_read;
> > +
> > + /* ACK all irqs */
> > + m_can_write(cdev, M_CAN_IR, ir);
> > +
> > + if (!cdev->irq_edge_triggered)
> > + break;
> > + }
> > +
> > m_can_coalescing_update(cdev, ir);
> > if (!ir)
> > return IRQ_NONE;
> >
> > - /* ACK all irqs */
> > - m_can_write(cdev, M_CAN_IR, ir);
> > -
> > if (cdev->ops->clear_interrupts)
> > cdev->ops->clear_interrupts(cdev);
> >
> > diff --git a/drivers/net/can/m_can/m_can.h b/drivers/net/can/m_can/m_can.h
> > index 92b2bd8628e6b..ef39e8e527ab6 100644
> > --- a/drivers/net/can/m_can/m_can.h
> > +++ b/drivers/net/can/m_can/m_can.h
> > @@ -99,6 +99,7 @@ struct m_can_classdev {
> > int pm_clock_support;
> > int pm_wake_source;
> > int is_peripheral;
> > + bool irq_edge_triggered;
> >
> > // Cached M_CAN_IE register content
> > u32 active_interrupts;
> > diff --git a/drivers/net/can/m_can/m_can_pci.c b/drivers/net/can/m_can/m_can_pci.c
> > index d72fe771dfc7a..9ad7419f88f83 100644
> > --- a/drivers/net/can/m_can/m_can_pci.c
> > +++ b/drivers/net/can/m_can/m_can_pci.c
> > @@ -127,6 +127,7 @@ static int m_can_pci_probe(struct pci_dev *pci, const struct pci_device_id *id)
> > mcan_class->pm_clock_support = 1;
> > mcan_class->pm_wake_source = 0;
> > mcan_class->can.clock.freq = id->driver_data;
> > + mcan_class->irq_edge_triggered = true;
> > mcan_class->ops = &m_can_pci_ops;
> >
> > pci_set_drvdata(pci, mcan_class);
> > --
> > TQ-Systems GmbH | Mühlstraße 2, Gut Delling | 82229 Seefeld, Germany
> > Amtsgericht München, HRB 105018
> > Geschäftsführer: Detlef Schneider, Rüdiger Stahl, Stefan Schneider
> > https://www.tq-group.com/
--
TQ-Systems GmbH | Mühlstraße 2, Gut Delling | 82229 Seefeld, Germany
Amtsgericht München, HRB 105018
Geschäftsführer: Detlef Schneider, Rüdiger Stahl, Stefan Schneider
https://www.tq-group.com/
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v3 2/2] can: m_can: fix missed interrupts with m_can_pci
2024-09-26 9:19 ` Matthias Schiffer
@ 2024-09-26 9:43 ` Marc Kleine-Budde
2024-09-26 9:57 ` Matthias Schiffer
0 siblings, 1 reply; 7+ messages in thread
From: Marc Kleine-Budde @ 2024-09-26 9:43 UTC (permalink / raw)
To: Matthias Schiffer
Cc: Markus Schneider-Pargmann, Chandrasekar Ramakrishnan,
Vincent Mailhol, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Martin Hundebøll, Felipe Balbi (Intel),
Raymond Tan, Jarkko Nikula, linux-can, netdev, linux-kernel,
linux
[-- Attachment #1: Type: text/plain, Size: 2503 bytes --]
On 26.09.2024 11:19:53, Matthias Schiffer wrote:
> On Tue, 2024-09-24 at 08:08 +0200, Markus Schneider-Pargmann wrote:
> >
> > On Mon, Sep 23, 2024 at 05:32:16PM GMT, Matthias Schiffer wrote:
> > > The interrupt line of PCI devices is interpreted as edge-triggered,
> > > however the interrupt signal of the m_can controller integrated in Intel
> > > Elkhart Lake CPUs appears to be generated level-triggered.
> > >
> > > Consider the following sequence of events:
> > >
> > > - IR register is read, interrupt X is set
> > > - A new interrupt Y is triggered in the m_can controller
> > > - IR register is written to acknowledge interrupt X. Y remains set in IR
> > >
> > > As at no point in this sequence no interrupt flag is set in IR, the
> > > m_can interrupt line will never become deasserted, and no edge will ever
> > > be observed to trigger another run of the ISR. This was observed to
> > > result in the TX queue of the EHL m_can to get stuck under high load,
> > > because frames were queued to the hardware in m_can_start_xmit(), but
> > > m_can_finish_tx() was never run to account for their successful
> > > transmission.
> > >
> > > To fix the issue, repeatedly read and acknowledge interrupts at the
> > > start of the ISR until no interrupt flags are set, so the next incoming
> > > interrupt will also result in an edge on the interrupt line.
> > >
> > > Fixes: cab7ffc0324f ("can: m_can: add PCI glue driver for Intel Elkhart Lake")
> > > Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com>
> >
> > Just a few comment nitpicks below. Otherwise:
> >
> > Reviewed-by: Markus Schneider-Pargmann <msp@baylibre.com>
>
>
> We have received a report that while this patch fixes a stuck queue issue reproducible with cangen,
> the problem has not disappeared with our customer's application. I will hold off sending a new
> version of the patch while we're investigating whether there is a separate issue with the same
> symptoms or the patch is insufficient.
>
> Patch 1/2 should be good to go and could be applied independently.
Can you post the reproducer here, too. So that we can add it to the
patch or at least reference to it.
regards,
Marc
--
Pengutronix e.K. | Marc Kleine-Budde |
Embedded Linux | https://www.pengutronix.de |
Vertretung Nürnberg | Phone: +49-5121-206917-129 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-9 |
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v3 2/2] can: m_can: fix missed interrupts with m_can_pci
2024-09-26 9:43 ` Marc Kleine-Budde
@ 2024-09-26 9:57 ` Matthias Schiffer
0 siblings, 0 replies; 7+ messages in thread
From: Matthias Schiffer @ 2024-09-26 9:57 UTC (permalink / raw)
To: Marc Kleine-Budde
Cc: Markus Schneider-Pargmann, Chandrasekar Ramakrishnan,
Vincent Mailhol, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Martin Hundebøll, Felipe Balbi (Intel),
Raymond Tan, Jarkko Nikula, linux-can, netdev, linux-kernel,
linux
On Thu, 2024-09-26 at 11:43 +0200, Marc Kleine-Budde wrote:
> On 26.09.2024 11:19:53, Matthias Schiffer wrote:
> > On Tue, 2024-09-24 at 08:08 +0200, Markus Schneider-Pargmann wrote:
> > >
> > > On Mon, Sep 23, 2024 at 05:32:16PM GMT, Matthias Schiffer wrote:
> > > > The interrupt line of PCI devices is interpreted as edge-triggered,
> > > > however the interrupt signal of the m_can controller integrated in Intel
> > > > Elkhart Lake CPUs appears to be generated level-triggered.
> > > >
> > > > Consider the following sequence of events:
> > > >
> > > > - IR register is read, interrupt X is set
> > > > - A new interrupt Y is triggered in the m_can controller
> > > > - IR register is written to acknowledge interrupt X. Y remains set in IR
> > > >
> > > > As at no point in this sequence no interrupt flag is set in IR, the
> > > > m_can interrupt line will never become deasserted, and no edge will ever
> > > > be observed to trigger another run of the ISR. This was observed to
> > > > result in the TX queue of the EHL m_can to get stuck under high load,
> > > > because frames were queued to the hardware in m_can_start_xmit(), but
> > > > m_can_finish_tx() was never run to account for their successful
> > > > transmission.
> > > >
> > > > To fix the issue, repeatedly read and acknowledge interrupts at the
> > > > start of the ISR until no interrupt flags are set, so the next incoming
> > > > interrupt will also result in an edge on the interrupt line.
> > > >
> > > > Fixes: cab7ffc0324f ("can: m_can: add PCI glue driver for Intel Elkhart Lake")
> > > > Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com>
> > >
> > > Just a few comment nitpicks below. Otherwise:
> > >
> > > Reviewed-by: Markus Schneider-Pargmann <msp@baylibre.com>
> >
> >
> > We have received a report that while this patch fixes a stuck queue issue reproducible with cangen,
> > the problem has not disappeared with our customer's application. I will hold off sending a new
> > version of the patch while we're investigating whether there is a separate issue with the same
> > symptoms or the patch is insufficient.
> >
> > Patch 1/2 should be good to go and could be applied independently.
>
> Can you post the reproducer here, too. So that we can add it to the
> patch or at least reference to it.
>
> regards,
> Marc
Something like the following results in a stuck queue after a few minutes without this patch, and
ran without issue for 2.5h with the patch (with can0 and can1 of the Elkhart Lake connected to each
other):
---
ip link set can0 up type can bitrate 1000000
ip link set can1 up type can bitrate 1000000
cangen can1 -g 2 -I 100 -L 8 &
cangen can1 -g 2 -I 101 -L 8 &
cangen can1 -g 2 -I 102 -L 8 &
cangen can1 -g 2 -I 103 -L 8 &
cangen can1 -g 2 -I 104 -L 8 &
cangen can1 -g 2 -I 105 -L 8 &
cangen can1 -g 2 -I 106 -L 8 &
cangen can1 -g 2 -I 107 -L 8 &
cangen can0 -g 2 -I 000 -L 8 &
cangen can0 -g 2 -I 001 -L 8 &
cangen can0 -g 2 -I 002 -L 8 &
cangen can0 -g 2 -I 003 -L 8 &
cangen can0 -g 2 -I 004 -L 8 &
cangen can0 -g 2 -I 005 -L 8 &
cangen can0 -g 2 -I 006 -L 8 &
cangen can0 -g 2 -I 007 -L 8 &
stress-ng --matrix 0 &
---
I will add the reproducer to the commit message in v4. I'm also not sure if the stress-ng actually
has any effect, I'll verify that before the next version of the patch.
Matthias
>
--
TQ-Systems GmbH | Mühlstraße 2, Gut Delling | 82229 Seefeld, Germany
Amtsgericht München, HRB 105018
Geschäftsführer: Detlef Schneider, Rüdiger Stahl, Stefan Schneider
https://www.tq-group.com/
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-09-26 9:57 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-23 15:32 [PATCH v3 1/2] can: m_can: set init flag earlier in probe Matthias Schiffer
2024-09-23 15:32 ` [PATCH v3 2/2] can: m_can: fix missed interrupts with m_can_pci Matthias Schiffer
2024-09-24 6:08 ` Markus Schneider-Pargmann
2024-09-26 9:19 ` Matthias Schiffer
2024-09-26 9:43 ` Marc Kleine-Budde
2024-09-26 9:57 ` Matthias Schiffer
2024-09-23 16:49 ` [PATCH v3 1/2] can: m_can: set init flag earlier in probe Markus Schneider-Pargmann
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).