LKML Archive mirror
 help / color / mirror / Atom feed
* [PATCH] mailbox: mtk-cmdq: Fix sleeping function called from invalid context
@ 2024-04-30  8:39 Jason-JH.Lin
  2024-04-30  9:39 ` AngeloGioacchino Del Regno
  0 siblings, 1 reply; 4+ messages in thread
From: Jason-JH.Lin @ 2024-04-30  8:39 UTC (permalink / raw
  To: Jassi Brar, Chun-Kuang Hu, Matthias Brugger
  Cc: AngeloGioacchino Del Regno, Jason-ch Chen, Jason-JH . Lin,
	Singo Chang, Nancy Lin, Shawn Sung, linux-kernel,
	linux-arm-kernel, linux-mediatek,
	Project_Global_Chrome_Upstream_Group

When we run kernel with lockdebug option, we will get the BUG below:
[  106.692124] BUG: sleeping function called from invalid context at drivers/base/power/runtime.c:1164
[  106.692190] in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 3616, name: kworker/u17:3
[  106.692226] preempt_count: 1, expected: 0
[  106.692254] RCU nest depth: 0, expected: 0
[  106.692282] INFO: lockdep is turned off.
[  106.692306] irq event stamp: 0
[  106.692331] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
[  106.692376] hardirqs last disabled at (0): [<ffffffee15d37fa0>] copy_process+0xc90/0x2ac0
[  106.692429] softirqs last  enabled at (0): [<ffffffee15d37fc4>] copy_process+0xcb4/0x2ac0
[  106.692473] softirqs last disabled at (0): [<0000000000000000>] 0x0
[  106.692513] CPU: 1 PID: 3616 Comm: kworker/u17:3 Not tainted 6.1.87-lockdep-14133-g26e933aca785 #1 6839942e1cf34914b0a366137843dd2366f52aa9
[  106.692556] Hardware name: Google Ciri sku0/unprovisioned board (DT)
[  106.692586] Workqueue: imgsys_runner imgsys_runner_func
[  106.692638] Call trace:
[  106.692662]  dump_backtrace+0x100/0x120
[  106.692702]  show_stack+0x20/0x2c
[  106.692737]  dump_stack_lvl+0x84/0xb4
[  106.692775]  dump_stack+0x18/0x48
[  106.692809]  __might_resched+0x354/0x4c0
[  106.692847]  __might_sleep+0x98/0xe4
[  106.692883]  __pm_runtime_resume+0x70/0x124
[  106.692921]  cmdq_mbox_send_data+0xe4/0xb1c
[  106.692964]  msg_submit+0x194/0x2dc
[  106.693003]  mbox_send_message+0x190/0x330
[  106.693043]  imgsys_cmdq_sendtask+0x1618/0x2224
[  106.693082]  imgsys_runner_func+0xac/0x11c
[  106.693118]  process_one_work+0x638/0xf84
[  106.693158]  worker_thread+0x808/0xcd0
[  106.693196]  kthread+0x24c/0x324
[  106.693231]  ret_from_fork+0x10/0x20

We found that there is a spin_lock_irqsave protection in msg_submit()
of mailbox.c.
So when cmdq driver calls pm_runtime_get_sync() in cmdq_mbox_send_data(),
it will get this BUG report.

Add pm_runtime_irq_safe() to let pm_runtime callbacks is safe in atomic
context with interrupts disabled.

Fixes: 8afe816b0c99 ("mailbox: mtk-cmdq-mailbox: Implement Runtime PM with autosuspend")
Signed-off-by: Jason-JH.Lin <jason-jh.lin@mediatek.com>
---
 drivers/mailbox/mtk-cmdq-mailbox.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/mailbox/mtk-cmdq-mailbox.c b/drivers/mailbox/mtk-cmdq-mailbox.c
index ead2200f39ba..3b4f19633c83 100644
--- a/drivers/mailbox/mtk-cmdq-mailbox.c
+++ b/drivers/mailbox/mtk-cmdq-mailbox.c
@@ -694,6 +694,7 @@ static int cmdq_probe(struct platform_device *pdev)
 
 	pm_runtime_set_autosuspend_delay(dev, CMDQ_MBOX_AUTOSUSPEND_DELAY_MS);
 	pm_runtime_use_autosuspend(dev);
+	pm_runtime_irq_safe(dev);
 
 	return 0;
 }
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] mailbox: mtk-cmdq: Fix sleeping function called from invalid context
  2024-04-30  8:39 [PATCH] mailbox: mtk-cmdq: Fix sleeping function called from invalid context Jason-JH.Lin
@ 2024-04-30  9:39 ` AngeloGioacchino Del Regno
  2024-05-01  0:57   ` Jassi Brar
  2024-05-01  7:09   ` Jason-JH Lin (林睿祥)
  0 siblings, 2 replies; 4+ messages in thread
From: AngeloGioacchino Del Regno @ 2024-04-30  9:39 UTC (permalink / raw
  To: Jason-JH.Lin, Jassi Brar, Chun-Kuang Hu, Matthias Brugger
  Cc: Jason-ch Chen, Singo Chang, Nancy Lin, Shawn Sung, linux-kernel,
	linux-arm-kernel, linux-mediatek,
	Project_Global_Chrome_Upstream_Group

Il 30/04/24 10:39, Jason-JH.Lin ha scritto:
> When we run kernel with lockdebug option, we will get the BUG below:
> [  106.692124] BUG: sleeping function called from invalid context at drivers/base/power/runtime.c:1164
> [  106.692190] in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 3616, name: kworker/u17:3
> [  106.692226] preempt_count: 1, expected: 0
> [  106.692254] RCU nest depth: 0, expected: 0
> [  106.692282] INFO: lockdep is turned off.
> [  106.692306] irq event stamp: 0
> [  106.692331] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
> [  106.692376] hardirqs last disabled at (0): [<ffffffee15d37fa0>] copy_process+0xc90/0x2ac0
> [  106.692429] softirqs last  enabled at (0): [<ffffffee15d37fc4>] copy_process+0xcb4/0x2ac0
> [  106.692473] softirqs last disabled at (0): [<0000000000000000>] 0x0
> [  106.692513] CPU: 1 PID: 3616 Comm: kworker/u17:3 Not tainted 6.1.87-lockdep-14133-g26e933aca785 #1 6839942e1cf34914b0a366137843dd2366f52aa9
> [  106.692556] Hardware name: Google Ciri sku0/unprovisioned board (DT)
> [  106.692586] Workqueue: imgsys_runner imgsys_runner_func
> [  106.692638] Call trace:
> [  106.692662]  dump_backtrace+0x100/0x120
> [  106.692702]  show_stack+0x20/0x2c
> [  106.692737]  dump_stack_lvl+0x84/0xb4
> [  106.692775]  dump_stack+0x18/0x48
> [  106.692809]  __might_resched+0x354/0x4c0
> [  106.692847]  __might_sleep+0x98/0xe4
> [  106.692883]  __pm_runtime_resume+0x70/0x124
> [  106.692921]  cmdq_mbox_send_data+0xe4/0xb1c
> [  106.692964]  msg_submit+0x194/0x2dc
> [  106.693003]  mbox_send_message+0x190/0x330
> [  106.693043]  imgsys_cmdq_sendtask+0x1618/0x2224
> [  106.693082]  imgsys_runner_func+0xac/0x11c
> [  106.693118]  process_one_work+0x638/0xf84
> [  106.693158]  worker_thread+0x808/0xcd0
> [  106.693196]  kthread+0x24c/0x324
> [  106.693231]  ret_from_fork+0x10/0x20
> 
> We found that there is a spin_lock_irqsave protection in msg_submit()
> of mailbox.c.
> So when cmdq driver calls pm_runtime_get_sync() in cmdq_mbox_send_data(),
> it will get this BUG report.
> 
> Add pm_runtime_irq_safe() to let pm_runtime callbacks is safe in atomic
> context with interrupts disabled.
> 

I see. The problem with this is that pm_runtime_irq_safe() will raise the refcount
of the parent device "forever", which isn't the best and partially defeats what we
are trying to do here (keeping stuff off unless really needed).

At this point I wonder if it'd be just better to really fix this instead of working
around the problem.

I'd say that one of the options would be to perform a change to msg_submit() so
that it will take into account that a .send_data() callback might be using RPM
functions.

Ideas? :-)

Cheers,
Angelo

> Fixes: 8afe816b0c99 ("mailbox: mtk-cmdq-mailbox: Implement Runtime PM with autosuspend")
> Signed-off-by: Jason-JH.Lin <jason-jh.lin@mediatek.com>
> ---
>   drivers/mailbox/mtk-cmdq-mailbox.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/mailbox/mtk-cmdq-mailbox.c b/drivers/mailbox/mtk-cmdq-mailbox.c
> index ead2200f39ba..3b4f19633c83 100644
> --- a/drivers/mailbox/mtk-cmdq-mailbox.c
> +++ b/drivers/mailbox/mtk-cmdq-mailbox.c
> @@ -694,6 +694,7 @@ static int cmdq_probe(struct platform_device *pdev)
>   
>   	pm_runtime_set_autosuspend_delay(dev, CMDQ_MBOX_AUTOSUSPEND_DELAY_MS);
>   	pm_runtime_use_autosuspend(dev);
> +	pm_runtime_irq_safe(dev);
>   
>   	return 0;
>   }




^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] mailbox: mtk-cmdq: Fix sleeping function called from invalid context
  2024-04-30  9:39 ` AngeloGioacchino Del Regno
@ 2024-05-01  0:57   ` Jassi Brar
  2024-05-01  7:09   ` Jason-JH Lin (林睿祥)
  1 sibling, 0 replies; 4+ messages in thread
From: Jassi Brar @ 2024-05-01  0:57 UTC (permalink / raw
  To: AngeloGioacchino Del Regno
  Cc: Jason-JH.Lin, Chun-Kuang Hu, Matthias Brugger, Jason-ch Chen,
	Singo Chang, Nancy Lin, Shawn Sung, linux-kernel,
	linux-arm-kernel, linux-mediatek,
	Project_Global_Chrome_Upstream_Group

On Tue, Apr 30, 2024 at 4:39 AM AngeloGioacchino Del Regno
<angelogioacchino.delregno@collabora.com> wrote:
>
> Il 30/04/24 10:39, Jason-JH.Lin ha scritto:
> > When we run kernel with lockdebug option, we will get the BUG below:
> > [  106.692124] BUG: sleeping function called from invalid context at drivers/base/power/runtime.c:1164
> > [  106.692190] in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 3616, name: kworker/u17:3
> > [  106.692226] preempt_count: 1, expected: 0
> > [  106.692254] RCU nest depth: 0, expected: 0
> > [  106.692282] INFO: lockdep is turned off.
> > [  106.692306] irq event stamp: 0
> > [  106.692331] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
> > [  106.692376] hardirqs last disabled at (0): [<ffffffee15d37fa0>] copy_process+0xc90/0x2ac0
> > [  106.692429] softirqs last  enabled at (0): [<ffffffee15d37fc4>] copy_process+0xcb4/0x2ac0
> > [  106.692473] softirqs last disabled at (0): [<0000000000000000>] 0x0
> > [  106.692513] CPU: 1 PID: 3616 Comm: kworker/u17:3 Not tainted 6.1.87-lockdep-14133-g26e933aca785 #1 6839942e1cf34914b0a366137843dd2366f52aa9
> > [  106.692556] Hardware name: Google Ciri sku0/unprovisioned board (DT)
> > [  106.692586] Workqueue: imgsys_runner imgsys_runner_func
> > [  106.692638] Call trace:
> > [  106.692662]  dump_backtrace+0x100/0x120
> > [  106.692702]  show_stack+0x20/0x2c
> > [  106.692737]  dump_stack_lvl+0x84/0xb4
> > [  106.692775]  dump_stack+0x18/0x48
> > [  106.692809]  __might_resched+0x354/0x4c0
> > [  106.692847]  __might_sleep+0x98/0xe4
> > [  106.692883]  __pm_runtime_resume+0x70/0x124
> > [  106.692921]  cmdq_mbox_send_data+0xe4/0xb1c
> > [  106.692964]  msg_submit+0x194/0x2dc
> > [  106.693003]  mbox_send_message+0x190/0x330
> > [  106.693043]  imgsys_cmdq_sendtask+0x1618/0x2224
> > [  106.693082]  imgsys_runner_func+0xac/0x11c
> > [  106.693118]  process_one_work+0x638/0xf84
> > [  106.693158]  worker_thread+0x808/0xcd0
> > [  106.693196]  kthread+0x24c/0x324
> > [  106.693231]  ret_from_fork+0x10/0x20
> >
> > We found that there is a spin_lock_irqsave protection in msg_submit()
> > of mailbox.c.
> > So when cmdq driver calls pm_runtime_get_sync() in cmdq_mbox_send_data(),
> > it will get this BUG report.
> >
> > Add pm_runtime_irq_safe() to let pm_runtime callbacks is safe in atomic
> > context with interrupts disabled.
> >
>
> I see. The problem with this is that pm_runtime_irq_safe() will raise the refcount
> of the parent device "forever", which isn't the best and partially defeats what we
> are trying to do here (keeping stuff off unless really needed).
>
> At this point I wonder if it'd be just better to really fix this instead of working
> around the problem.
>
> I'd say that one of the options would be to perform a change to msg_submit() so
> that it will take into account that a .send_data() callback might be using RPM
> functions.
>
> Ideas? :-)
>
@send_data:  The API asks the MBOX controller driver, in atomic
 *      context try to transmit a message on the bus. Returns 0 if
 *      data is accepted for transmission, -EBUSY while rejecting
 *      if the remote hasn't yet read the last data sent. Actual
 *      transmission of data is reported by the controller via
 *      mbox_chan_txdone (if it has some TX ACK irq). It must not
 *      sleep.

As long as send_data() does not sleep, ideas are welcome.

Cheers.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] mailbox: mtk-cmdq: Fix sleeping function called from invalid context
  2024-04-30  9:39 ` AngeloGioacchino Del Regno
  2024-05-01  0:57   ` Jassi Brar
@ 2024-05-01  7:09   ` Jason-JH Lin (林睿祥)
  1 sibling, 0 replies; 4+ messages in thread
From: Jason-JH Lin (林睿祥) @ 2024-05-01  7:09 UTC (permalink / raw
  To: jassisinghbrar@gmail.com, matthias.bgg@gmail.com,
	angelogioacchino.delregno@collabora.com, chunkuang.hu@kernel.org
  Cc: linux-mediatek@lists.infradead.org,
	Singo Chang (張興國),
	linux-kernel@vger.kernel.org,
	Jason-ch Chen (陳建豪),
	Shawn Sung (宋孝謙),
	Nancy Lin (林欣螢),
	Project_Global_Chrome_Upstream_Group,
	linux-arm-kernel@lists.infradead.org

Hi Angelo,

Thanks for the reviews.

On Tue, 2024-04-30 at 11:39 +0200, AngeloGioacchino Del Regno wrote:
> Il 30/04/24 10:39, Jason-JH.Lin ha scritto:
> > When we run kernel with lockdebug option, we will get the BUG
> > below:
> > [  106.692124] BUG: sleeping function called from invalid context
> > at drivers/base/power/runtime.c:1164
> > [  106.692190] in_atomic(): 1, irqs_disabled(): 128, non_block: 0,
> > pid: 3616, name: kworker/u17:3
> > [  106.692226] preempt_count: 1, expected: 0
> > [  106.692254] RCU nest depth: 0, expected: 0
> > [  106.692282] INFO: lockdep is turned off.
> > [  106.692306] irq event stamp: 0
> > [  106.692331] hardirqs last  enabled at (0): [<0000000000000000>]
> > 0x0
> > [  106.692376] hardirqs last disabled at (0): [<ffffffee15d37fa0>]
> > copy_process+0xc90/0x2ac0
> > [  106.692429] softirqs last  enabled at (0): [<ffffffee15d37fc4>]
> > copy_process+0xcb4/0x2ac0
> > [  106.692473] softirqs last disabled at (0): [<0000000000000000>]
> > 0x0
> > [  106.692513] CPU: 1 PID: 3616 Comm: kworker/u17:3 Not tainted
> > 6.1.87-lockdep-14133-g26e933aca785 #1
> > 6839942e1cf34914b0a366137843dd2366f52aa9
> > [  106.692556] Hardware name: Google Ciri sku0/unprovisioned board
> > (DT)
> > [  106.692586] Workqueue: imgsys_runner imgsys_runner_func
> > [  106.692638] Call trace:
> > [  106.692662]  dump_backtrace+0x100/0x120
> > [  106.692702]  show_stack+0x20/0x2c
> > [  106.692737]  dump_stack_lvl+0x84/0xb4
> > [  106.692775]  dump_stack+0x18/0x48
> > [  106.692809]  __might_resched+0x354/0x4c0
> > [  106.692847]  __might_sleep+0x98/0xe4
> > [  106.692883]  __pm_runtime_resume+0x70/0x124
> > [  106.692921]  cmdq_mbox_send_data+0xe4/0xb1c
> > [  106.692964]  msg_submit+0x194/0x2dc
> > [  106.693003]  mbox_send_message+0x190/0x330
> > [  106.693043]  imgsys_cmdq_sendtask+0x1618/0x2224
> > [  106.693082]  imgsys_runner_func+0xac/0x11c
> > [  106.693118]  process_one_work+0x638/0xf84
> > [  106.693158]  worker_thread+0x808/0xcd0
> > [  106.693196]  kthread+0x24c/0x324
> > [  106.693231]  ret_from_fork+0x10/0x20
> > 
> > We found that there is a spin_lock_irqsave protection in
> > msg_submit()
> > of mailbox.c.
> > So when cmdq driver calls pm_runtime_get_sync() in
> > cmdq_mbox_send_data(),
> > it will get this BUG report.
> > 
> > Add pm_runtime_irq_safe() to let pm_runtime callbacks is safe in
> > atomic
> > context with interrupts disabled.
> > 
> 
> I see. The problem with this is that pm_runtime_irq_safe() will raise
> the refcount
> of the parent device "forever", which isn't the best and partially
> defeats what we
> are trying to do here (keeping stuff off unless really needed).
> 
> At this point I wonder if it'd be just better to really fix this
> instead of working
> around the problem.
> 
> I'd say that one of the options would be to perform a change to
> msg_submit() so
> that it will take into account that a .send_data() callback might be
> using RPM
> functions.
> 
> Ideas? :-)
> 
> Cheers,
> Angelo
> 

As Jassi mentioned, .send_data() can not use sleep, so we can not use
pm_runtime_get_sync() in cmdq_mbox_send_data().

Because we have to make sure the clocks is enabled before setting GCE
registers inside the cmdq_mbox_send_data(). I think we can't use the
ASYNC pm_runtime_get() instead of pm_runtime_get_sync() either.

And you're right, I didn't notice that raising the irq_safe flag will
make the parent device can not be suspended.

How about this:
1. Use clk_bluk_enable() + pm_runtime_get() instead of 
pm_runtime_get_sync() everywhere in mtk-cmdq-mailbox.c.

2. Use refcount++ instead of clk_bluk_enable() in
cmdq_runtime_resume().

3. Call clk_bluk_disable() when refcount > 0 in cmdq_runtime_suspend().

Is this an option?

Regards,
Jason-JH.Lin

> > Fixes: 8afe816b0c99 ("mailbox: mtk-cmdq-mailbox: Implement Runtime
> > PM with autosuspend")
> > Signed-off-by: Jason-JH.Lin <jason-jh.lin@mediatek.com>
> > ---
> >   drivers/mailbox/mtk-cmdq-mailbox.c | 1 +
> >   1 file changed, 1 insertion(+)
> > 
> > diff --git a/drivers/mailbox/mtk-cmdq-mailbox.c
> > b/drivers/mailbox/mtk-cmdq-mailbox.c
> > index ead2200f39ba..3b4f19633c83 100644
> > --- a/drivers/mailbox/mtk-cmdq-mailbox.c
> > +++ b/drivers/mailbox/mtk-cmdq-mailbox.c
> > @@ -694,6 +694,7 @@ static int cmdq_probe(struct platform_device
> > *pdev)
> >   
> >   	pm_runtime_set_autosuspend_delay(dev,
> > CMDQ_MBOX_AUTOSUSPEND_DELAY_MS);
> >   	pm_runtime_use_autosuspend(dev);
> > +	pm_runtime_irq_safe(dev);
> >   
> >   	return 0;
> >   }
> 
> 
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-05-01  7:10 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-30  8:39 [PATCH] mailbox: mtk-cmdq: Fix sleeping function called from invalid context Jason-JH.Lin
2024-04-30  9:39 ` AngeloGioacchino Del Regno
2024-05-01  0:57   ` Jassi Brar
2024-05-01  7:09   ` Jason-JH Lin (林睿祥)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).