Hi Krzysztof, Thanks for the reviews. On Wed, 2024-05-08 at 22:36 +0200, Krzysztof Kozlowski wrote: > > External email : Please do not click links or open attachments until > you have verified the sender or the content. > On 08/05/2024 11:51, Jason-JH.Lin wrote: > > When we run kernel with lockdebug option, we will get the BUG > below: > > [ 106.692124] BUG: sleeping function called from invalid context > at drivers/base/power/runtime.c:1164 > > [ 106.692190] in_atomic(): 1, irqs_disabled(): 128, non_block: 0, > pid: 3616, name: kworker/u17:3 > > Trim logs from irrelevant context. timestamps are useless his. > Registers > as well. Addresses of stacktrace as well and better just decode it. > OK, I'll remove the useless timestamps and registers. > > > [ 106.692226] preempt_count: 1, expected: 0 > > [ 106.692254] RCU nest depth: 0, expected: 0 > > [ 106.692282] INFO: lockdep is turned off. > > [ 106.692306] irq event stamp: 0 > > [ 106.692331] hardirqs last enabled at (0): [<0000000000000000>] > 0x0 > > [ 106.692376] hardirqs last disabled at (0): [] > copy_process+0xc90/0x2ac0 > > [ 106.692429] softirqs last enabled at (0): [] > copy_process+0xcb4/0x2ac0 > > [ 106.692473] softirqs last disabled at (0): [<0000000000000000>] > 0x0 > > [ 106.692513] CPU: 1 PID: 3616 Comm: kworker/u17:3 Not tainted > 6.1.87-lockdep-14133-g26e933aca785 #1 > 6839942e1cf34914b0a366137843dd2366f52aa9 > > [ 106.692556] Hardware name: Google Ciri sku0/unprovisioned board > (DT) > > [ 106.692586] Workqueue: imgsys_runner imgsys_runner_func > > [ 106.692638] Call trace: > > [ 106.692662] dump_backtrace+0x100/0x120 > > [ 106.692702] show_stack+0x20/0x2c > > [ 106.692737] dump_stack_lvl+0x84/0xb4 > > [ 106.692775] dump_stack+0x18/0x48 > > [ 106.692809] __might_resched+0x354/0x4c0 > > [ 106.692847] __might_sleep+0x98/0xe4 > > [ 106.692883] __pm_runtime_resume+0x70/0x124 > > [ 106.692921] cmdq_mbox_send_data+0xe4/0xb1c > > [ 106.692964] msg_submit+0x194/0x2dc > > [ 106.693003] mbox_send_message+0x190/0x330 > > [ 106.693043] imgsys_cmdq_sendtask+0x1618/0x2224 > > [ 106.693082] imgsys_runner_func+0xac/0x11c > > [ 106.693118] process_one_work+0x638/0xf84 > > [ 106.693158] worker_thread+0x808/0xcd0 > > [ 106.693196] kthread+0x24c/0x324 > > [ 106.693231] ret_from_fork+0x10/0x20 > > > > We found that there is a spin_lock_irqsave protection in > msg_submit() > > of mailbox.c and it is in the atomic context. > > So when cmdq driver calls pm_runtime_get_sync() in > cmdq_mbox_send_data(), > > it will get this BUG report. > > > > 1. Change pm_runtime_get_sync() to pm_runtime_get() to avoid using > sleep > > in atomic context. > > 2. Move clk_bulk_enable() outside cmdq_runtime_resume() to ensure > GCE > > clocks are enabled before configuring GCE register. > > 3. Add used_count to avoid cmdq_runtime_suspend() being called > before > > calling cmdq_runtime_resume(). > > > > Fixes: 8afe816b0c99 ("mailbox: mtk-cmdq-mailbox: Implement Runtime > PM with autosuspend") > > Signed-off-by: Jason-JH.Lin > > --- > > drivers/mailbox/mtk-cmdq-mailbox.c | 24 +++++++++++++++--------- > > 1 file changed, 15 insertions(+), 9 deletions(-) > > > > diff --git a/drivers/mailbox/mtk-cmdq-mailbox.c > b/drivers/mailbox/mtk-cmdq-mailbox.c > > index 033aff11f87c..b50f42e69aab 100644 > > --- a/drivers/mailbox/mtk-cmdq-mailbox.c > > +++ b/drivers/mailbox/mtk-cmdq-mailbox.c > > @@ -82,6 +82,7 @@ struct cmdq { > > const struct gce_plat*pdata; > > struct cmdq_thread*thread; > > struct clk_bulk_dataclocks[CMDQ_GCE_NUM_MAX]; > > +atomic_tused_count; > > boolsuspended; > > }; > > > > @@ -317,14 +318,21 @@ static int cmdq_runtime_resume(struct device > *dev) > > { > > struct cmdq *cmdq = dev_get_drvdata(dev); > > > > -return clk_bulk_enable(cmdq->pdata->gce_num, cmdq->clocks); > > +atomic_inc(&cmdq->used_count); > > Do not implement own runtime PM counter... > > > +return 0; > > } > > > > static int cmdq_runtime_suspend(struct device *dev) > > { > > struct cmdq *cmdq = dev_get_drvdata(dev); > > > > +if (atomic_read(&cmdq->used_count) == 0) { > > +dev_warn(dev, "%s when used_count is 0!", __func__); > > +return -EINVAL; > > +} > > + > > clk_bulk_disable(cmdq->pdata->gce_num, cmdq->clocks); > > +atomic_dec(&cmdq->used_count); > > return 0; > > } > > > > @@ -392,9 +400,8 @@ static int cmdq_mbox_send_data(struct mbox_chan > *chan, void *data) > > /* Client should not flush new tasks if suspended. */ > > WARN_ON(cmdq->suspended); > > > > -ret = pm_runtime_get_sync(cmdq->mbox.dev); > > -if (ret < 0) > > -return ret; > > +WARN_ON(pm_runtime_get(cmdq->mbox.dev) < 0); > > +WARN_ON(clk_bulk_enable(cmdq->pdata->gce_num, cmdq->clocks)); > > That's a no... First, you remove error handling. Second, you add WARN > and code should not have WARNs for error handling. It should not use WARNs instead of error return. I'll revert this. > > All this looks like terrible hacky workaround just to make warning go > away, without actually addressing real problem. > I think Angelo's idea is far better, so I'll try to implement the PM function hook to achieve this. Regards, Jason-JH.Lin > Best regards, > Krzysztof > >